1. 22 Apr, 2013 10 commits
  2. 21 Apr, 2013 4 commits
    • Patrick McHardy's avatar
      qeth: fix VLAN related compilation errors · 91b1c1aa
      Patrick McHardy authored
      drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_add_vlan_mc':
      >> drivers/s390/net/qeth_l3_main.c:1662:3: error: too few arguments to function '__vlan_find_dev_deep'
         include/linux/if_vlan.h:88:27: note: declared here
         drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_add_vlan_mc6':
      >> drivers/s390/net/qeth_l3_main.c:1723:3: error: too few arguments to function '__vlan_find_dev_deep'
         include/linux/if_vlan.h:88:27: note: declared here
         drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_free_vlan_addresses4':
      >> drivers/s390/net/qeth_l3_main.c:1767:2: error: too few arguments to function '__vlan_find_dev_deep'
         include/linux/if_vlan.h:88:27: note: declared here
         drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_free_vlan_addresses6':
      >> drivers/s390/net/qeth_l3_main.c:1797:2: error: too few arguments to function '__vlan_find_dev_deep'
         include/linux/if_vlan.h:88:27: note: declared here
         drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_process_inbound_buffer':
      >> drivers/s390/net/qeth_l3_main.c:1980:6: error: too few arguments to function '__vlan_hwaccel_put_tag'
         include/linux/if_vlan.h:234:31: note: declared here
         drivers/s390/net/qeth_l3_main.c: In function 'qeth_l3_verify_vlan_dev':
      >> drivers/s390/net/qeth_l3_main.c:2089:3: error: too few arguments to function '__vlan_find_dev_deep'
         include/linux/if_vlan.h:88:27: note: declared here
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91b1c1aa
    • Patrick McHardy's avatar
      net: vlan: fix up vlan_proto_idx() for CONFIG_BUG=n · 8da63a65
      Patrick McHardy authored
      Add missing return statement for CONFIG_BUG=n.
      Reported-by: default avatarkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8da63a65
    • Patrick McHardy's avatar
      net: vlan: fix dummy function signatures for CONFIG_VLAN=n · 9fae27b3
      Patrick McHardy authored
      Fix up some function signatures for CONFIG_VLAN=n that were missed during
      the 802.1ad support patches.
      
      Found by the kbuild robot.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fae27b3
    • Patrick McHardy's avatar
      net: vlan: fix memory leak in vlan_info_rcu_free() · cf2c014a
      Patrick McHardy authored
      The following leak is reported by kmemleak:
      
      [   86.812073] kmemleak: Found object by alias at 0xffff88006ecc76f0
      [   86.816019] Pid: 739, comm: kworker/u:1 Not tainted 3.9.0-rc5+ #842
      [   86.816019] Call Trace:
      [   86.816019]  <IRQ>  [<ffffffff81151c58>] find_and_get_object+0x8c/0xdf
      [   86.816019]  [<ffffffff8190e90d>] ? vlan_info_rcu_free+0x33/0x49
      [   86.816019]  [<ffffffff81151cbe>] delete_object_full+0x13/0x2f
      [   86.816019]  [<ffffffff8194bbb6>] kmemleak_free+0x26/0x45
      [   86.816019]  [<ffffffff8113e8c7>] slab_free_hook+0x1e/0x7b
      [   86.816019]  [<ffffffff81141c05>] kfree+0xce/0x14b
      [   86.816019]  [<ffffffff8190e90d>] vlan_info_rcu_free+0x33/0x49
      [   86.816019]  [<ffffffff810d0b0b>] rcu_do_batch+0x261/0x4e7
      
      The reason is that in vlan_info_rcu_free() we don't take the VLAN protocol
      into account when iterating over the vlan_devices_array.
      Reported-by: default avatarCong Wang <amwang@redhat.com>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Tested-by: default avatarCong Wang <amwang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf2c014a
  3. 19 Apr, 2013 26 commits
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 95a06161
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      The following patchset contains a small batch of Netfilter
      updates for your net-next tree, they are:
      
      * Three patches that provide more accurate error reporting to
        user-space, instead of -EPERM, in IPv4/IPv6 netfilter re-routing
        code and NAT, from Patrick McHardy.
      
      * Update copyright statements in Netfilter filters of
        Patrick McHardy, from himself.
      
      * Add Kconfig dependency on the raw/mangle tables to the
        rpfilter, from Florian Westphal.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      95a06161
    • Andy Gospodarek's avatar
      bond: add support to read speed and duplex via ethtool · bb5b052f
      Andy Gospodarek authored
      This patch adds support for the get_settings ethtool op to the bonding
      driver.  This was motivated by users who wanted to get the speed of the
      bond and compare that against throughput to understand utilization.
      The behavior before this patch was added was problematic when computing
      line utilization after trying to get link-speed and throughput via SNMP.
      
      Output from ethtool looks like this for a round-robin bond:
      
      Settings for bond0:
      	Supported ports: [ ]
      	Supported link modes:   Not reported
      	Supported pause frame use: No
      	Supports auto-negotiation: No
      	Advertised link modes:  Not reported
      	Advertised pause frame use: No
      	Advertised auto-negotiation: No
      	Speed: 11000Mb/s
      	Duplex: Full
      	Port: Other
      	PHYAD: 0
      	Transceiver: internal
      	Auto-negotiation: off
      	MDI-X: Unknown
      	Link detected: yes
      
      I tested this and verified it works as expected.  A test was also done
      on a version backported to an older kernel and it worked well there.
      
      v2: Switch to using ethtool_cmd_speed_set to set speed, added check to
      SLAVE_IS_OK for each slave in bond, dropped mode-specific calculations
      as they were not needed, and set port type to 'Other.'
      
      v3: Fix useless assignment and checkpatch warning.
      Signed-off-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Reviewed-by: default avatarBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bb5b052f
    • Daniel Borkmann's avatar
      packet: move hw/sw timestamp extraction into a small helper · 4b457bdf
      Daniel Borkmann authored
      This patch introduces a small, internal helper function, that is used by
      PF_PACKET. Based on the flags that are passed, it extracts the packet
      timestamp in the receive path. This is merely a refactoring to remove
      some duplicate code in tpacket_rcv(), to make it more readable, and to
      enable others to use this function in PF_PACKET as well, e.g. for TX.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b457bdf
    • Daniel Borkmann's avatar
      net: socket: move ktime2ts to ktime header api · 6e94d1ef
      Daniel Borkmann authored
      Currently, ktime2ts is a small helper function that is only used in
      net/socket.c. Move this helper into the ktime API as a small inline
      function, so that i) it's maintained together with ktime routines,
      and ii) also other files can make use of it. The function is named
      ktime_to_timespec_cond() and placed into the generic part of ktime,
      since we internally make use of ktime_to_timespec(). ktime_to_timespec()
      itself does not check the ktime variable for zero, hence, we name
      this function ktime_to_timespec_cond() for only a conditional
      conversion, and adapt its users to it.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6e94d1ef
    • David S. Miller's avatar
    • David S. Miller's avatar
      net: Add missing netdev feature strings for NETIF_F_HW_VLAN_STAG_* · 2d6577f1
      David S. Miller authored
      Noticed by Ben Hutchings.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d6577f1
    • David S. Miller's avatar
      Merge branch 'qlcnic' · 92352df1
      David S. Miller authored
      Rajesh Borundia says:
      
      ====================
      * "qlcnic: Change 82xx adapter VLAN id endian type".
        - Adapter requires VLAN id in little endian. VLAN id was being
          converted to __le16 and then passed as a parameter. Pass VLAN id
          as u16 and then use cpu_to_le16 at appropriate places. It is
          appropriate for net-next as SR-IOV patches have a dependency on it.
      * "qlcnic: Fix loopback test for SR-IOV PF".
        - It is appropriate for net-next as change is needed for SRIOV PF
          only.
      * Remaining patches add enhancements to SR-IOV functionality like
        - FLR handling
        - Adapter reset recovery handling
        - iproute2 tool support for configuring MAC address, Tx rate and
          VLAN id.
        - Mailbox polling support for SR-IOV PF in case mailbox interrupts
          are disabled.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92352df1
    • Rajesh Borundia's avatar
      c6376278
    • Rajesh Borundia's avatar
      qlcnic: Support polling for mailbox events. · 7ed3ce48
      Rajesh Borundia authored
      o When mailbox interrupt is disabled PF should be
        able to process request from VF. Enable polling
        for such cases.
      Signed-off-by: default avatarRajesh Borundia <rajesh.borundia@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7ed3ce48
    • Rajesh Borundia's avatar
      qlcnic: Fix loopback test for SR-IOV PF. · d1a1105e
      Rajesh Borundia authored
      o Do not disable mailbox interrupts while running
        loopback test through SR-IOV PF.
      Signed-off-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarRajesh Borundia <rajesh.borundia@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1a1105e
    • Rajesh Borundia's avatar
      qlcnic: Support VLAN id config. · 91b7282b
      Rajesh Borundia authored
      o Add support for VLAN id configuration per VF using
        iproute2 tool.
      o VLAN id's 1-4094 are treated as PVID by the PF and
        Guest VLAN tagging is not allowed by default.
      o PVID is disabled when the VLAN id is set to 0
      o Guest VLAN tagging is allowed when the VLAN id is set to 4095.
      o Only one Guest VLAN id  is supported.
      o VLAN id can be changed only when the VF driver is not loaded.
      Signed-off-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarSucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Signed-off-by: default avatarRajesh Borundia <rajesh.borundia@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91b7282b
    • Rajesh Borundia's avatar
      qlcnic: Support MAC address, Tx rate config. · 4000e7a7
      Rajesh Borundia authored
      o Add support for MAC address and Tx rate configuration
        per VF via iproute2 tool.
      o Tx rate change is allowed while the guest is running
        and the VF driver is loaded.
      o MAC address change is allowed only when VF driver
        is not loaded.
      Signed-off-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarSucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Signed-off-by: default avatarRajesh Borundia <rajesh.borundia@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4000e7a7
    • Rajesh Borundia's avatar
      qlcnic: VF reset recovery implementation. · f036e4f4
      Rajesh Borundia authored
      o Implement recovery mechanism for VF to recover from
        adapter resets.
      Signed-off-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarSucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Signed-off-by: default avatarRajesh Borundia <rajesh.borundia@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f036e4f4
    • Rajesh Borundia's avatar
      qlcnic: VF FLR implementation. · 97d8105c
      Rajesh Borundia authored
      o FLR from Hypervisor - When hypervisor issues a VF FLR request,
        adapter notifies the parent PF driver of the FLR request for PF
        driver to perform any cleanup on behalf of that VF.
      o FLR from VF Driver - VF driver may initiate a VF FLR request,
        if VF state needs to be cleaned up before a re-initialization.
        VF re-initialization during kdump is an example.
      o PF driver cleans up all resources allocated on behalf of a  VF,
        on VF FLR notifications from the adapter or from the VF driver.
      Signed-off-by: default avatarManish Chopra <manish.chopra@qlogic.com>
      Signed-off-by: default avatarSucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Signed-off-by: default avatarRajesh Borundia <rajesh.borundia@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      97d8105c
    • Rajesh Borundia's avatar
      qlcnic: Change 82xx adapter VLAN id endian type. · f80bc8fe
      Rajesh Borundia authored
      o 82xx adapter requires VLAN id in little endian format.
        Instead of passing vlan id parameter as __le16, pass the
        parameter as u16 and  use cpu_to_le16 at appropriate places.
      Signed-off-by: default avatarRajesh Borundia <rajesh.borundia@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f80bc8fe
    • David S. Miller's avatar
      Merge branch 'netlink-mmap' · 42bbcb78
      David S. Miller authored
      Patrick McHardy says:
      
      ====================
      The following patches contain an implementation of memory mapped I/O for
      netlink. The implementation is modelled after AF_PACKET memory mapped I/O
      with a few differences:
      
      - In order to perform memory mapped I/O to userspace, the kernel allocates
        skbs with the data area pointing to the data area of the mapped frames.
        All netlink subsystems assume a linear data area, so for the sake of
        simplicity, the mapped data area is not attached to the paged area but
        to skb->data. This requires introduction of a special skb alloction
        function that just allocates an skb head without the data area. Since this
        is a quite rare use case, I introduced a new function based on __alloc_skb
        instead of splitting it up into head and data alloction. The alternative
        would be to   introduce an __alloc_skb_head and __alloc_skb_data function,
        which would actually be useful for a specific error case in memory mapped
        netlink, but would require a couple of extra instructions for the common
        skb allocation case, so it doesn't really seem worth it.
      
        In order to get the destination memory area for skb->data before message
        construction, memory mapped netlink I/O needs to look up the destination
        socket during allocation instead of during transmission because the
        ring is owned by the receiveing socket/process. A special skb allocation
        function (netlink_alloc_skb) taking the destination pid as an argument is
        used for this, all subsystems that want to support memory mapped I/O need
        to use this function, automatic fallback to the receive queue happens
        for unconverted subsystems. Dumps automatically use memory mapped I/O if
        the receiving socket has enabled it.
      
        The visible effect of looking up the destination socket during allocation
        instead of transmission is that message ordering in userspace might
        change in case allocation and transmission aren't performed atomically.
        This usually doesn't matter since most subsystems have a BKL-like lock
        like the rtnl mutex, to my knowledge the currently only existing case
        where it might matter is nfnetlink_queue combined with the recently
        introduced batched verdicts, but a) that subsystem already includes
        sequence numbers which allow userspace to reorder messages in case it
        cares to, also the reodering window is quite small and b) with memory
        mapped transmission batching can be performed in a subsystem indepandant
        manner.
      
      - AF_NETLINK contains flow control for database dumps, with regular I/O
        dump continuation are triggered based on the sockets receive queue space
        and by recvmsg() calls. Since with memory mapped I/O there are no
        recvmsg() calls under normal operation, this is done in netlink_poll(),
        under the assumption that userspace has processed all pending frames
        before invoking poll(), thus the ring is expected to have room for new
        messages. Dumps currently don't benefit as much as they could from
        memory mapped I/O because each single continuation requires a poll()
        call. A more agressive approach seems like a good idea to me, especially
        in case the socket is not subscribed to any multicast groups (IOW only
        receiving explicitly requested data).
      
      Besides that, the memory mapped netlink implementation extends the states
      defined by AF_PACKET between userspace and the kernel by a SKIP status, this
      is intended for the case that userspace wants to queue frames (specifically
      when using nfnetlink_queue, an IDS and stream reassembly, requested by
      Eric Leblond) for a longer period of time. The kernel skips over all frames
      marked with SKIP when looking or unused frames and only fails when not finding
      a free frame or when having skipped the entire ring.
      
      Also noteworthy is memory mapped sendmsg: the kernel performs validation
      of messages before accepting and processing them, in order to prevent
      userspace from changing the messages contents after validation, the
      kernel checks that the ring is only mapped once and the file descriptor
      is not shared (in order to avoid having userspace set up another mapping
      after the first mentioned check). If either of both is not true, the
      message copied to an allocated skb and processed as with regular I/O.
      I'd especially appreciate review of this part since I'm not really versed
      in memory, file and process management,
      
      The remaining interesting details are included in the changelogs of the
      individual patches and the documentation, so I won't repeat them here.
      
      As an example, nfnetlink_queue is convererted to support memory mapped
      I/O. Other subsystems that would probably benefit are nfnetlink_log,
      audit and maybe ISCSI, not sure.
      
      Following are some numbers collected by Florian Westphal based on a
      slightly older version, which included an experimental patch for the
      nfnetlink_queue ordering issue.
      
      ===
      
      Test hardware is a 12-core machine
      Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz
      ixgbe interfaces are used (i.e., multiqueue nics).
      irqs are distributed across the cpus.
      
      I've made several tests.
      
      The simple one consists of 3GBit UDP traffic, packets are 1500 bytes
      in size (i.e., no fragmentation), with a single nfqueue
      and the test client programs in libmnl examples directory.
      Packets are sent from one /24 net to another /24 net, i.e.
      there are a few hundred flows active at any given time.
      
      I've also tested with snort, but I disabled all rules.
      6Gbit UDP traffic is generated in the snort case, and
      6 nfqueues are used (i.e., 6 snorts run in parallel).
      
      I've tested with 3 different kernels, all based on 3.7.1.
      - 3.7.1, without the mmap patches
      - 3.7.1, with Patricks mmap patches
      - 3.7.1, with mmap patches and extended spinlock to ensure packet ids are
        monotonically increasing and cannot be re-ordered.  This is what we
        currently ship in our product.
      
        [ the spinlock that is extended is the per nfqueue spinlock, it will
          be held from the time the netlink skb is allocated until the netlink
          skb is sent to userspace:
      
          http://1984.lsi.us.es/git/nf-next/commit/?h=mmap-netlink3&id=b8eb19c46650fef4e9e4fe53f367f99bbf72afc9
        ]
      
      snort is normally used in "batch mode", i.e., after processing 25 packets
      a single "batch verdict" is sent to accept the packets seen so far.
      "mmap snort" means RX_RING + sendmsg(), i.e. TX_RING is not used at this
      time (except where noted below).
      
      One reason is that snort has a reload thread, so kernel needs to copy;
      also in the snort case no payload rewrite takes place, so compared
      to the rx path the tx path is cheap.
      
      Results:
      
      3.7.1, without mmap patches, i.e. recv()+sendmsg() for everyone
      nfq-queue:           1.7 gbit out
      snort-recv-batch-25  5.1 gbit out
      snort-recv-no-batch  3.1 gbit out
      
      3.7.1 + mmap + without extended spinlocked section
      nfq-queue:           1.7 gbit out (recv/sendmsg)
      nfq-queue-mmap:      2.4 gbit out
      snort-mmap-batch-25	 5.6 gbit out  (warning: since ids can be
                                              re-ordered, this version is "broken").
      snort-recv-batch-25	 5.1 gbit out
      snort-mmap-no-batch	 4.6 gbit out (i.e., one verdict per packet)
      
      Kernel 3.7.1 + mmap + extended spinlock section:
      nfq-queue:	1.4 gbit out
      nfq-queue-mmap: 2.3 gbit out
      snort:          5.6 gbit out
      
      Conclusions:
      - The "extended spinlocked section" hurts performance in the
        single queue case; with 6 snorts there is no measureable slowdown.
      - I tried to re-write the mmap-snort to work without batch verdicts, but
        results were not very encouraging:
      
      kernel 3.7.1 + mmap (without extended spinlocked section):
      
      snort-mmap-batch-25      5.6 gbit out (what we currenlty ship)
      snort-recv-batch-25      5.1 gbit out (without using mmap)
      snort-mmap-batch-1       4.6 gbit out (with mmap but without batch verdicts)
      snort-mmap-txring-25     5.2 gbit out (with mmap but without batch verdicts)
      snort-mmap-txring-1      4.6 gbit out (with mmap but without batch verdicts)
      
      The difference between the last two is that in the txring-25 case, we
      put a verdict into the tx ring after every packet, but will only
      invoke sendmsg(, NULL, 0) after processing 25 packets.  So the only
      difference is the number of sendmsg calls/context switches.
      
      So, i.o.w, kernel 3.7.1 + mmap + the extra locking crap is faster
      than 3.7.1 + mmap-without-extra-locking and single-verdict-per packet.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      42bbcb78
    • Patrick McHardy's avatar
    • Patrick McHardy's avatar
      netfilter: rename netlink related "pid" variables to "portid" · ec464e5d
      Patrick McHardy authored
      Get rid of the confusing mix of pid and portid and use portid consistently
      for all netlink related socket identities.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ec464e5d
    • Patrick McHardy's avatar
    • Patrick McHardy's avatar
      netlink: add RX/TX-ring support to netlink diag · 4ae9fbee
      Patrick McHardy authored
      Based on AF_PACKET.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ae9fbee
    • Patrick McHardy's avatar
      netlink: add flow control for memory mapped I/O · cd1df525
      Patrick McHardy authored
      Add flow control for memory mapped RX. Since user-space usually doesn't
      invoke recvmsg() when using memory mapped I/O, flow control is performed
      in netlink_poll(). Dumps are allowed to continue if at least half of the
      ring frames are unused.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cd1df525
    • Patrick McHardy's avatar
      netlink: implement memory mapped recvmsg() · f9c22888
      Patrick McHardy authored
      Add support for mmap'ed recvmsg(). To allow the kernel to construct messages
      into the mapped area, a dataless skb is allocated and the data pointer is
      set to point into the ring frame. This means frames will be delivered to
      userspace in order of allocation instead of order of transmission. This
      usually doesn't matter since the order is either not determinable by
      userspace or message creation/transmission is serialized. The only case
      where this can have a visible difference is nfnetlink_queue. Userspace
      can't assume mmap'ed messages have ordered IDs anymore and needs to check
      this if using batched verdicts.
      
      For non-mapped sockets, nothing changes.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9c22888
    • Patrick McHardy's avatar
      netlink: implement memory mapped sendmsg() · 5fd96123
      Patrick McHardy authored
      Add support for mmap'ed sendmsg() to netlink. Since the kernel validates
      received messages before processing them, the code makes sure userspace
      can't modify the message contents after invoking sendmsg(). To do that
      only a single mapping of the TX ring is allowed to exist and the socket
      must not be shared. If either of these two conditions does not hold, it
      falls back to copying.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5fd96123
    • Patrick McHardy's avatar
      netlink: add mmap'ed netlink helper functions · 9652e931
      Patrick McHardy authored
      Add helper functions for looking up mmap'ed frame headers, reading and
      writing their status, allocating skbs with mmap'ed data areas and a poll
      function.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9652e931
    • Patrick McHardy's avatar
      netlink: mmaped netlink: ring setup · ccdfcc39
      Patrick McHardy authored
      Add support for mmap'ed RX and TX ring setup and teardown based on the
      af_packet.c code. The following patches will use this to add the real
      mmap'ed receive and transmit functionality.
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccdfcc39
    • Patrick McHardy's avatar
      netlink: add netlink_skb_set_owner_r() · cf0a018a
      Patrick McHardy authored
      For mmap'ed I/O a netlink specific skb destructor needs to be invoked
      after the final kfree_skb() to clean up state. This doesn't work currently
      since the skb's ownership is transfered to the receiving socket using
      skb_set_owner_r(), which orphans the skb, thereby invoking the destructor
      prematurely.
      
      Since netlink doesn't account skbs to the originating socket, there's no
      need to orphan the skb. Add a netlink specific skb_set_owner_r() variant
      that does not orphan the skb and use a netlink specific destructor to
      call sock_rfree().
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf0a018a