1. 05 Jun, 2014 25 commits
  2. 04 Jun, 2014 15 commits
    • David S. Miller's avatar
      Merge branch 'bonding-macvlan' · 6579867c
      David S. Miller authored
      Vlad Yasevich says:
      
      ====================
      Fix support for macvlan devices on top bonding
      
      Currently, macvlan devices do not work well over bond interfaces.
      Everything works well, untill a failover is triggered in the bond
      device and then macvlan becomes unreachble untill arp entries
      are flushed.   This series adds needed functionality to
      handle correct notifications and update switches with mac addresses
      assigned to macvlans.
      
      The first patch simply addes IFF_UNICAST_FLT flag to bonds since they
      already correctly manage the unicast filter list of the slaves, so
      we might as well prevent the bond from needlessly going into promiscuous
      mode.
      
      The second patch adds notifier handler to macvlan to trigger correct
      ARP notifications.
      
      The third patch adds handling for TLB and RLB modes that use special
      ETH_P_LOOPBACK type packets to teach switch about mac addresses.
      It also allow ARPs for the macvlan mac addresses to be handled by
      RLB mode.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6579867c
    • Vlad Yasevich's avatar
      bonding: Support macvlans on top of tlb/rlb mode bonds · 14af9963
      Vlad Yasevich authored
      To make TLB mode work, the patch allows learning packets
      to be sent using mac addresses assigned to macvlan devices,
      also taking into an account vlans that may be between the
      bond and macvlan device.
      
      To make RLB work, all we have to do is accept ARP packets
      for addresses added to the bond dev->uc list.  Since RLB
      mode will take care to update the peers directly with
      correct mac addresses, learning packets for these addresses
      do not have be send to switch.
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14af9963
    • Vlad Yasevich's avatar
      macvlan: Support bonding events · 4c991255
      Vlad Yasevich authored
      Bonding and team drivers generate specific events during failover
      that trigger switch updates.  When a macvlan device is configured
      on top of bonding, we want switches to learn about the macvlan
      devices as well.   This patch adds a handler to macvlan driver to
      propagate these events to all macvlan devices.  We let the generic
      inetdev event handler do the work.
      
      This allows macvlan to operated correctly over active-backup
      mode bond.
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c991255
    • Vlad Yasevich's avatar
      bonding: Turn on IFF_UNICAST_FLT on bond devices · c565b488
      Vlad Yasevich authored
      Bonding devices manage the unicast filters of the underlying
      interfaces, but do not turn on IFF_UNICAST_FLT flag.  Thus
      anytime a unicast address is added to the bond, the bond is
      places in promiscuous mode.
      
      Turn on IFF_UNICAST_FLT on the bond device so that the bond does
      not go into promiscuous mode needlesly.  If an underlying device
      does not support unicast filtering, that device will automaticall
      enter promiscuous mode already.
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c565b488
    • Sasha Levin's avatar
      net: Revert "fib_trie: use seq_file_net rather than seq->private" · f830b022
      Sasha Levin authored
      This reverts commit 30f38d2f.
      
      fib_triestat is surrounded by a big lie: while it claims that it's a
      seq_file (fib_triestat_seq_open, fib_triestat_seq_show), it isn't:
      
      	static const struct file_operations fib_triestat_fops = {
      	        .owner  = THIS_MODULE,
      	        .open   = fib_triestat_seq_open,
      	        .read   = seq_read,
      	        .llseek = seq_lseek,
      	        .release = single_release_net,
      	};
      
      Yes, fib_triestat is just a regular file.
      
      A small detail (assuming CONFIG_NET_NS=y) is that while for seq_files
      you could do seq_file_net() to get the net ptr, doing so for a regular
      file would be wrong and would dereference an invalid pointer.
      
      The fib_triestat lie claimed a victim, and trying to show the file would
      be bad for the kernel. This patch just reverts the issue and fixes
      fib_triestat, which still needs a rewrite to either be a seq_file or
      stop claiming it is.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f830b022
    • Antonio Ospite's avatar
      trivial: drivers/net/ethernet/nvidia/forcedeth.c: fix typo s/SUBSTRACT1/SUBTRACT1/ · cef33c81
      Antonio Ospite authored
      Signed-off-by: default avatarAntonio Ospite <ao2@ao2.it>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexander Gordeev <agordeev@redhat.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cef33c81
    • Xiubo Li's avatar
      gianfar: Fix the section mismatch warnings. · 898157ed
      Xiubo Li authored
      Building with CONFIG_DEBUG_SECTION_MISMATCH enabled, the following
      WARNING is occured:
      
        LD      drivers/net/built-in.o
      WARNING: drivers/net/built-in.o(.text+0xcd4c): Section mismatch in
      reference from the function gfar_probe() to the function
      .init.text:gfar_init_addr_hash_table()
      The function gfar_probe() references
      the function __init gfar_init_addr_hash_table().
      This is often because gfar_probe lacks a __init
      annotation or the annotation of gfar_init_addr_hash_table is wrong.
      Signed-off-by: default avatarXiubo Li <Li.Xiubo@freescale.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      898157ed
    • David S. Miller's avatar
      Merge branch 'xen-netback-netfront-multiqueue' · 9ab89acc
      David S. Miller authored
      Wei Liu says:
      
      ====================
      This is rebased version of Andrew's V8 patch series. The original cover letter:
      
      --------------------
      xen-net{back,	front}: Multiple transmit and receive queues
      
      This patch series implements multiple transmit and receive queues (i.e.
      multiple shared rings) for the xen virtual network interfaces.
      
      The series is split up as follows:
       - Patch 1 brings the 'grant_copy_op' array back into struct xenvif, in
         preparation for multi-queue support. See the patch itself for more details.
      - Patches 2 and 4 factor out the queue-specific data for netback and
        netfront respectively, and modify the rest of the code to use these
        as appropriate.
      - Patches 3 and 5 introduce new XenStore keys to negotiate and use
        multiple shared rings and event channels, and code to connect these
        as appropriate.
      - Patch 6 documents the XenStore keys required for the new feature
        in include/xen/interface/io/netif.h
      
      All other transmit and receive processing remains unchanged, i.e. there
      is a kthread per queue and a NAPI context per queue.
      
      The performance of these patches has been analysed in detail, with
      results available at:
      
      http://wiki.xenproject.org/wiki/Xen-netback_and_xen-netfront_multi-queue_performance_testing
      
      To summarise:
        * Using multiple queues allows a VM to transmit at line rate on a 10
          Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s
          with a single queue.
        * For intra-host VM--VM traffic, eight queues provide 171% of the
          throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s.
        * There is a corresponding increase in total CPU usage, i.e. this is a
          scaling out over available resources, not an efficiency improvement.
        * Results depend on the availability of sufficient CPUs, as well as the
          distribution of interrupts and the distribution of TCP streams across
          the queues.
      
      Queue selection is currently achieved via an L4 hash on the packet (i.e.
      TCP src/dst port, IP src/dst address) and is not negotiated between the
      frontend and backend, since only one option exists. Future patches to
      support other frontends (particularly Windows) will need to add some
      capability to negotiate not only the hash algorithm selection, but also
      allow the frontend to specify some parameters to this.
      
      Note that queue selection is a decision by the transmitting system about
      which queue to use for a particular packet. In general, the algorithm
      may differ between the frontend and the backend with no adverse effects.
      
      Queue-specific XenStore entries for ring references and event channels
      are stored hierarchically, i.e. under .../queue-N/... where N varies
      from 0 to one less than the requested number of queues (inclusive). If
      only one queue is requested, it falls back to the flat structure where
      the ring references and event channels are written at the same level as
      other vif information.
      
      V8:
      - Squash the queue error handling code into patch 3.
      - Update the documentation (patch 6) according to comments on the
        equivalent patch to Xen.
      
      V7:
      - Rebase on latest net-next, which includes the netback grant mapping
        patch series from Zoltan Kiss
      - Reduce QUEUE_NAME_SIZE by 1 to avoid double-counting the trailing '\0'
      - Simplify the queue hashing by using (hash % num_queues) instead of
        multiply & shift.
      - Add ratelimited warning for invalid queue selection.
      - Fix error handling to correctly tear down already setup queues.
      - Use dev->real_num_tx_queues instead of separately maintaining a
        count of the number of queues.
      
      V6:
      - Use 'max_queues' as the module param. name for both netback and netfront.
      
      V5:
      - Fix bug in xenvif_free() that could lead to an attempt to transmit an
        skb after the queue structures had been freed.
      - Improve the XenStore protocol documentation in netif.h.
      - Fix IRQ_NAME_SIZE double-accounting for null terminator.
      - Move rx_gso_checksum_fixup stat into struct xenvif_stats (per-queue).
      - Don't initialise a local variable that is set in both branches (xspath).
      
      V4:
      - Add MODULE_PARM_DESC() for the multi-queue parameters for netback
        and netfront modules.
      - Move del_timer_sync() in netfront to after unregister_netdev, which
        restores the order in which these functions were called before applying
        these patches.
      
      V3:
      - Further indentation and style fixups.
      
      V2:
      - Rebase onto net-next.
      - Change queue->number to queue->id.
      - Add atomic operations around the small number of stats variables that
        are not queue-specific or per-cpu.
      - Fixup formatting and style issues.
      - XenStore protocol changes documented in netif.h.
      - Default max. number of queues to num_online_cpus().
      - Check requested number of queues does not exceed maximum.
      --------------------
      
      I rebased this on top of net-next. No functional change is introduced.  The
      patch that needed some extra care was "xen-netback: Factor queue-specific data
      into queue struct" because it clashed with a fix introduced in net. A simple
      test of creating guest, iperf, then shutting down guest worked as expected.
      
      The last patch fixes a minor problem that queue name is not initialised in
      xen-netfront, resulting in names like "-tx" "-rx" in /proc/interrupt.
      
      Changes since v9 (no functional change introduced):
      * include commit summary in the commit message of first patch
      * fold David Vrabel's Reviewed-by into last patch
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ab89acc
    • Wei Liu's avatar
    • Andrew J. Bennieston's avatar
      xen-net{back, front}: Document multi-queue feature in netif.h · a2deb8b1
      Andrew J. Bennieston authored
      Document the multi-queue feature in terms of XenStore keys to be written
      by the backend and by the frontend.
      Signed-off-by: default avatarAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Acked-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2deb8b1
    • Andrew J. Bennieston's avatar
      xen-netfront: Add support for multiple queues · 50ee6061
      Andrew J. Bennieston authored
      Build on the refactoring of the previous patch to implement multiple
      queues between xen-netfront and xen-netback.
      
      Check XenStore for multi-queue support, and set up the rings and event
      channels accordingly.
      
      Write ring references and event channels to XenStore in a queue
      hierarchy if appropriate, or flat when using only one queue.
      
      Update the xennet_select_queue() function to choose the queue on which
      to transmit a packet based on the skb hash result.
      Signed-off-by: default avatarAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50ee6061
    • Andrew J. Bennieston's avatar
      xen-netfront: Factor queue-specific data into queue struct. · 2688fcb7
      Andrew J. Bennieston authored
      In preparation for multi-queue support in xen-netfront, move the
      queue-specific data from struct netfront_info to struct netfront_queue,
      and update the rest of the code to use this.
      
      Also adds loops over queues where appropriate, even though only one is
      configured at this point, and uses alloc_etherdev_mq() and the
      corresponding multi-queue netif wake/start/stop functions in preparation
      for multiple active queues.
      
      Finally, implements a trivial queue selection function suitable for
      ndo_select_queue, which simply returns 0, selecting the first (and
      only) queue.
      Signed-off-by: default avatarAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2688fcb7
    • Andrew J. Bennieston's avatar
      xen-netback: Add support for multiple queues · 8d3d53b3
      Andrew J. Bennieston authored
      Builds on the refactoring of the previous patch to implement multiple
      queues between xen-netfront and xen-netback.
      
      Writes the maximum supported number of queues into XenStore, and reads
      the values written by the frontend to determine how many queues to use.
      
      Ring references and event channels are read from XenStore on a per-queue
      basis and rings are connected accordingly.
      
      Also adds code to handle the cleanup of any already initialised queues
      if the initialisation of a subsequent queue fails.
      Signed-off-by: default avatarAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8d3d53b3
    • Wei Liu's avatar
      xen-netback: Factor queue-specific data into queue struct · e9ce7cb6
      Wei Liu authored
      In preparation for multi-queue support in xen-netback, move the
      queue-specific data from struct xenvif into struct xenvif_queue, and
      update the rest of the code to use this.
      
      Also adds loops over queues where appropriate, even though only one is
      configured at this point, and uses alloc_netdev_mq() and the
      corresponding multi-queue netif wake/start/stop functions in preparation
      for multiple active queues.
      
      Finally, implements a trivial queue selection function suitable for
      ndo_select_queue, which simply returns 0 for a single queue and uses
      skb_get_hash() to compute the queue index otherwise.
      Signed-off-by: default avatarAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Signed-off-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e9ce7cb6
    • Andrew J. Bennieston's avatar
      xen-netback: Move grant_copy_op array back into struct xenvif. · a55d9766
      Andrew J. Bennieston authored
      This array was allocated separately in commit ac3d5ac2 ("xen-netback:
      fix guest-receive-side array sizes") due to it being very large, and a
      struct xenvif is allocated as the netdev_priv part of a struct
      net_device, i.e. via kmalloc() but falling back to vmalloc() if the
      initial alloc. fails.
      
      In preparation for the multi-queue patches, where this array becomes
      part of struct xenvif_queue and is always allocated through vzalloc(),
      move this back into the struct xenvif.
      Signed-off-by: default avatarAndrew J. Bennieston <andrew.bennieston@citrix.com>
      Acked-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a55d9766