1. 16 Jan, 2014 23 commits
    • Arnaud Ebalard's avatar
      net: mvneta: mvneta_tx_done_gbe() cleanups · 0713a86a
      Arnaud Ebalard authored
      mvneta_tx_done_gbe() return value and third parameter are no more
      used. This patch changes the function prototype and removes a useless
      variable where the function is called.
      Reviewed-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0713a86a
    • willy tarreau's avatar
      net: mvneta: implement rx_copybreak · f19fadfc
      willy tarreau authored
      calling dma_map_single()/dma_unmap_single() is quite expensive compared
      to copying a small packet. So let's copy short frames and keep the buffers
      mapped. We set the limit to 256 bytes which seems to give good results both
      on the XP-GP board and on the AX3/4.
      
      The Rx small packet rate increased by 16.4% doing this, from 486kpps to
      573kpps. It is worth noting that even the call to the function
      dma_sync_single_range_for_cpu() is expensive (300 ns) although less
      than dma_unmap_single(). Without it, the packet rate raises to 711kpps
      (+24% more). Thus on systems where coherency from device to CPU is
      guaranteed by a snoop control unit, this patch should provide even more
      gains, and probably rx_copybreak could be increased.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f19fadfc
    • willy tarreau's avatar
      net: mvneta: convert to build_skb() · 8ec2cd48
      willy tarreau authored
      Make use of build_skb() to allocate frags on the RX path. When frag size
      is lower than a page size, we can use netdev_alloc_frag(), and we fall back
      to kmalloc() for larger sizes. The frag size is stored into the mvneta_port
      struct. The alloc/free functions check the frag size to decide what alloc/
      free method to use. MTU changes are safe because the MTU change function
      stops the device and clears the queues before applying the change.
      
      With this patch, I observed a reproducible 2% performance improvement on
      HTTP-based benchmarks, and 5% on small packet RX rate.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ec2cd48
    • willy tarreau's avatar
      net: mvneta: prefetch next rx descriptor instead of current one · 34e4179d
      willy tarreau authored
      Currently, the mvneta driver tries to prefetch the current Rx
      descriptor during read. Tests have shown that prefetching the
      next one instead increases general performance by about 1% on
      HTTP traffic.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      34e4179d
    • willy tarreau's avatar
      net: mvneta: simplify access to the rx descriptor status · 5428213c
      willy tarreau authored
      At several places, we already know the value of the rx status but
      we call functions which dereference the pointer again to get it
      and don't need the descriptor for anything else. Simplify this
      task by replacing the rx desc pointer by the status word itself.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5428213c
    • willy tarreau's avatar
      net: mvneta: factor rx refilling code · a1a65ab1
      willy tarreau authored
      Make mvneta_rxq_fill() use mvneta_rx_refill() instead of using
      duplicate code.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1a65ab1
    • willy tarreau's avatar
      net: mvneta: remove tests for impossible cases in the tx_done path · 6c498974
      willy tarreau authored
      Currently, mvneta_txq_bufs_free() calls mvneta_tx_done_policy() with
      a non-null cause to retrieve the pointer to the next queue to process.
      There are useless tests on the return queue number and on the pointer,
      all of which are well defined within a known limited set. This code
      path is fast, although not critical. Removing 3 tests here that the
      compiler could not optimize (verified) is always desirable.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c498974
    • willy tarreau's avatar
      net: mvneta: replace Tx timer with a real interrupt · 71f6d1b3
      willy tarreau authored
      Right now the mvneta driver doesn't handle Tx IRQ, and relies on two
      mechanisms to flush Tx descriptors : a flush at the end of mvneta_tx()
      and a timer. If a burst of packets is emitted faster than the device
      can send them, then the queue is stopped until next wake-up of the
      timer 10ms later. This causes jerky output traffic with bursts and
      pauses, making it difficult to reach line rate with very few streams.
      
      A test on UDP traffic shows that it's not possible to go beyond 134
      Mbps / 12 kpps of outgoing traffic with 1500-bytes IP packets. Routed
      traffic tends to observe pauses as well if the traffic is bursty,
      making it even burstier after the wake-up.
      
      It seems that this feature was inherited from the original driver but
      nothing there mentions any reason for not using the interrupt instead,
      which the chip supports.
      
      Thus, this patch enables Tx interrupts and removes the timer. It does
      the two at once because it's not really possible to make the two
      mechanisms coexist, so a split patch doesn't make sense.
      
      First tests performed on a Mirabox (Armada 370) show that less CPU
      seems to be used when sending traffic. One reason might be that we now
      call the mvneta_tx_done_gbe() with a mask indicating which queues have
      been done instead of looping over all of them.
      
      The same UDP test above now happily reaches 987 Mbps / 87.7 kpps.
      Single-stream TCP traffic can now more easily reach line rate. HTTP
      transfers of 1 MB objects over a single connection went from 730 to
      840 Mbps. It is even possible to go significantly higher (>900 Mbps)
      by tweaking tcp_tso_win_divisor.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Cc: Arnaud Ebalard <arno@natisbad.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      71f6d1b3
    • willy tarreau's avatar
      net: mvneta: add missing bit descriptions for interrupt masks and causes · 40ba35e7
      willy tarreau authored
      Marvell has not published the chip's datasheet yet, so it's very hard
      to find the relevant bits to manipulate to change the IRQ behaviour.
      Fortunately, these bits are described in the proprietary LSP patch set
      which is publicly available here :
      
          http://www.plugcomputer.org/downloads/mirabox/
      
      So let's put them back in the driver in order to reduce the burden of
      current and future maintenance.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      40ba35e7
    • willy tarreau's avatar
      net: mvneta: do not schedule in mvneta_tx_timeout · 29021366
      willy tarreau authored
      If a queue timeout is reported, we can oops because of some
      schedules while the caller is atomic, as shown below :
      
        mvneta d0070000.ethernet eth0: tx timeout
        BUG: scheduling while atomic: bash/1528/0x00000100
        Modules linked in: slhttp_ethdiv(C) [last unloaded: slhttp_ethdiv]
        CPU: 2 PID: 1528 Comm: bash Tainted: G        WC   3.13.0-rc4-mvebu-nf #180
        [<c0011bd9>] (unwind_backtrace+0x1/0x98) from [<c000f1ab>] (show_stack+0xb/0xc)
        [<c000f1ab>] (show_stack+0xb/0xc) from [<c02ad323>] (dump_stack+0x4f/0x64)
        [<c02ad323>] (dump_stack+0x4f/0x64) from [<c02abe67>] (__schedule_bug+0x37/0x4c)
        [<c02abe67>] (__schedule_bug+0x37/0x4c) from [<c02ae261>] (__schedule+0x325/0x3ec)
        [<c02ae261>] (__schedule+0x325/0x3ec) from [<c02adb97>] (schedule_timeout+0xb7/0x118)
        [<c02adb97>] (schedule_timeout+0xb7/0x118) from [<c0020a67>] (msleep+0xf/0x14)
        [<c0020a67>] (msleep+0xf/0x14) from [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194)
        [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194) from [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24)
        [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24) from [<c024afc7>] (dev_watchdog+0x18b/0x1c4)
        [<c024afc7>] (dev_watchdog+0x18b/0x1c4) from [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c)
        [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c) from [<c0020cad>] (run_timer_softirq+0x115/0x170)
        [<c0020cad>] (run_timer_softirq+0x115/0x170) from [<c001ccb9>] (__do_softirq+0xbd/0x1a8)
        [<c001ccb9>] (__do_softirq+0xbd/0x1a8) from [<c001cfad>] (irq_exit+0x61/0x98)
        [<c001cfad>] (irq_exit+0x61/0x98) from [<c000d4bf>] (handle_IRQ+0x27/0x60)
        [<c000d4bf>] (handle_IRQ+0x27/0x60) from [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8)
        [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8) from [<c000fba9>] (__irq_usr+0x49/0x60)
      
      Ben Hutchings attempted to propose a better fix consisting in using a
      scheduled work for this, but while it fixed this panic, it caused other
      random freezes and panics proving that the reset sequence in the driver
      is unreliable and that additional fixes should be investigated.
      
      When sending multiple streams over a link limited to 100 Mbps, Tx timeouts
      happen from time to time, and the driver correctly recovers only when the
      function is disabled.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29021366
    • willy tarreau's avatar
      net: mvneta: use per_cpu stats to fix an SMP lock up · 74c41b04
      willy tarreau authored
      Stats writers are mvneta_rx() and mvneta_tx(). They don't lock anything
      when they update the stats, and as a result, it randomly happens that
      the stats freeze on SMP if two updates happen during stats retrieval.
      This is very easily reproducible by starting two HTTP servers and binding
      each of them to a different CPU, then consulting /proc/net/dev in loops
      during transfers, the interface should immediately lock up. This issue
      also randomly happens upon link state changes during transfers, because
      the stats are collected in this situation, but it takes more attempts to
      reproduce it.
      
      The comments in netdevice.h suggest using per_cpu stats instead to get
      rid of this issue.
      
      This patch implements this. It merges both rx_stats and tx_stats into
      a single "stats" member with a single syncp. Both mvneta_rx() and
      mvneta_rx() now only update the a single CPU's counters.
      
      In turn, mvneta_get_stats64() does the summing by iterating over all CPUs
      to get their respective stats.
      
      With this change, stats are still correct and no more lockup is encountered.
      
      Note that this bug was present since the first import of the mvneta
      driver.  It might make sense to backport it to some stable trees. If
      so, it depends on "d33dc73 net: mvneta: increase the 64-bit rx/tx stats
      out of the hot path".
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      74c41b04
    • willy tarreau's avatar
      net: mvneta: increase the 64-bit rx/tx stats out of the hot path · dc4277dd
      willy tarreau authored
      Better count packets and bytes in the stack and on 32 bit then
      accumulate them at the end for once. This saves two memory writes
      and two memory barriers per packet. The incoming packet rate was
      increased by 4.7% on the Openblocks AX3 thanks to this.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Tested-by: default avatarArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc4277dd
    • Paul Gortmaker's avatar
      drivers/net: delete non-required instances of include <linux/init.h> · a81ab36b
      Paul Gortmaker authored
      None of these files are actually using any __init type directives
      and hence don't need to include <linux/init.h>.   Most are just a
      left over from __devinit and __cpuinit removal, or simply due to
      code getting copied from one driver to the next.
      
      This covers everything under drivers/net except for wireless, which
      has been submitted separately.
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a81ab36b
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables · 5ff1dd24
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      This small batch contains several Netfilter fixes for your net-next
      tree, more specifically:
      
      * Fix compilation warning in nft_ct in NF_CONNTRACK_MARK is not set,
        from Kristian Evensen.
      
      * Add dependency to IPV6 for NF_TABLES_INET. This one has been reported
        by the several robots that are testing .config combinations, from Paul
        Gortmaker.
      
      * Fix default base chain policy setting in nf_tables, from myself.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5ff1dd24
    • Jiri Pirko's avatar
      neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions. · 89740ca7
      Jiri Pirko authored
      When ndo_neigh_setup is called, the bitfield used by NEIGH_VAR_SET is
      not initialized yet. This might cause confusion for the people who use
      NEIGH_VAR_SET in ndo_neigh_setup. So rather introduce NEIGH_VAR_INIT for
      usage in ndo_neigh_setup.
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89740ca7
    • David S. Miller's avatar
      Merge branch 'ixgbe' · d6e26404
      David S. Miller authored
      Aaron Brown says:
      
      ====================
      Intel Wired LAN Driver Updates
      
      This series contains several updates from Alex to ixgbe.
      
      To avoid head of line blocking in the event a VF stops cleaning Rx descriptors
      he makes sure QDE bits are set for a VF before the Rx queues are enabled.
      
      To avoid a situation where the head write-back registers can remain set ofter
      the driver is unloaded he clears them on a VF reset.
      
      Alexander Duyck (2):
        ixgbe: Force QDE via PFQDE for VFs during reset
        ixgbe: Clear head write-back registers on VF reset
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6e26404
    • Alexander Duyck's avatar
      ixgbe: Clear head write-back registers on VF reset · dbf231af
      Alexander Duyck authored
      The Tx head write-back registers are not cleared during an FLR or VF reset.
      As a result a configuration that had head write-back enabled can leave the
      registers set after the driver is unloaded.  If the next driver loaded doesn't
      use the write-back registers this can lead to a bad configuration where
      head write-back is enabled, but the driver didn't request it.
      
      To avoid this situation the PF should be resetting the Tx head write-back
      registers when the VF requests a reset.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      Signed-off-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbf231af
    • Alexander Duyck's avatar
      ixgbe: Force QDE via PFQDE for VFs during reset · 87397379
      Alexander Duyck authored
      This change makes it so that the QDE bits are set for a VF before the Rx
      queues are enabled.  As such we avoid head of line blocking in the event
      that the VF stops cleaning Rx descriptors for whatever reason.
      Signed-off-by: default avatarAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: default avatarPhil Schmitt <phillip.j.schmitt@intel.com>
      
       drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c |   14 ++++++++++++++
       drivers/net/ethernet/intel/ixgbe/ixgbe_type.h  |    7 ++++---
       2 files changed, 18 insertions(+), 3 deletions(-)
      Signed-off-by: default avatarAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87397379
    • David S. Miller's avatar
      Merge branch 'noprefixroute' · e5d64023
      David S. Miller authored
      Thomas Haller says:
      
      ====================
      ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes
      
      v1 -> v2: add a second commit, handling NOPREFIXROUTE in ip6_del_addr.
      v2 -> v3: reword commit messages, code comments and some refactoring.
      v3 -> v4: refactor, rename variables, add enum
      v4 -> v5: rebase, so that patch applies cleanly to current net-next/master
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e5d64023
    • Thomas Haller's avatar
      ipv6 addrconf: don't cleanup prefix route for IFA_F_NOPREFIXROUTE · 5b84efec
      Thomas Haller authored
      Refactor the deletion/update of prefix routes when removing an
      address. Now also consider IFA_F_NOPREFIXROUTE and if there is an address
      present with this flag, to not cleanup the route. Instead, assume
      that userspace is taking care of this route.
      
      Also perform the same cleanup, when userspace changes an existing address
      to add NOPREFIXROUTE (to an address that didn't have this flag). This is
      done because when the address was added, a prefix route was created for it.
      Since the user now wants to handle this route by himself, we cleanup this
      route.
      
      This cleanup of the route is not totally robust. There is no guarantee,
      that the route we are about to delete was really the one added by the
      kernel. This behavior does not change by the patch, and in practice it
      should work just fine.
      Signed-off-by: default avatarThomas Haller <thaller@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5b84efec
    • Thomas Haller's avatar
      ipv6 addrconf: add IFA_F_NOPREFIXROUTE flag to suppress creation of IP6 routes · 761aac73
      Thomas Haller authored
      When adding/modifying an IPv6 address, the userspace application needs
      a way to suppress adding a prefix route. This is for example relevant
      together with IFA_F_MANAGERTEMPADDR, where userspace creates autoconf
      generated addresses, but depending on on-link, no route for the
      prefix should be added.
      Signed-off-by: default avatarThomas Haller <thaller@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      761aac73
    • David S. Miller's avatar
      Revert "batman-adv: drop dependency against CRC16" · 6631c5ce
      David S. Miller authored
      This reverts commit 12afc36e.
      
      The dependency is actually still necessary.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6631c5ce
    • wangweidong's avatar
      sctp: create helper function to enable|disable sackdelay · 0ea5e4df
      wangweidong authored
      add sctp_spp_sackdelay_{enable|disable} helper function for
      avoiding code duplication.
      Signed-off-by: default avatarWang Weidong <wangweidong1@huawei.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ea5e4df
  2. 15 Jan, 2014 17 commits