1. 17 Mar, 2017 13 commits
    • Soheil Hassas Yeganeh's avatar
      tcp: remove tcp_tw_recycle · 4396e461
      Soheil Hassas Yeganeh authored
      The tcp_tw_recycle was already broken for connections
      behind NAT, since the per-destination timestamp is not
      monotonically increasing for multiple machines behind
      a single destination address.
      
      After the randomization of TCP timestamp offsets
      in commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets
      for each connection), the tcp_tw_recycle is broken for all
      types of connections for the same reason: the timestamps
      received from a single machine is not monotonically increasing,
      anymore.
      
      Remove tcp_tw_recycle, since it is not functional. Also, remove
      the PAWSPassive SNMP counter since it is only used for
      tcp_tw_recycle, and simplify tcp_v4_route_req and tcp_v6_route_req
      since the strict argument is only set when tcp_tw_recycle is
      enabled.
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Cc: Lutz Vieweg <lvml@5t9.de>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4396e461
    • Soheil Hassas Yeganeh's avatar
      tcp: remove per-destination timestamp cache · d82bae12
      Soheil Hassas Yeganeh authored
      Commit 8a5bd45f6616 (tcp: randomize tcp timestamp offsets for each connection)
      randomizes TCP timestamps per connection. After this commit,
      there is no guarantee that the timestamps received from the
      same destination are monotonically increasing. As a result,
      the per-destination timestamp cache in TCP metrics (i.e., tcpm_ts
      in struct tcp_metrics_block) is broken and cannot be relied upon.
      
      Remove the per-destination timestamp cache and all related code
      paths.
      
      Note that this cache was already broken for caching timestamps of
      multiple machines behind a NAT sharing the same address.
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Cc: Lutz Vieweg <lvml@5t9.de>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d82bae12
    • David S. Miller's avatar
      Merge branch 'sunvnet-better-connection-management' · 8b705f52
      David S. Miller authored
      Shannon Nelson says:
      
      ====================
      sunvnet: better connection management
      
      These patches remove some problems in handling of carrier state
      with the ldmvsw vswitch, remove  an xoff misuse in sunvnet, and
      add stats for debug and tracking of point-to-point connections
      between the ldom VMs.
      
      v2:
       - added ldmvsw ndo_open to reset the LDC channel
       - updated copyrights
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b705f52
    • Shannon Nelson's avatar
      sunvnet: xoff not needed when removing port link · 9c5a3a1f
      Shannon Nelson authored
      The sunvnet netdev is connected to the controlling ldom's vswitch
      for network bridging.  However, for higher performance between ldoms,
      there also is a channel between each client ldom.  These connections are
      represented in the sunvnet driver by a queue for each ldom.  The driver
      uses select_queue to tell the stack which queue to use by tracking the mac
      addresses on the other end of each port.  When a connected ldom shuts down,
      the driver receives an LDC_EVENT_RESET and the port is removed from the
      driver, thus a queue with no ldom on the other end will never be selected
      for Tx.
      
      The driver was trying to reinforce the "don't use this queue" notion with
      netif_tx_stop_queue() and netif_tx_wake_queue(), which really should only
      be used to signal a Tx queue is full (aka XOFF).  This misuse of queue
      state resulted in NETDEV WATCHDOG messages and lots of unnecessary calls
      into the driver's tx_timeout handler.  Simply removing these takes care
      of the problem.
      
      Orabug: 25190537
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c5a3a1f
    • Shannon Nelson's avatar
      sunvnet: count multicast packets · b12a96f5
      Shannon Nelson authored
      Make sure multicast packets get counted in the device.
      
      Orabug: 25190537
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b12a96f5
    • Shannon Nelson's avatar
      sunvnet: track port queues correctly · e1f1e5f7
      Shannon Nelson authored
      Track our used and unused queue indexies correctly.  Otherwise, as ports
      dropped out and returned, they all eventually ended up with the same
      queue index.
      
      Orabug: 25190537
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e1f1e5f7
    • Shannon Nelson's avatar
      sunvnet: add stats to track ldom to ldom packets and bytes · 0f512c84
      Shannon Nelson authored
      In this driver, there is a "port" created for the connection to each of
      the other ldoms; a netdev queue is mapped to each port, and they are
      collected under a single netdev.  The generic netdev statistics show
      us all the traffic in and out of our network device, but don't show
      individual queue/port stats.  This patch breaks out the traffic counts
      for the individual ports and gives us a little view into the state of
      those connections.
      
      Orabug: 25190537
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f512c84
    • Shannon Nelson's avatar
      ldmvsw: better use of link up and down on ldom vswitch · 867fa150
      Shannon Nelson authored
      When an ldom VM is bound, the network vswitch infrastructure is set up for
      it, but was being forced 'UP' by the userland switch configuration script.
      When 'UP' but not actually connected to a running VM, the ipv6 neighbor
      probes fail (not a horrible thing) and start cluttering up the kernel logs.
      Funny thing: these are debug messages that never actually show up, but
      we do see the net_ratelimited messages that say N callbacks were
      suppressed.
      
      This patch defers the netif_carrier_on() until an actual link has been
      established with the VM, as indicated by receiving an LDC_EVENT_UP from
      the underlying LDC protocol.  Similarly, we take the link down when we
      see the LDC_EVENT_RESET.  Now when we see the ndo_open(), we reset the
      link to get things talking again.
      
      Orabug: 25525312
      Signed-off-by: default avatarShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      867fa150
    • Jarod Wilson's avatar
      bonding: add 802.3ad support for 25G speeds · 19ddde1e
      Jarod Wilson authored
      Cut-n-paste enablement of 802.3ad bonding on 25G NICs, which currently
      report 0 as their bandwidth.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: default avatarJarod Wilson <jarod@redhat.com>
      Acked-by: default avatarAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      19ddde1e
    • chun Long's avatar
      tcp_westwood: fix tcp_westwood_info() style mistakes · be7164cd
      chun Long authored
      replace comma to semi colons in tcp_westwood_info().
      Acked-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      be7164cd
    • Rick Farrington's avatar
      liquidio: use meaningful names for IRQs · 0c88a761
      Rick Farrington authored
      All IRQs owned by the PF and VF drivers share the same nondescript name
      "octeon"; this makes it difficult to setup interrupt affinity.
      
      Change the IRQ names to reflect their specific purpose:
      
          LiquidIO<id>-<func>-<type>-<queue pair num>
      
      Examples:
          LiquidIO0-pf0-rxtx-3
          LiquidIO1-vf1-rxtx-0
          LiquidIO0-pf0-aux
      
      We cannot use netdev->name for naming the IRQs because:
      
          1.  Early during init, the PF and VF drivers require interrupts to
              send/receive control data from the NIC firmware; so the PF and VF
              must request IRQs long before the netdev struct is registered.
      
          2.  The IRQ name can only be specified at the time it is requested.
              It cannot be changed after that.
      Signed-off-by: default avatarRick Farrington <ricardo.farrington@cavium.com>
      Signed-off-by: default avatarFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: default avatarSatanand Burla <satananda.burla@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c88a761
    • Rick Farrington's avatar
      liquidio: remove/replace invalid code · b229487b
      Rick Farrington authored
      Remove invalid call to dma_sync_single_for_cpu() because previous DMA
      allocation was coherent--not streaming.  Remove code that references fields
      in struct list_head; replace it with calls to list_empty() and
      list_first_entry().  Also, add comment to clarify complicated if statement.
      Signed-off-by: default avatarRick Farrington <ricardo.farrington@cavium.com>
      Signed-off-by: default avatarFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: default avatarDerek Chickles <derek.chickles@cavium.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b229487b
    • Nik Unger's avatar
      netem: apply correct delay when rate throttling · 5080f39e
      Nik Unger authored
      I recently reported on the netem list that iperf network benchmarks
      show unexpected results when a bandwidth throttling rate has been
      configured for netem. Specifically:
      
      1) The measured link bandwidth *increases* when a higher delay is added
      2) The measured link bandwidth appears higher than the specified limit
      3) The measured link bandwidth for the same very slow settings varies significantly across
        machines
      
      The issue can be reproduced by using tc to configure netem with a
      512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a
      veth pair between network namespaces, and then using iperf (or any
      other network benchmarking tool) to test throughput. Complete detailed
      instructions are in the original email chain here:
      https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html
      
      There appear to be two underlying bugs causing these effects:
      
      - The first issue causes long delays when the rate is slow and no
        delay is configured (e.g., "rate 512kbit"). This is because SKBs are
        not orphaned when no delay is configured, so orphaning does not
        occur until *after* the rate-induced delay has been applied. For
        this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us")
        dramatically increases the measured bandwidth.
      
      - The second issue is that rate-induced delays are not correctly
        applied, allowing SKB delays to occur in parallel. The indended
        approach is to compute the delay for an SKB and to add this delay to
        the end of the current queue. However, the code does not detect
        existing SKBs in the queue due to improperly testing sch->q.qlen,
        which is nonzero even when packets exist only in the
        rbtree. Consequently, new SKBs do not wait for the current queue to
        empty. When packet delays vary significantly (e.g., if packet sizes
        are different), then this also causes unintended reordering.
      
      I modified the code to expect a delay (and orphan the SKB) when a rate
      is configured. I also added some defensive tests that correctly find
      the latest scheduled delivery time, even if it is (unexpectedly) for a
      packet in sch->q. I have tested these changes on the latest kernel
      (4.11.0-rc1+) and the iperf / ping test results are as expected.
      Signed-off-by: default avatarNik Unger <njunger@uwaterloo.ca>
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5080f39e
  2. 16 Mar, 2017 26 commits
  3. 15 Mar, 2017 1 commit
    • David S. Miller's avatar
      Merge branch 'dsa-check-out-of-range-ageing-time' · 02cb24e9
      David S. Miller authored
      Vivien Didelot says:
      
      ====================
      net: dsa: check out-of-range ageing time
      
      The ageing time limits supported by DSA drivers vary depending on the
      switch model. If a driver returns -ERANGE for out-of-range values, the
      switchdev commit phase will fail with the following stacktrace:
      
          # brctl setageing br0 4
          [ 8530.082179] WARNING: CPU: 0 PID: 910 at net/switchdev/switchdev.c:291 switchdev_port_attr_set_now+0xbc/0xc0
          [ 8530.090679] br0: Commit of attribute (id=5) failed.
          [ 8530.094256] Modules linked in:
          [ 8530.096032] CPU: 0 PID: 910 Comm: kworker/0:4 Tainted: G        W       4.10.0 #361
          [ 8530.102412] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
          [ 8530.107571] Workqueue: events switchdev_deferred_process_work
          [ 8530.112039] Backtrace:
          [ 8530.113224] [<8010ca34>] (dump_backtrace) from [<8010cd3c>] (show_stack+0x20/0x24)
          [ 8530.119521]  r6:00000000 r5:80834da0 r4:80ca7e48 r3:8120ca3c
          [ 8530.123908] [<8010cd1c>] (show_stack) from [<8037ad40>] (dump_stack+0x24/0x28)
          [ 8530.129873] [<8037ad1c>] (dump_stack) from [<80118de4>] (__warn+0xf4/0x10c)
          [ 8530.135545] [<80118cf0>] (__warn) from [<80118e44>] (warn_slowpath_fmt+0x48/0x50)
          [ 8530.141760]  r9:00000000 r8:81252bec r7:80f19d90 r6:9dc3c000 r5:80ca7e7c r4:80834de8
          [ 8530.148235] [<80118e00>] (warn_slowpath_fmt) from [<80670b20>] (switchdev_port_attr_set_now+0xbc/0xc0)
          [ 8530.156240]  r3:9dc3c000 r2:80834de8
          [ 8530.158539]  r4:ffffffde
          [ 8530.159788] [<80670a64>] (switchdev_port_attr_set_now) from [<80670b44>] (switchdev_port_attr_set_deferred+0x20/0x6c)
          [ 8530.169118]  r7:806705a8 r6:9dc3c000 r5:80f19d90 r4:80f19d80
          [ 8530.173500] [<80670b24>] (switchdev_port_attr_set_deferred) from [<80670580>] (switchdev_deferred_process+0x50/0xe8)
          [ 8530.182742]  r6:80ca6000 r5:81252bec r4:80f19d80 r3:80670b24
          [ 8530.187115] [<80670530>] (switchdev_deferred_process) from [<80670930>] (switchdev_deferred_process_work+0x1c/0x24)
          [ 8530.196277]  r8:00000000 r7:9ffdc100 r6:8120ad6c r5:9ddefc00 r4:81252bf4 r3:9de343c0
          [ 8530.202756] [<80670914>] (switchdev_deferred_process_work) from [<8012f770>] (process_one_work+0x120/0x3b0)
          [ 8530.211231] [<8012f650>] (process_one_work) from [<8012fa70>] (worker_thread+0x70/0x534)
          [ 8530.218046]  r10:9ddefc00 r9:8120ad6c r8:80ca6038 r7:8120ad80 r6:81211f80 r5:9ddefc18
          [ 8530.224579]  r4:8120ad6c
          [ 8530.225830] [<8012fa00>] (worker_thread) from [<80135640>] (kthread+0x114/0x144)
          [ 8530.231955]  r10:9f4e9e94 r9:9de1fe58 r8:8012fa00 r7:9ddefc00 r6:9de1fdc0 r5:00000000
          [ 8530.238497]  r4:9de1fe40
          [ 8530.239750] [<8013552c>] (kthread) from [<80108cd8>] (ret_from_fork+0x14/0x3c)
          [ 8530.245679]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:8013552c
          [ 8530.252234]  r4:9de1fdc0 r3:80ca6000
          [ 8530.254512] ---[ end trace 87475cc71b80ef73 ]---
          [ 8530.257852] br0: failed (err=-34) to set attribute (id=5)
      
      This patchset fixes this by adding ageing_time_min and ageing_time_max
      fields to the dsa_switch structure, which can optionally be set by a DSA
      driver.
      
      If provided, the DSA core will check for out-of-range values in the
      SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME prepare phase and return -ERANGE
      accordingly.
      
      Finally set these limits in the mv88e6xxx driver.
      ====================
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02cb24e9