1. 18 Jan, 2019 20 commits
  2. 17 Jan, 2019 20 commits
    • Peter Oskolkov's avatar
      net: add a route cache full diagnostic message · 22c2ad61
      Peter Oskolkov authored
      In some testing scenarios, dst/route cache can fill up so quickly
      that even an explicit GC call occasionally fails to clean it up. This leads
      to sporadically failing calls to dst_alloc and "network unreachable" errors
      to the user, which is confusing.
      
      This patch adds a diagnostic message to make the cause of the failure
      easier to determine.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      22c2ad61
    • Ioana Ciocoi Radulescu's avatar
      dpaa2-eth: Fix ndo_stop routine · 68d74315
      Ioana Ciocoi Radulescu authored
      In the current implementation, on interface down we disabled NAPI and
      then manually drained any remaining ingress frames. This could lead
      to a situation when, under heavy traffic, the data availability
      notification for some of the channels would not get rearmed correctly.
      
      Change the implementation such that we let all remaining ingress frames
      be processed as usual and only disable NAPI once the hardware queues
      are empty.
      
      We also add a wait on the Tx side, to allow hardware time to process
      all in-flight Tx frames before issueing the disable command.
      Signed-off-by: default avatarIoana Radulescu <ruxandra.radulescu@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68d74315
    • Colin Ian King's avatar
      wan: dscc4: fix various indentation issues · 5191673b
      Colin Ian King authored
      There are some lines that have indentation issues, fix these.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5191673b
    • David S. Miller's avatar
      Merge branch 'vxlan-FDB-veto' · 039d52e1
      David S. Miller authored
      Petr Machata says:
      
      ====================
      vxlan: Allow vetoing FDB operations
      
      mlxsw does not implement handling of the more advanced types of VXLAN
      FDB entries. In order to provide visibility to users, it is important to
      be able to reject such FDB entries, ideally with an explanation passed
      in extended ack. This patch set implements this.
      
      In patches #1-#4, vxlan is gradually transformed to support vetoing of
      FDB entries added (or modified) through vxlan_fdb_update(), and the
      default FDB entry added in __vxlan_dev_create().
      
      Patches #5-#7 deal with vxlan_changelink(). The existing code recognizes
      that vxlan_fdb_update() may fail, but doesn't attempt to keep things
      intact if it does. These patches change the function in several steps to
      gracefully handle vetoes (or other failures).
      
      Then in patches #8-#11, extack arguments are added, respectively, to
      ndo_fdb_add(), mlxsw's mlxsw_sp_nve_ops.fdb_replay, the functions that
      connect to the VXLAN vetoing code, and call_switchdev_notifiers(). Note
      that call_switchdev_blocking_notifiers() already does support extack.
      
      Finally in patch #12, mlxsw is extended to add extack messages to
      rejected FDB entries. In patch #13, the functionality is tested.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      039d52e1
    • Petr Machata's avatar
      selftests: mlxsw: Test veto of unsupported VXLAN FDBs · 7e1046fd
      Petr Machata authored
      mlxsw doesn't implement offloading of all types of FDB entries that the
      VXLAN driver supports. Test that such FDB entries are rejected. That
      makes sure that the decision made by the existing validation code in
      mlxsw propagates up the stack. It also exercises rollback functionality
      in VXLAN, and tests that extack is returned.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e1046fd
    • Petr Machata's avatar
      mlxsw: spectrum: Add extack messages to VXLAN FDB rejection · a40313d9
      Petr Machata authored
      Annotate the rejections in mlxsw_sp_switchdev_vxlan_work_prepare() with
      textual reasons.
      
      Because this code ends up being invoked for FDB replay as well, drop the
      default message from there, so that the more accurate error message is
      not overwritten.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a40313d9
    • Petr Machata's avatar
      switchdev: Add extack argument to call_switchdev_notifiers() · 6685987c
      Petr Machata authored
      A follow-up patch will enable vetoing of FDB entries. Make it possible
      to communicate details of why an FDB entry is not acceptable back to the
      user.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6685987c
    • Petr Machata's avatar
      vxlan: Add extack to switchdev operations · 4c59b7d1
      Petr Machata authored
      There are four sources of VXLAN switchdev notifier calls:
      
      - the changelink() link operation, which already supports extack,
      - ndo_fdb_add() which got extack support in a previous patch,
      - FDB updates due to packet forwarding,
      - and vxlan_fdb_replay().
      
      Extend vxlan_fdb_switchdev_call_notifiers() to include extack in the
      switchdev message that it sends, and propagate the argument upwards to
      the callers. For the first two cases, pass in the extack gotten through
      the operation. For case #3, pass in NULL.
      
      To cover the last case, extend vxlan_fdb_replay() to take extack
      argument, which might come from whatever operation necessitated the FDB
      replay.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c59b7d1
    • Petr Machata's avatar
      mlxsw: Add extack to mlxsw_sp_nve_ops.fdb_replay · d907f58f
      Petr Machata authored
      A follow-up patch will extend vxlan_fdb_replay() with an extack
      argument. Extend the fdb_replay callback in mlxsw likewise so that the
      argument is ready for the vxlan conversion.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d907f58f
    • Petr Machata's avatar
      net: Add extack argument to ndo_fdb_add() · 87b0984e
      Petr Machata authored
      Drivers may not be able to support certain FDB entries, and an error
      code is insufficient to give clear hints as to the reasons of rejection.
      
      In order to make it possible to communicate the rejection reason, extend
      ndo_fdb_add() with an extack argument. Adapt the existing
      implementations of ndo_fdb_add() to take the parameter (and ignore it).
      Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add().
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87b0984e
    • Petr Machata's avatar
      vxlan: changelink: Delete remote after update · 1cdc98c2
      Petr Machata authored
      If a change in remote address prompts a change in a default FDB entry,
      that change might be vetoed. If that happens, it would then be necessary
      to reinstate the already-removed default FDB entry corresponding to the
      previous remote address.
      
      Instead, arrange to have the previous address removed only after the
      FDB is successfully vetted.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1cdc98c2
    • Petr Machata's avatar
      vxlan: changelink: Postpone vxlan_config_apply() · 038a5a99
      Petr Machata authored
      When an FDB entry is vetoed, it is necessary to unroll the changes that
      have already been done. To avoid having to unroll vxlan_config_apply(),
      postpone the call after the point where the vetoing takes place. Since
      the call can't fail, it doesn't necessitate any cleanups in the
      preceding FDB update logic.
      
      Correspondingly, move down the mod_timer() call as well.
      
      References to *dst need to be replaced with references to conf.
      Additionally, old_dst and old_age_interval are not necessary anymore,
      and therefore drop them.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      038a5a99
    • Petr Machata's avatar
      vxlan: changelink: Inline vxlan_dev_configure() · 8db9427d
      Petr Machata authored
      The changelink operation may cause change in remote address, and
      therefore an FDB update, which can be vetoed. To properly handle
      vetoing, vxlan_changelink() needs to be gradually updated.
      
      In this patch simply replace vxlan_dev_configure() with the two
      constituent calls.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8db9427d
    • Petr Machata's avatar
      vxlan: Allow vetoing of FDB notifications · 61f46fe8
      Petr Machata authored
      Change vxlan_fdb_switchdev_call_notifiers() to return the result from
      calling switchdev notifiers. Propagate the error number up the stack.
      
      In vxlan_fdb_update_existing() and vxlan_fdb_update_create() add
      rollbacks to clean up the work that was done before the veto.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61f46fe8
    • Petr Machata's avatar
      vxlan: Have vxlan_fdb_replace() save original rdst value · ccdfd4f7
      Petr Machata authored
      To enable rollbacks after vetoed FDB updates, extend vxlan_fdb_replace()
      to take an additional argument where it should store the original values
      of a modified rdst. Update the sole caller.
      
      The following patch will make use of the saved value.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ccdfd4f7
    • Petr Machata's avatar
      vxlan: Split vxlan_fdb_update() in two · a76d1ca2
      Petr Machata authored
      In order to make it easier to implement rollbacks after FDB update
      vetoing, separate the FDB update code to two parts: one that deals with
      updates of existing FDB entries, and one that creates new entries.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a76d1ca2
    • Petr Machata's avatar
      vxlan: Move up vxlan_fdb_free(), vxlan_fdb_destroy() · c2b200e0
      Petr Machata authored
      These functions will be needed for rollbacks of vetoed FDB entries. Move
      them up so that they are visible at their intended point of use.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2b200e0
    • David S. Miller's avatar
      Merge branch 'improving-TCP-behavior-on-host-congestion' · 12ff91c8
      David S. Miller authored
      Yuchung Cheng says:
      
      ====================
      improving TCP behavior on host congestion
      
      This patch set aims to improve how TCP handle local qdisc congestion
      by simplifying the previous implementation.  Previously when an
      skb fails to (re)transmit due to local qdisc congestion or other
      resource issue, TCP refrains from setting the skb timestamp or the
      recovery starting time.
      
      This design makes determining when to abort a stalling socket more
      complicated, as the timestamps of these tranmission attempts were
      missing. The stack needs to sort of infer when the original attempt
      happens. A by-product is a socket may disregard the system timeout
      limit (i.e. sysctl net.ipv4.tcp_retries2 or USER_TIMEOUT option),
      and continue to retry until the transmission is successful.
      
      In data-center environment when TCP RTO is small, this could cause
      the socket to retry frequently for long during qdisc congestion.
      
      The solution is to first unconditionally timestamp skb and recovery
      attempt. Then retry more conservatively (twice a second) on local
      qdisc congestion but abort the sockets according to the system limit.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      12ff91c8
    • Yuchung Cheng's avatar
      tcp: less aggressive window probing on local congestion · c1d5674f
      Yuchung Cheng authored
      Previously when the sender fails to send (original) data packet or
      window probes due to congestion in the local host (e.g. throttling
      in qdisc), it'll retry within an RTO or two up to 500ms.
      
      In low-RTT networks such as data-centers, RTO is often far below
      the default minimum 200ms. Then local host congestion could trigger
      a retry storm pouring gas to the fire. Worse yet, the probe counter
      (icsk_probes_out) is not properly updated so the aggressive retry
      may exceed the system limit (15 rounds) until the packet finally
      slips through.
      
      On such rare events, it's wise to retry more conservatively
      (500ms) and update the stats properly to reflect these incidents
      and follow the system limit. Note that this is consistent with
      the behaviors when a keep-alive probe or RTO retry is dropped
      due to local congestion.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1d5674f
    • Yuchung Cheng's avatar
      tcp: retry more conservatively on local congestion · 590d2026
      Yuchung Cheng authored
      Previously when the sender fails to retransmit a data packet on
      timeout due to congestion in the local host (e.g. throttling in
      qdisc), it'll retry within an RTO up to 500ms.
      
      In low-RTT networks such as data-centers, RTO is often far
      below the default minimum 200ms (and the cap 500ms). Then local
      host congestion could trigger a retry storm pouring gas to the
      fire. Worse yet, the retry counter (icsk_retransmits) is not
      properly updated so the aggressive retry may exceed the system
      limit (15 rounds) until the packet finally slips through.
      
      On such rare events, it's wise to retry more conservatively (500ms)
      and update the stats properly to reflect these incidents and follow
      the system limit. Note that this is consistent with the behavior
      when a keep-alive probe is dropped due to local congestion.
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarNeal Cardwell <ncardwell@google.com>
      Reviewed-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      590d2026