1. 03 Jan, 2020 11 commits
  2. 02 Jan, 2020 18 commits
    • David S. Miller's avatar
      Merge branch 'tcp-Add-support-for-L3-domains-to-MD5-auth' · 7a8d8a46
      David S. Miller authored
      David Ahern says:
      
      ====================
      tcp: Add support for L3 domains to MD5 auth
      
      With VRF, the scope of network addresses is limited to the L3 domain
      the device is associated. MD5 keys are based on addresses, so proper
      VRF support requires an L3 domain to be considered for the lookups.
      
      Leverage the new TCP_MD5SIG_EXT option to add support for a device index
      to MD5 keys. The __tcpm_pad entry in tcp_md5sig is renamed to tcpm_ifindex
      and a new flag, TCP_MD5SIG_FLAG_IFINDEX, in tcpm_flags determines if the
      entry is examined. This follows what was done for MD5 and prefixes with
      commits
         8917a777 ("tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix")
         6797318e ("tcp: md5: add an address prefix for key lookup")
      
      Handling both a device AND L3 domain is much more complicated for the
      response paths. This set focuses only on L3 support - requiring the
      device index to be an l3mdev (ie, VRF). Support for slave devices can
      be added later if desired, much like the progression of support for
      sockets bound to a VRF and then bound to a device in a VRF. Kernel
      code is setup to explicitly call out that current lookup is for an L3
      index, while the uapi just references a device index allowing its
      meaning to include other devices in the future.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a8d8a46
    • David Ahern's avatar
      fcnal-test: Add TCP MD5 tests for VRF · 5cad8bce
      David Ahern authored
      Add tests for new TCP MD5 API for L3 domains (VRF).
      
      A new namespace is added to create a duplicate configuration between
      the VRF and default VRF to verify overlapping config is handled properly.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cad8bce
    • David Ahern's avatar
      fcnal-test: Add TCP MD5 tests · f0bee1eb
      David Ahern authored
      Add tests for existing TCP MD5 APIs - both single address
      config and the new extended API for prefixes.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f0bee1eb
    • David Ahern's avatar
      nettest: Add support for TCP_MD5 extensions · eb09cf03
      David Ahern authored
      Update nettest to implement TCP_MD5SIG_EXT for a prefix and a device.
      
      Add a new option, -m, to specify a prefix and length to use with MD5
      auth. The device option comes from the existing -d option. If either
      are set and MD5 auth is requested, TCP_MD5SIG_EXT is used instead of
      TCP_MD5SIG.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eb09cf03
    • David Ahern's avatar
      nettest: Return 1 on MD5 failure for server mode · 1bfb45d8
      David Ahern authored
      On failure to set MD5 password, do_server should return 1 so that the
      program exits with 1 rather than 255. This used for negative testing
      when adding MD5 with device option.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1bfb45d8
    • David Ahern's avatar
      net: Add device index to tcp_md5sig · 6b102db5
      David Ahern authored
      Add support for userspace to specify a device index to limit the scope
      of an entry via the TCP_MD5SIG_EXT setsockopt. The existing __tcpm_pad
      is renamed to tcpm_ifindex and the new field is only checked if the new
      TCP_MD5SIG_FLAG_IFINDEX is set in tcpm_flags. For now, the device index
      must point to an L3 master device (e.g., VRF). The API and error
      handling are setup to allow the constraint to be relaxed in the future
      to any device index.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6b102db5
    • David Ahern's avatar
      tcp: Add l3index to tcp_md5sig_key and md5 functions · dea53bb8
      David Ahern authored
      Add l3index to tcp_md5sig_key to represent the L3 domain of a key, and
      add l3index to tcp_md5_do_add and tcp_md5_do_del to fill in the key.
      
      With the key now based on an l3index, add the new parameter to the
      lookup functions and consider the l3index when looking for a match.
      
      The l3index comes from the skb when processing ingress packets leveraging
      the helpers created for socket lookups, tcp_v4_sdif and inet_iif (and the
      v6 variants). When the sdif index is set it means the packet ingressed a
      device that is part of an L3 domain and inet_iif points to the VRF device.
      For egress, the L3 domain is determined from the socket binding and
      sk_bound_dev_if.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dea53bb8
    • David Ahern's avatar
      ipv4/tcp: Pass dif and sdif to tcp_v4_inbound_md5_hash · 534322ca
      David Ahern authored
      The original ingress device index is saved to the cb space of the skb
      and the cb is moved during tcp processing. Since tcp_v4_inbound_md5_hash
      can be called before and after the cb move, pass dif and sdif to it so
      the caller can save both prior to the cb move. Both are used by a later
      patch.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      534322ca
    • David Ahern's avatar
      ipv6/tcp: Pass dif and sdif to tcp_v6_inbound_md5_hash · d14c77e0
      David Ahern authored
      The original ingress device index is saved to the cb space of the skb
      and the cb is moved during tcp processing. Since tcp_v6_inbound_md5_hash
      can be called before and after the cb move, pass dif and sdif to it so
      the caller can save both prior to the cb move. Both are used by a later
      patch.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d14c77e0
    • David Ahern's avatar
      ipv4/tcp: Use local variable for tcp_md5_addr · cea97609
      David Ahern authored
      Extract the typecast to (union tcp_md5_addr *) to a local variable
      rather than the current long, inline declaration with function calls.
      
      No functional change intended.
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cea97609
    • Niu Xilei's avatar
      vxlan: Fix alignment and code style of vxlan.c · 98c81476
      Niu Xilei authored
      Fixed Coding function and style issues
      Signed-off-by: default avatarNiu Xilei <niu_xilei@163.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      98c81476
    • David S. Miller's avatar
      Merge branch 'mlxsw-Allow-setting-default-port-priority' · f5e5d272
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Allow setting default port priority
      
      Petr says:
      
      When LLDP APP TLV selector 1 (EtherType) is used with PID of 0, the
      corresponding entry specifies "default application priority [...] when
      application priority is not otherwise specified."
      
      mlxsw currently supports this type of APP entry, but uses it only as a
      fallback for unspecified DSCP rules. However non-IP traffic is prioritized
      according to port-default priority, not according to the DSCP-to-prio
      tables, and thus it's currently not possible to prioritize such traffic
      correctly.
      
      This patchset extends the use of the abovementioned APP entry to also set
      default port priority (in patches #1 and #2) and then (in patch #3) adds a
      selftest.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f5e5d272
    • Petr Machata's avatar
      selftests: mlxsw: Add a self-test for port-default priority · c5341bcc
      Petr Machata authored
      Send non-IP traffic to a port and observe that it gets prioritized
      according to the lldptool app=$prio,1,0 rules.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c5341bcc
    • Petr Machata's avatar
      mlxsw: spectrum_dcb: Allow setting default port priority · 379a00dd
      Petr Machata authored
      When APP TLV selector 1 (EtherType) is used with PID of 0, the
      corresponding entry specifies "default application priority [...] when
      application priority is not otherwise specified."
      
      mlxsw currently supports this type of APP entry, but uses it only as a
      fallback for unspecified DSCP rules. However non-IP traffic is prioritized
      according to port-default priority, not according to the DSCP-to-prio
      tables, and thus it's currently not possible to prioritize such traffic
      correctly.
      
      Extend the use of the abovementioned APP entry to also set default port
      priority.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      379a00dd
    • Petr Machata's avatar
      mlxsw: reg: Add QoS Port DSCP to Priority Mapping Register · d8446884
      Petr Machata authored
      Add QPDP. This register controls the port default Switch Priority and
      Color. The default Switch Priority and Color are used for frames where the
      trust state uses default values. Currently there are two cases where this
      applies: a port is in trust-PCP state, but a packet arrives untagged; and a
      port is in trust-DSCP state, but a non-IP packet arrives.
      Signed-off-by: default avatarPetr Machata <petrm@mellanox.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d8446884
    • David S. Miller's avatar
      Merge branch 'page_pool-NUMA-node-handling-fixes' · c9a2069b
      David S. Miller authored
      Jesper Dangaard Brouer says:
      
      ====================
      page_pool: NUMA node handling fixes
      
      The recently added NUMA changes (merged for v5.5) to page_pool, it both
      contains a bug in handling NUMA_NO_NODE condition, and added code to
      the fast-path.
      
      This patchset fixes the bug and moves code out of fast-path. The first
      patch contains a fix that should be considered for 5.5. The second
      patch reduce code size and overhead in case CONFIG_NUMA is disabled.
      
      Currently the NUMA_NO_NODE setting bug only affects driver 'ti_cpsw'
      (drivers/net/ethernet/ti/), but after this patchset, we plan to move
      other drivers (netsec and mvneta) to use NUMA_NO_NODE setting.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c9a2069b
    • Jesper Dangaard Brouer's avatar
      page_pool: help compiler remove code in case CONFIG_NUMA=n · f13fc107
      Jesper Dangaard Brouer authored
      When kernel is compiled without NUMA support, then page_pool NUMA
      config setting (pool->p.nid) doesn't make any practical sense. The
      compiler cannot see that it can remove the code paths.
      
      This patch avoids reading pool->p.nid setting in case of !CONFIG_NUMA,
      in allocation and numa check code, which helps compiler to see the
      optimisation potential. It leaves update code intact to keep API the
      same.
      
       $ ./scripts/bloat-o-meter net/core/page_pool.o-numa-enabled \
                                 net/core/page_pool.o-numa-disabled
       add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-113 (-113)
       Function                                     old     new   delta
       page_pool_create                             401     398      -3
       __page_pool_alloc_pages_slow                 439     426     -13
       page_pool_refill_alloc_cache                 425     328     -97
       Total: Before=3611, After=3498, chg -3.13%
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f13fc107
    • Jesper Dangaard Brouer's avatar
      page_pool: handle page recycle for NUMA_NO_NODE condition · 44768dec
      Jesper Dangaard Brouer authored
      The check in pool_page_reusable (page_to_nid(page) == pool->p.nid) is
      not valid if page_pool was configured with pool->p.nid = NUMA_NO_NODE.
      
      The goal of the NUMA changes in commit d5394610 ("page_pool: Don't
      recycle non-reusable pages"), were to have RX-pages that belongs to the
      same NUMA node as the CPU processing RX-packet during softirq/NAPI. As
      illustrated by the performance measurements.
      
      This patch moves the NAPI checks out of fast-path, and at the same time
      solves the NUMA_NO_NODE issue.
      
      First realize that alloc_pages_node() with pool->p.nid = NUMA_NO_NODE
      will lookup current CPU nid (Numa ID) via numa_mem_id(), which is used
      as the the preferred nid.  It is only in rare situations, where
      e.g. NUMA zone runs dry, that page gets doesn't get allocated from
      preferred nid.  The page_pool API allows drivers to control the nid
      themselves via controlling pool->p.nid.
      
      This patch moves the NAPI check to when alloc cache is refilled, via
      dequeuing/consuming pages from the ptr_ring. Thus, we can allow placing
      pages from remote NUMA into the ptr_ring, as the dequeue/consume step
      will check the NUMA node. All current drivers using page_pool will
      alloc/refill RX-ring from same CPU running softirq/NAPI process.
      
      Drivers that control the nid explicitly, also use page_pool_update_nid
      when changing nid runtime.  To speed up transision to new nid the alloc
      cache is now flushed on nid changes.  This force pages to come from
      ptr_ring, which does the appropate nid check.
      
      For the NUMA_NO_NODE case, when a NIC IRQ is moved to another NUMA
      node, we accept that transitioning the alloc cache doesn't happen
      immediately. The preferred nid change runtime via consulting
      numa_mem_id() based on the CPU processing RX-packets.
      
      Notice, to avoid stressing the page buddy allocator and avoid doing too
      much work under softirq with preempt disabled, the NUMA check at
      ptr_ring dequeue will break the refill cycle, when detecting a NUMA
      mismatch. This will cause a slower transition, but its done on purpose.
      
      Fixes: d5394610 ("page_pool: Don't recycle non-reusable pages")
      Reported-by: default avatarLi RongQing <lirongqing@baidu.com>
      Reported-by: default avatarYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      44768dec
  3. 01 Jan, 2020 1 commit
    • David S. Miller's avatar
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · fe23d634
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      1GbE Intel Wired LAN Driver Updates 2019-12-31
      
      This series contains updates to e1000e, igb and igc only.
      
      Robert Beckett provide an igb change to assist in keeping packets from
      being dropped due to receive descriptor ring being full when receive
      flow control is enabled.  Create a separate function to setup SRRCTL to
      ease in reuse and ensure that setting of the drop enable bit only if
      receive flow control is not enabled.
      
      Sasha adds support for scatter gather support in igc.  Improve the
      direct memory address mapping flow by optimizing/simplifying and more
      clear.  Update igc to use pci_release_mem_regions() instead of
      pci_release_selected_regions().  Clean up function header comments to
      align with the actual code.  Adds support for 64 bit DMA access, to help
      handle socket buffer fragments in high memory.  Adds legacy power
      management support in igc by implementing suspend, resume,
      runtime_suspend/resume, and runtime_idle callbacks.  Clean up references
      to Serdes interface in igc since that interface is not supported for
      i225 devices.
      
      Alex replaces the pr_info calls with netdev_info in all cases related to
      netdev link state, as suggested by Joe Perches.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fe23d634
  4. 31 Dec, 2019 10 commits