1. 21 Dec, 2018 29 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-XDP-100Mpps' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 37159174
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-XDP-100Mpps
      
      This series from Tariq, mainly adds the support of mlx5 Multi Packet WQE
      (TX descriptor) - ConnectX-5 and above - for XDP TX, which allows us to
      overcome the 70Mpps PCIe bottleneck of conventional TX queues (single TX
      descriptor per packet), and achieve the 100Mpps milestone with the MPWQE
      approach.
      
      In the first five patches, Tariq did minor improvements to mlx5 tx path,
      for better debug-ability and code structuring.
      
      Next two patches lay down the foundation for MPWQE implementation to store
      the in-flight XDP TX information for multiple packets of one descriptor
      (WQE).
      
      Next: Support Enhanced Multi-Packet TX WQE for XDP
      
      In this patch we add support for the HW feature, which is supported
      starting from ConnectX-5.
      
      Performance:
      Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs.
      CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
      
      XDP_TX:
      We see a huge gain on single port ConnectX-5, and reach the 100 Mpps
      milestone.
      * Single-port HCA:
      	Before:   70 Mpps
      	After:   100 Mpps (+42.8%)
      
      * Dual-port HCA:
      	Before: 51.7 Mpps
      	After:  57.3 Mpps (+10.8%)
      
      * In both cases we tested traffic on one port and for now On Dual-port
        HCAs we see only a small gain, we are working to overcome this
        bottleneck, but for the moment only with experimental firmware on dual
        port HCAs we can reach the wanted numbers as seen on Single-port HCAs.
      
      XDP_REDIRECT:
      Redirect from (A) ConnectX-5 to (B) ConnectX-5.
      Due to a setup limitation, (A) and (B) are on different NUMA nodes,
      so absolute performance numbers are not optimal.
      - Note:
        Below is the transmit rate of (B), not the redirect rate of (A)
        which is in some cases higher.
      
      * (B) is single-port:
      	Before:   77 Mpps
      	After:    90 Mpps (+16.8%)
      
      * (B) is dual-port:
      	Before:  61 Mpps
      	After:   72 Mpps (+18%)
      
      Last patch adds a knob in mlx5 ethtool private flag to turn on/off
      XDP TX MPWQE.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      37159174
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Add user control for XDP TX MPWQE feature · 6277053a
      Tariq Toukan authored
      Add ethtool private flag 'xdp_tx_mpwqe' to control the feature
      from userspace.
      Feature is set ON by default, if supported.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      6277053a
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Support Enhanced Multi-Packet TX WQE · 5e0d2eef
      Tariq Toukan authored
      Add support for the HW feature of multi-packet WQE in XDP
      xmit flow.
      
      The conventional TX descriptor (WQE, Work Queue Element) serves
      a single packet. Our HW has support for multi-packet WQE (MPWQE)
      in which a single descriptor serves multiple TX packets.
      
      This reduces both the PCI overhead and the CPU cycles wasted on
      writing them.
      
      In this patch we add support for the HW feature, which is supported
      starting from ConnectX-5.
      
      Performance:
      Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs.
      CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
      
      XDP_TX:
      We see a huge gain on single port ConnectX-5, and reach the 100 Mpps
      milestone.
      * Single-port HCA:
      	Before:   70 Mpps
      	After:   100 Mpps (+42.8%)
      
      * Dual-port HCA:
      	Before: 51.7 Mpps
      	After:  57.3 Mpps (+10.8%)
      
      * In both cases we tested traffic on one port and for now On Dual-port HCAs
        we see only small gain, we are working to overcome this bottleneck, but
        for the moment only with experimental firmware on dual port HCAs we can
        reach the wanted numbers as seen on Single-port HCAs.
      
      XDP_REDIRECT:
      Redirect from (A) ConnectX-5 to (B) ConnectX-5.
      Due to a setup limitation, (A) and (B) are on different NUMA nodes,
      so absolute performance numbers are not optimal.
      Note:
        Below is the transmit rate of (B), not the redirect rate of (A)
        which is in some cases higher.
      
      * (B) is single-port:
      	Before:   77 Mpps
      	After:    90 Mpps (+16.8%)
      
      * (B) is dual-port:
      	Before:  61 Mpps
      	After:   72 Mpps (+18%)
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      5e0d2eef
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Add array for WQE info descriptors · 1feeab80
      Tariq Toukan authored
      Each xdp_wqe_info instance describes the number of data-segments
      and WQEBBs of the WQE.
      This is useful for a downstream patch that adds support for
      Multi-Packet TX WQE feature.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      1feeab80
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Maintain a FIFO structure for xdp_info instances · fea28dd6
      Tariq Toukan authored
      This provides infrastructure to have multiple xdp_info instances
      for the same consumer index.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      fea28dd6
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Replace boolean doorbell indication with segment pointer · b8180392
      Tariq Toukan authored
      Instead of calculating the control segment to be used upon an
      XDP xmit doorbell, save it in SQ structure.
      Nullify when no pending doorbell.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      b8180392
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Warn upon polling an error CQE · db02a308
      Tariq Toukan authored
      Do not ignore the CQE opcode.
      This helps expose issues and debug them.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      db02a308
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Change the XDP SQ redirect indication · feb2ff9d
      Tariq Toukan authored
      Do not maintain an SQ state bit to indicate whether an
      XDP SQ serves redirect operations.
      
      Instead, rely on the fact that such an XDP SQ doesn't reside
      in an RQ instance, while the others do.
      This info is not known to the XDP SQ functions themselves,
      and they rely on their callers to distinguish between the cases.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      feb2ff9d
    • Tariq Toukan's avatar
      net/mlx5e: XDP, Precede XDP-related operations in RQ poll by a loaded program check · 4fb2f516
      Tariq Toukan authored
      At the end of the RQ polling loop, some XDP-related operations
      might be required. Before checking them one by one, check if
      an XDP program is even loaded.
      Combine all the checks and operations in a single function in xdp files.
      
      This saves unnecessary checks for non-XDP flows.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      4fb2f516
    • Tariq Toukan's avatar
      net/mlx5e: TX, Print opcode in error CQE warning · e05b8d4f
      Tariq Toukan authored
      The opcode indicates about the error reason.
      Printing it helps in debug.
      Signed-off-by: default avatarTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: default avatarEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      e05b8d4f
    • Peter Oskolkov's avatar
      selftests: net: reuseport_addr_any: silence clang warning · fa232332
      Peter Oskolkov authored
      Clang does not recognize that calls to error() terminate execution
      and complains about uninitialized variable use that happens after calls
      to error(). This noop patchset fixes this.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fa232332
    • Willem de Bruijn's avatar
      virtio-net: ethtool configurable LRO · a02e8964
      Willem de Bruijn authored
      Virtio-net devices negotiate LRO support with the host.
      Display the initially negotiated state with ethtool -k.
      
      Also allow configuring it with ethtool -K, reusing the existing
      virtnet_set_guest_offloads helper that configures LRO for XDP.
      This is conditional on VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
      
      Virtio-net negotiates TSO4 and TSO6 separately, but ethtool does not
      distinguish between the two. Display LRO as on only if any offload
      is active.
      
      RTNL is held while calling virtnet_set_features, same as on the path
      from virtnet_xdp_set.
      
      Changes v1 -> v2
        - allow ethtool config (-K) only if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
        - show LRO as enabled if any LRO variant is enabled
        - do not allow configuration while XDP is active
        - differentiate current features from the capable set, to restore
          on XDP down only those features that were active on XDP up
        - move test out of VIRTIO_NET_F_CSUM/TSO branch, which is tx only
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a02e8964
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · c3e53369
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for net-next:
      
      1) Support for destination MAC in ipset, from Stefano Brivio.
      
      2) Disallow all-zeroes MAC address in ipset, also from Stefano.
      
      3) Add IPSET_CMD_GET_BYNAME and IPSET_CMD_GET_BYINDEX commands,
         introduce protocol version number 7, from Jozsef Kadlecsik.
         A follow up patch to fix ip_set_byindex() is also included
         in this batch.
      
      4) Honor CTA_MARK_MASK from ctnetlink, from Andreas Jaggi.
      
      5) Statify nf_flow_table_iterate(), from Taehee Yoo.
      
      6) Use nf_flow_table_iterate() to simplify garbage collection in
         nf_flow_table logic, also from Taehee Yoo.
      
      7) Don't use _bh variants of call_rcu(), rcu_barrier() and
         synchronize_rcu_bh() in Netfilter, from Paul E. McKenney.
      
      8) Remove NFC_* cache definition from the old caching
         infrastructure.
      
      9) Remove layer 4 port rover in NAT helpers, use random port
         instead, from Florian Westphal.
      
      10) Use strscpy() in ipset, from Qian Cai.
      
      11) Remove NF_NAT_RANGE_PROTO_RANDOM_FULLY branch now that
          random port is allocated by default, from Xiaozhou Liu.
      
      12) Ignore NF_NAT_RANGE_PROTO_RANDOM too, from Florian Westphal.
      
      13) Limit port allocation selection routine in NAT to avoid
          softlockup splats when most ports are in use, from Florian.
      
      14) Remove unused parameters in nf_ct_l4proto_unregister_sysctl()
          from Yafang Shao.
      
      15) Direct call to nf_nat_l4proto_unique_tuple() instead of
          indirection, from Florian Westphal.
      
      16) Several patches to remove all layer 4 NAT indirections,
          remove nf_nat_l4proto struct, from Florian Westphal.
      
      17) Fix RTP/RTCP source port translation when SNAT is in place,
          from Alin Nastac.
      
      18) Selective rule dump per chain, from Phil Sutter.
      
      19) Revisit CLUSTERIP target, this includes a deadlock fix from
          netns path, sleep in atomic, remove bogus WARN_ON_ONCE()
          and disallow mismatching IP address and MAC address.
          Patchset from Taehee Yoo.
      
      20) Update UDP timeout to stream after 2 seconds, from Florian.
      
      21) Shrink UDP established timeout to 120 seconds like TCP timewait.
      
      22) Sysctl knobs to set GRE timeouts, from Yafang Shao.
      
      23) Move seq_print_acct() to conntrack core file, from Florian.
      
      24) Add enum for conntrack sysctl knobs, also from Florian.
      
      25) Place nf_conntrack_acct, nf_conntrack_helper, nf_conntrack_events
          and nf_conntrack_timestamp knobs in the core, from Florian Westphal.
          As a side effect, shrink netns_ct structure by removing obsolete
          sysctl anchors, also from Florian.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c3e53369
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 339bbff2
      David S. Miller authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-12-21
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      There is a merge conflict in test_verifier.c. Result looks as follows:
      
              [...]
              },
              {
                      "calls: cross frame pruning",
                      .insns = {
                      [...]
                      .prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
                      .errstr_unpriv = "function calls to other bpf functions are allowed for root only",
                      .result_unpriv = REJECT,
                      .errstr = "!read_ok",
                      .result = REJECT,
      	},
              {
                      "jset: functional",
                      .insns = {
              [...]
              {
                      "jset: unknown const compare not taken",
                      .insns = {
                              BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
                                           BPF_FUNC_get_prandom_u32),
                              BPF_JMP_IMM(BPF_JSET, BPF_REG_0, 1, 1),
                              BPF_LDX_MEM(BPF_B, BPF_REG_8, BPF_REG_9, 0),
                              BPF_EXIT_INSN(),
                      },
                      .prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
                      .errstr_unpriv = "!read_ok",
                      .result_unpriv = REJECT,
                      .errstr = "!read_ok",
                      .result = REJECT,
              },
              [...]
              {
                      "jset: range",
                      .insns = {
                      [...]
                      },
                      .prog_type = BPF_PROG_TYPE_SOCKET_FILTER,
                      .result_unpriv = ACCEPT,
                      .result = ACCEPT,
              },
      
      The main changes are:
      
      1) Various BTF related improvements in order to get line info
         working. Meaning, verifier will now annotate the corresponding
         BPF C code to the error log, from Martin and Yonghong.
      
      2) Implement support for raw BPF tracepoints in modules, from Matt.
      
      3) Add several improvements to verifier state logic, namely speeding
         up stacksafe check, optimizations for stack state equivalence
         test and safety checks for liveness analysis, from Alexei.
      
      4) Teach verifier to make use of BPF_JSET instruction, add several
         test cases to kselftests and remove nfp specific JSET optimization
         now that verifier has awareness, from Jakub.
      
      5) Improve BPF verifier's slot_type marking logic in order to
         allow more stack slot sharing, from Jiong.
      
      6) Add sk_msg->size member for context access and add set of fixes
         and improvements to make sock_map with kTLS usable with openssl
         based applications, from John.
      
      7) Several cleanups and documentation updates in bpftool as well as
         auto-mount of tracefs for "bpftool prog tracelog" command,
         from Quentin.
      
      8) Include sub-program tags from now on in bpf_prog_info in order to
         have a reliable way for user space to get all tags of the program
         e.g. needed for kallsyms correlation, from Song.
      
      9) Add BTF annotations for cgroup_local_storage BPF maps and
         implement bpf fs pretty print support, from Roman.
      
      10) Fix bpftool in order to allow for cross-compilation, from Ivan.
      
      11) Update of bpftool license to GPLv2-only + BSD-2-Clause in order
          to be compatible with libbfd and allow for Debian packaging,
          from Jakub.
      
      12) Remove an obsolete prog->aux sanitation in dump and get rid of
          version check for prog load, from Daniel.
      
      13) Fix a memory leak in libbpf's line info handling, from Prashant.
      
      14) Fix cpumap's frame alignment for build_skb() so that skb_shared_info
          does not get unaligned, from Jesper.
      
      15) Fix test_progs kselftest to work with older compilers which are less
          smart in optimizing (and thus throwing build error), from Stanislav.
      
      16) Cleanup and simplify AF_XDP socket teardown, from Björn.
      
      17) Fix sk lookup in BPF kselftest's test_sock_addr with regards
          to netns_id argument, from Andrey.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      339bbff2
    • David S. Miller's avatar
      Merge branch 'expand-txtimestamp-selftest' · e770454f
      David S. Miller authored
      Willem de Bruijn says:
      
      ====================
      expand txtimestamp selftest
      
      Convert the existing txtimestamp test to run as part of kselftest
      and return a pass/fail.
      
      Also expand the variations of timestamping tested, including packet
      sockets, ipv6 raw and dgram and passing options using cmsg.
      
      These are enough changes to split across a few patches, even if all
      changes are only this one test.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e770454f
    • Willem de Bruijn's avatar
      selftests: add txtimestamp kselftest · cda261f4
      Willem de Bruijn authored
      Run the transmit timestamp tests as part of kselftests.
      
      Add a txtimestamp.sh test script that runs most variants:
      ipv4/ipv6, tcp/udp/raw/raw_ipproto/pf_packet, data/nodata,
      setsockopt/cmsg. The script runs tests with netem delays.
      
      Refine txtimestamp.c to validate results. Take expected
      netem delays as input and compare against real timestamps.
      
      To run without dependencies, add a listener socket to be
      able to connect in the case of TCP.
      
      Add the timestamping directory to the kselftests Makefile.
      Build all the binaries. Only run verified txtimestamp.sh.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cda261f4
    • Willem de Bruijn's avatar
      selftests: expand txtimestamp with ipv6 dgram + raw and pf_packet · b52354aa
      Willem de Bruijn authored
      Expand the transmit timestamp regression test with support for
      missing protocols: ipv6 datagram and raw and pf_packet.
      
      Also refine resolve_hostname to independently request AF_INET or
      AF_INET6 addresses. Else, ipv4 addresses may be returned as AF_INET6.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b52354aa
    • Willem de Bruijn's avatar
      selftests: expand txtimestamp with cmsg support · 7085f47f
      Willem de Bruijn authored
      Commit 3dd17e63 ("sock: accept SO_TIMESTAMPING flags in socket
      cmsg") added support for passing tx timestamping options per-call
      in sendmsg.
      
      Expand the txtimestamp test with support for this feature.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7085f47f
    • Peter Oskolkov's avatar
      net: seg6.h: remove an unused #include · a6ae520d
      Peter Oskolkov authored
      A minor code cleanup.
      Signed-off-by: default avatarPeter Oskolkov <posk@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6ae520d
    • Sam Protsenko's avatar
      ppp: Move PFC decompression to PPP generic layer · 7fb1b8ca
      Sam Protsenko authored
      Extract "Protocol" field decompression code from transport protocols to
      PPP generic layer, where it actually belongs. As a consequence, this
      patch fixes incorrect place of PFC decompression in L2TP driver (when
      it's not PPPOX_BOUND) and also enables this decompression for other
      protocols, like PPPoE.
      
      Protocol field decompression also happens in PPP Multilink Protocol
      code and in PPP compression protocols implementations (bsd, deflate,
      mppe). It looks like there is no easy way to get rid of that, so it was
      decided to leave it as is, but provide those cases with appropriate
      comments instead.
      
      Changes in v2:
        - Fix the order of checking skb data room and proto decompression
        - Remove "inline" keyword from ppp_decompress_proto()
        - Don't split line before function name
        - Prefix ppp_decompress_proto() function with "__"
        - Add ppp_decompress_proto() function with skb data room checks
        - Add description for introduced functions
        - Fix comments (as per review on mailing list)
      Signed-off-by: default avatarSam Protsenko <semen.protsenko@linaro.org>
      Reviewed-by: default avatarGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7fb1b8ca
    • David S. Miller's avatar
      Merge tag 'wireless-drivers-next-for-davem-2018-12-20' of... · e69fbf31
      David S. Miller authored
      Merge tag 'wireless-drivers-next-for-davem-2018-12-20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      wireless-drivers-next patches for 4.21
      
      Last set of patches for 4.21. mt76 is still in very active development
      and having some refactoring as well as new features. But also other
      drivers got few new features and fixes.
      
      Major changes:
      
      ath10k
      
      * add amsdu support for QCA6174 monitor mode
      
      * report tx rate using the new ieee80211_tx_rate_update() API
      
      * wcn3990 support is not experimental anymore
      
      iwlwifi
      
      * support for FW version 43 for 9000 and 22000 series
      
      brcmfmac
      
      * add support for CYW43012 SDIO chipset
      
      * add the raw 4354 PCIe device ID for unprogrammed Cypress boards
      
      mwifiex
      
      * add NL80211_STA_INFO_RX_BITRATE support
      
      mt76
      
      * use the same firmware for mt76x2e and mt76x2u
      
      * mt76x0e survey support
      
      * more unification between mt76x2 and mt76x0
      
      * mt76x0e AP mode support
      
      * mt76x0e DFS support
      
      * rework and fix tx status handling for mt76x0 and mt76x2
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e69fbf31
    • Stephen Hemminger's avatar
      linux/netlink.h: drop unnecessary extern prefix · aa9d6e0f
      Stephen Hemminger authored
      Don't need extern prefix before function prototypes.
      Checkpatch has complained about this for a couple of years.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa9d6e0f
    • David S. Miller's avatar
      Merge branch 'ipv4-Prevent-user-triggerable-warning' · 7de33309
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      net: ipv4: Prevent user triggerable warning
      
      Patch #1 prevents a user triaggerable warning in the flow dissector by
      setting 'skb->dev' in skbs used for IPv4 output route get requests.
      
      Patch #2 adds a test case that triggers the warning without the first
      patch.
      
      I have audited all the RTM_GETROUTE handlers and could not find any
      other callpath where an skb is passed to the flow dissector with both
      'skb->dev' and 'skb->sk' cleared.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7de33309
    • Ido Schimmel's avatar
      selftests: rtnetlink: Add a test case for multipath route get · 676f4bb1
      Ido Schimmel authored
      Without previous patch a warning would be generated upon multipath route
      get when FIB multipath hash policy is to use a 5-tuple for multipath
      hash calculation.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      676f4bb1
    • Ido Schimmel's avatar
      net: ipv4: Set skb->dev for output route resolution · 21f94775
      Ido Schimmel authored
      When user requests to resolve an output route, the kernel synthesizes
      an skb where the relevant parameters (e.g., source address) are set. The
      skb is then passed to ip_route_output_key_hash_rcu() which might call
      into the flow dissector in case a multipath route was hit and a nexthop
      needs to be selected based on the multipath hash.
      
      Since both 'skb->dev' and 'skb->sk' are not set, a warning is triggered
      in the flow dissector [1]. The warning is there to prevent codepaths
      from silently falling back to the standard flow dissector instead of the
      BPF one.
      
      Therefore, instead of removing the warning, set 'skb->dev' to the
      loopback device, as its not used for anything but resolving the correct
      namespace.
      
      [1]
      WARNING: CPU: 1 PID: 24819 at net/core/flow_dissector.c:764 __skb_flow_dissect+0x314/0x16b0
      ...
      RSP: 0018:ffffa0df41fdf650 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8bcded232000 RCX: 0000000000000000
      RDX: ffffa0df41fdf7e0 RSI: ffffffff98e415a0 RDI: ffff8bcded232000
      RBP: ffffa0df41fdf760 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffa0df41fdf7e8 R11: ffff8bcdf27a3000 R12: ffffffff98e415a0
      R13: ffffa0df41fdf7e0 R14: ffffffff98dd2980 R15: ffffa0df41fdf7e0
      FS:  00007f46f6897680(0000) GS:ffff8bcdf7a80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055933e95f9a0 CR3: 000000021e636000 CR4: 00000000001006e0
      Call Trace:
       fib_multipath_hash+0x28c/0x2d0
       ? fib_multipath_hash+0x28c/0x2d0
       fib_select_path+0x241/0x32f
       ? __fib_lookup+0x6a/0xb0
       ip_route_output_key_hash_rcu+0x650/0xa30
       ? __alloc_skb+0x9b/0x1d0
       inet_rtm_getroute+0x3f7/0xb80
       ? __alloc_pages_nodemask+0x11c/0x2c0
       rtnetlink_rcv_msg+0x1d9/0x2f0
       ? rtnl_calcit.isra.24+0x120/0x120
       netlink_rcv_skb+0x54/0x130
       rtnetlink_rcv+0x15/0x20
       netlink_unicast+0x20a/0x2c0
       netlink_sendmsg+0x2d1/0x3d0
       sock_sendmsg+0x39/0x50
       ___sys_sendmsg+0x2a0/0x2f0
       ? filemap_map_pages+0x16b/0x360
       ? __handle_mm_fault+0x108e/0x13d0
       __sys_sendmsg+0x63/0xa0
       ? __sys_sendmsg+0x63/0xa0
       __x64_sys_sendmsg+0x1f/0x30
       do_syscall_64+0x5a/0x120
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: d0e13a14 ("flow_dissector: lookup netns by skb->sk if skb->dev is NULL")
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21f94775
    • Steen Hegelund's avatar
      net: mscc: ocelot: Register poll timeout should be wall time not attempts · 639c1b26
      Steen Hegelund authored
      When doing indirect access in the Ocelot chip, a command is setup,
      issued and then we need to poll until the result is ready. The polling
      timeout is specified in milliseconds in the datasheet and not in
      register access attempts.
      It is not a bug on the currently supported platform, but we observed
      that the code does not work properly on other platforms that we want to
      support as the timing requirements there are different.
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      639c1b26
    • Colin Ian King's avatar
      neighbour: remove stray semicolon · 463561e6
      Colin Ian King authored
      Currently the stray semicolon means that the final term in the addition
      is being missed.  Fix this by removing it. Cleans up clang warning:
      
      net/core/neighbour.c:2821:9: warning: expression result unused [-Wunused-value]
      
      Fixes: 82cbb5c6 ("neighbour: register rtnl doit handler")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Acked-By: default avatarRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      463561e6
    • Tristram Ha's avatar
      net: dsa: microchip: fix unicast frame leak · 962ad710
      Tristram Ha authored
      Port partitioning is done by enabling UNICAST_VLAN_BOUNDARY and changing
      the default port membership of 0x7f to other values such that there is
      no communication between ports.  In KSZ9477 the member for port 1 is
      0x41; port 2, 0x42; port 3, 0x44; port 4, 0x48; port 5, 0x50; and port 7,
      0x60.  Port 6 is the host port.
      
      Setting a zero value can be used to stop port from receiving.
      
      However, when UNICAST_VLAN_BOUNDARY is disabled and the unicast addresses
      are already learned in the dynamic MAC table, setting zero still allows
      devices connected to those ports to communicate.  This does not apply to
      multicast and broadcast addresses though.  To prevent these leaks and
      make the function of port membership consistent UNICAST_VLAN_BOUNDARY
      should never be disabled.
      
      Note that UNICAST_VLAN_BOUNDARY is enabled by default in KSZ9477.
      
      Fixes: b987e98e ("dsa: add DSA switch driver for Microchip KSZ9477")
      Signed-off-by: default avatarTristram Ha <Tristram.Ha@microchip.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      962ad710
    • David S. Miller's avatar
      vxlan: Correct merge error. · 3a6d528a
      David S. Miller authored
      When resolving the conflict wrt. the vxlan_fdb_update call
      in vxlan_changelink() I made the last argument false instead
      of true.
      
      Fix this.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a6d528a
  2. 20 Dec, 2018 11 commits