1. 25 Jan, 2022 6 commits
  2. 24 Jan, 2022 22 commits
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · caaba961
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2022-01-24
      
      We've added 80 non-merge commits during the last 14 day(s) which contain
      a total of 128 files changed, 4990 insertions(+), 895 deletions(-).
      
      The main changes are:
      
      1) Add XDP multi-buffer support and implement it for the mvneta driver,
         from Lorenzo Bianconi, Eelco Chaudron and Toke Høiland-Jørgensen.
      
      2) Add unstable conntrack lookup helpers for BPF by using the BPF kfunc
         infra, from Kumar Kartikeya Dwivedi.
      
      3) Extend BPF cgroup programs to export custom ret value to userspace via
         two helpers bpf_get_retval() and bpf_set_retval(), from YiFei Zhu.
      
      4) Add support for AF_UNIX iterator batching, from Kuniyuki Iwashima.
      
      5) Complete missing UAPI BPF helper description and change bpf_doc.py script
         to enforce consistent & complete helper documentation, from Usama Arif.
      
      6) Deprecate libbpf's legacy BPF map definitions and streamline XDP APIs to
         follow tc-based APIs, from Andrii Nakryiko.
      
      7) Support BPF_PROG_QUERY for BPF programs attached to sockmap, from Di Zhu.
      
      8) Deprecate libbpf's bpf_map__def() API and replace users with proper getters
         and setters, from Christy Lee.
      
      9) Extend libbpf's btf__add_btf() with an additional hashmap for strings to
         reduce overhead, from Kui-Feng Lee.
      
      10) Fix bpftool and libbpf error handling related to libbpf's hashmap__new()
          utility function, from Mauricio Vásquez.
      
      11) Add support to BTF program names in bpftool's program dump, from Raman Shukhau.
      
      12) Fix resolve_btfids build to pick up host flags, from Connor O'Brien.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (80 commits)
        selftests, bpf: Do not yet switch to new libbpf XDP APIs
        selftests, xsk: Fix rx_full stats test
        bpf: Fix flexible_array.cocci warnings
        xdp: disable XDP_REDIRECT for xdp frags
        bpf: selftests: add CPUMAP/DEVMAP selftests for xdp frags
        bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest
        net: xdp: introduce bpf_xdp_pointer utility routine
        bpf: generalise tail call map compatibility check
        libbpf: Add SEC name for xdp frags programs
        bpf: selftests: update xdp_adjust_tail selftest to include xdp frags
        bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature
        bpf: introduce frags support to bpf_prog_test_run_xdp()
        bpf: move user_size out of bpf_test_init
        bpf: add frags support to xdp copy helpers
        bpf: add frags support to the bpf_xdp_adjust_tail() API
        bpf: introduce bpf_xdp_get_buff_len helper
        net: mvneta: enable jumbo frames if the loaded XDP program support frags
        bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program
        net: mvneta: add frags support to XDP_TX
        xdp: add frags support to xdp_return_{buff/frame}
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20220124221235.18993-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      caaba961
    • Daniel Borkmann's avatar
      selftests, bpf: Do not yet switch to new libbpf XDP APIs · 0bfb95f5
      Daniel Borkmann authored
      Revert commit 54435652 ("selftests/bpf: switch to new libbpf XDP APIs")
      for now given this will heavily conflict with 4b27480d ("bpf/selftests:
      convert xdp_link test to ASSERT_* macros") upon merge. Andrii agreed to redo
      the conversion cleanly after trees merged.
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      0bfb95f5
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-fixes-for-5.17-20220124' of... · e52984be
      Jakub Kicinski authored
      Merge tag 'linux-can-fixes-for-5.17-20220124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2022-01-24
      
      The first patch updates the email address of Brian Silverman from his
      former employer to his private address.
      
      The next patch fixes DT bindings information for the tcan4x5x SPI CAN
      driver.
      
      The following patch targets the m_can driver and fixes the
      introduction of FIFO bulk read support.
      
      Another patch for the tcan4x5x driver, which fixes the max register
      value for the regmap config.
      
      The last patch for the flexcan driver marks the RX mailbox support for
      the MCF5441X as support.
      
      * tag 'linux-can-fixes-for-5.17-20220124' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: flexcan: mark RX via mailboxes as supported on MCF5441X
        can: tcan4x5x: regmap: fix max register value
        can: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0
        dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
        mailmap: update email address of Brian Silverman
      ====================
      
      Link: https://lore.kernel.org/r/20220124175955.3464134-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      e52984be
    • Marc Kleine-Budde's avatar
      can: flexcan: mark RX via mailboxes as supported on MCF5441X · f04aefd4
      Marc Kleine-Budde authored
      Most flexcan IP cores support 2 RX modes:
      - FIFO
      - mailbox
      
      The flexcan IP core on the MCF5441X cannot receive CAN RTR messages
      via mailboxes. However the mailbox mode is more performant. The commit
      
      | 1c45f577 ("can: flexcan: add ethtool support to change rx-rtr setting during runtime")
      
      added support to switch from FIFO to mailbox mode on these cores.
      
      After testing the mailbox mode on the MCF5441X by Angelo Dureghello,
      this patch marks it (without RTR capability) as supported. Further the
      IP core overview table is updated, that RTR reception via mailboxes is
      not supported.
      
      Link: https://lore.kernel.org/all/20220121084425.3141218-1-mkl@pengutronix.deTested-by: default avatarAngelo Dureghello <angelo@kernel-space.org>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      f04aefd4
    • Marc Kleine-Budde's avatar
      can: tcan4x5x: regmap: fix max register value · e59986de
      Marc Kleine-Budde authored
      The MRAM of the tcan4x5x has a size of 2K and starts at 0x8000. There
      are no further registers in the tcan4x5x making 0x87fc the biggest
      addressable register.
      
      This patch fixes the max register value of the regmap config from
      0x8ffc to 0x87fc.
      
      Fixes: 6e1caaf8 ("can: tcan4x5x: fix max register value")
      Link: https://lore.kernel.org/all/20220119064011.2943292-1-mkl@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      e59986de
    • Marc Kleine-Budde's avatar
      can: m_can: m_can_fifo_{read,write}: don't read or write from/to FIFO if length is 0 · db72589c
      Marc Kleine-Budde authored
      In order to optimize FIFO access, especially on m_can cores attached
      to slow busses like SPI, in patch
      
      | e3938177 ("can: m_can: Disable IRQs on FIFO bus errors")
      
      bulk read/write support has been added to the m_can_fifo_{read,write}
      functions.
      
      That change leads to the tcan driver to call
      regmap_bulk_{read,write}() with a length of 0 (for CAN frames with 0
      data length). regmap treats this as an error:
      
      | tcan4x5x spi1.0 tcan4x5x0: FIFO write returned -22
      
      This patch fixes the problem by not calling the
      cdev->ops->{read,write)_fifo() in case of a 0 length read/write.
      
      Fixes: e3938177 ("can: m_can: Disable IRQs on FIFO bus errors")
      Link: https://lore.kernel.org/all/20220114155751.2651888-1-mkl@pengutronix.de
      Cc: stable@vger.kernel.org
      Cc: Matt Kline <matt@bitbashing.io>
      Cc: Chandrasekar Ramakrishnan <rcsekar@samsung.com>
      Reported-by: default avatarMichael Anochin <anochin@photo-meter.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      db72589c
    • Marc Kleine-Budde's avatar
      dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config · 17a30422
      Marc Kleine-Budde authored
      This tcan4x5x only comes with 2K of MRAM, a RX FIFO with a dept of 32
      doesn't fit into the MRAM. Use a depth of 16 instead.
      
      Fixes: 4edd396a ("dt-bindings: can: tcan4x5x: Add DT bindings for TCAN4x5X driver")
      Link: https://lore.kernel.org/all/20220119062951.2939851-1-mkl@pengutronix.deSigned-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      17a30422
    • Marc Kleine-Budde's avatar
      mailmap: update email address of Brian Silverman · 984d1eff
      Marc Kleine-Budde authored
      Brian Silverman's address at bluerivertech.com is not valid anymore,
      use Brian's private email address instead.
      
      Link: https://lore.kernel.org/all/20220110082359.2019735-1-mkl@pengutronix.de
      Cc: Brian Silverman <bsilver16384@gmail.com>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      984d1eff
    • Magnus Karlsson's avatar
      selftests, xsk: Fix rx_full stats test · b4ec6a19
      Magnus Karlsson authored
      Fix the rx_full stats test so that it correctly reports pass even when
      the fill ring is not full of buffers.
      
      Fixes: 872a1184 ("selftests: xsk: Put the same buffer only once in the fill ring")
      Signed-off-by: default avatarMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Acked-by: default avatarMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Link: https://lore.kernel.org/bpf/20220121123508.12759-1-magnus.karlsson@gmail.com
      b4ec6a19
    • kernel test robot's avatar
      bpf: Fix flexible_array.cocci warnings · ed8bb032
      kernel test robot authored
      Zero-length and one-element arrays are deprecated, see:
      Documentation/process/deprecated.rst
      
      Flexible-array members should be used instead.
      
      Generated by: scripts/coccinelle/misc/flexible_array.cocci
      
      Fixes: c1ff181f ("selftests/bpf: Extend kfunc selftests")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarkernel test robot <lkp@intel.com>
      Signed-off-by: default avatarJulia Lawall <julia.lawall@inria.fr>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/bpf/alpine.DEB.2.22.394.2201221206320.12220@hadrien
      ed8bb032
    • Jisheng Zhang's avatar
      net: stmmac: remove unused members in struct stmmac_priv · de8a820d
      Jisheng Zhang authored
      The tx_coalesce and mii_irq are not used at all now, so remove them.
      Signed-off-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de8a820d
    • Christophe JAILLET's avatar
      net: atlantic: Use the bitmap API instead of hand-writing it · ebe0582b
      Christophe JAILLET authored
      Simplify code by using bitmap_weight() and bitmap_zero() instead of
      hand-writing these functions.
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarIgor Russkikh <irusskikh@marvell.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ebe0582b
    • Xin Long's avatar
      ping: fix the sk_bound_dev_if match in ping_lookup · 2afc3b5a
      Xin Long authored
      When 'ping' changes to use PING socket instead of RAW socket by:
      
         # sysctl -w net.ipv4.ping_group_range="0 100"
      
      the selftests 'router_broadcast.sh' will fail, as such command
      
        # ip vrf exec vrf-h1 ping -I veth0 198.51.100.255 -b
      
      can't receive the response skb by the PING socket. It's caused by mismatch
      of sk_bound_dev_if and dif in ping_rcv() when looking up the PING socket,
      as dif is vrf-h1 if dif's master was set to vrf-h1.
      
      This patch is to fix this regression by also checking the sk_bound_dev_if
      against sdif so that the packets can stil be received even if the socket
      is not bound to the vrf device but to the real iif.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Reported-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2afc3b5a
    • Wen Gu's avatar
      net/smc: Transitional solution for clcsock race issue · c0bf3d8a
      Wen Gu authored
      We encountered a crash in smc_setsockopt() and it is caused by
      accessing smc->clcsock after clcsock was released.
      
       BUG: kernel NULL pointer dereference, address: 0000000000000020
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 1 PID: 50309 Comm: nginx Kdump: loaded Tainted: G E     5.16.0-rc4+ #53
       RIP: 0010:smc_setsockopt+0x59/0x280 [smc]
       Call Trace:
        <TASK>
        __sys_setsockopt+0xfc/0x190
        __x64_sys_setsockopt+0x20/0x30
        do_syscall_64+0x34/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
       RIP: 0033:0x7f16ba83918e
        </TASK>
      
      This patch tries to fix it by holding clcsock_release_lock and
      checking whether clcsock has already been released before access.
      
      In case that a crash of the same reason happens in smc_getsockopt()
      or smc_switch_to_fallback(), this patch also checkes smc->clcsock
      in them too. And the caller of smc_switch_to_fallback() will identify
      whether fallback succeeds according to the return value.
      
      Fixes: fd57770d ("net/smc: wait for pending work before clcsock release_sock")
      Link: https://lore.kernel.org/lkml/5dd7ffd1-28e2-24cc-9442-1defec27375e@linux.ibm.com/T/Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0bf3d8a
    • Sukadev Bhattiprolu's avatar
      ibmvnic: remove unused ->wait_capability · 3a5d9db7
      Sukadev Bhattiprolu authored
      With previous bug fix, ->wait_capability flag is no longer needed and can
      be removed.
      
      Fixes: 249168ad ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3a5d9db7
    • Sukadev Bhattiprolu's avatar
      ibmvnic: don't spin in tasklet · 48079e7f
      Sukadev Bhattiprolu authored
      ibmvnic_tasklet() continuously spins waiting for responses to all
      capability requests. It does this to avoid encountering an error
      during initialization of the vnic. However if there is a bug in the
      VIOS and we do not receive a response to one or more queries the
      tasklet ends up spinning continuously leading to hard lock ups.
      
      If we fail to receive a message from the VIOS it is reasonable to
      timeout the login attempt rather than spin indefinitely in the tasklet.
      
      Fixes: 249168ad ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48079e7f
    • Sukadev Bhattiprolu's avatar
      ibmvnic: init ->running_cap_crqs early · 151b6a5c
      Sukadev Bhattiprolu authored
      We use ->running_cap_crqs to determine when the ibmvnic_tasklet() should
      send out the next protocol message type. i.e when we get back responses
      to all our QUERY_CAPABILITY CRQs we send out REQUEST_CAPABILITY crqs.
      Similiary, when we get responses to all the REQUEST_CAPABILITY crqs, we
      send out the QUERY_IP_OFFLOAD CRQ.
      
      We currently increment ->running_cap_crqs as we send out each CRQ and
      have the ibmvnic_tasklet() send out the next message type, when this
      running_cap_crqs count drops to 0.
      
      This assumes that all the CRQs of the current type were sent out before
      the count drops to 0. However it is possible that we send out say 6 CRQs,
      get preempted and receive all the 6 responses before we send out the
      remaining CRQs. This can result in ->running_cap_crqs count dropping to
      zero before all messages of the current type were sent and we end up
      sending the next protocol message too early.
      
      Instead initialize the ->running_cap_crqs upfront so the tasklet will
      only send the next protocol message after all responses are received.
      
      Use the cap_reqs local variable to also detect any discrepancy (either
      now or in future) in the number of capability requests we actually send.
      
      Currently only send_query_cap() is affected by this behavior (of sending
      next message early) since it is called from the worker thread (during
      reset) and from application thread (during ->ndo_open()) and they can be
      preempted. send_request_cap() is only called from the tasklet  which
      processes CRQ responses sequentially, is not be affected.  But to
      maintain the existing symmtery with send_query_capability() we update
      send_request_capability() also.
      
      Fixes: 249168ad ("ibmvnic: Make CRQ interrupt tasklet wait for all capabilities crqs")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      151b6a5c
    • Sukadev Bhattiprolu's avatar
      ibmvnic: Allow extra failures before disabling · db9f0e8b
      Sukadev Bhattiprolu authored
      If auto-priority-failover (APF) is enabled and there are at least two
      backing devices of different priorities, some resets like fail-over,
      change-param etc can cause at least two back to back failovers. (Failover
      from high priority backing device to lower priority one and then back
      to the higher priority one if that is still functional).
      
      Depending on the timimg of the two failovers it is possible to trigger
      a "hard" reset and for the hard reset to fail due to failovers. When this
      occurs, the driver assumes that the network is unstable and disables the
      VNIC for a 60-second "settling time". This in turn can cause the ethtool
      command to fail with "No such device" while the vnic automatically recovers
      a little while later.
      
      Given that it's possible to have two back to back failures, allow for extra
      failures before disabling the vnic for the settling time.
      
      Fixes: f15fde9d ("ibmvnic: delay next reset if hard reset fails")
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@linux.ibm.com>
      Reviewed-by: default avatarDany Madden <drt@linux.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      db9f0e8b
    • Jakub Kicinski's avatar
      ipv4: fix ip option filtering for locally generated fragments · 27a8caa5
      Jakub Kicinski authored
      During IP fragmentation we sanitize IP options. This means overwriting
      options which should not be copied with NOPs. Only the first fragment
      has the original, full options.
      
      ip_fraglist_prepare() copies the IP header and options from previous
      fragment to the next one. Commit 19c3401a ("net: ipv4: place control
      buffer handling away from fragmentation iterators") moved sanitizing
      options before ip_fraglist_prepare() which means options are sanitized
      and then overwritten again with the old values.
      
      Fixing this is not enough, however, nor did the sanitization work
      prior to aforementioned commit.
      
      ip_options_fragment() (which does the sanitization) uses ipcb->opt.optlen
      for the length of the options. ipcb->opt of fragments is not populated
      (it's 0), only the head skb has the state properly built. So even when
      called at the right time ip_options_fragment() does nothing. This seems
      to date back all the way to v2.5.44 when the fast path for pre-fragmented
      skbs had been introduced. Prior to that ip_options_build() would have been
      called for every fragment (in fact ever since v2.5.44 the fragmentation
      handing in ip_options_build() has been dead code, I'll clean it up in
      -next).
      
      In the original patch (see Link) caixf mentions fixing the handling
      for fragments other than the second one, but I'm not sure how _any_
      fragment could have had their options sanitized with the code
      as it stood.
      
      Tested with python (MTU on lo lowered to 1000 to force fragmentation):
      
        import socket
        s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        s.setsockopt(socket.IPPROTO_IP, socket.IP_OPTIONS,
                     bytearray([7,4,5,192, 20|0x80,4,1,0]))
        s.sendto(b'1'*2000, ('127.0.0.1', 1234))
      
      Before:
      
      IP (tos 0x0, ttl 64, id 1053, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
          localhost.36500 > localhost.search-agent: UDP, length 2000
      IP (tos 0x0, ttl 64, id 1053, offset 968, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
          localhost > localhost: udp
      IP (tos 0x0, ttl 64, id 1053, offset 1936, flags [none], proto UDP (17), length 100, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
          localhost > localhost: udp
      
      After:
      
      IP (tos 0x0, ttl 96, id 42549, offset 0, flags [+], proto UDP (17), length 996, options (RR [bad length 4] [bad ptr 5] 192.148.4.1,,RA value 256))
          localhost.51607 > localhost.search-agent: UDP, bad length 2000 > 960
      IP (tos 0x0, ttl 96, id 42549, offset 968, flags [+], proto UDP (17), length 996, options (NOP,NOP,NOP,NOP,RA value 256))
          localhost > localhost: udp
      IP (tos 0x0, ttl 96, id 42549, offset 1936, flags [none], proto UDP (17), length 100, options (NOP,NOP,NOP,NOP,RA value 256))
          localhost > localhost: udp
      
      RA (20 | 0x80) is now copied as expected, RR (7) is "NOPed out".
      
      Link: https://lore.kernel.org/netdev/20220107080559.122713-1-ooppublic@163.com/
      Fixes: 19c3401a ("net: ipv4: place control buffer handling away from fragmentation iterators")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarcaixf <ooppublic@163.com>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27a8caa5
    • Jianguo Wu's avatar
      net-procfs: show net devices bound packet types · 1d10f8a1
      Jianguo Wu authored
      After commit:7866a621 ("dev: add per net_device packet type chains"),
      we can not get packet types that are bound to a specified net device by
      /proc/net/ptype, this patch fix the regression.
      
      Run "tcpdump -i ens192 udp -nns0" Before and after apply this patch:
      
      Before:
        [root@localhost ~]# cat /proc/net/ptype
        Type Device      Function
        0800          ip_rcv
        0806          arp_rcv
        86dd          ipv6_rcv
      
      After:
        [root@localhost ~]# cat /proc/net/ptype
        Type Device      Function
        ALL  ens192   tpacket_rcv
        0800          ip_rcv
        0806          arp_rcv
        86dd          ipv6_rcv
      
      v1 -> v2:
        - fix the regression rather than adding new /proc API as
          suggested by Stephen Hemminger.
      
      Fixes: 7866a621 ("dev: add per net_device packet type chains")
      Signed-off-by: default avatarJianguo Wu <wujianguo@chinatelecom.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1d10f8a1
    • Hangbin Liu's avatar
      bonding: use rcu_dereference_rtnl when get bonding active slave · aa603467
      Hangbin Liu authored
      bond_option_active_slave_get_rcu() should not be used in rtnl_mutex as it
      use rcu_dereference(). Replace to rcu_dereference_rtnl() so we also can use
      this function in rtnl protected context.
      
      With this update, we can rmeove the rcu_read_lock/unlock in
      bonding .ndo_eth_ioctl and .get_ts_info.
      Reported-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Fixes: 94dd016a ("bond: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to active device")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa603467
    • Marek Behún's avatar
      net: sfp: ignore disabled SFP node · 2148927e
      Marek Behún authored
      Commit ce0aa27f ("sfp: add sfp-bus to bridge between network devices
      and sfp cages") added code which finds SFP bus DT node even if the node
      is disabled with status = "disabled". Because of this, when phylink is
      created, it ends with non-null .sfp_bus member, even though the SFP
      module is not probed (because the node is disabled).
      
      We need to ignore disabled SFP bus node.
      
      Fixes: ce0aa27f ("sfp: add sfp-bus to bridge between network devices and sfp cages")
      Signed-off-by: default avatarMarek Behún <kabel@kernel.org>
      Cc: stable@vger.kernel.org # 2203cbf2 ("net: sfp: move fwnode parsing into sfp-bus layer")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2148927e
  3. 22 Jan, 2022 3 commits
  4. 21 Jan, 2022 9 commits
    • Alexei Starovoitov's avatar
      Merge branch 'mvneta: introduce XDP multi-buffer support' · a9921ce1
      Alexei Starovoitov authored
      Lorenzo Bianconi says:
      
      ====================
      
      This series introduces XDP frags support. The mvneta driver is
      the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
      please focus on how these new types of xdp_{buff,frame} packets
      traverse the different layers and the layout design. It is on purpose
      that BPF-helpers are kept simple, as we don't want to expose the
      internal layout to allow later changes.
      
      The main idea for the new XDP frags layout is to reuse the same
      structure used for non-linear SKB. This rely on the "skb_shared_info"
      struct at the end of the first buffer to link together subsequent
      buffers. Keeping the layout compatible with SKBs is also done to ease
      and speedup creating a SKB from an xdp_{buff,frame}.
      Converting xdp_frame to SKB and deliver it to the network stack is shown
      in patch 05/18 (e.g. cpumaps).
      
      A frags bit (XDP_FLAGS_HAS_FRAGS) has been introduced in the flags
      field of xdp_{buff,frame} structure to notify the bpf/network layer if
      this is a non-linear xdp frame (XDP_FLAGS_HAS_FRAGS set) or not
      (XDP_FLAGS_HAS_FRAGS not set).
      The frags bit will be set by a xdp frags capable driver only
      for non-linear frames maintaining the capability to receive linear frames
      without any extra cost since the skb_shared_info structure at the end
      of the first buffer will be initialized only if XDP_FLAGS_HAS_FRAGS bit
      is set. Moreover the flags field in xdp_{buff,frame} will be reused even for
      xdp rx csum offloading in future series.
      
      Typical use cases for this series are:
      - Jumbo-frames
      - Packet header split (please see Google’s use-case @ NetDevConf 0x14, [0])
      - TSO/GRO for XDP_REDIRECT
      
      The three following ebpf helpers (and related selftests) has been introduced:
      - bpf_xdp_load_bytes:
        This helper is provided as an easy way to load data from a xdp buffer. It
        can be used to load len bytes from offset from the frame associated to
        xdp_md, into the buffer pointed by buf.
      - bpf_xdp_store_bytes:
        Store len bytes from buffer buf into the frame associated to xdp_md, at
        offset.
      - bpf_xdp_get_buff_len:
        Return the total frame size (linear + paged parts)
      
      bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
      account non-linear xdp frames.
      Moreover, similar to skb_header_pointer, we introduced bpf_xdp_pointer utility
      routine to return a pointer to a given position in the xdp_buff if the
      requested area (offset + len) is contained in a contiguous memory area
      otherwise it must be copied in a bounce buffer provided by the caller running
      bpf_xdp_copy_buf().
      
      BPF_F_XDP_HAS_FRAGS flag has been introduced to notify the kernel the
      eBPF program fully support xdp frags.
      SEC("xdp.frags"), SEC_DEF("xdp.frags/devmap") and SEC_DEF("xdp.frags/cpumap")
      have been introduced to declare xdp frags support.
      The NIC driver is expected to reject an eBPF program if it is running in
      XDP frags mode and the program does not support XDP frags.
      In the same way it is not possible to mix XDP frags and XDP legacy
      programs in a CPUMAP/DEVMAP or tailcall a XDP frags/legacy program from
      a legacy/frags one.
      
      More info about the main idea behind this approach can be found here [1][2].
      
      Changes since v22:
      - remove leftover CHECK macro usage
      - reintroduce SEC_XDP_FRAGS flag in sec_def_flags
      - rename xdp multi_frags in xdp frags
      - do not report xdp_frags support in fdinfo
      
      Changes since v21:
      - rename *_mb in *_frags: e.g:
        s/xdp_buff_is_mb/xdp_buff_has_frags
      - rely on ASSERT_* and not on CHECK in
        bpf_xdp_load_bytes/bpf_xdp_store_bytes self-tests
      - change new multi.frags SEC definitions to use the following schema:
        prog_type.prog_flags/attach_place
      - get rid of unnecessary properties in new multi.frags SEC definitions
      - rebase on top of bpf-next
      
      Changes since v20:
      - rebase to current bpf-next
      
      Changes since v19:
      - do not run deprecated bpf_prog_load()
      - rely on skb_frag_size_add/skb_frag_size_sub in
        bpf_xdp_mb_increase_tail/bpf_xdp_mb_shrink_tail
      - rely on sinfo->nr_frags in bpf_xdp_mb_shrink_tail to check if the frame has
        been shrunk to a single-buffer one
      - allow XDP_REDIRECT of a xdp-mb frame into a CPUMAP
      
      Changes since v18:
      - fix bpf_xdp_copy_buf utility routine when we want to load/store data
        contained in frag<n>
      - add a selftest for bpf_xdp_load_bytes/bpf_xdp_store_bytes when the caller
        accesses data contained in frag<n> and frag<n+1>
      
      Changes since v17:
      - rework bpf_xdp_copy to squash base and frag management
      - remove unused variable in bpf_xdp_mb_shrink_tail()
      - move bpf_xdp_copy_buf() out of bpf_xdp_pointer()
      - add sanity check for len in bpf_xdp_pointer()
      - remove EXPORT_SYMBOL for __xdp_return()
      - introduce frag_size field in xdp_rxq_info to let the driver specify max value
        for xdp fragments. frag_size set to 0 means the tail increase of last the
        fragment is not supported.
      
      Changes since v16:
      - do not allow tailcalling a xdp multi-buffer/legacy program from a
        legacy/multi-buff one.
      - do not allow mixing xdp multi-buffer and xdp legacy programs in a
        CPUMAP/DEVMAP
      - add selftests for CPUMAP/DEVMAP xdp mb compatibility
      - disable XDP_REDIRECT for xdp multi-buff for the moment
      - set max offset value to 0xffff in bpf_xdp_pointer
      - use ARG_PTR_TO_UNINIT_MEM and ARG_CONST_SIZE for arg3_type and arg4_type
        of bpf_xdp_store_bytes/bpf_xdp_load_bytes
      
      Changes since v15:
      - let the verifier check buf is not NULL in
        bpf_xdp_load_bytes/bpf_xdp_store_bytes helpers
      - return an error if offset + length is over frame boundaries in
        bpf_xdp_pointer routine
      - introduce BPF_F_XDP_MB flag for bpf_attr to notify the kernel the eBPF
        program fully supports xdp multi-buffer.
      - reject a non XDP multi-buffer program if the driver is running in
        XDP multi-buffer mode.
      
      Changes since v14:
      - intrudce bpf_xdp_pointer utility routine and
        bpf_xdp_load_bytes/bpf_xdp_store_bytes helpers
      - drop bpf_xdp_adjust_data helper
      - drop xdp_frags_truesize in skb_shared_info
      - explode bpf_xdp_mb_adjust_tail in bpf_xdp_mb_increase_tail and
        bpf_xdp_mb_shrink_tail
      
      Changes since v13:
      - use u32 for xdp_buff/xdp_frame flags field
      - rename xdp_frags_tsize in xdp_frags_truesize
      - fixed comments
      
      Changes since v12:
      - fix bpf_xdp_adjust_data helper for single-buffer use case
      - return -EFAULT in bpf_xdp_adjust_{head,tail} in case the data pointers are not
        properly reset
      - collect ACKs from John
      
      Changes since v11:
      - add missing static to bpf_xdp_get_buff_len_proto structure
      - fix bpf_xdp_adjust_data helper when offset is smaller than linear area length.
      
      Changes since v10:
      - move xdp->data to the requested payload offset instead of to the beginning of
        the fragment in bpf_xdp_adjust_data()
      
      Changes since v9:
      - introduce bpf_xdp_adjust_data helper and related selftest
      - add xdp_frags_size and xdp_frags_tsize fields in skb_shared_info
      - introduce xdp_update_skb_shared_info utility routine in ordere to not reset
        frags array in skb_shared_info converting from a xdp_buff/xdp_frame to a skb
      - simplify bpf_xdp_copy routine
      
      Changes since v8:
      - add proper dma unmapping if XDP_TX fails on mvneta for a xdp multi-buff
      - switch back to skb_shared_info implementation from previous xdp_shared_info
        one
      - avoid using a bietfield in xdp_buff/xdp_frame since it introduces performance
        regressions. Tested now on 10G NIC (ixgbe) to verify there are no performance
        penalties for regular codebase
      - add bpf_xdp_get_buff_len helper and remove frame_length field in xdp ctx
      - add data_len field in skb_shared_info struct
      - introduce XDP_FLAGS_FRAGS_PF_MEMALLOC flag
      
      Changes since v7:
      - rebase on top of bpf-next
      - fix sparse warnings
      - improve comments for frame_length in include/net/xdp.h
      
      Changes since v6:
      - the main difference respect to previous versions is the new approach proposed
        by Eelco to pass full length of the packet to eBPF layer in XDP context
      - reintroduce multi-buff support to eBPF kself-tests
      - reintroduce multi-buff support to bpf_xdp_adjust_tail helper
      - introduce multi-buffer support to bpf_xdp_copy helper
      - rebase on top of bpf-next
      
      Changes since v5:
      - rebase on top of bpf-next
      - initialize mb bit in xdp_init_buff() and drop per-driver initialization
      - drop xdp->mb initialization in xdp_convert_zc_to_xdp_frame()
      - postpone introduction of frame_length field in XDP ctx to another series
      - minor changes
      
      Changes since v4:
      - rebase ontop of bpf-next
      - introduce xdp_shared_info to build xdp multi-buff instead of using the
        skb_shared_info struct
      - introduce frame_length in xdp ctx
      - drop previous bpf helpers
      - fix bpf_xdp_adjust_tail for xdp multi-buff
      - introduce xdp multi-buff self-tests for bpf_xdp_adjust_tail
      - fix xdp_return_frame_bulk for xdp multi-buff
      
      Changes since v3:
      - rebase ontop of bpf-next
      - add patch 10/13 to copy back paged data from a xdp multi-buff frame to
        userspace buffer for xdp multi-buff selftests
      
      Changes since v2:
      - add throughput measurements
      - drop bpf_xdp_adjust_mb_header bpf helper
      - introduce selftest for xdp multibuffer
      - addressed comments on bpf_xdp_get_frags_count
      - introduce xdp multi-buff support to cpumaps
      
      Changes since v1:
      - Fix use-after-free in xdp_return_{buff/frame}
      - Introduce bpf helpers
      - Introduce xdp_mb sample program
      - access skb_shared_info->nr_frags only on the last fragment
      
      Changes since RFC:
      - squash multi-buffer bit initialization in a single patch
      - add mvneta non-linear XDP buff support for tx side
      
      [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
      [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
      [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)
      
      Eelco Chaudron (3):
        bpf: add frags support to the bpf_xdp_adjust_tail() API
        bpf: add frags support to xdp copy helpers
        bpf: selftests: update xdp_adjust_tail selftest to include xdp frags
      
      Lorenzo Bianconi (19):
        net: skbuff: add size metadata to skb_shared_info for xdp
        xdp: introduce flags field in xdp_buff/xdp_frame
        net: mvneta: update frags bit before passing the xdp buffer to eBPF
          layer
        net: mvneta: simplify mvneta_swbm_add_rx_fragment management
        net: xdp: add xdp_update_skb_shared_info utility routine
        net: marvell: rely on xdp_update_skb_shared_info utility routine
        xdp: add frags support to xdp_return_{buff/frame}
        net: mvneta: add frags support to XDP_TX
        bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf
          program
        net: mvneta: enable jumbo frames if the loaded XDP program support
          frags
        bpf: introduce bpf_xdp_get_buff_len helper
        bpf: move user_size out of bpf_test_init
        bpf: introduce frags support to bpf_prog_test_run_xdp()
        bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
          signature
        libbpf: Add SEC name for xdp frags programs
        net: xdp: introduce bpf_xdp_pointer utility routine
        bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest
        bpf: selftests: add CPUMAP/DEVMAP selftests for xdp frags
        xdp: disable XDP_REDIRECT for xdp frags
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a9921ce1
    • Lorenzo Bianconi's avatar
      xdp: disable XDP_REDIRECT for xdp frags · ab0db463
      Lorenzo Bianconi authored
      XDP_REDIRECT is not fully supported yet for xdp frags since not
      all XDP capable drivers can map non-linear xdp_frame in ndo_xdp_xmit
      so disable it for the moment.
      Acked-by: default avatarToke Hoiland-Jorgensen <toke@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/0da25e117d0e2673f5d0ce6503393c55c6fb1be9.1642758637.git.lorenzo@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ab0db463
    • Lorenzo Bianconi's avatar
    • Lorenzo Bianconi's avatar
      bpf: selftests: introduce bpf_xdp_{load,store}_bytes selftest · 6db28e24
      Lorenzo Bianconi authored
      Introduce kernel selftest for new bpf_xdp_{load,store}_bytes helpers.
      and bpf_xdp_pointer/bpf_xdp_copy_buf utility routines.
      Acked-by: default avatarToke Hoiland-Jorgensen <toke@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/2c99ae663a5dcfbd9240b1d0489ad55dea4f4601.1642758637.git.lorenzo@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      6db28e24
    • Lorenzo Bianconi's avatar
      net: xdp: introduce bpf_xdp_pointer utility routine · 3f364222
      Lorenzo Bianconi authored
      Similar to skb_header_pointer, introduce bpf_xdp_pointer utility routine
      to return a pointer to a given position in the xdp_buff if the requested
      area (offset + len) is contained in a contiguous memory area otherwise it
      will be copied in a bounce buffer provided by the caller.
      Similar to the tc counterpart, introduce the two following xdp helpers:
      - bpf_xdp_load_bytes
      - bpf_xdp_store_bytes
      Reviewed-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Acked-by: default avatarToke Hoiland-Jorgensen <toke@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/ab285c1efdd5b7a9d361348b1e7d3ef49f6382b3.1642758637.git.lorenzo@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3f364222
    • Toke Hoiland-Jorgensen's avatar
      bpf: generalise tail call map compatibility check · f45d5b6c
      Toke Hoiland-Jorgensen authored
      The check for tail call map compatibility ensures that tail calls only
      happen between maps of the same type. To ensure backwards compatibility for
      XDP frags we need a similar type of check for cpumap and devmap
      programs, so move the state from bpf_array_aux into bpf_map, add
      xdp_has_frags to the check, and apply the same check to cpumap and devmap.
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Co-developed-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarToke Hoiland-Jorgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/f19fd97c0328a39927f3ad03e1ca6b43fd53cdfd.1642758637.git.lorenzo@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f45d5b6c
    • Lorenzo Bianconi's avatar
      libbpf: Add SEC name for xdp frags programs · 082c4bfb
      Lorenzo Bianconi authored
      Introduce support for the following SEC entries for XDP frags
      property:
      - SEC("xdp.frags")
      - SEC("xdp.frags/devmap")
      - SEC("xdp.frags/cpumap")
      Acked-by: default avatarToke Hoiland-Jorgensen <toke@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/af23b6e4841c171ad1af01917839b77847a4bc27.1642758637.git.lorenzo@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      082c4bfb
    • Eelco Chaudron's avatar
    • Lorenzo Bianconi's avatar
      bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature · 7855e0db
      Lorenzo Bianconi authored
      introduce xdp_shared_info pointer in bpf_test_finish signature in order
      to copy back paged data from a xdp frags frame to userspace buffer
      Acked-by: default avatarToke Hoiland-Jorgensen <toke@redhat.com>
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Link: https://lore.kernel.org/r/c803673798c786f915bcdd6c9338edaa9740d3d6.1642758637.git.lorenzo@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7855e0db