1. 16 Dec, 2022 7 commits
    • Jakub Kicinski's avatar
      Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 13e3c779
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2022-12-16
      
      We've added 7 non-merge commits during the last 2 day(s) which contain
      a total of 9 files changed, 119 insertions(+), 36 deletions(-).
      
      1) Fix for recent syzkaller XDP dispatcher update splat, from Jiri Olsa.
      
      2) Fix BPF program refcount leak in LSM attachment failure path,
         from Milan Landaverde.
      
      3) Fix BPF program type in map compatibility check for fext,
         from Toke Høiland-Jørgensen.
      
      4) Fix a BPF selftest compilation error under !CONFIG_SMP config,
         from Yonghong Song.
      
      5) Fix CI to enable CONFIG_FUNCTION_ERROR_INJECTION after it got changed
         to a prompt, from Song Liu.
      
      6) Various BPF documentation fixes for socket local storage,
         from Donald Hunter.
      
      * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP program
        bpf: Resolve fext program type when checking map compatibility
        bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func
        bpf: prevent leak of lsm program after failed attach
        selftests/bpf: Select CONFIG_FUNCTION_ERROR_INJECTION
        selftests/bpf: Fix a selftest compilation error with CONFIG_SMP=n
        docs/bpf: Reword docs for BPF_MAP_TYPE_SK_STORAGE
      ====================
      
      Link: https://lore.kernel.org/r/20221216174540.16598-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      13e3c779
    • Eelco Chaudron's avatar
      openvswitch: Fix flow lookup to use unmasked key · 68bb1010
      Eelco Chaudron authored
      The commit mentioned below causes the ovs_flow_tbl_lookup() function
      to be called with the masked key. However, it's supposed to be called
      with the unmasked key. This due to the fact that the datapath supports
      installing wider flows, and OVS relies on this behavior. For example
      if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
      flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
      128.0.0.0) is allowed to be added.
      
      However, if we try to add a wildcard rule, the installation fails:
      
      $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
        ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
      $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
        ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
      ovs-vswitchd: updating flow table (File exists)
      
      The reason is that the key used to determine if the flow is already
      present in the system uses the original key ANDed with the mask.
      This results in the IP address not being part of the (miniflow) key,
      i.e., being substituted with an all-zero value. When doing the actual
      lookup, this results in the key wrongfully matching the first flow,
      and therefore the flow does not get installed.
      
      This change reverses the commit below, but rather than having the key
      on the stack, it's allocated.
      
      Fixes: 190aa3e7 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.")
      Signed-off-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      68bb1010
    • David S. Miller's avatar
      Merge branch 'devlink-fixes' · 3e31d209
      David S. Miller authored
      Jakub Kicinski says:
      
      ====================
      devlink: region snapshot locking fix and selftest adjustments
      
      Minor fix for region snapshot locking and adjustments to selftests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e31d209
    • Jakub Kicinski's avatar
      selftests: devlink: add a warning for interfaces coming up · d1c4a346
      Jakub Kicinski authored
      NetworkManager (and other daemons) may bring the interface up
      and cause failures in quiescence checks. Print a helpful warning,
      and take the interface down again.
      
      I seem to forget about this every time I run these tests on a new VM.
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d1c4a346
    • Jakub Kicinski's avatar
      selftests: devlink: fix the fd redirect in dummy_reporter_test · 2fc60e2f
      Jakub Kicinski authored
      $number + > bash means redirect FD $number, e.g. commonly
      used 2> redirects stderr (fd 2). The test uses 8192> to
      write the number 8192 to a file, this results in:
      
        ./devlink.sh: line 499: 8192: Bad file descriptor
      
      Oddly the test also papers over this issue by checking
      for failure (expecting an error rather than success)
      so it passes, anyway.
      
      Fixes: ff18176a ("selftests: Add a test of large binary to devlink health test")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2fc60e2f
    • Jakub Kicinski's avatar
      devlink: hold region lock when flushing snapshots · b4cafb3d
      Jakub Kicinski authored
      Netdevsim triggers a splat on reload, when it destroys regions
      with snapshots pending:
      
        WARNING: CPU: 1 PID: 787 at net/core/devlink.c:6291 devlink_region_snapshot_del+0x12e/0x140
        CPU: 1 PID: 787 Comm: devlink Not tainted 6.1.0-07460-g7ae9888d #580
        RIP: 0010:devlink_region_snapshot_del+0x12e/0x140
        Call Trace:
         <TASK>
         devl_region_destroy+0x70/0x140
         nsim_dev_reload_down+0x2f/0x60 [netdevsim]
         devlink_reload+0x1f7/0x360
         devlink_nl_cmd_reload+0x6ce/0x860
         genl_family_rcv_msg_doit.isra.0+0x145/0x1c0
      
      This is the locking assert in devlink_region_snapshot_del(),
      we're supposed to be holding the region->snapshot_lock here.
      
      Fixes: 2dec18ad ("net: devlink: remove region snapshots list dependency on devlink->lock")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b4cafb3d
    • Daniel Golle's avatar
      net: dsa: mt7530: remove redundant assignment · 32f1002e
      Daniel Golle authored
      Russell King correctly pointed out that the MAC_2500FD capability is
      already added for port 5 (if not in RGMII mode) and port 6 (which only
      supports SGMII) by mt7531_mac_port_get_caps. Remove the reduntant
      setting of this capability flag which was added by a previous commit.
      
      Fixes: e19de30d ("net: dsa: mt7530: add support for in-band link status")
      Reported-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: default avatarRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: default avatarDaniel Golle <daniel@makrotopia.org>
      Link: https://lore.kernel.org/r/Y5qY7x6la5TxZxzX@makrotopia.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      32f1002e
  2. 15 Dec, 2022 9 commits
    • Vladimir Oltean's avatar
      net: dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port() · a7d82367
      Vladimir Oltean authored
      In the blamed commit, it was not noticed that one implementation of
      chip->info->ops->phylink_get_caps(), called by mv88e6xxx_get_caps(),
      may access hardware registers, and in doing so, it takes the
      mv88e6xxx_reg_lock(). Namely, this is mv88e6352_phylink_get_caps().
      
      This is a problem because mv88e6xxx_get_caps(), apart from being
      a top-level function (method invoked by dsa_switch_ops), is now also
      directly called from mv88e6xxx_setup_port(), which runs under the
      mv88e6xxx_reg_lock() taken by mv88e6xxx_setup(). Therefore, when running
      on mv88e6352, the reg_lock would be acquired a second time and the
      system would deadlock on driver probe.
      
      The things that mv88e6xxx_setup() can compete with in terms of register
      access with are the IRQ handlers and MDIO bus operations registered by
      mv88e6xxx_probe(). So there is a real need to acquire the register lock.
      
      The register lock can, in principle, be dropped and re-acquired pretty
      much at will within the driver, as long as no operations that involve
      waiting for indirect access to complete (essentially, callers of
      mv88e6xxx_smi_direct_wait() and mv88e6xxx_wait_mask()) are interrupted
      with the lock released. However, I would guess that in mv88e6xxx_setup(),
      the critical section is kept open for such a long time just in order to
      optimize away multiple lock/unlock operations on the registers.
      
      We could, in principle, drop the reg_lock right before the
      mv88e6xxx_setup_port() -> mv88e6xxx_get_caps() call, and
      re-acquire it immediately afterwards. But this would look ugly, because
      mv88e6xxx_setup_port() would release a lock which it didn't acquire, but
      the caller did.
      
      A cleaner solution to this issue comes from the observation that struct
      mv88e6xxxx_ops methods generally assume they are called with the
      reg_lock already acquired. Whereas mv88e6352_phylink_get_caps() is more
      the exception rather than the norm, in that it acquires the lock itself.
      
      Let's enforce the same locking pattern/convention for
      chip->info->ops->phylink_get_caps() as well, and make
      mv88e6xxx_get_caps(), the top-level function, acquire the register lock
      explicitly, for this one implementation that will access registers for
      port 4 to work properly.
      
      This means that mv88e6xxx_setup_port() will no longer call the top-level
      function, but the low-level mv88e6xxx_ops method which expects the
      correct calling context (register lock held).
      
      Compared to chip->info->ops->phylink_get_caps(), mv88e6xxx_get_caps()
      also fixes up the supported_interfaces bitmap for internal ports, since
      that can be done generically and does not require per-switch knowledge.
      That's code which will no longer execute, however mv88e6xxx_setup_port()
      doesn't need that. It just needs to look at the mac_capabilities bitmap.
      
      Fixes: cc1049cc ("net: dsa: mv88e6xxx: fix speed setting for CPU/DSA ports")
      Reported-by: default avatarMaksim Kiselev <bigunclemax@gmail.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarMaksim Kiselev <bigunclemax@gmail.com>
      Link: https://lore.kernel.org/r/20221214110120.3368472-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      a7d82367
    • Biju Das's avatar
      ravb: Fix "failed to switch device to config mode" message during unbind · c72a7e42
      Biju Das authored
      This patch fixes the error "ravb 11c20000.ethernet eth0: failed to switch
      device to config mode" during unbind.
      
      We are doing register access after pm_runtime_put_sync().
      
      We usually do cleanup in reverse order of init. Currently in
      remove(), the "pm_runtime_put_sync" is not in reverse order.
      
      Probe
      	reset_control_deassert(rstc);
      	pm_runtime_enable(&pdev->dev);
      	pm_runtime_get_sync(&pdev->dev);
      
      remove
      	pm_runtime_put_sync(&pdev->dev);
      	unregister_netdev(ndev);
      	..
      	ravb_mdio_release(priv);
      	pm_runtime_disable(&pdev->dev);
      
      Consider the call to unregister_netdev()
      unregister_netdev->unregister_netdevice_queue->rollback_registered_many
      that calls the below functions which access the registers after
      pm_runtime_put_sync()
       1) ravb_get_stats
       2) ravb_close
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBiju Das <biju.das.jz@bp.renesas.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20221214105118.2495313-1-biju.das.jz@bp.renesas.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      c72a7e42
    • Gaosheng Cui's avatar
      net: stmmac: fix errno when create_singlethread_workqueue() fails · 2cb815cf
      Gaosheng Cui authored
      We should set the return value to -ENOMEM explicitly when
      create_singlethread_workqueue() fails in stmmac_dvr_probe(),
      otherwise we'll lose the error value.
      
      Fixes: a137f3f2 ("net: stmmac: fix possible memory leak in stmmac_dvr_probe()")
      Signed-off-by: default avatarGaosheng Cui <cuigaosheng1@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20221214080117.3514615-1-cuigaosheng1@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2cb815cf
    • Li Zetao's avatar
      r6040: Fix kmemleak in probe and remove · 7e43039a
      Li Zetao authored
      There is a memory leaks reported by kmemleak:
      
        unreferenced object 0xffff888116111000 (size 2048):
          comm "modprobe", pid 817, jiffies 4294759745 (age 76.502s)
          hex dump (first 32 bytes):
            00 c4 0a 04 81 88 ff ff 08 10 11 16 81 88 ff ff  ................
            08 10 11 16 81 88 ff ff 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<ffffffff815bcd82>] kmalloc_trace+0x22/0x60
            [<ffffffff827e20ee>] phy_device_create+0x4e/0x90
            [<ffffffff827e6072>] get_phy_device+0xd2/0x220
            [<ffffffff827e7844>] mdiobus_scan+0xa4/0x2e0
            [<ffffffff827e8be2>] __mdiobus_register+0x482/0x8b0
            [<ffffffffa01f5d24>] r6040_init_one+0x714/0xd2c [r6040]
            ...
      
      The problem occurs in probe process as follows:
        r6040_init_one:
          mdiobus_register
            mdiobus_scan    <- alloc and register phy_device,
                               the reference count of phy_device is 3
          r6040_mii_probe
            phy_connect     <- connect to the first phy_device,
                               so the reference count of the first
                               phy_device is 4, others are 3
          register_netdev   <- fault inject succeeded, goto error handling path
      
          // error handling path
          err_out_mdio_unregister:
            mdiobus_unregister(lp->mii_bus);
          err_out_mdio:
            mdiobus_free(lp->mii_bus);    <- the reference count of the first
                                             phy_device is 1, it is not released
                                             and other phy_devices are released
        // similarly, the remove process also has the same problem
      
      The root cause is traced to the phy_device is not disconnected when
      removes one r6040 device in r6040_remove_one() or on error handling path
      after r6040_mii probed successfully. In r6040_mii_probe(), a net ethernet
      device is connected to the first PHY device of mii_bus, in order to
      notify the connected driver when the link status changes, which is the
      default behavior of the PHY infrastructure to handle everything.
      Therefore the phy_device should be disconnected when removes one r6040
      device or on error handling path.
      
      Fix it by adding phy_disconnect() when removes one r6040 device or on
      error handling path after r6040_mii probed successfully.
      
      Fixes: 3831861b ("r6040: implement phylib")
      Signed-off-by: default avatarLi Zetao <lizetao1@huawei.com>
      Reviewed-by: default avatarLeon Romanovsky <leonro@nvidia.com>
      Link: https://lore.kernel.org/r/20221213125614.927754-1-lizetao1@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7e43039a
    • Kirill Tkhai's avatar
      unix: Fix race in SOCK_SEQPACKET's unix_dgram_sendmsg() · 3ff8bff7
      Kirill Tkhai authored
      There is a race resulting in alive SOCK_SEQPACKET socket
      may change its state from TCP_ESTABLISHED to TCP_CLOSE:
      
      unix_release_sock(peer)                  unix_dgram_sendmsg(sk)
        sock_orphan(peer)
          sock_set_flag(peer, SOCK_DEAD)
                                                 sock_alloc_send_pskb()
                                                   if !(sk->sk_shutdown & SEND_SHUTDOWN)
                                                     OK
                                                 if sock_flag(peer, SOCK_DEAD)
                                                   sk->sk_state = TCP_CLOSE
        sk->sk_shutdown = SHUTDOWN_MASK
      
      After that socket sk remains almost normal: it is able to connect, listen, accept
      and recvmsg, while it can't sendmsg.
      
      Since this is the only possibility for alive SOCK_SEQPACKET to change
      the state in such way, we should better fix this strange and potentially
      danger corner case.
      
      Note, that we will return EPIPE here like this is normally done in sock_alloc_send_pskb().
      Originally used ECONNREFUSED looks strange, since it's strange to return
      a specific retval in dependence of race in kernel, when user can't affect on this.
      
      Also, move TCP_CLOSE assignment for SOCK_DGRAM sockets under state lock
      to fix race with unix_dgram_connect():
      
      unix_dgram_connect(other)            unix_dgram_sendmsg(sk)
                                             unix_peer(sk) = NULL
                                             unix_state_unlock(sk)
        unix_state_double_lock(sk, other)
        sk->sk_state  = TCP_ESTABLISHED
        unix_peer(sk) = other
        unix_state_double_unlock(sk, other)
                                             sk->sk_state  = TCP_CLOSED
      
      This patch fixes both of these races.
      
      Fixes: 83301b53 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too")
      Signed-off-by: default avatarKirill Tkhai <tkhai@ya.ru>
      Link: https://lore.kernel.org/r/135fda25-22d5-837a-782b-ceee50e19844@ya.ruSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3ff8bff7
    • Toke Høiland-Jørgensen's avatar
      selftests/bpf: Add a test for using a cpumap from an freplace-to-XDP program · f506439e
      Toke Høiland-Jørgensen authored
      This adds a simple test for inserting an XDP program into a cpumap that is
      "owned" by an XDP program that was loaded as PROG_TYPE_EXT (as libxdp
      does). Prior to the kernel fix this would fail because the map type
      ownership would be set to PROG_TYPE_EXT instead of being resolved to
      PROG_TYPE_XDP.
      
      v5:
      - Fix a few nits from Andrii, add his ACK
      v4:
      - Use skeletons for selftest
      v3:
      - Update comment to better explain the cause
      - Add Yonghong's ACK
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20221214230254.790066-2-toke@redhat.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      f506439e
    • Toke Høiland-Jørgensen's avatar
      bpf: Resolve fext program type when checking map compatibility · 1c123c56
      Toke Høiland-Jørgensen authored
      The bpf_prog_map_compatible() check makes sure that BPF program types are
      not mixed inside BPF map types that can contain programs (tail call maps,
      cpumaps and devmaps). It does this by setting the fields of the map->owner
      struct to the values of the first program being checked against, and
      rejecting any subsequent programs if the values don't match.
      
      One of the values being set in the map owner struct is the program type,
      and since the code did not resolve the prog type for fext programs, the map
      owner type would be set to PROG_TYPE_EXT and subsequent loading of programs
      of the target type into the map would fail.
      
      This bug is seen in particular for XDP programs that are loaded as
      PROG_TYPE_EXT using libxdp; these cannot insert programs into devmaps and
      cpumaps because the check fails as described above.
      
      Fix the bug by resolving the fext program type to its target program type
      as elsewhere in the verifier.
      
      v3:
      - Add Yonghong's ACK
      
      Fixes: f45d5b6c ("bpf: generalise tail call map compatibility check")
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20221214230254.790066-1-toke@redhat.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      1c123c56
    • Minsuk Kang's avatar
      nfc: pn533: Clear nfc_target before being used · 9f281577
      Minsuk Kang authored
      Fix a slab-out-of-bounds read that occurs in nla_put() called from
      nfc_genl_send_target() when target->sensb_res_len, which is duplicated
      from an nfc_target in pn533, is too large as the nfc_target is not
      properly initialized and retains garbage values. Clear nfc_targets with
      memset() before they are used.
      
      Found by a modified version of syzkaller.
      
      BUG: KASAN: slab-out-of-bounds in nla_put
      Call Trace:
       memcpy
       nla_put
       nfc_genl_dump_targets
       genl_lock_dumpit
       netlink_dump
       __netlink_dump_start
       genl_family_rcv_msg_dumpit
       genl_rcv_msg
       netlink_rcv_skb
       genl_rcv
       netlink_unicast
       netlink_sendmsg
       sock_sendmsg
       ____sys_sendmsg
       ___sys_sendmsg
       __sys_sendmsg
       do_syscall_64
      
      Fixes: 673088fb ("NFC: pn533: Send ATR_REQ directly for active device detection")
      Fixes: 361f3cb7 ("NFC: DEP link hook implementation for pn533")
      Signed-off-by: default avatarMinsuk Kang <linuxlovemin@yonsei.ac.kr>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Link: https://lore.kernel.org/r/20221214015139.119673-1-linuxlovemin@yonsei.ac.krSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9f281577
    • Vladimir Oltean's avatar
      net: enetc: avoid buffer leaks on xdp_do_redirect() failure · 628050ec
      Vladimir Oltean authored
      Before enetc_clean_rx_ring_xdp() calls xdp_do_redirect(), each software
      BD in the RX ring between index orig_i and i can have one of 2 refcount
      values on its page.
      
      We are the owner of the current buffer that is being processed, so the
      refcount will be at least 1.
      
      If the current owner of the buffer at the diametrically opposed index
      in the RX ring (i.o.w, the other half of this page) has not yet called
      kfree(), this page's refcount could even be 2.
      
      enetc_page_reusable() in enetc_flip_rx_buff() tests for the page
      refcount against 1, and [ if it's 2 ] does not attempt to reuse it.
      
      But if enetc_flip_rx_buff() is put after the xdp_do_redirect() call,
      the page refcount can have one of 3 values. It can also be 0, if there
      is no owner of the other page half, and xdp_do_redirect() for this
      buffer ran so far that it triggered a flush of the devmap/cpumap bulk
      queue, and the consumers of those bulk queues also freed the buffer,
      all by the time xdp_do_redirect() returns the execution back to enetc.
      
      This is the reason why enetc_flip_rx_buff() is called before
      xdp_do_redirect(), but there is a big flaw with that reasoning:
      enetc_flip_rx_buff() will set rx_swbd->page = NULL on both sides of the
      enetc_page_reusable() branch, and if xdp_do_redirect() returns an error,
      we call enetc_xdp_free(), which does not deal gracefully with that.
      
      In fact, what happens is quite special. The page refcounts start as 1.
      enetc_flip_rx_buff() figures they're reusable, transfers these
      rx_swbd->page pointers to a different rx_swbd in enetc_reuse_page(), and
      bumps the refcount to 2. When xdp_do_redirect() later returns an error,
      we call the no-op enetc_xdp_free(), but we still haven't lost the
      reference to that page. A copy of it is still at rx_ring->next_to_alloc,
      but that has refcount 2 (and there are no concurrent owners of it in
      flight, to drop the refcount). What really kills the system is when
      we'll flip the rx_swbd->page the second time around. With an updated
      refcount of 2, the page will not be reusable and we'll really leak it.
      Then enetc_new_page() will have to allocate more pages, which will then
      eventually leak again on further errors from xdp_do_redirect().
      
      The problem, summarized, is that we zeroize rx_swbd->page before we're
      completely done with it, and this makes it impossible for the error path
      to do something with it.
      
      Since the packet is potentially multi-buffer and therefore the
      rx_swbd->page is potentially an array, manual passing of the old
      pointers between enetc_flip_rx_buff() and enetc_xdp_free() is a bit
      difficult.
      
      For the sake of going with a simple solution, we accept the possibility
      of racing with xdp_do_redirect(), and we move the flip procedure to
      execute only on the redirect success path. By racing, I mean that the
      page may be deemed as not reusable by enetc (having a refcount of 0),
      but there will be no leak in that case, either.
      
      Once we accept that, we have something better to do with buffers on
      XDP_REDIRECT failure. Since we haven't performed half-page flipping yet,
      we won't, either (and this way, we can avoid enetc_xdp_free()
      completely, which gives the entire page to the slab allocator).
      Instead, we'll call enetc_xdp_drop(), which will recycle this half of
      the buffer back to the RX ring.
      
      Fixes: 9d2b68cc ("net: enetc: add support for XDP_REDIRECT")
      Suggested-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20221213001908.2347046-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      628050ec
  3. 14 Dec, 2022 17 commits
  4. 13 Dec, 2022 7 commits
    • Linus Torvalds's avatar
      Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next · 7e68dd7d
      Linus Torvalds authored
      Pull networking updates from Paolo Abeni:
       "Core:
      
         - Allow live renaming when an interface is up
      
         - Add retpoline wrappers for tc, improving considerably the
           performances of complex queue discipline configurations
      
         - Add inet drop monitor support
      
         - A few GRO performance improvements
      
         - Add infrastructure for atomic dev stats, addressing long standing
           data races
      
         - De-duplicate common code between OVS and conntrack offloading
           infrastructure
      
         - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
      
         - Netfilter: introduce packet parser for tunneled packets
      
         - Replace IPVS timer-based estimators with kthreads to scale up the
           workload with the number of available CPUs
      
         - Add the helper support for connection-tracking OVS offload
      
        BPF:
      
         - Support for user defined BPF objects: the use case is to allocate
           own objects, build own object hierarchies and use the building
           blocks to build own data structures flexibly, for example, linked
           lists in BPF
      
         - Make cgroup local storage available to non-cgroup attached BPF
           programs
      
         - Avoid unnecessary deadlock detection and failures wrt BPF task
           storage helpers
      
         - A relevant bunch of BPF verifier fixes and improvements
      
         - Veristat tool improvements to support custom filtering, sorting,
           and replay of results
      
         - Add LLVM disassembler as default library for dumping JITed code
      
         - Lots of new BPF documentation for various BPF maps
      
         - Add bpf_rcu_read_{,un}lock() support for sleepable programs
      
         - Add RCU grace period chaining to BPF to wait for the completion of
           access from both sleepable and non-sleepable BPF programs
      
         - Add support storing struct task_struct objects as kptrs in maps
      
         - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
           values
      
         - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
      
        Protocols:
      
         - TCP: implement Protective Load Balancing across switch links
      
         - TCP: allow dynamically disabling TCP-MD5 static key, reverting back
           to fast[er]-path
      
         - UDP: Introduce optional per-netns hash lookup table
      
         - IPv6: simplify and cleanup sockets disposal
      
         - Netlink: support different type policies for each generic netlink
           operation
      
         - MPTCP: add MSG_FASTOPEN and FastOpen listener side support
      
         - MPTCP: add netlink notification support for listener sockets events
      
         - SCTP: add VRF support, allowing sctp sockets binding to VRF devices
      
         - Add bridging MAC Authentication Bypass (MAB) support
      
         - Extensions for Ethernet VPN bridging implementation to better
           support multicast scenarios
      
         - More work for Wi-Fi 7 support, comprising conversion of all the
           existing drivers to internal TX queue usage
      
         - IPSec: introduce a new offload type (packet offload) allowing
           complete header processing and crypto offloading
      
         - IPSec: extended ack support for more descriptive XFRM error
           reporting
      
         - RXRPC: increase SACK table size and move processing into a
           per-local endpoint kernel thread, reducing considerably the
           required locking
      
         - IEEE 802154: synchronous send frame and extended filtering support,
           initial support for scanning available 15.4 networks
      
         - Tun: bump the link speed from 10Mbps to 10Gbps
      
         - Tun/VirtioNet: implement UDP segmentation offload support
      
        Driver API:
      
         - PHY/SFP: improve power level switching between standard level 1 and
           the higher power levels
      
         - New API for netdev <-> devlink_port linkage
      
         - PTP: convert existing drivers to new frequency adjustment
           implementation
      
         - DSA: add support for rx offloading
      
         - Autoload DSA tagging driver when dynamically changing protocol
      
         - Add new PCP and APPTRUST attributes to Data Center Bridging
      
         - Add configuration support for 800Gbps link speed
      
         - Add devlink port function attribute to enable/disable RoCE and
           migratable
      
         - Extend devlink-rate to support strict prioriry and weighted fair
           queuing
      
         - Add devlink support to directly reading from region memory
      
         - New device tree helper to fetch MAC address from nvmem
      
         - New big TCP helper to simplify temporary header stripping
      
        New hardware / drivers:
      
         - Ethernet:
            - Marvel Octeon CNF95N and CN10KB Ethernet Switches
            - Marvel Prestera AC5X Ethernet Switch
            - WangXun 10 Gigabit NIC
            - Motorcomm yt8521 Gigabit Ethernet
            - Microchip ksz9563 Gigabit Ethernet Switch
            - Microsoft Azure Network Adapter
            - Linux Automation 10Base-T1L adapter
      
         - PHY:
            - Aquantia AQR112 and AQR412
            - Motorcomm YT8531S
      
         - PTP:
            - Orolia ART-CARD
      
         - WiFi:
            - MediaTek Wi-Fi 7 (802.11be) devices
            - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
              devices
      
         - Bluetooth:
            - Broadcom BCM4377/4378/4387 Bluetooth chipsets
            - Realtek RTL8852BE and RTL8723DS
            - Cypress.CYW4373A0 WiFi + Bluetooth combo device
      
        Drivers:
      
         - CAN:
            - gs_usb: bus error reporting support
            - kvaser_usb: listen only and bus error reporting support
      
         - Ethernet NICs:
            - Intel (100G):
               - extend action skbedit to RX queue mapping
               - implement devlink-rate support
               - support direct read from memory
            - nVidia/Mellanox (mlx5):
               - SW steering improvements, increasing rules update rate
               - Support for enhanced events compression
               - extend H/W offload packet manipulation capabilities
               - implement IPSec packet offload mode
            - nVidia/Mellanox (mlx4):
               - better big TCP support
            - Netronome Ethernet NICs (nfp):
               - IPsec offload support
               - add support for multicast filter
            - Broadcom:
               - RSS and PTP support improvements
            - AMD/SolarFlare:
               - netlink extened ack improvements
               - add basic flower matches to offload, and related stats
            - Virtual NICs:
               - ibmvnic: introduce affinity hint support
            - small / embedded:
               - FreeScale fec: add initial XDP support
               - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
               - TI am65-cpsw: add suspend/resume support
               - Mediatek MT7986: add RX wireless wthernet dispatch support
               - Realtek 8169: enable GRO software interrupt coalescing per
                 default
      
         - Ethernet high-speed switches:
            - Microchip (sparx5):
               - add support for Sparx5 TC/flower H/W offload via VCAP
            - Mellanox mlxsw:
               - add 802.1X and MAC Authentication Bypass offload support
               - add ip6gre support
      
         - Embedded Ethernet switches:
            - Mediatek (mtk_eth_soc):
               - improve PCS implementation, add DSA untag support
               - enable flow offload support
            - Renesas:
               - add rswitch R-Car Gen4 gPTP support
            - Microchip (lan966x):
               - add full XDP support
               - add TC H/W offload via VCAP
               - enable PTP on bridge interfaces
            - Microchip (ksz8):
               - add MTU support for KSZ8 series
      
         - Qualcomm 802.11ax WiFi (ath11k):
            - support configuring channel dwell time during scan
      
         - MediaTek WiFi (mt76):
            - enable Wireless Ethernet Dispatch (WED) offload support
            - add ack signal support
            - enable coredump support
            - remain_on_channel support
      
         - Intel WiFi (iwlwifi):
            - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
            - 320 MHz channels support
      
         - RealTek WiFi (rtw89):
            - new dynamic header firmware format support
            - wake-over-WLAN support"
      
      * tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
        ipvs: fix type warning in do_div() on 32 bit
        net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
        net: ipa: add IPA v4.7 support
        dt-bindings: net: qcom,ipa: Add SM6350 compatible
        bnxt: Use generic HBH removal helper in tx path
        IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
        selftests: forwarding: Add bridge MDB test
        selftests: forwarding: Rename bridge_mdb test
        bridge: mcast: Support replacement of MDB port group entries
        bridge: mcast: Allow user space to specify MDB entry routing protocol
        bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
        bridge: mcast: Add support for (*, G) with a source list and filter mode
        bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
        bridge: mcast: Add a flag for user installed source entries
        bridge: mcast: Expose __br_multicast_del_group_src()
        bridge: mcast: Expose br_multicast_new_group_src()
        bridge: mcast: Add a centralized error path
        bridge: mcast: Place netlink policy before validation functions
        bridge: mcast: Split (*, G) and (S, G) addition into different functions
        bridge: mcast: Do not derive entry type from its filter mode
        ...
      7e68dd7d
    • Linus Torvalds's avatar
      Merge tag 'xtensa-20221213' of https://github.com/jcmvbkbc/linux-xtensa · 1ca06f1c
      Linus Torvalds authored
      Pull Xtensa updates from Max Filippov:
      
       - fix kernel build with gcc-13
      
       - various minor fixes
      
      * tag 'xtensa-20221213' of https://github.com/jcmvbkbc/linux-xtensa:
        xtensa: add __umulsidi3 helper
        xtensa: update config files
        MAINTAINERS: update the 'T:' entry for xtensa
      1ca06f1c
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 4cb1fc6f
      Linus Torvalds authored
      Pull ARM updates from Russell King:
      
       - update unwinder to cope with module PLTs
      
       - enable UBSAN on ARM
      
       - improve kernel fault message
      
       - update UEFI runtime page tables dump
      
       - avoid clang's __aeabi_uldivmod generated in NWFPE code
      
       - disable FIQs on CPU shutdown paths
      
       - update XOR register usage
      
       - a number of build updates (using .arch, thread pointer, removal of
         lazy evaluation in Makefile)
      
       - conversion of stacktrace code to stackwalk
      
       - findbit assembly updates
      
       - hwcap feature updates for ARMv8 CPUs
      
       - instruction dump updates for big-endian platforms
      
       - support for function error injection
      
      * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (31 commits)
        ARM: 9279/1: support function error injection
        ARM: 9277/1: Make the dumped instructions are consistent with the disassembled ones
        ARM: 9276/1: Refactor dump_instr()
        ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISA
        ARM: 9274/1: Add hwcap for Speculative Store Bypassing Safe
        ARM: 9273/1: Add hwcap for Speculation Barrier(SB)
        ARM: 9272/1: vfp: Add hwcap for FEAT_AA32I8MM
        ARM: 9271/1: vfp: Add hwcap for FEAT_AA32BF16
        ARM: 9270/1: vfp: Add hwcap for FEAT_FHM
        ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
        ARM: 9268/1: vfp: Add hwcap FPHP and ASIMDHP for FEAT_FP16
        ARM: 9267/1: Define Armv8 registers in AArch32 state
        ARM: findbit: add unwinder information
        ARM: findbit: operate by words
        ARM: findbit: convert to macros
        ARM: findbit: provide more efficient ARMv7 implementation
        ARM: findbit: document ARMv5 bit offset calculation
        ARM: 9259/1: stacktrace: Convert stacktrace to generic ARCH_STACKWALK
        ARM: 9258/1: stacktrace: Make stack walk callback consistent with generic code
        ARM: 9265/1: pass -march= only to compiler
        ...
      4cb1fc6f
    • Linus Torvalds's avatar
      Merge tag 'x86_sev_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 740afa4d
      Linus Torvalds authored
      Pull x86 sev updates from Borislav Petkov:
      
       - Two minor fixes to the sev-guest driver
      
      * tag 'x86_sev_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        virt/sev-guest: Add a MODULE_ALIAS
        virt/sev-guest: Remove unnecessary free in init_crypto()
      740afa4d
    • Linus Torvalds's avatar
      Merge tag 'x86_paravirt_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 82c72902
      Linus Torvalds authored
      Pull x86 paravirt update from Borislav Petkov:
      
       - Simplify paravirt patching machinery by removing the now unused
         clobber mask
      
      * tag 'x86_paravirt_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/paravirt: Remove clobber bitmask from .parainstructions
      82c72902
    • Linus Torvalds's avatar
      Merge tag 'x86_microcode_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a70210f4
      Linus Torvalds authored
      Pull x86 microcode and IFS updates from Borislav Petkov:
       "The IFS (In-Field Scan) stuff goes through tip because the IFS driver
        uses the same structures and similar functionality as the microcode
        loader and it made sense to route it all through this branch so that
        there are no conflicts.
      
         - Add support for multiple testing sequences to the Intel In-Field
           Scan driver in order to be able to run multiple different test
           patterns. Rework things and remove the BROKEN dependency so that
           the driver can be enabled (Jithu Joseph)
      
         - Remove the subsys interface usage in the microcode loader because
           it is not really needed
      
         - A couple of smaller fixes and cleanups"
      
      * tag 'x86_microcode_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
        x86/microcode/intel: Do not retry microcode reloading on the APs
        x86/microcode/intel: Do not print microcode revision and processor flags
        platform/x86/intel/ifs: Add missing kernel-doc entry
        Revert "platform/x86/intel/ifs: Mark as BROKEN"
        Documentation/ABI: Update IFS ABI doc
        platform/x86/intel/ifs: Add current_batch sysfs entry
        platform/x86/intel/ifs: Remove reload sysfs entry
        platform/x86/intel/ifs: Add metadata validation
        platform/x86/intel/ifs: Use generic microcode headers and functions
        platform/x86/intel/ifs: Add metadata support
        x86/microcode/intel: Use a reserved field for metasize
        x86/microcode/intel: Add hdr_type to intel_microcode_sanity_check()
        x86/microcode/intel: Reuse microcode_sanity_check()
        x86/microcode/intel: Use appropriate type in microcode_sanity_check()
        x86/microcode/intel: Reuse find_matching_signature()
        platform/x86/intel/ifs: Remove memory allocation from load path
        platform/x86/intel/ifs: Remove image loading during init
        platform/x86/intel/ifs: Return a more appropriate error code
        platform/x86/intel/ifs: Remove unused selection
        x86/microcode: Drop struct ucode_cpu_info.valid
        ...
      a70210f4
    • Linus Torvalds's avatar
      Merge tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3ef3ace4
      Linus Torvalds authored
      Pull x86 cpu updates from Borislav Petkov:
      
       - Split MTRR and PAT init code to accomodate at least Xen PV and TDX
         guests which do not get MTRRs exposed but only PAT. (TDX guests do
         not support the cache disabling dance when setting up MTRRs so they
         fall under the same category)
      
         This is a cleanup work to remove all the ugly workarounds for such
         guests and init things separately (Juergen Gross)
      
       - Add two new Intel CPUs to the list of CPUs with "normal" Energy
         Performance Bias, leading to power savings
      
       - Do not do bus master arbitration in C3 (ARB_DISABLE) on modern
         Centaur CPUs
      
      * tag 'x86_cpu_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
        x86/mtrr: Make message for disabled MTRRs more descriptive
        x86/pat: Handle TDX guest PAT initialization
        x86/cpuid: Carve out all CPUID functionality
        x86/cpu: Switch to cpu_feature_enabled() for X86_FEATURE_XENPV
        x86/cpu: Remove X86_FEATURE_XENPV usage in setup_cpu_entry_area()
        x86/cpu: Drop 32-bit Xen PV guest code in update_task_stack()
        x86/cpu: Remove unneeded 64-bit dependency in arch_enter_from_user_mode()
        x86/cpufeatures: Add X86_FEATURE_XENPV to disabled-features.h
        x86/acpi/cstate: Optimize ARB_DISABLE on Centaur CPUs
        x86/mtrr: Simplify mtrr_ops initialization
        x86/cacheinfo: Switch cache_ap_init() to hotplug callback
        x86: Decouple PAT and MTRR handling
        x86/mtrr: Add a stop_machine() handler calling only cache_cpu_init()
        x86/mtrr: Let cache_aps_delayed_init replace mtrr_aps_delayed_init
        x86/mtrr: Get rid of __mtrr_enabled bool
        x86/mtrr: Simplify mtrr_bp_init()
        x86/mtrr: Remove set_all callback from struct mtrr_ops
        x86/mtrr: Disentangle MTRR init from PAT init
        x86/mtrr: Move cache control code to cacheinfo.c
        x86/mtrr: Split MTRR-specific handling from cache dis/enabling
        ...
      3ef3ace4