1. 09 Feb, 2024 1 commit
    • Jiri Pirko's avatar
      dpll: fix possible deadlock during netlink dump operation · 53c0441d
      Jiri Pirko authored
      Recently, I've been hitting following deadlock warning during dpll pin
      dump:
      
      [52804.637962] ======================================================
      [52804.638536] WARNING: possible circular locking dependency detected
      [52804.639111] 6.8.0-rc2jiri+ #1 Not tainted
      [52804.639529] ------------------------------------------------------
      [52804.640104] python3/2984 is trying to acquire lock:
      [52804.640581] ffff88810e642678 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}, at: netlink_dump+0xb3/0x780
      [52804.641417]
                     but task is already holding lock:
      [52804.642010] ffffffff83bde4c8 (dpll_lock){+.+.}-{3:3}, at: dpll_lock_dumpit+0x13/0x20
      [52804.642747]
                     which lock already depends on the new lock.
      
      [52804.643551]
                     the existing dependency chain (in reverse order) is:
      [52804.644259]
                     -> #1 (dpll_lock){+.+.}-{3:3}:
      [52804.644836]        lock_acquire+0x174/0x3e0
      [52804.645271]        __mutex_lock+0x119/0x1150
      [52804.645723]        dpll_lock_dumpit+0x13/0x20
      [52804.646169]        genl_start+0x266/0x320
      [52804.646578]        __netlink_dump_start+0x321/0x450
      [52804.647056]        genl_family_rcv_msg_dumpit+0x155/0x1e0
      [52804.647575]        genl_rcv_msg+0x1ed/0x3b0
      [52804.648001]        netlink_rcv_skb+0xdc/0x210
      [52804.648440]        genl_rcv+0x24/0x40
      [52804.648831]        netlink_unicast+0x2f1/0x490
      [52804.649290]        netlink_sendmsg+0x36d/0x660
      [52804.649742]        __sock_sendmsg+0x73/0xc0
      [52804.650165]        __sys_sendto+0x184/0x210
      [52804.650597]        __x64_sys_sendto+0x72/0x80
      [52804.651045]        do_syscall_64+0x6f/0x140
      [52804.651474]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
      [52804.652001]
                     -> #0 (nlk_cb_mutex-GENERIC){+.+.}-{3:3}:
      [52804.652650]        check_prev_add+0x1ae/0x1280
      [52804.653107]        __lock_acquire+0x1ed3/0x29a0
      [52804.653559]        lock_acquire+0x174/0x3e0
      [52804.653984]        __mutex_lock+0x119/0x1150
      [52804.654423]        netlink_dump+0xb3/0x780
      [52804.654845]        __netlink_dump_start+0x389/0x450
      [52804.655321]        genl_family_rcv_msg_dumpit+0x155/0x1e0
      [52804.655842]        genl_rcv_msg+0x1ed/0x3b0
      [52804.656272]        netlink_rcv_skb+0xdc/0x210
      [52804.656721]        genl_rcv+0x24/0x40
      [52804.657119]        netlink_unicast+0x2f1/0x490
      [52804.657570]        netlink_sendmsg+0x36d/0x660
      [52804.658022]        __sock_sendmsg+0x73/0xc0
      [52804.658450]        __sys_sendto+0x184/0x210
      [52804.658877]        __x64_sys_sendto+0x72/0x80
      [52804.659322]        do_syscall_64+0x6f/0x140
      [52804.659752]        entry_SYSCALL_64_after_hwframe+0x46/0x4e
      [52804.660281]
                     other info that might help us debug this:
      
      [52804.661077]  Possible unsafe locking scenario:
      
      [52804.661671]        CPU0                    CPU1
      [52804.662129]        ----                    ----
      [52804.662577]   lock(dpll_lock);
      [52804.662924]                                lock(nlk_cb_mutex-GENERIC);
      [52804.663538]                                lock(dpll_lock);
      [52804.664073]   lock(nlk_cb_mutex-GENERIC);
      [52804.664490]
      
      The issue as follows: __netlink_dump_start() calls control->start(cb)
      with nlk->cb_mutex held. In control->start(cb) the dpll_lock is taken.
      Then nlk->cb_mutex is released and taken again in netlink_dump(), while
      dpll_lock still being held. That leads to ABBA deadlock when another
      CPU races with the same operation.
      
      Fix this by moving dpll_lock taking into dumpit() callback which ensures
      correct lock taking order.
      
      Fixes: 9d71b54b ("dpll: netlink: Add DPLL framework base functions")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
      Link: https://lore.kernel.org/r/20240207115902.371649-1-jiri@resnulli.usSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      53c0441d
  2. 08 Feb, 2024 20 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 1f719a2f
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from WiFi and netfilter.
      
        Current release - regressions:
      
         - nic: intel: fix old compiler regressions
      
         - netfilter: ipset: missing gc cancellations fixed
      
        Current release - new code bugs:
      
         - netfilter: ctnetlink: fix filtering for zone 0
      
        Previous releases - regressions:
      
         - core: fix from address in memcpy_to_iter_csum()
      
         - netfilter: nfnetlink_queue: un-break NF_REPEAT
      
         - af_unix: fix memory leak for dead unix_(sk)->oob_skb in GC.
      
         - devlink: avoid potential loop in devlink_rel_nested_in_notify_work()
      
         - iwlwifi:
             - mvm: fix a battery life regression
             - fix double-free bug
      
         - mac80211: fix waiting for beacons logic
      
         - nic: nfp: flower: prevent re-adding mac index for bonded port
      
        Previous releases - always broken:
      
         - rxrpc: fix generation of serial numbers to skip zero
      
         - tipc: check the bearer type before calling tipc_udp_nl_bearer_add()
      
         - tunnels: fix out of bounds access when building IPv6 PMTU error
      
         - nic: hv_netvsc: register VF in netvsc_probe if NET_DEVICE_REGISTER
           missed
      
         - nic: atlantic: fix DMA mapping for PTP hwts ring
      
        Misc:
      
         - selftests: more fixes to deal with very slow hosts"
      
      * tag 'net-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (80 commits)
        netfilter: nft_set_pipapo: remove scratch_aligned pointer
        netfilter: nft_set_pipapo: add helper to release pcpu scratch area
        netfilter: nft_set_pipapo: store index in scratch maps
        netfilter: nft_set_rbtree: skip end interval element from gc
        netfilter: nfnetlink_queue: un-break NF_REPEAT
        netfilter: nf_tables: use timestamp to check for set element timeout
        netfilter: nft_ct: reject direction for ct id
        netfilter: ctnetlink: fix filtering for zone 0
        s390/qeth: Fix potential loss of L3-IP@ in case of network issues
        netfilter: ipset: Missing gc cancellations fixed
        octeontx2-af: Initialize maps.
        net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio
        net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio
        netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
        netfilter: nft_compat: restrict match/target protocol to u16
        netfilter: nft_compat: reject unused compat flag
        netfilter: nft_compat: narrow down revision to unsigned 8-bits
        net: intel: fix old compiler regressions
        MAINTAINERS: Maintainer change for rds
        selftests: cmsg_ipv6: repeat the exact packet
        ...
      1f719a2f
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · b0d5d0f7
      Linus Torvalds authored
      Pull pinctrl fix from Linus Walleij:
       "A single fix for the AMD driver which affects developer laptops, the
        pinctrl/GPIO driver won't probe on some systems"
      
      * tag 'pinctrl-v6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: amd: Add IRQF_ONESHOT to the interrupt request
      b0d5d0f7
    • Paolo Abeni's avatar
      Merge tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 63e4b9d6
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Narrow down target/match revision to u8 in nft_compat.
      
      2) Bail out with unused flags in nft_compat.
      
      3) Restrict layer 4 protocol to u16 in nft_compat.
      
      4) Remove static in pipapo get command that slipped through when
         reducing set memory footprint.
      
      5) Follow up incremental fix for the ipset performance regression,
         this includes the missing gc cancellation, from Jozsef Kadlecsik.
      
      6) Allow to filter by zone 0 in ctnetlink, do not interpret zone 0
         as no filtering, from Felix Huettner.
      
      7) Reject direction for NFT_CT_ID.
      
      8) Use timestamp to check for set element expiration while transaction
         is handled to prevent garbage collection from removing set elements
         that were just added by this transaction. Packet path and netlink
         dump/get path still use current time to check for expiration.
      
      9) Restore NF_REPEAT in nfnetlink_queue, from Florian Westphal.
      
      10) map_index needs to be percpu and per-set, not just percpu.
          At this time its possible for a pipapo set to fill the all-zero part
          with ones and take the 'might have bits set' as 'start-from-zero' area.
          From Florian Westphal. This includes three patches:
      
          - Change scratchpad area to a structure that provides space for a
            per-set-and-cpu toggle and uses it of the percpu one.
      
          - Add a new free helper to prepare for the next patch.
      
          - Remove the scratch_aligned pointer and makes AVX2 implementation
            use the exact same memory addresses for read/store of the matching
            state.
      
      netfilter pull request 24-02-08
      
      * tag 'nf-24-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nft_set_pipapo: remove scratch_aligned pointer
        netfilter: nft_set_pipapo: add helper to release pcpu scratch area
        netfilter: nft_set_pipapo: store index in scratch maps
        netfilter: nft_set_rbtree: skip end interval element from gc
        netfilter: nfnetlink_queue: un-break NF_REPEAT
        netfilter: nf_tables: use timestamp to check for set element timeout
        netfilter: nft_ct: reject direction for ct id
        netfilter: ctnetlink: fix filtering for zone 0
        netfilter: ipset: Missing gc cancellations fixed
        netfilter: nft_set_pipapo: remove static in nft_pipapo_get()
        netfilter: nft_compat: restrict match/target protocol to u16
        netfilter: nft_compat: reject unused compat flag
        netfilter: nft_compat: narrow down revision to unsigned 8-bits
      ====================
      
      Link: https://lore.kernel.org/r/20240208112834.1433-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      63e4b9d6
    • Florian Westphal's avatar
      netfilter: nft_set_pipapo: remove scratch_aligned pointer · 5a8cdf6f
      Florian Westphal authored
      use ->scratch for both avx2 and the generic implementation.
      
      After previous change the scratch->map member is always aligned properly
      for AVX2, so we can just use scratch->map in AVX2 too.
      
      The alignoff delta is stored in the scratchpad so we can reconstruct
      the correct address to free the area again.
      
      Fixes: 7400b063 ("nft_set_pipapo: Introduce AVX2-based lookup implementation")
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      5a8cdf6f
    • Florian Westphal's avatar
      netfilter: nft_set_pipapo: add helper to release pcpu scratch area · 47b1c03c
      Florian Westphal authored
      After next patch simple kfree() is not enough anymore, so add
      a helper for it.
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      47b1c03c
    • Florian Westphal's avatar
      netfilter: nft_set_pipapo: store index in scratch maps · 76313d1a
      Florian Westphal authored
      Pipapo needs a scratchpad area to keep state during matching.
      This state can be large and thus cannot reside on stack.
      
      Each set preallocates percpu areas for this.
      
      On each match stage, one scratchpad half starts with all-zero and the other
      is inited to all-ones.
      
      At the end of each stage, the half that starts with all-ones is
      always zero.  Before next field is tested, pointers to the two halves
      are swapped, i.e.  resmap pointer turns into fill pointer and vice versa.
      
      After the last field has been processed, pipapo stashes the
      index toggle in a percpu variable, with assumption that next packet
      will start with the all-zero half and sets all bits in the other to 1.
      
      This isn't reliable.
      
      There can be multiple sets and we can't be sure that the upper
      and lower half of all set scratch map is always in sync (lookups
      can be conditional), so one set might have swapped, but other might
      not have been queried.
      
      Thus we need to keep the index per-set-and-cpu, just like the
      scratchpad.
      
      Note that this bug fix is incomplete, there is a related issue.
      
      avx2 and normal implementation might use slightly different areas of the
      map array space due to the avx2 alignment requirements, so
      m->scratch (generic/fallback implementation) and ->scratch_aligned
      (avx) may partially overlap. scratch and scratch_aligned are not distinct
      objects, the latter is just the aligned address of the former.
      
      After this change, write to scratch_align->map_index may write to
      scratch->map, so this issue becomes more prominent, we can set to 1
      a bit in the supposedly-all-zero area of scratch->map[].
      
      A followup patch will remove the scratch_aligned and makes generic and
      avx code use the same (aligned) area.
      
      Its done in a separate change to ease review.
      
      Fixes: 3c4287f6 ("nf_tables: Add set type for arbitrary concatenation of ranges")
      Reviewed-by: default avatarStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      76313d1a
    • Pablo Neira Ayuso's avatar
      netfilter: nft_set_rbtree: skip end interval element from gc · 60c0c230
      Pablo Neira Ayuso authored
      rbtree lazy gc on insert might collect an end interval element that has
      been just added in this transactions, skip end interval elements that
      are not yet active.
      
      Fixes: f718863a ("netfilter: nft_set_rbtree: fix overlap expiration walk")
      Cc: stable@vger.kernel.org
      Reported-by: default avatarlonial con <kongln9170@gmail.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      60c0c230
    • Florian Westphal's avatar
      netfilter: nfnetlink_queue: un-break NF_REPEAT · f82777e8
      Florian Westphal authored
      Only override userspace verdict if the ct hook returns something
      other than ACCEPT.
      
      Else, this replaces NF_REPEAT (run all hooks again) with NF_ACCEPT
      (move to next hook).
      
      Fixes: 6291b3a6 ("netfilter: conntrack: convert nf_conntrack_update to netfilter verdicts")
      Reported-by: l.6diay@passmail.com
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      f82777e8
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: use timestamp to check for set element timeout · 7395dfac
      Pablo Neira Ayuso authored
      Add a timestamp field at the beginning of the transaction, store it
      in the nftables per-netns area.
      
      Update set backend .insert, .deactivate and sync gc path to use the
      timestamp, this avoids that an element expires while control plane
      transaction is still unfinished.
      
      .lookup and .update, which are used from packet path, still use the
      current time to check if the element has expired. And .get path and dump
      also since this runs lockless under rcu read size lock. Then, there is
      async gc which also needs to check the current time since it runs
      asynchronously from a workqueue.
      
      Fixes: c3e1b005 ("netfilter: nf_tables: add set element timeout support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7395dfac
    • Pablo Neira Ayuso's avatar
      netfilter: nft_ct: reject direction for ct id · 38ed1c70
      Pablo Neira Ayuso authored
      Direction attribute is ignored, reject it in case this ever needs to be
      supported
      
      Fixes: 3087c3f7 ("netfilter: nft_ct: Add ct id support")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      38ed1c70
    • Felix Huettner's avatar
      netfilter: ctnetlink: fix filtering for zone 0 · fa173a1b
      Felix Huettner authored
      previously filtering for the default zone would actually skip the zone
      filter and flush all zones.
      
      Fixes: eff3c558 ("netfilter: ctnetlink: support filtering by zone")
      Reported-by: default avatarIlya Maximets <i.maximets@ovn.org>
      Closes: https://lore.kernel.org/netdev/2032238f-31ac-4106-8f22-522e76df5a12@ovn.org/Signed-off-by: default avatarFelix Huettner <felix.huettner@mail.schwarz>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      fa173a1b
    • Alexandra Winter's avatar
      s390/qeth: Fix potential loss of L3-IP@ in case of network issues · 2fe8a236
      Alexandra Winter authored
      Symptom:
      In case of a bad cable connection (e.g. dirty optics) a fast sequence of
      network DOWN-UP-DOWN-UP could happen. UP triggers recovery of the qeth
      interface. In case of a second DOWN while recovery is still ongoing, it
      can happen that the IP@ of a Layer3 qeth interface is lost and will not
      be recovered by the second UP.
      
      Problem:
      When registration of IP addresses with Layer 3 qeth devices fails, (e.g.
      because of bad address format) the respective IP address is deleted from
      its hash-table in the driver. If registration fails because of a ENETDOWN
      condition, the address should stay in the hashtable, so a subsequent
      recovery can restore it.
      
      3caa4af8 ("qeth: keep ip-address after LAN_OFFLINE failure")
      fixes this for registration failures during normal operation, but not
      during recovery.
      
      Solution:
      Keep L3-IP address in case of ENETDOWN in qeth_l3_recover_ip(). For
      consistency with qeth_l3_add_ip() we also keep it in case of EADDRINUSE,
      i.e. for some reason the card already/still has this address registered.
      
      Fixes: 4a71df50 ("qeth: new qeth device driver")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Link: https://lore.kernel.org/r/20240206085849.2902775-1-wintera@linux.ibm.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2fe8a236
    • Jozsef Kadlecsik's avatar
      netfilter: ipset: Missing gc cancellations fixed · 27c5a095
      Jozsef Kadlecsik authored
      The patch fdb8e12cc2cc ("netfilter: ipset: fix performance regression
      in swap operation") missed to add the calls to gc cancellations
      at the error path of create operations and at module unload. Also,
      because the half of the destroy operations now executed by a
      function registered by call_rcu(), neither NFNL_SUBSYS_IPSET mutex
      or rcu read lock is held and therefore the checking of them results
      false warnings.
      
      Fixes: 97f7cf1c ("netfilter: ipset: fix performance regression in swap operation")
      Reported-by: syzbot+52bbc0ad036f6f0d4a25@syzkaller.appspotmail.com
      Reported-by: default avatarBrad Spengler <spender@grsecurity.net>
      Reported-by: default avatarСтас Ничипорович <stasn77@gmail.com>
      Tested-by: default avatarBrad Spengler <spender@grsecurity.net>
      Tested-by: default avatarСтас Ничипорович <stasn77@gmail.com>
      Signed-off-by: default avatarJozsef Kadlecsik <kadlec@netfilter.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      27c5a095
    • Ratheesh Kannoth's avatar
      octeontx2-af: Initialize maps. · db010ff6
      Ratheesh Kannoth authored
      kmalloc_array() without __GFP_ZERO flag does not initialize
      memory to zero. This causes issues. Use kcalloc() for maps and
      bitmap_zalloc() for bitmaps.
      
      Fixes: dd784287 ("octeontx2-af: Add new devlink param to configure maximum usable NIX block LFs")
      Signed-off-by: default avatarRatheesh Kannoth <rkannoth@marvell.com>
      Reviewed-by: default avatarBrett Creeley <bcreeley@amd.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240206024000.1070260-1-rkannoth@marvell.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      db010ff6
    • Paolo Abeni's avatar
    • Sinthu Raja's avatar
      net: ethernet: ti: cpsw: enable mac_managed_pm to fix mdio · bc4ce46b
      Sinthu Raja authored
      The below commit  introduced a WARN when phy state is not in the states:
      PHY_HALTED, PHY_READY and PHY_UP.
      commit 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
      
      When cpsw resumes, there have port in PHY_NOLINK state, so the below
      warning comes out. Set mac_managed_pm be true to tell mdio that the phy
      resume/suspend is managed by the mac, to fix the following warning:
      
      WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144
      CPU: 0 PID: 965 Comm: sh Tainted: G           O       6.1.46-g247b2535b2 #1
      Hardware name: Generic AM33XX (Flattened Device Tree)
       unwind_backtrace from show_stack+0x18/0x1c
       show_stack from dump_stack_lvl+0x24/0x2c
       dump_stack_lvl from __warn+0x84/0x15c
       __warn from warn_slowpath_fmt+0x1a8/0x1c8
       warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144
       mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140
       dpm_run_callback from device_resume+0xb8/0x2b8
       device_resume from dpm_resume+0x144/0x314
       dpm_resume from dpm_resume_end+0x14/0x20
       dpm_resume_end from suspend_devices_and_enter+0xd0/0x924
       suspend_devices_and_enter from pm_suspend+0x2e0/0x33c
       pm_suspend from state_store+0x74/0xd0
       state_store from kernfs_fop_write_iter+0x104/0x1ec
       kernfs_fop_write_iter from vfs_write+0x1b8/0x358
       vfs_write from ksys_write+0x78/0xf8
       ksys_write from ret_fast_syscall+0x0/0x54
      Exception stack(0xe094dfa8 to 0xe094dff0)
      dfa0:                   00000004 005c3fb8 00000001 005c3fb8 00000004 00000001
      dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000
      dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66
      
      Cc: <stable@vger.kernel.org> # v6.0+
      Fixes: 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
      Fixes: fba863b8 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
      Signed-off-by: default avatarSinthu Raja <sinthu.raja@ti.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bc4ce46b
    • Sinthu Raja's avatar
      net: ethernet: ti: cpsw_new: enable mac_managed_pm to fix mdio · 9def04e7
      Sinthu Raja authored
      The below commit  introduced a WARN when phy state is not in the states:
      PHY_HALTED, PHY_READY and PHY_UP.
      commit 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
      
      When cpsw_new resumes, there have port in PHY_NOLINK state, so the below
      warning comes out. Set mac_managed_pm be true to tell mdio that the phy
      resume/suspend is managed by the mac, to fix the following warning:
      
      WARNING: CPU: 0 PID: 965 at drivers/net/phy/phy_device.c:326 mdio_bus_phy_resume+0x140/0x144
      CPU: 0 PID: 965 Comm: sh Tainted: G           O       6.1.46-g247b2535b2 #1
      Hardware name: Generic AM33XX (Flattened Device Tree)
       unwind_backtrace from show_stack+0x18/0x1c
       show_stack from dump_stack_lvl+0x24/0x2c
       dump_stack_lvl from __warn+0x84/0x15c
       __warn from warn_slowpath_fmt+0x1a8/0x1c8
       warn_slowpath_fmt from mdio_bus_phy_resume+0x140/0x144
       mdio_bus_phy_resume from dpm_run_callback+0x3c/0x140
       dpm_run_callback from device_resume+0xb8/0x2b8
       device_resume from dpm_resume+0x144/0x314
       dpm_resume from dpm_resume_end+0x14/0x20
       dpm_resume_end from suspend_devices_and_enter+0xd0/0x924
       suspend_devices_and_enter from pm_suspend+0x2e0/0x33c
       pm_suspend from state_store+0x74/0xd0
       state_store from kernfs_fop_write_iter+0x104/0x1ec
       kernfs_fop_write_iter from vfs_write+0x1b8/0x358
       vfs_write from ksys_write+0x78/0xf8
       ksys_write from ret_fast_syscall+0x0/0x54
      Exception stack(0xe094dfa8 to 0xe094dff0)
      dfa0:                   00000004 005c3fb8 00000001 005c3fb8 00000004 00000001
      dfc0: 00000004 005c3fb8 b6f6bba0 00000004 00000004 0059edb8 00000000 00000000
      dfe0: 00000004 bed918f0 b6f09bd3 b6e89a66
      
      Cc: <stable@vger.kernel.org> # v6.0+
      Fixes: 744d23c7 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")
      Fixes: fba863b8 ("net: phy: make PHY PM ops a no-op if MAC driver manages PHY PM")
      Signed-off-by: default avatarSinthu Raja <sinthu.raja@ti.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9def04e7
    • Pablo Neira Ayuso's avatar
      netfilter: nft_set_pipapo: remove static in nft_pipapo_get() · ab0beafd
      Pablo Neira Ayuso authored
      This has slipped through when reducing memory footprint for set
      elements, remove it.
      
      Fixes: 9dad402b ("netfilter: nf_tables: expose opaque set element as struct nft_elem_priv")
      Reported-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ab0beafd
    • Linus Torvalds's avatar
      Merge tag 'v6.8-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 04737196
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
       "Fix regressions in cbc and algif_hash, as well as an older
        NULL-pointer dereference in ccp"
      
      * tag 'v6.8-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: algif_hash - Remove bogus SGL free on zero-length error path
        crypto: cbc - Ensure statesize is zero
        crypto: ccp - Fix null pointer dereference in __sev_platform_shutdown_locked
      04737196
    • Linus Torvalds's avatar
      Merge tag 'percpu-for-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu · 860d7dcb
      Linus Torvalds authored
      Pull percpu fix from Dennis Zhou:
      
       - fix riscv wrong size passed to local_flush_tlb_range_asid()
      
      * tag 'percpu-for-6.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
        riscv: Fix wrong size passed to local_flush_tlb_range_asid()
      860d7dcb
  3. 07 Feb, 2024 15 commits
    • Pablo Neira Ayuso's avatar
      netfilter: nft_compat: restrict match/target protocol to u16 · d694b754
      Pablo Neira Ayuso authored
      xt_check_{match,target} expects u16, but NFTA_RULE_COMPAT_PROTO is u32.
      
      NLA_POLICY_MAX(NLA_BE32, 65535) cannot be used because .max in
      nla_policy is s16, see 3e48be05 ("netlink: add attribute range
      validation to policy").
      
      Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d694b754
    • Pablo Neira Ayuso's avatar
      netfilter: nft_compat: reject unused compat flag · 292781c3
      Pablo Neira Ayuso authored
      Flag (1 << 0) is ignored is set, never used, reject it it with EINVAL
      instead.
      
      Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      292781c3
    • Pablo Neira Ayuso's avatar
      netfilter: nft_compat: narrow down revision to unsigned 8-bits · 36fa8d69
      Pablo Neira Ayuso authored
      xt_find_revision() expects u8, restrict it to this datatype.
      
      Fixes: 0ca743a5 ("netfilter: nf_tables: add compatibility layer for x_tables")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      36fa8d69
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2024-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · 335bac1d
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.8-rc4
      
      This time we have unusually large wireless pull request. Several
      functionality fixes to both stack and iwlwifi. Lots of fixes to
      warnings, especially to MODULE_DESCRIPTION().
      
      * tag 'wireless-2024-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: (31 commits)
        wifi: mt76: mt7996: fix fortify warning
        wifi: brcmfmac: Adjust n_channels usage for __counted_by
        wifi: iwlwifi: do not announce EPCS support
        wifi: iwlwifi: exit eSR only after the FW does
        wifi: iwlwifi: mvm: fix a battery life regression
        wifi: mac80211: accept broadcast probe responses on 6 GHz
        wifi: mac80211: adding missing drv_mgd_complete_tx() call
        wifi: mac80211: fix waiting for beacons logic
        wifi: mac80211: fix unsolicited broadcast probe config
        wifi: mac80211: initialize SMPS mode correctly
        wifi: mac80211: fix driver debugfs for vif type change
        wifi: mac80211: set station RX-NSS on reconfig
        wifi: mac80211: fix RCU use in TDLS fast-xmit
        wifi: mac80211: improve CSA/ECSA connection refusal
        wifi: cfg80211: detect stuck ECSA element in probe resp
        wifi: iwlwifi: remove extra kernel-doc
        wifi: fill in MODULE_DESCRIPTION()s for mt76 drivers
        wifi: fill in MODULE_DESCRIPTION()s for wilc1000
        wifi: fill in MODULE_DESCRIPTION()s for wl18xx
        wifi: fill in MODULE_DESCRIPTION()s for p54spi
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20240206095722.CD9D2C433F1@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      335bac1d
    • Linus Torvalds's avatar
      Merge tag 'loongarch-fixes-6.8-2' of... · 547ab8fc
      Linus Torvalds authored
      Merge tag 'loongarch-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
      
      Pull LoongArch fixes from Huacai Chen:
       "Fix acpi_core_pic[] array overflow, fix earlycon parameter if KASAN
        enabled, disable UBSAN instrumentation for vDSO build, and two Kconfig
        cleanups"
      
      * tag 'loongarch-fixes-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
        LoongArch: vDSO: Disable UBSAN instrumentation
        LoongArch: Fix earlycon parameter if KASAN enabled
        LoongArch: Change acpi_core_pic[NR_CPUS] to acpi_core_pic[MAX_CORE_PIC]
        LoongArch: Select HAVE_ARCH_SECCOMP to use the common SECCOMP menu
        LoongArch: Select ARCH_ENABLE_THP_MIGRATION instead of redefining it
      547ab8fc
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 5c24ba20
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "x86 guest:
      
         - Avoid false positive for check that only matters on AMD processors
      
        x86:
      
         - Give a hint when Win2016 might fail to boot due to XSAVES &&
           !XSAVEC configuration
      
         - Do not allow creating an in-kernel PIT unless an IOAPIC already
           exists
      
        RISC-V:
      
         - Allow ISA extensions that were enabled for bare metal in 6.8 (Zbc,
           scalar and vector crypto, Zfh[min], Zihintntl, Zvfh[min], Zfa)
      
        S390:
      
         - fix CC for successful PQAP instruction
      
         - fix a race when creating a shadow page"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        x86/coco: Define cc_vendor without CONFIG_ARCH_HAS_CC_PLATFORM
        x86/kvm: Fix SEV check in sev_map_percpu_data()
        KVM: x86: Give a hint when Win2016 might fail to boot due to XSAVES erratum
        KVM: x86: Check irqchip mode before create PIT
        KVM: riscv: selftests: Add Zfa extension to get-reg-list test
        RISC-V: KVM: Allow Zfa extension for Guest/VM
        KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test
        RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM
        KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test
        RISC-V: KVM: Allow Zihintntl extension for Guest/VM
        KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test
        RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM
        KVM: riscv: selftests: Add vector crypto extensions to get-reg-list test
        RISC-V: KVM: Allow vector crypto extensions for Guest/VM
        KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list test
        RISC-V: KVM: Allow scalar crypto extensions for Guest/VM
        KVM: riscv: selftests: Add Zbc extension to get-reg-list test
        RISC-V: KVM: Allow Zbc extension for Guest/VM
        KVM: s390: fix cc for successful PQAP
        KVM: s390: vsie: fix race during shadow creation
      5c24ba20
    • Linus Torvalds's avatar
      Merge tag 'nfsd-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux · c8d80f83
      Linus Torvalds authored
      Pull nfsd fix from Chuck Lever:
      
       - Address a deadlock regression in RELEASE_LOCKOWNER
      
      * tag 'nfsd-6.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
        nfsd: don't take fi_lock in nfsd_break_deleg_cb()
      c8d80f83
    • Jesse Brandeburg's avatar
      net: intel: fix old compiler regressions · 75428f53
      Jesse Brandeburg authored
      The kernel build regressions/improvements email contained a couple of
      issues with old compilers (in fact all the reports were on different
      architectures, but all gcc 5.5) and the FIELD_PREP() and FIELD_GET()
      conversions. They're all because an integer #define that should have
      been declared as unsigned, was shifted to the point that it could set
      the sign bit.
      
      The fix just involves making sure the defines use the "U" identifier on
      the constants to make sure they're unsigned. Should make the checkers
      happier.
      
      Confirmed with objdump before/after that there is no change to the
      binaries.
      
      Issues were reported as follows:
      ./drivers/net/ethernet/intel/ice/ice_base.c:238:7: note: in expansion of macro 'FIELD_GET'
            (FIELD_GET(GLINT_CTL_ITR_GRAN_25_M, regval) == ICE_ITR_GRAN_US))
             ^
      ./include/linux/compiler_types.h:435:38: error: call to '__compiletime_assert_1093' declared with attribute error: FIELD_GET: mask is not constant
      drivers/net/ethernet/intel/ice/ice_nvm.c:709:16: note: in expansion of macro ‘FIELD_GET’
        orom->major = FIELD_GET(ICE_OROM_VER_MASK, combo_ver);
                      ^
      ./include/linux/compiler_types.h:435:38: error: call to ‘__compiletime_assert_796’ declared with attribute error: FIELD_GET: mask is not constant
      drivers/net/ethernet/intel/ice/ice_common.c:945:18: note: in expansion of macro ‘FIELD_GET’
        u8 max_agg_bw = FIELD_GET(GL_PWR_MODE_CTL_CAR_MAX_BW_M,
                        ^
      ./include/linux/compiler_types.h:435:38: error: call to ‘__compiletime_assert_420’ declared with attribute error: FIELD_GET: mask is not constant
      drivers/net/ethernet/intel/i40e/i40e_dcb.c:458:8: note: in expansion of macro ‘FIELD_GET’
        oui = FIELD_GET(I40E_LLDP_TLV_OUI_MASK, ouisubtype);
              ^
      Reported-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Closes: https://lore.kernel.org/lkml/d03e90ca-8485-4d1b-5ec1-c3398e0e8da@linux-m68k.org/ #i40e #ice
      Fixes: 62589808 ("i40e: field get conversion")
      Fixes: 5a259f8e ("ice: field get conversion")
      Signed-off-by: default avatarJesse Brandeburg <jesse.brandeburg@intel.com>
      Link: https://lore.kernel.org/r/20240206022906.2194214-1-jesse.brandeburg@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      75428f53
    • Allison Henderson's avatar
      MAINTAINERS: Maintainer change for rds · 5001bfe9
      Allison Henderson authored
      At this point, Santosh has moved onto other things and I am happy
      to take over the role of rds maintainer. Update the MAINTAINERS
      accordingly.
      Signed-off-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Link: https://lore.kernel.org/r/20240205190343.112436-1-allison.henderson@oracle.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5001bfe9
    • Jakub Kicinski's avatar
      selftests: cmsg_ipv6: repeat the exact packet · 4b00d0c5
      Jakub Kicinski authored
      cmsg_ipv6 test requests tcpdump to capture 4 packets,
      and sends until tcpdump quits. Only the first packet
      is "real", however, and the rest are basic UDP packets.
      So if tcpdump doesn't start in time it will miss
      the real packet and only capture the UDP ones.
      
      This makes the test fail on slow machine (no KVM or with
      debug enabled) 100% of the time, while it passes in fast
      environments.
      
      Repeat the "real" / expected packet.
      
      Fixes: 9657ad09 ("selftests: net: test IPV6_TCLASS")
      Fixes: 05ae83d5 ("selftests: net: test IPV6_HOPLIMIT")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b00d0c5
    • Petr Tesarik's avatar
      net: stmmac: protect updates of 64-bit statistics counters · 38cc3c6d
      Petr Tesarik authored
      As explained by a comment in <linux/u64_stats_sync.h>, write side of struct
      u64_stats_sync must ensure mutual exclusion, or one seqcount update could
      be lost on 32-bit platforms, thus blocking readers forever. Such lockups
      have been observed in real world after stmmac_xmit() on one CPU raced with
      stmmac_napi_poll_tx() on another CPU.
      
      To fix the issue without introducing a new lock, split the statics into
      three parts:
      
      1. fields updated only under the tx queue lock,
      2. fields updated only during NAPI poll,
      3. fields updated only from interrupt context,
      
      Updates to fields in the first two groups are already serialized through
      other locks. It is sufficient to split the existing struct u64_stats_sync
      so that each group has its own.
      
      Note that tx_set_ic_bit is updated from both contexts. Split this counter
      so that each context gets its own, and calculate their sum to get the total
      value in stmmac_get_ethtool_stats().
      
      For the third group, multiple interrupts may be processed by different CPUs
      at the same time, but interrupts on the same CPU will not nest. Move fields
      from this group to a newly created per-cpu struct stmmac_pcpu_stats.
      
      Fixes: 133466c3 ("net: stmmac: use per-queue 64 bit statistics where necessary")
      Link: https://lore.kernel.org/netdev/Za173PhviYg-1qIn@torres.zugschlus.de/t/
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPetr Tesarik <petr@tesarici.cz>
      Reviewed-by: default avatarJisheng Zhang <jszhang@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      38cc3c6d
    • Linus Torvalds's avatar
      Merge tag 'for-6.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 6d280f4d
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - two fixes preventing deletion and manual creation of subvolume qgroup
      
       - unify error code returned for unknown send flags
      
       - fix assertion during subvolume creation when anonymous device could
         be allocated by other thread (e.g. due to backref walk)
      
      * tag 'for-6.8-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: do not ASSERT() if the newly created subvolume already got read
        btrfs: forbid deleting live subvol qgroup
        btrfs: forbid creating subvol qgroups
        btrfs: send: return EOPNOTSUPP on unknown flags
      6d280f4d
    • Eric Dumazet's avatar
      ppp_async: limit MRU to 64K · cb88cb53
      Eric Dumazet authored
      syzbot triggered a warning [1] in __alloc_pages():
      
      WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)
      
      Willem fixed a similar issue in commit c0a2a1b0 ("ppp: limit MRU to 64K")
      
      Adopt the same sanity check for ppp_async_ioctl(PPPIOCSMRU)
      
      [1]:
      
       WARNING: CPU: 1 PID: 11 at mm/page_alloc.c:4543 __alloc_pages+0x308/0x698 mm/page_alloc.c:4543
      Modules linked in:
      CPU: 1 PID: 11 Comm: kworker/u4:0 Not tainted 6.8.0-rc2-syzkaller-g41bccc98 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
      Workqueue: events_unbound flush_to_ldisc
      pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
       pc : __alloc_pages+0x308/0x698 mm/page_alloc.c:4543
       lr : __alloc_pages+0xc8/0x698 mm/page_alloc.c:4537
      sp : ffff800093967580
      x29: ffff800093967660 x28: ffff8000939675a0 x27: dfff800000000000
      x26: ffff70001272ceb4 x25: 0000000000000000 x24: ffff8000939675c0
      x23: 0000000000000000 x22: 0000000000060820 x21: 1ffff0001272ceb8
      x20: ffff8000939675e0 x19: 0000000000000010 x18: ffff800093967120
      x17: ffff800083bded5c x16: ffff80008ac97500 x15: 0000000000000005
      x14: 1ffff0001272cebc x13: 0000000000000000 x12: 0000000000000000
      x11: ffff70001272cec1 x10: 1ffff0001272cec0 x9 : 0000000000000001
      x8 : ffff800091c91000 x7 : 0000000000000000 x6 : 000000000000003f
      x5 : 00000000ffffffff x4 : 0000000000000000 x3 : 0000000000000020
      x2 : 0000000000000008 x1 : 0000000000000000 x0 : ffff8000939675e0
      Call trace:
        __alloc_pages+0x308/0x698 mm/page_alloc.c:4543
        __alloc_pages_node include/linux/gfp.h:238 [inline]
        alloc_pages_node include/linux/gfp.h:261 [inline]
        __kmalloc_large_node+0xbc/0x1fc mm/slub.c:3926
        __do_kmalloc_node mm/slub.c:3969 [inline]
        __kmalloc_node_track_caller+0x418/0x620 mm/slub.c:4001
        kmalloc_reserve+0x17c/0x23c net/core/skbuff.c:590
        __alloc_skb+0x1c8/0x3d8 net/core/skbuff.c:651
        __netdev_alloc_skb+0xb8/0x3e8 net/core/skbuff.c:715
        netdev_alloc_skb include/linux/skbuff.h:3235 [inline]
        dev_alloc_skb include/linux/skbuff.h:3248 [inline]
        ppp_async_input drivers/net/ppp/ppp_async.c:863 [inline]
        ppp_asynctty_receive+0x588/0x186c drivers/net/ppp/ppp_async.c:341
        tty_ldisc_receive_buf+0x12c/0x15c drivers/tty/tty_buffer.c:390
        tty_port_default_receive_buf+0x74/0xac drivers/tty/tty_port.c:37
        receive_buf drivers/tty/tty_buffer.c:444 [inline]
        flush_to_ldisc+0x284/0x6e4 drivers/tty/tty_buffer.c:494
        process_one_work+0x694/0x1204 kernel/workqueue.c:2633
        process_scheduled_works kernel/workqueue.c:2706 [inline]
        worker_thread+0x938/0xef4 kernel/workqueue.c:2787
        kthread+0x288/0x310 kernel/kthread.c:388
        ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:860
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-and-tested-by: syzbot+c5da1f087c9e4ec6c933@syzkaller.appspotmail.com
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20240205171004.1059724-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cb88cb53
    • Jiri Pirko's avatar
      devlink: avoid potential loop in devlink_rel_nested_in_notify_work() · 58086721
      Jiri Pirko authored
      In case devlink_rel_nested_in_notify_work() can not take the devlink
      lock mutex. Convert the work to delayed work and in case of reschedule
      do it jiffie later and avoid potential looping.
      Suggested-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Fixes: c137743b ("devlink: introduce object and nested devlink relationship infra")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240205171114.338679-1-jiri@resnulli.usSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      58086721
    • Kuniyuki Iwashima's avatar
      af_unix: Call kfree_skb() for dead unix_(sk)->oob_skb in GC. · 1279f9d9
      Kuniyuki Iwashima authored
      syzbot reported a warning [0] in __unix_gc() with a repro, which
      creates a socketpair and sends one socket's fd to itself using the
      peer.
      
        socketpair(AF_UNIX, SOCK_STREAM, 0, [3, 4]) = 0
        sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\360", iov_len=1}],
                msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
                                            cmsg_type=SCM_RIGHTS, cmsg_data=[3]}],
                msg_controllen=24, msg_flags=0}, MSG_OOB|MSG_PROBE|MSG_DONTWAIT|MSG_ZEROCOPY) = 1
      
      This forms a self-cyclic reference that GC should finally untangle
      but does not due to lack of MSG_OOB handling, resulting in memory
      leak.
      
      Recently, commit 11498715 ("af_unix: Remove io_uring code for
      GC.") removed io_uring's dead code in GC and revealed the problem.
      
      The code was executed at the final stage of GC and unconditionally
      moved all GC candidates from gc_candidates to gc_inflight_list.
      That papered over the reported problem by always making the following
      WARN_ON_ONCE(!list_empty(&gc_candidates)) false.
      
      The problem has been there since commit 2aab4b96 ("af_unix: fix
      struct pid leaks in OOB support") added full scm support for MSG_OOB
      while fixing another bug.
      
      To fix this problem, we must call kfree_skb() for unix_sk(sk)->oob_skb
      if the socket still exists in gc_candidates after purging collected skb.
      
      Then, we need to set NULL to oob_skb before calling kfree_skb() because
      it calls last fput() and triggers unix_release_sock(), where we call
      duplicate kfree_skb(u->oob_skb) if not NULL.
      
      Note that the leaked socket remained being linked to a global list, so
      kmemleak also could not detect it.  We need to check /proc/net/protocol
      to notice the unfreed socket.
      
      [0]:
      WARNING: CPU: 0 PID: 2863 at net/unix/garbage.c:345 __unix_gc+0xc74/0xe80 net/unix/garbage.c:345
      Modules linked in:
      CPU: 0 PID: 2863 Comm: kworker/u4:11 Not tainted 6.8.0-rc1-syzkaller-00583-g1701940b #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
      Workqueue: events_unbound __unix_gc
      RIP: 0010:__unix_gc+0xc74/0xe80 net/unix/garbage.c:345
      Code: 8b 5c 24 50 e9 86 f8 ff ff e8 f8 e4 22 f8 31 d2 48 c7 c6 30 6a 69 89 4c 89 ef e8 97 ef ff ff e9 80 f9 ff ff e8 dd e4 22 f8 90 <0f> 0b 90 e9 7b fd ff ff 48 89 df e8 5c e7 7c f8 e9 d3 f8 ff ff e8
      RSP: 0018:ffffc9000b03fba0 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: ffffc9000b03fc10 RCX: ffffffff816c493e
      RDX: ffff88802c02d940 RSI: ffffffff896982f3 RDI: ffffc9000b03fb30
      RBP: ffffc9000b03fce0 R08: 0000000000000001 R09: fffff52001607f66
      R10: 0000000000000003 R11: 0000000000000002 R12: dffffc0000000000
      R13: ffffc9000b03fc10 R14: ffffc9000b03fc10 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff8880b9400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00005559c8677a60 CR3: 000000000d57a000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       process_one_work+0x889/0x15e0 kernel/workqueue.c:2633
       process_scheduled_works kernel/workqueue.c:2706 [inline]
       worker_thread+0x8b9/0x12a0 kernel/workqueue.c:2787
       kthread+0x2c6/0x3b0 kernel/kthread.c:388
       ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1b/0x30 arch/x86/entry/entry_64.S:242
       </TASK>
      
      Reported-by: syzbot+fa3ef895554bdbfd1183@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=fa3ef895554bdbfd1183
      Fixes: 2aab4b96 ("af_unix: fix struct pid leaks in OOB support")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240203183149.63573-1-kuniyu@amazon.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1279f9d9
  4. 06 Feb, 2024 4 commits
    • Furong Xu's avatar
      net: stmmac: xgmac: fix a typo of register name in DPP safety handling · 1ce2654d
      Furong Xu authored
      DDPP is copied from Synopsys Data book:
      
      DDPP: Disable Data path Parity Protection.
          When it is 0x0, Data path Parity Protection is enabled.
          When it is 0x1, Data path Parity Protection is disabled.
      
      The macro name should be XGMAC_DPP_DISABLE.
      
      Fixes: 46eba193 ("net: stmmac: xgmac: fix handling of DPP safety error for DMA channels")
      Signed-off-by: default avatarFurong Xu <0x1207@gmail.com>
      Reviewed-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Link: https://lore.kernel.org/r/20240203053133.1129236-1-0x1207@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1ce2654d
    • Dmitry Safonov's avatar
      selftests/net: Amend per-netns counter checks · b083d24f
      Dmitry Safonov authored
      Selftests here check not only that connect()/accept() for
      TCP-AO/TCP-MD5/non-signed-TCP combinations do/don't establish
      connections, but also counters: those are per-AO-key, per-socket and
      per-netns.
      
      The counters are checked on the server's side, as the server listener
      has TCP-AO/TCP-MD5/no keys for different peers. All tests run in
      the same namespaces with the same veth pair, created in test_init().
      
      After close() in both client and server, the sides go through
      the regular FIN/ACK + FIN/ACK sequence, which goes in the background.
      If the selftest has already started a new testing scenario, read
      per-netns counters - it may fail in the end iff it doesn't expect
      the TCPAOGood per-netns counters go up during the test.
      
      Let's just kill both TCP-AO sides - that will avoid any asynchronous
      background TCP-AO segments going to either sides.
      Reported-by: default avatarJakub Kicinski <kuba@kernel.org>
      Closes: https://lore.kernel.org/all/20240201132153.4d68f45e@kernel.org/T/#u
      Fixes: 6f0c472a ("selftests/net: Add TCP-AO + TCP-MD5 + no sign listen socket tests")
      Signed-off-by: default avatarDmitry Safonov <dima@arista.com>
      Link: https://lore.kernel.org/r/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b083d24f
    • Nathan Chancellor's avatar
      x86/coco: Define cc_vendor without CONFIG_ARCH_HAS_CC_PLATFORM · e4596477
      Nathan Chancellor authored
      After commit a9ef2774 ("x86/kvm: Fix SEV check in
      sev_map_percpu_data()"), there is a build error when building
      x86_64_defconfig with GCOV using LLVM:
      
        ld.lld: error: undefined symbol: cc_vendor
        >>> referenced by kvm.c
        >>>               arch/x86/kernel/kvm.o:(kvm_smp_prepare_boot_cpu) in archive vmlinux.a
      
      which corresponds to
      
        if (cc_vendor != CC_VENDOR_AMD ||
            !cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT))
                  return;
      
      Without GCOV, clang is able to eliminate the use of cc_vendor because
      cc_platform_has() evaluates to false when CONFIG_ARCH_HAS_CC_PLATFORM is
      not set, meaning that if statement will be true no matter what value
      cc_vendor has.
      
      With GCOV, the instrumentation keeps the use of cc_vendor around for
      code coverage purposes but cc_vendor is only declared, not defined,
      without CONFIG_ARCH_HAS_CC_PLATFORM, leading to the build error above.
      
      Provide a macro definition of cc_vendor when CONFIG_ARCH_HAS_CC_PLATFORM
      is not set with a value of CC_VENDOR_NONE, so that the first condition
      can always be evaluated/eliminated at compile time, avoiding the build
      error altogether. This is very similar to the situation prior to
      commit da86eb96 ("x86/coco: Get rid of accessor functions").
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Acked-by: default avatarBorislav Petkov (AMD) <bp@alien8.de>
      Message-Id: <20240202-provide-cc_vendor-without-arch_has_cc_platform-v1-1-09ad5f2a3099@kernel.org>
      Fixes: a9ef2774 ("x86/kvm: Fix SEV check in sev_map_percpu_data()", 2024-01-31)
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      e4596477
    • Shigeru Yoshida's avatar
      tipc: Check the bearer type before calling tipc_udp_nl_bearer_add() · 3871aa01
      Shigeru Yoshida authored
      syzbot reported the following general protection fault [1]:
      
      general protection fault, probably for non-canonical address 0xdffffc0000000010: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000080-0x0000000000000087]
      ...
      RIP: 0010:tipc_udp_is_known_peer+0x9c/0x250 net/tipc/udp_media.c:291
      ...
      Call Trace:
       <TASK>
       tipc_udp_nl_bearer_add+0x212/0x2f0 net/tipc/udp_media.c:646
       tipc_nl_bearer_add+0x21e/0x360 net/tipc/bearer.c:1089
       genl_family_rcv_msg_doit+0x1fc/0x2e0 net/netlink/genetlink.c:972
       genl_family_rcv_msg net/netlink/genetlink.c:1052 [inline]
       genl_rcv_msg+0x561/0x800 net/netlink/genetlink.c:1067
       netlink_rcv_skb+0x16b/0x440 net/netlink/af_netlink.c:2544
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1076
       netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
       netlink_unicast+0x53b/0x810 net/netlink/af_netlink.c:1367
       netlink_sendmsg+0x8b7/0xd70 net/netlink/af_netlink.c:1909
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg+0xd5/0x180 net/socket.c:745
       ____sys_sendmsg+0x6ac/0x940 net/socket.c:2584
       ___sys_sendmsg+0x135/0x1d0 net/socket.c:2638
       __sys_sendmsg+0x117/0x1e0 net/socket.c:2667
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      The cause of this issue is that when tipc_nl_bearer_add() is called with
      the TIPC_NLA_BEARER_UDP_OPTS attribute, tipc_udp_nl_bearer_add() is called
      even if the bearer is not UDP.
      
      tipc_udp_is_known_peer() called by tipc_udp_nl_bearer_add() assumes that
      the media_ptr field of the tipc_bearer has an udp_bearer type object, so
      the function goes crazy for non-UDP bearers.
      
      This patch fixes the issue by checking the bearer type before calling
      tipc_udp_nl_bearer_add() in tipc_nl_bearer_add().
      
      Fixes: ef20cd4d ("tipc: introduce UDP replicast")
      Reported-and-tested-by: syzbot+5142b87a9abc510e14fa@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=5142b87a9abc510e14fa [1]
      Signed-off-by: default avatarShigeru Yoshida <syoshida@redhat.com>
      Reviewed-by: default avatarTung Nguyen <tung.q.nguyen@dektech.com.au>
      Link: https://lore.kernel.org/r/20240131152310.4089541-1-syoshida@redhat.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      3871aa01