1. 16 Aug, 2024 7 commits
  2. 15 Aug, 2024 18 commits
    • Linus Torvalds's avatar
      Merge tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · a4a35f6c
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Including fixes from wireless and netfilter
      
        Current release - regressions:
      
         - udp: fall back to software USO if IPv6 extension headers are
           present
      
         - wifi: iwlwifi: correctly lookup DMA address in SG table
      
        Current release - new code bugs:
      
         - eth: mlx5e: fix queue stats access to non-existing channels splat
      
        Previous releases - regressions:
      
         - eth: mlx5e: take state lock during tx timeout reporter
      
         - eth: mlxbf_gige: disable RX filters until RX path initialized
      
         - eth: igc: fix reset adapter logics when tx mode change
      
        Previous releases - always broken:
      
         - tcp: update window clamping condition
      
         - netfilter:
            - nf_queue: drop packets with cloned unconfirmed conntracks
            - nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
      
         - vsock: fix recursive ->recvmsg calls
      
         - dsa: vsc73xx: fix MDIO bus access and PHY opera
      
         - eth: gtp: pull network headers in gtp_dev_xmit()
      
         - eth: igc: fix packet still tx after gate close by reducing i226 MAC
           retry buffer
      
         - eth: mana: fix RX buf alloc_size alignment and atomic op panic
      
         - eth: hns3: fix a deadlock problem when config TC during resetting"
      
      * tag 'net-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (58 commits)
        net: hns3: use correct release function during uninitialization
        net: hns3: void array out of bound when loop tnl_num
        net: hns3: fix a deadlock problem when config TC during resetting
        net: hns3: use the user's cfg after reset
        net: hns3: fix wrong use of semaphore up
        selftests: net: lib: kill PIDs before del netns
        pse-core: Conditionally set current limit during PI regulator registration
        net: thunder_bgx: Fix netdev structure allocation
        net: ethtool: Allow write mechanism of LPL and both LPL and EPL
        vsock: fix recursive ->recvmsg calls
        selftest: af_unix: Fix kselftest compilation warnings
        netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
        netfilter: nf_tables: Introduce nf_tables_getobj_single
        netfilter: nf_tables: Audit log dump reset after the fact
        selftests: netfilter: add test for br_netfilter+conntrack+queue combination
        netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
        netfilter: flowtable: initialise extack before use
        netfilter: nfnetlink: Initialise extack before use in ACKs
        netfilter: allow ipv6 fragments to arrive on different devices
        tcp: Update window clamping condition
        ...
      a4a35f6c
    • Linus Torvalds's avatar
      Merge tag 'media/v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 20573d8e
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "Two regression fixes:
      
         - fix atomisp support for ISP2400
      
         - fix dvb-usb regression for TeVii s480 dual DVB-S2 S660 board"
      
      * tag 'media/v6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices
        media: Revert "media: dvb-usb: Fix unexpected infinite loop in dvb_usb_read_remote_control()"
      20573d8e
    • Linus Torvalds's avatar
      Merge tag 'ata-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux · 6e80a1fd
      Linus Torvalds authored
      Pull ata fix from Niklas Cassel:
      
       - Revert a recent change to sense data generation.
      
         Sense data can be in either fixed format or descriptor format.
      
         The D_SENSE bit in the Control mode page controls which format to
         generate. All places but one respected the D_SENSE bit.
      
         The recent change fixed the one place that didn't respect the D_SENSE
         bit. However, it turns out that hdparm, hddtemp and udisks
         (incorrectly) assumes sense data in descriptor format.
      
         Therefore, even while the change was technically correct, revert it,
         since even if these user space programs are fixed to (correctly) look
         at the format type before parsing the data, older versions of these
         tools will be around roughly forever.
      
      * tag 'ata-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
        Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error"
      6e80a1fd
    • Paolo Abeni's avatar
      Merge tag 'nf-24-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf · 9c5af2d7
      Paolo Abeni authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Ignores ifindex for types other than mcast/linklocal in ipv6 frag
         reasm, from Tom Hughes.
      
      2) Initialize extack for begin/end netlink message marker in batch,
         from Donald Hunter.
      
      3) Initialize extack for flowtable offload support, also from Donald.
      
      4) Dropped packets with cloned unconfirmed conntracks in nfqueue,
         later it should be possible to explore lookup after reinject but
         Florian prefers this approach at this stage. From Florian Westphal.
      
      5) Add selftest for cloned unconfirmed conntracks in nfqueue for
         previous update.
      
      6) Audit after filling netlink header successfully in object dump,
         from Phil Sutter.
      
      7-8) Fix concurrent dump and reset which could result in underflow
           counter / quota objects.
      
      netfilter pull request 24-08-15
      
      * tag 'nf-24-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
        netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
        netfilter: nf_tables: Introduce nf_tables_getobj_single
        netfilter: nf_tables: Audit log dump reset after the fact
        selftests: netfilter: add test for br_netfilter+conntrack+queue combination
        netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
        netfilter: flowtable: initialise extack before use
        netfilter: nfnetlink: Initialise extack before use in ACKs
        netfilter: allow ipv6 fragments to arrive on different devices
      ====================
      
      Link: https://patch.msgid.link/20240814222042.150590-1-pablo@netfilter.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9c5af2d7
    • Paolo Abeni's avatar
      Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver' · 34dfdf21
      Paolo Abeni authored
      Jijie Shao says:
      
      ====================
      There are some bugfix for the HNS3 ethernet driver
      ====================
      
      Link: https://patch.msgid.link/20240813141024.1707252-1-shaojijie@huawei.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      34dfdf21
    • Peiyang Wang's avatar
      net: hns3: use correct release function during uninitialization · 7660833d
      Peiyang Wang authored
      pci_request_regions is called to apply for PCI I/O and memory resources
      when the driver is initialized, Therefore, when the driver is uninstalled,
      pci_release_regions should be used to release PCI I/O and memory resources
      instead of pci_release_mem_regions is used to release memory reasouces
      only.
      Signed-off-by: default avatarPeiyang Wang <wangpeiyang1@huawei.com>
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7660833d
    • Peiyang Wang's avatar
      net: hns3: void array out of bound when loop tnl_num · 86db7bfb
      Peiyang Wang authored
      When query reg inf of SSU, it loops tnl_num times. However, tnl_num comes
      from hardware and the length of array is a fixed value. To void array out
      of bound, make sure the loop time is not greater than the length of array
      Signed-off-by: default avatarPeiyang Wang <wangpeiyang1@huawei.com>
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      86db7bfb
    • Jie Wang's avatar
      net: hns3: fix a deadlock problem when config TC during resetting · be5e816d
      Jie Wang authored
      When config TC during the reset process, may cause a deadlock, the flow is
      as below:
                                   pf reset start
                                       │
                                       ▼
                                    ......
      setup tc                         │
          │                            ▼
          ▼                      DOWN: napi_disable()
      napi_disable()(skip)             │
          │                            │
          ▼                            ▼
        ......                      ......
          │                            │
          ▼                            │
      napi_enable()                    │
                                       ▼
                                 UINIT: netif_napi_del()
                                       │
                                       ▼
                                    ......
                                       │
                                       ▼
                                 INIT: netif_napi_add()
                                       │
                                       ▼
                                    ......                 global reset start
                                       │                      │
                                       ▼                      ▼
                                 UP: napi_enable()(skip)    ......
                                       │                      │
                                       ▼                      ▼
                                    ......                 napi_disable()
      
      In reset process, the driver will DOWN the port and then UINIT, in this
      case, the setup tc process will UP the port before UINIT, so cause the
      problem. Adds a DOWN process in UINIT to fix it.
      
      Fixes: bb6b94a8 ("net: hns3: Add reset interface implementation in client")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      be5e816d
    • Peiyang Wang's avatar
      net: hns3: use the user's cfg after reset · 30545e17
      Peiyang Wang authored
      Consider the followed case that the user change speed and reset the net
      interface. Before the hw change speed successfully, the driver get old
      old speed from hw by timer task. After reset, the previous speed is config
      to hw. As a result, the new speed is configed successfully but lost after
      PF reset. The followed pictured shows more dirrectly.
      
      +------+              +----+                 +----+
      | USER |              | PF |                 | HW |
      +---+--+              +-+--+                 +-+--+
          |  ethtool -s 100G  |                      |
          +------------------>|   set speed 100G     |
          |                   +--------------------->|
          |                   |  set successfully    |
          |                   |<---------------------+---+
          |                   |query cfg (timer task)|   |
          |                   +--------------------->|   | handle speed
          |                   |     return 200G      |   | changing event
          |  ethtool --reset  |<---------------------+   | (100G)
          +------------------>|  cfg previous speed  |<--+
          |                   |  after reset (200G)  |
          |                   +--------------------->|
          |                   |                      +---+
          |                   |query cfg (timer task)|   |
          |                   +--------------------->|   | handle speed
          |                   |     return 100G      |   | changing event
          |                   |<---------------------+   | (200G)
          |                   |                      |<--+
          |                   |query cfg (timer task)|
          |                   +--------------------->|
          |                   |     return 200G      |
          |                   |<---------------------+
          |                   |                      |
          v                   v                      v
      
      This patch save new speed if hw change speed successfully, which will be
      used after reset successfully.
      
      Fixes: 2d03eacc ("net: hns3: Only update mac configuation when necessary")
      Signed-off-by: default avatarPeiyang Wang <wangpeiyang1@huawei.com>
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      30545e17
    • Jie Wang's avatar
      net: hns3: fix wrong use of semaphore up · 8445d9d3
      Jie Wang authored
      Currently, if hns3 PF or VF FLR reset failed after five times retry,
      the reset done process will directly release the semaphore
      which has already released in hclge_reset_prepare_general.
      This will cause down operation fail.
      
      So this patch fixes it by adding reset state judgement. The up operation is
      only called after successful PF FLR reset.
      
      Fixes: 8627bded ("net: hns3: refactor the precedure of PF FLR")
      Fixes: f28368bb ("net: hns3: refactor the procedure of VF FLR")
      Signed-off-by: default avatarJie Wang <wangjie125@huawei.com>
      Signed-off-by: default avatarJijie Shao <shaojijie@huawei.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      8445d9d3
    • Matthieu Baerts (NGI0)'s avatar
      selftests: net: lib: kill PIDs before del netns · 7965a7f3
      Matthieu Baerts (NGI0) authored
      When deleting netns, it is possible to still have some tasks running,
      e.g. background tasks like tcpdump running in the background, not
      stopped because the test has been interrupted.
      
      Before deleting the netns, it is then safer to kill all attached PIDs,
      if any. That should reduce some noises after the end of some tests, and
      help with the debugging of some issues. That's why this modification is
      seen as a "fix".
      
      Fixes: 25ae948b ("selftests/net: add lib.sh")
      Acked-by: default avatarMat Martineau <martineau@kernel.org>
      Signed-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://patch.msgid.link/20240813-upstream-net-20240813-selftests-net-lib-kill-v1-1-27b689b248b8@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7965a7f3
    • Oleksij Rempel's avatar
      pse-core: Conditionally set current limit during PI regulator registration · cdc90f75
      Oleksij Rempel authored
      Fix an issue where `devm_regulator_register()` would fail for PSE
      controllers that do not support current limit control, such as simple
      GPIO-based controllers like the podl-pse-regulator. The
      `REGULATOR_CHANGE_CURRENT` flag and `max_uA` constraint are now
      conditionally set only if the `pi_set_current_limit` operation is
      supported. This change prevents the regulator registration routine from
      attempting to call `pse_pi_set_current_limit()`, which would return
      `-EOPNOTSUPP` and cause the registration to fail.
      
      Fixes: 4a83abce ("net: pse-pd: Add new power limit get and set c33 features")
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: default avatarKory Maincent <kory.maincent@bootlin.com>
      Tested-by: default avatarKyle Swenson <kyle.swenson@est.tech>
      Link: https://patch.msgid.link/20240813073719.2304633-1-o.rempel@pengutronix.deSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cdc90f75
    • Marc Zyngier's avatar
      net: thunder_bgx: Fix netdev structure allocation · 1f1b1942
      Marc Zyngier authored
      Commit 94833add ("net: thunderx: Unembed netdev structure") had
      a go at dynamically allocating the netdev structures for the thunderx_bgx
      driver.  This change results in my ThunderX box catching fire (to be fair,
      it is what it does best).
      
      The issues with this change are that:
      
      - bgx_lmac_enable() is called *after* bgx_acpi_register_phy() and
        bgx_init_of_phy(), both expecting netdev to be a valid pointer.
      
      - bgx_init_of_phy() populates the MAC addresses for *all* LMACs
        attached to a given BGX instance, and thus needs netdev for each of
        them to have been allocated.
      
      There is a few things to be said about how the driver mixes LMAC and
      BGX states which leads to this sorry state, but that's beside the point.
      
      To address this, go back to a situation where all netdev structures
      are allocated before the driver starts relying on them, and move the
      freeing of these structures to driver removal. Someone brave enough
      can always go and restructure the driver if they want.
      
      Fixes: 94833add ("net: thunderx: Unembed netdev structure")
      Signed-off-by: default avatarMarc Zyngier <maz@kernel.org>
      Cc: Breno Leitao <leitao@debian.org>
      Cc: Sunil Goutham <sgoutham@marvell.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarBreno Leitao <leitao@debian.org>
      Link: https://patch.msgid.link/20240812141322.1742918-1-maz@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1f1b1942
    • Danielle Ratson's avatar
      net: ethtool: Allow write mechanism of LPL and both LPL and EPL · fde25c20
      Danielle Ratson authored
      CMIS 5.2 standard section 9.4.2 defines four types of firmware update
      supported mechanism: None, only LPL, only EPL, both LPL and EPL.
      
      Currently, only LPL (Local Payload) type of write firmware block is
      supported. However, if the module supports both LPL and EPL the flashing
      process wrongly fails for no supporting LPL.
      
      Fix that, by allowing the write mechanism to be LPL or both LPL and
      EPL.
      
      Fixes: c4f78134 ("ethtool: cmis_fw_update: add a layer for supporting firmware update using CDB")
      Reported-by: default avatarVladyslav Mykhaliuk <vmykhaliuk@nvidia.com>
      Signed-off-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Link: https://patch.msgid.link/20240812140824.3718826-1-danieller@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      fde25c20
    • Cong Wang's avatar
      vsock: fix recursive ->recvmsg calls · 69139d29
      Cong Wang authored
      After a vsock socket has been added to a BPF sockmap, its prot->recvmsg
      has been replaced with vsock_bpf_recvmsg(). Thus the following
      recursiion could happen:
      
      vsock_bpf_recvmsg()
       -> __vsock_recvmsg()
        -> vsock_connectible_recvmsg()
         -> prot->recvmsg()
          -> vsock_bpf_recvmsg() again
      
      We need to fix it by calling the original ->recvmsg() without any BPF
      sockmap logic in __vsock_recvmsg().
      
      Fixes: 634f1a71 ("vsock: support sockmap")
      Reported-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com
      Tested-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com
      Cc: Bobby Eshleman <bobby.eshleman@bytedance.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Stefano Garzarella <sgarzare@redhat.com>
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Link: https://patch.msgid.link/20240812022153.86512-1-xiyou.wangcong@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      69139d29
    • Jakub Kicinski's avatar
      Merge tag 'wireless-2024-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless · b2ca1661
      Jakub Kicinski authored
      Kalle Valo says:
      
      ====================
      wireless fixes for v6.11
      
      We have few fixes to drivers. The most important here is a fix for
      iwlwifi which caused major slowdowns for several users.
      
      * tag 'wireless-2024-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
        wifi: iwlwifi: correctly lookup DMA address in SG table
        wifi: mt76: mt7921: fix NULL pointer access in mt7921_ipv6_addr_change
        wifi: brcmfmac: cfg80211: Handle SSID based pmksa deletion
        wifi: rtlwifi: rtl8192du: Initialise value32 in _rtl92du_init_queue_reserved_page
        wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850
      ====================
      
      Link: https://patch.msgid.link/20240814171606.E14A0C116B1@smtp.kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b2ca1661
    • Abhinav Jain's avatar
      selftest: af_unix: Fix kselftest compilation warnings · 6c569b77
      Abhinav Jain authored
      Change expected_buf from (const void *) to (const char *)
      in function __recvpair().
      This change fixes the below warnings during test compilation:
      
      ```
      In file included from msg_oob.c:14:
      msg_oob.c: In function ‘__recvpair’:
      
      ../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument
      of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]
      
      ../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
      msg_oob.c:235:17: note: in expansion of macro ‘TH_LOG’
      
      ../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument
      of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]
      
      ../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
      msg_oob.c:259:25: note: in expansion of macro ‘TH_LOG’
      ```
      
      Fixes: d098d772 ("selftest: af_unix: Add msg_oob.c.")
      Signed-off-by: default avatarAbhinav Jain <jain.abhinav177@gmail.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://patch.msgid.link/20240814080743.1156166-1-jain.abhinav177@gmail.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      6c569b77
    • Linus Torvalds's avatar
      Merge tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 1fb91896
      Linus Torvalds authored
      Pull btrfs fixes from David Sterba:
      
       - extend tree-checker verification of directory item type
      
       - fix regression in page/folio and extent state tracking in xarray, the
         dirty status can get out of sync and can cause problems e.g. a hang
      
       - in send, detect last extent and allow to clone it instead of sending
         it as write, reduces amount of data transferred in the stream
      
       - fix checking extent references when cleaning deleted subvolumes
      
       - fix one more case in the extent map shrinker, let it run only in the
         kswapd context so it does not cause latency spikes during other
         operations
      
      * tag 'for-6.11-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix invalid mapping of extent xarray state
        btrfs: send: allow cloning non-aligned extent if it ends at i_size
        btrfs: only run the extent map shrinker from kswapd tasks
        btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
        btrfs: check delayed refs when we're checking if a ref exists
      1fb91896
  3. 14 Aug, 2024 15 commits
    • Phil Sutter's avatar
      netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests · bd662c42
      Phil Sutter authored
      Objects' dump callbacks are not concurrency-safe per-se with reset bit
      set. If two CPUs perform a reset at the same time, at least counter and
      quota objects suffer from value underrun.
      
      Prevent this by introducing dedicated locking callbacks for nfnetlink
      and the asynchronous dump handling to serialize access.
      
      Fixes: 43da04a5 ("netfilter: nf_tables: atomic dump and reset for stateful objects")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      bd662c42
    • Phil Sutter's avatar
      netfilter: nf_tables: Introduce nf_tables_getobj_single · 69fc3e9e
      Phil Sutter authored
      Outsource the reply skb preparation for non-dump getrule requests into a
      distinct function. Prep work for object reset locking.
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      69fc3e9e
    • Phil Sutter's avatar
      netfilter: nf_tables: Audit log dump reset after the fact · e0b6648b
      Phil Sutter authored
      In theory, dumpreset may fail and invalidate the preceeding log message.
      Fix this and use the occasion to prepare for object reset locking, which
      benefits from a few unrelated changes:
      
      * Add an early call to nfnetlink_unicast if not resetting which
        effectively skips the audit logging but also unindents it.
      * Extract the table's name from the netlink attribute (which is verified
        via earlier table lookup) to not rely upon validity of the looked up
        table pointer.
      * Do not use local variable family, it will vanish.
      
      Fixes: 8e6cf365 ("audit: log nftables configuration change events")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e0b6648b
    • Florian Westphal's avatar
      selftests: netfilter: add test for br_netfilter+conntrack+queue combination · ea2306f0
      Florian Westphal authored
      Trigger cloned skbs leaving softirq protection.
      This triggers splat without the preceeding change
      ("netfilter: nf_queue: drop packets with cloned unconfirmed
       conntracks"):
      
      WARNING: at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm..
      
      because local delivery and forwarding will race for confirmation.
      
      Based on a reproducer script from Yi Chen.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ea2306f0
    • Florian Westphal's avatar
      netfilter: nf_queue: drop packets with cloned unconfirmed conntracks · 7d8dc1c7
      Florian Westphal authored
      Conntrack assumes an unconfirmed entry (not yet committed to global hash
      table) has a refcount of 1 and is not visible to other cores.
      
      With multicast forwarding this assumption breaks down because such
      skbs get cloned after being picked up, i.e.  ct->use refcount is > 1.
      
      Likewise, bridge netfilter will clone broad/mutlicast frames and
      all frames in case they need to be flood-forwarded during learning
      phase.
      
      For ip multicast forwarding or plain bridge flood-forward this will
      "work" because packets don't leave softirq and are implicitly
      serialized.
      
      With nfqueue this no longer holds true, the packets get queued
      and can be reinjected in arbitrary ways.
      
      Disable this feature, I see no other solution.
      
      After this patch, nfqueue cannot queue packets except the last
      multicast/broadcast packet.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      7d8dc1c7
    • Donald Hunter's avatar
      netfilter: flowtable: initialise extack before use · e9767137
      Donald Hunter authored
      Fix missing initialisation of extack in flow offload.
      
      Fixes: c29f74e0 ("netfilter: nf_flow_table: hardware offload support")
      Signed-off-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e9767137
    • Donald Hunter's avatar
      netfilter: nfnetlink: Initialise extack before use in ACKs · d1a7b382
      Donald Hunter authored
      Add missing extack initialisation when ACKing BATCH_BEGIN and BATCH_END.
      
      Fixes: bf2ac490 ("netfilter: nfnetlink: Handle ACK flags for batch messages")
      Signed-off-by: default avatarDonald Hunter <donald.hunter@gmail.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      d1a7b382
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · d07b4328
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "s390:
      
         - Fix failure to start guests with kvm.use_gisa=0
      
         - Panic if (un)share fails to maintain security.
      
        ARM:
      
         - Use kvfree() for the kvmalloc'd nested MMUs array
      
         - Set of fixes to address warnings in W=1 builds
      
         - Make KVM depend on assembler support for ARMv8.4
      
         - Fix for vgic-debug interface for VMs without LPIs
      
         - Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest
      
         - Minor code / comment cleanups for configuring PAuth traps
      
         - Take kvm->arch.config_lock to prevent destruction / initialization
           race for a vCPU's CPUIF which may lead to a UAF
      
        x86:
      
         - Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
      
         - Fix smatch issues
      
         - Small cleanups
      
         - Make x2APIC ID 100% readonly
      
         - Fix typo in uapi constant
      
        Generic:
      
         - Use synchronize_srcu_expedited() on irqfd shutdown"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
        KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG
        KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX)
        KVM: eventfd: Use synchronize_srcu_expedited() on shutdown
        KVM: selftests: Add a testcase to verify x2APIC is fully readonly
        KVM: x86: Make x2APIC ID 100% readonly
        KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id())
        KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page()
        KVM: SVM: Fix an error code in sev_gmem_post_populate()
        KVM: SVM: Fix uninitialized variable bug
        KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface
        KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list
        KVM: arm64: Tidying up PAuth code in KVM
        KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI
        KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain
        s390/uv: Panic for set and remove shared access UVC errors
        KVM: s390: fix validity interception issue when gisa is switched off
        docs: KVM: Fix register ID of SPSR_FIQ
        KVM: arm64: vgic: fix unexpected unlock sparse warnings
        KVM: arm64: fix kdoc warnings in W=1 builds
        KVM: arm64: fix override-init warnings in W=1 builds
        ...
      d07b4328
    • Tom Hughes's avatar
      netfilter: allow ipv6 fragments to arrive on different devices · 3cd740b9
      Tom Hughes authored
      Commit 264640fc ("ipv6: distinguish frag queues by device
      for multicast and link-local packets") modified the ipv6 fragment
      reassembly logic to distinguish frag queues by device for multicast
      and link-local packets but in fact only the main reassembly code
      limits the use of the device to those address types and the netfilter
      reassembly code uses the device for all packets.
      
      This means that if fragments of a packet arrive on different interfaces
      then netfilter will fail to reassemble them and the fragments will be
      expired without going any further through the filters.
      
      Fixes: 648700f7 ("inet: frags: use rhashtables for reassembly units")
      Signed-off-by: default avatarTom Hughes <tom@compton.nu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      3cd740b9
    • Amit Shah's avatar
      KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG · 1c0e5881
      Amit Shah authored
      "INVALID" is misspelt in "SEV_RET_INAVLID_CONFIG". Since this is part of
      the UAPI, keep the current definition and add a new one with the fix.
      Fix-suggested-by: default avatarMarc Zyngier <maz@kernel.org>
      Signed-off-by: default avatarAmit Shah <amit.shah@amd.com>
      Message-ID: <20240814083113.21622-1-amit@kernel.org>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      1c0e5881
    • Sean Christopherson's avatar
      KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX) · 66155de9
      Sean Christopherson authored
      Disallow read-only memslots for SEV-{ES,SNP} VM types, as KVM can't
      directly emulate instructions for ES/SNP, and instead the guest must
      explicitly request emulation.  Unless the guest explicitly requests
      emulation without accessing memory, ES/SNP relies on KVM creating an MMIO
      SPTE, with the subsequent #NPF being reflected into the guest as a #VC.
      
      But for read-only memslots, KVM deliberately doesn't create MMIO SPTEs,
      because except for ES/SNP, doing so requires setting reserved bits in the
      SPTE, i.e. the SPTE can't be readable while also generating a #VC on
      writes.  Because KVM never creates MMIO SPTEs and jumps directly to
      emulation, the guest never gets a #VC.  And since KVM simply resumes the
      guest if ES/SNP guests trigger emulation, KVM effectively puts the vCPU
      into an infinite #NPF loop if the vCPU attempts to write read-only memory.
      
      Disallow read-only memory for all VMs with protected state, i.e. for
      upcoming TDX VMs as well as ES/SNP VMs.  For TDX, it's actually possible
      to support read-only memory, as TDX uses EPT Violation #VE to reflect the
      fault into the guest, e.g. KVM could configure read-only SPTEs with RX
      protections and SUPPRESS_VE=0.  But there is no strong use case for
      supporting read-only memslots on TDX, e.g. the main historical usage is
      to emulate option ROMs, but TDX disallows executing from shared memory.
      And if someone comes along with a legitimate, strong use case, the
      restriction can always be lifted for TDX.
      
      Don't bother trying to retroactively apply the restriction to SEV-ES
      VMs that are created as type KVM_X86_DEFAULT_VM.  Read-only memslots can't
      possibly work for SEV-ES, i.e. disallowing such memslots is really just
      means reporting an error to userspace instead of silently hanging vCPUs.
      Trying to deal with the ordering between KVM_SEV_INIT and memslot creation
      isn't worth the marginal benefit it would provide userspace.
      
      Fixes: 26c44aa9 ("KVM: SEV: define VM types for SEV and SEV-ES")
      Fixes: 1dfe571c ("KVM: SEV: Add initial SEV-SNP support")
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Michael Roth <michael.roth@amd.com>
      Cc: Vishal Annapurve <vannapurve@google.com>
      Cc: Ackerly Tng <ackerleytng@google.com>
      Signed-off-by: default avatarSean Christopherson <seanjc@google.com>
      Message-ID: <20240809190319.1710470-2-seanjc@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      66155de9
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 9d590679
      Linus Torvalds authored
      Pull selinux fixes from Paul Moore:
      
       - Fix a xperms counting problem where we adding to the xperms count
         even if we failed to add the xperm.
      
       - Propogate errors from avc_add_xperms_decision() back to the caller so
         that we can trigger the proper cleanup and error handling.
      
       - Revert our use of vma_is_initial_heap() in favor of our older logic
         as vma_is_initial_heap() doesn't correctly handle the no-heap case
         and it is causing issues with the SELinux process/execheap access
         control. While the older SELinux logic may not be perfect, it
         restores the expected user visible behavior.
      
         Hopefully we will be able to resolve the problem with the
         vma_is_initial_heap() macro with the mm folks, but we need to fix
         this in the meantime.
      
      * tag 'selinux-pr-20240814' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: revert our use of vma_is_initial_heap()
        selinux: add the processing of the failure of avc_add_xperms_decision()
        selinux: fix potential counting error in avc_add_xperms_decision()
      9d590679
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · 4ac0f08f
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
       "VFS:
      
         - Fix the name of file lease slab cache. When file leases were split
           out of file locks the name of the file lock slab cache was used for
           the file leases slab cache as well.
      
         - Fix a type in take_fd() helper.
      
         - Fix infinite directory iteration for stable offsets in tmpfs.
      
         - When the icache is pruned all reclaimable inodes are marked with
           I_FREEING and other processes that try to lookup such inodes will
           block.
      
           But some filesystems like ext4 can trigger lookups in their inode
           evict callback causing deadlocks. Ext4 does such lookups if the
           ea_inode feature is used whereby a separate inode may be used to
           store xattrs.
      
           Introduce I_LRU_ISOLATING which pins the inode while its pages are
           reclaimed. This avoids inode deletion during inode_lru_isolate()
           avoiding the deadlock and evict is made to wait until
           I_LRU_ISOLATING is done.
      
        netfs:
      
         - Fault in smaller chunks for non-large folio mappings for
           filesystems that haven't been converted to large folios yet.
      
         - Fix the CONFIG_NETFS_DEBUG config option. The config option was
           renamed a short while ago and that introduced two minor issues.
           First, it depended on CONFIG_NETFS whereas it wants to depend on
           CONFIG_NETFS_SUPPORT. The former doesn't exist, while the latter
           does. Second, the documentation for the config option wasn't fixed
           up.
      
         - Revert the removal of the PG_private_2 writeback flag as ceph is
           using it and fix how that flag is handled in netfs.
      
         - Fix DIO reads on 9p. A program watching a file on a 9p mount
           wouldn't see any changes in the size of the file being exported by
           the server if the file was changed directly in the source
           filesystem. Fix this by attempting to read the full size specified
           when a DIO read is requested.
      
         - Fix a NULL pointer dereference bug due to a data race where a
           cachefiles cookies was retired even though it was still in use.
           Check the cookie's n_accesses counter before discarding it.
      
        nsfs:
      
         - Fix ioctl declaration for NS_GET_MNTNS_ID from _IO() to _IOR() as
           the kernel is writing to userspace.
      
        pidfs:
      
         - Prevent the creation of pidfds for kthreads until we have a
           use-case for it and we know the semantics we want. It also confuses
           userspace why they can get pidfds for kthreads.
      
        squashfs:
      
         - Fix an unitialized value bug reported by KMSAN caused by a
           corrupted symbolic link size read from disk. Check that the
           symbolic link size is not larger than expected"
      
      * tag 'vfs-6.11-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        Squashfs: sanity check symbolic link size
        9p: Fix DIO read through netfs
        vfs: Don't evict inode under the inode lru traversing context
        netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
        netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
        file: fix typo in take_fd() comment
        pidfd: prevent creation of pidfds for kthreads
        netfs: clean up after renaming FSCACHE_DEBUG config
        libfs: fix infinite directory reads for offset dir
        nsfs: fix ioctl declaration
        fs/netfs/fscache_cookie: add missing "n_accesses" check
        filelock: fix name of file_lease slab cache
        netfs: Fault in smaller chunks for non-large folio mappings
      4ac0f08f
    • Linus Torvalds's avatar
      Merge tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 02f8ca3d
      Linus Torvalds authored
      Pull bpf fixes from Alexei Starovoitov:
      
       - Fix bpftrace regression from Kyle Huey.
      
         Tracing bpf prog was called with perf_event input arguments causing
         bpftrace produce garbage output.
      
       - Fix verifier crash in stacksafe() from Yonghong Song.
      
         Daniel Hodges reported verifier crash when playing with sched-ext.
         The stack depth in the known verifier state was larger than stack
         depth in being explored state causing out-of-bounds access.
      
       - Fix update of freplace prog in prog_array from Leon Hwang.
      
         freplace prog type wasn't recognized correctly.
      
      * tag 'bpf-6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
        perf/bpf: Don't call bpf_overflow_handler() for tracing events
        selftests/bpf: Add a test to verify previous stacksafe() fix
        bpf: Fix a kernel verifier crash in stacksafe()
        bpf: Fix updating attached freplace prog in prog_array map
      02f8ca3d
    • Niklas Cassel's avatar
      Revert "ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error" · fa0db8e5
      Niklas Cassel authored
      This reverts commit 28ab9769.
      
      Sense data can be in either fixed format or descriptor format.
      
      SAT-6 revision 1, "10.4.6 Control mode page", defines the D_SENSE bit:
      "The SATL shall support this bit as defined in SPC-5 with the following
      exception: if the D_ SENSE bit is set to zero (i.e., fixed format sense
      data), then the SATL should return fixed format sense data for ATA
      PASS-THROUGH commands."
      
      The libata SATL has always kept D_SENSE set to zero by default. (It is
      however possible to change the value using a MODE SELECT SG_IO command.)
      
      Failed ATA PASS-THROUGH commands correctly respected the D_SENSE bit,
      however, successful ATA PASS-THROUGH commands incorrectly returned the
      sense data in descriptor format (regardless of the D_SENSE bit).
      
      Commit 28ab9769 ("ata: libata-scsi: Honor the D_SENSE bit for
      CK_COND=1 and no error") fixed this bug for successful ATA PASS-THROUGH
      commands.
      
      However, after commit 28ab9769 ("ata: libata-scsi: Honor the D_SENSE
      bit for CK_COND=1 and no error"), there were bug reports that hdparm,
      hddtemp, and udisks were no longer working as expected.
      
      These applications incorrectly assume the returned sense data is in
      descriptor format, without even looking at the RESPONSE CODE field in the
      returned sense data (to see which format the returned sense data is in).
      
      Considering that there will be broken versions of these applications around
      roughly forever, we are stuck with being bug compatible with older kernels.
      
      Cc: stable@vger.kernel.org # 4.19+
      Reported-by: default avatarStephan Eisvogel <eisvogel@seitics.de>
      Reported-by: default avatarChristian Heusel <christian@heusel.eu>
      Closes: https://lore.kernel.org/linux-ide/0bf3f2f0-0fc6-4ba5-a420-c0874ef82d64@heusel.eu/
      Fixes: 28ab9769 ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")
      Reviewed-by: default avatarHannes Reinecke <hare@suse.de>
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Link: https://lore.kernel.org/r/20240813131900.1285842-2-cassel@kernel.orgSigned-off-by: default avatarNiklas Cassel <cassel@kernel.org>
      fa0db8e5