1. 18 Oct, 2024 1 commit
    • Linus Torvalds's avatar
      Merge tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ade8ff3b
      Linus Torvalds authored
      Pull x86 IBPB fixes from Borislav Petkov:
       "This fixes the IBPB implementation of older AMDs (< gen4) that do not
        flush the RSB (Return Address Stack) so you can still do some leaking
        when using a "=ibpb" mitigation for Retbleed or SRSO. Fix it by doing
        the flushing in software on those generations.
      
        IBPB is not the default setting so this is not likely to affect
        anybody in practice"
      
      * tag 'x86_bugs_post_ibpb' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/bugs: Do not use UNTRAIN_RET with IBPB on entry
        x86/bugs: Skip RSB fill at VMEXIT
        x86/entry: Have entry_ibpb() invalidate return predictions
        x86/cpufeatures: Add a IBPB_NO_RET BUG flag
        x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET
      ade8ff3b
  2. 17 Oct, 2024 39 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of... · 4d939780
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "28 hotfixes. 13 are cc:stable. 23 are MM.
      
        It is the usual shower of unrelated singletons - please see the
        individual changelogs for details"
      
      * tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
        maple_tree: add regression test for spanning store bug
        maple_tree: correct tree corruption on spanning store
        mm/mglru: only clear kswapd_failures if reclaimable
        mm/swapfile: skip HugeTLB pages for unuse_vma
        selftests: mm: fix the incorrect usage() info of khugepaged
        MAINTAINERS: add Jann as memory mapping/VMA reviewer
        mm: swap: prevent possible data-race in __try_to_reclaim_swap
        mm: khugepaged: fix the incorrect statistics when collapsing large file folios
        MAINTAINERS: kasan, kcov: add bugzilla links
        mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
        mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
        Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs
        Docs/damon/maintainer-profile: add missing '_' suffixes for external web links
        maple_tree: check for MA_STATE_BULK on setting wr_rebalance
        mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
        mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
        mm: remove unused stub for can_swapin_thp()
        mailmap: add an entry for Andy Chiu
        MAINTAINERS: add memory mapping/VMA co-maintainers
        fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
        ...
      4d939780
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · d4b82e58
      Linus Torvalds authored
      Pull clk fixes from Stephen Boyd:
       "Two clk driver fixes and a unit test fix:
      
         - Terminate the of_device_id table in the Samsung exynosautov920 clk
           driver so that device matching logic doesn't run off the end of the
           array into other memory and break matching for any kernel with this
           driver loaded
      
         - Properly limit the max clk ID in the Rockchip clk driver
      
         - Use clk kunit helpers in the clk tests so that memory isn't leaked
           after the test concludes"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: test: Fix some memory leaks
        clk: rockchip: fix finding of maximum clock ID
        clk: samsung: Fix out-of-bound access of of_match_node()
      d4b82e58
    • Linus Torvalds's avatar
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 6efbea77
      Linus Torvalds authored
      Pull arm64 fixes from Will Deacon:
      
       - Disable software tag-based KASAN when compiling with GCC, as
         functions are incorrectly instrumented leading to a crash early
         during boot
      
       - Fix pkey configuration for kernel threads when POE is enabled
      
       - Fix invalid memory accesses in uprobes when targetting load-literal
         instructions
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        kasan: Disable Software Tag-Based KASAN with GCC
        Documentation/protection-keys: add AArch64 to documentation
        arm64: set POR_EL0 for kernel threads
        arm64: probes: Fix uprobes for big-endian kernels
        arm64: probes: Fix simulate_ldr*_literal()
        arm64: probes: Remove broken LDR (literal) uprobe support
      6efbea77
    • Linus Torvalds's avatar
      Merge tag 'arm-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · c16e5c94
      Linus Torvalds authored
      Pull SoC fixes from Arnd Bergmann:
       "Most of the fixes this time are for platform specific drivers,
        addressing issues found through build testing on freescale, ep93xx,
        starfive, and npcm platforms, as as well as the ffa firmware.
      
        The fixes for the scmi firmware driver address compatibility problems
        found on broadcom machines.
      
        There are only two devicetree fixes, addressing incorrect in
        configuration on broadcom and marvell machines.
      
        The changes to the Documentation and MAINTAINERS files are for
        clarification only"
      
      * tag 'arm-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
        firmware: arm_ffa: Avoid string-fortify warning caused by memcpy()
        firmware: arm_scmi: Queue in scmi layer for mailbox implementation
        firmware: arm_ffa: Avoid string-fortify warning in export_uuid()
        firmware: arm_scmi: Give SMC transport precedence over mailbox
        firmware: arm_scmi: Fix the double free in scmi_debugfs_common_setup()
        Documentation/process: maintainer-soc: clarify submitting patches
        dmaengine: cirrus: check that output may be truncated
        dmaengine: cirrus: ERR_CAST() ioremap error
        MAINTAINERS: use the canonical soc mailing list address and mark it as L:
        ARM: dts: bcm2837-rpi-cm3-io3: Fix HDMI hpd-gpio pin
        arm64: dts: marvell: cn9130-sr-som: fix cp0 mdio pin numbers
        soc: fsl: cpm1: qmc: Fix unused data compilation warning
        soc: fsl: cpm1: qmc: Do not use IS_ERR_VALUE() on error pointers
        reset: starfive: jh71x0: Fix accessing the empty member on JH7110 SoC
        reset: npcm: convert comma to semicolon
      c16e5c94
    • Linus Torvalds's avatar
      Merge tag 'sound-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 5c94bdab
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes, nothing really stands out:
      
         - Usual HD-audio quirks / device-specific fixes
      
         - Kconfig dependency fix for UM
      
         - A series of minor fixes for SoundWire
      
         - Updates of USB-audio LINE6 contact address"
      
      * tag 'sound-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/conexant - Use cached pin control for Node 0x1d on HP EliteOne 1000 G2
        ALSA/hda: intel-sdw-acpi: add support for sdw-manager-list property read
        ALSA/hda: intel-sdw-acpi: simplify sdw-master-count property read
        ALSA/hda: intel-sdw-acpi: fetch fwnode once in sdw_intel_scan_controller()
        ALSA/hda: intel-sdw-acpi: cleanup sdw_intel_scan_controller
        ALSA: hda/tas2781: Add new quirk for Lenovo, ASUS, Dell projects
        ALSA: scarlett2: Add error check after retrieving PEQ filter values
        ALSA: hda/cs8409: Fix possible NULL dereference
        sound: Make CONFIG_SND depend on INDIRECT_IOMEM instead of UML
        ALSA: line6: update contact information
        ALSA: usb-audio: Fix NULL pointer deref in snd_usb_power_domain_set()
        ALSA: hda/conexant - Fix audio routing for HP EliteOne 1000 G2
        ALSA: hda: Sound support for HP Spectre x360 16 inch model 2024
      5c94bdab
    • Linus Torvalds's avatar
      Merge tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 07d6bf63
      Linus Torvalds authored
      Pull networking fixes from Paolo Abeni:
       "Current release - new code bugs:
      
         - eth: mlx5: HWS, don't destroy more bwc queue locks than allocated
      
        Previous releases - regressions:
      
         - ipv4: give an IPv4 dev to blackhole_netdev
      
         - udp: compute L4 checksum as usual when not segmenting the skb
      
         - tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().
      
         - eth: mlx5e: don't call cleanup on profile rollback failure
      
         - eth: microchip: vcap api: fix memory leaks in
           vcap_api_encode_rule_test()
      
         - eth: enetc: disable Tx BD rings after they are empty
      
         - eth: macb: avoid 20s boot delay by skipping MDIO bus registration
           for fixed-link PHY
      
        Previous releases - always broken:
      
         - posix-clock: fix missing timespec64 check in pc_clock_settime()
      
         - genetlink: hold RCU in genlmsg_mcast()
      
         - mptcp: prevent MPC handshake on port-based signal endpoints
      
         - eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame
      
         - eth: stmmac: dwmac-tegra: fix link bring-up sequence
      
         - eth: bcmasp: fix potential memory leak in bcmasp_xmit()
      
        Misc:
      
         - add Andrew Lunn as a co-maintainer of all networking drivers"
      
      * tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
        net/mlx5e: Don't call cleanup on profile rollback failure
        net/mlx5: Unregister notifier on eswitch init failure
        net/mlx5: Fix command bitmask initialization
        net/mlx5: Check for invalid vector index on EQ creation
        net/mlx5: HWS, use lock classes for bwc locks
        net/mlx5: HWS, don't destroy more bwc queue locks than allocated
        net/mlx5: HWS, fixed double free in error flow of definer layout
        net/mlx5: HWS, removed wrong access to a number of rules variable
        mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
        net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
        vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
        net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
        net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
        net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
        net: phy: mdio-bcm-unimac: Add BCM6846 support
        dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
        udp: Compute L4 checksum as usual when not segmenting the skb
        genetlink: hold RCU in genlmsg_mcast()
        net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
        tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
        ...
      07d6bf63
    • Lorenzo Stoakes's avatar
      maple_tree: add regression test for spanning store bug · e993457d
      Lorenzo Stoakes authored
      Add a regression test to assert that, when performing a spanning store
      which consumes the entirety of the rightmost right leaf node does not
      result in maple tree corruption when doing so.
      
      This achieves this by building a test tree of 3 levels and establishing a
      store which ultimately results in a spanned store of this nature.
      
      Link: https://lkml.kernel.org/r/30cdc101a700d16e03ba2f9aa5d83f2efa894168.1728314403.git.lorenzo.stoakes@oracle.comSigned-off-by: default avatarLorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Cc: Bert Karwatzki <spasswolf@web.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      e993457d
    • Lorenzo Stoakes's avatar
      maple_tree: correct tree corruption on spanning store · bea07fd6
      Lorenzo Stoakes authored
      Patch series "maple_tree: correct tree corruption on spanning store", v3.
      
      There has been a nasty yet subtle maple tree corruption bug that appears
      to have been in existence since the inception of the algorithm.
      
      This bug seems far more likely to happen since commit f8d112a4
      ("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point
      at which reports started to be submitted concerning this bug.
      
      We were made definitely aware of the bug thanks to the kind efforts of
      Bert Karwatzki who helped enormously in my being able to track this down
      and identify the cause of it.
      
      The bug arises when an attempt is made to perform a spanning store across
      two leaf nodes, where the right leaf node is the rightmost child of the
      shared parent, AND the store completely consumes the right-mode node.
      
      This results in mas_wr_spanning_store() mitakenly duplicating the new and
      existing entries at the maximum pivot within the range, and thus maple
      tree corruption.
      
      The fix patch corrects this by detecting this scenario and disallowing the
      mistaken duplicate copy.
      
      The fix patch commit message goes into great detail as to how this occurs.
      
      This series also includes a test which reliably reproduces the issue, and
      asserts that the fix works correctly.
      
      Bert has kindly tested the fix and confirmed it resolved his issues.  Also
      Mikhail Gavrilov kindly reported what appears to be precisely the same
      bug, which this fix should also resolve.
      
      
      This patch (of 2):
      
      There has been a subtle bug present in the maple tree implementation from
      its inception.
      
      This arises from how stores are performed - when a store occurs, it will
      overwrite overlapping ranges and adjust the tree as necessary to
      accommodate this.
      
      A range may always ultimately span two leaf nodes.  In this instance we
      walk the two leaf nodes, determine which elements are not overwritten to
      the left and to the right of the start and end of the ranges respectively
      and then rebalance the tree to contain these entries and the newly
      inserted one.
      
      This kind of store is dubbed a 'spanning store' and is implemented by
      mas_wr_spanning_store().
      
      In order to reach this stage, mas_store_gfp() invokes
      mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to
      walk the tree and update the object (mas) to traverse to the location
      where the write should be performed, determining its store type.
      
      When a spanning store is required, this function returns false stopping at
      the parent node which contains the target range, and mas_wr_store_type()
      marks the mas->store_type as wr_spanning_store to denote this fact.
      
      When we go to perform the store in mas_wr_spanning_store(), we first
      determine the elements AFTER the END of the range we wish to store (that
      is, to the right of the entry to be inserted) - we do this by walking to
      the NEXT pivot in the tree (i.e.  r_mas.last + 1), starting at the node we
      have just determined contains the range over which we intend to write.
      
      We then turn our attention to the entries to the left of the entry we are
      inserting, whose state is represented by l_mas, and copy these into a 'big
      node', which is a special node which contains enough slots to contain two
      leaf node's worth of data.
      
      We then copy the entry we wish to store immediately after this - the copy
      and the insertion of the new entry is performed by mas_store_b_node().
      
      After this we copy the elements to the right of the end of the range which
      we are inserting, if we have not exceeded the length of the node (i.e. 
      r_mas.offset <= r_mas.end).
      
      Herein lies the bug - under very specific circumstances, this logic can
      break and corrupt the maple tree.
      
      Consider the following tree:
      
      Height
        0                             Root Node
                                       /      \
                       pivot = 0xffff /        \ pivot = ULONG_MAX
                                     /          \
        1                       A [-----]       ...
                                   /   \
                   pivot = 0x4fff /     \ pivot = 0xffff
                                 /       \
        2 (LEAVES)          B [-----]  [-----] C
                                            ^--- Last pivot 0xffff.
      
      Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note
      that all ranges expressed in maple tree code are inclusive):
      
      1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then
         determines that this is a spanning store across nodes B and C. The mas
         state is set such that the current node from which we traverse further
         is node A.
      
      2. In mas_wr_spanning_store() we try to find elements to the right of pivot
         0xffff by searching for an index of 0x10000:
      
          - mas_wr_walk_index() invokes mas_wr_walk_descend() and
            mas_wr_node_walk() in turn.
      
              - mas_wr_node_walk() loops over entries in node A until EITHER it
                finds an entry whose pivot equals or exceeds 0x10000 OR it
                reaches the final entry.
      
              - Since no entry has a pivot equal to or exceeding 0x10000, pivot
                0xffff is selected, leading to node C.
      
          - mas_wr_walk_traverse() resets the mas state to traverse node C. We
            loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk()
            in turn once again.
      
               - Again, we reach the last entry in node C, which has a pivot of
                 0xffff.
      
      3. We then copy the elements to the left of 0x4000 in node B to the big
         node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry
         too.
      
      4. We determine whether we have any entries to copy from the right of the
         end of the range via - and with r_mas set up at the entry at pivot
         0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at
         pivot 0xffff.
      
      5. BUG! The maple tree is corrupted with a duplicate entry.
      
      This requires a very specific set of circumstances - we must be spanning
      the last element in a leaf node, which is the last element in the parent
      node.
      
      spanning store across two leaf nodes with a range that ends at that shared
      pivot.
      
      A potential solution to this problem would simply be to reset the walk
      each time we traverse r_mas, however given the rarity of this situation it
      seems that would be rather inefficient.
      
      Instead, this patch detects if the right hand node is populated, i.e.  has
      anything we need to copy.
      
      We do so by only copying elements from the right of the entry being
      inserted when the maximum value present exceeds the last, rather than
      basing this on offset position.
      
      The patch also updates some comments and eliminates the unused bool return
      value in mas_wr_walk_index().
      
      The work performed in commit f8d112a4 ("mm/mmap: avoid zeroing vma
      tree in mmap_region()") seems to have made the probability of this event
      much more likely, which is the point at which reports started to be
      submitted concerning this bug.
      
      The motivation for this change arose from Bert Karwatzki's report of
      encountering mm instability after the release of kernel v6.12-rc1 which,
      after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration
      options, was identified as maple tree corruption.
      
      After Bert very generously provided his time and ability to reproduce this
      event consistently, I was able to finally identify that the issue
      discussed in this commit message was occurring for him.
      
      Link: https://lkml.kernel.org/r/cover.1728314402.git.lorenzo.stoakes@oracle.com
      Link: https://lkml.kernel.org/r/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com
      Fixes: 54a611b6 ("Maple Tree: add new data structure")
      Signed-off-by: default avatarLorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Reported-by: default avatarBert Karwatzki <spasswolf@web.de>
      Closes: https://lore.kernel.org/all/20241001023402.3374-1-spasswolf@web.de/Tested-by: default avatarBert Karwatzki <spasswolf@web.de>
      Reported-by: default avatarMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Closes: https://lore.kernel.org/all/CABXGCsOPwuoNOqSMmAvWO2Fz4TEmPnjFj-b7iF+XFRu1h7-+Dg@mail.gmail.com/Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarLiam R. Howlett <Liam.Howlett@Oracle.com>
      Tested-by: default avatarMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      bea07fd6
    • Paolo Abeni's avatar
      Merge branch 'mlx5-misc-fixes-2024-10-15' · cb560795
      Paolo Abeni authored
      Tariq Toukan says:
      
      ====================
      mlx5 misc fixes 2024-10-15
      
      This patchset provides misc bug fixes from the team to the mlx5 core and
      Eth drivers.
      
      Series generated against:
      commit 174714f0 ("selftests: drivers: net: fix name not defined")
      ====================
      
      Link: https://patch.msgid.link/20241015093208.197603-1-tariqt@nvidia.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      cb560795
    • Cosmin Ratiu's avatar
      net/mlx5e: Don't call cleanup on profile rollback failure · 4dbc1d1a
      Cosmin Ratiu authored
      When profile rollback fails in mlx5e_netdev_change_profile, the netdev
      profile var is left set to NULL. Avoid a crash when unloading the driver
      by not calling profile->cleanup in such a case.
      
      This was encountered while testing, with the original trigger that
      the wq rescuer thread creation got interrupted (presumably due to
      Ctrl+C-ing modprobe), which gets converted to ENOMEM (-12) by
      mlx5e_priv_init, the profile rollback also fails for the same reason
      (signal still active) so the profile is left as NULL, leading to a crash
      later in _mlx5e_remove.
      
       [  732.473932] mlx5_core 0000:08:00.1: E-Switch: Unload vfs: mode(OFFLOADS), nvfs(2), necvfs(0), active vports(2)
       [  734.525513] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
       [  734.557372] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
       [  734.559187] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: new profile init failed, -12
       [  734.560153] workqueue: Failed to create a rescuer kthread for wq "mlx5e": -EINTR
       [  734.589378] mlx5_core 0000:08:00.1: mlx5e_netdev_init_profile:6235:(pid 6086): mlx5e_priv_init failed, err=-12
       [  734.591136] mlx5_core 0000:08:00.1 eth3: mlx5e_netdev_change_profile: failed to rollback to orig profile, -12
       [  745.537492] BUG: kernel NULL pointer dereference, address: 0000000000000008
       [  745.538222] #PF: supervisor read access in kernel mode
      <snipped>
       [  745.551290] Call Trace:
       [  745.551590]  <TASK>
       [  745.551866]  ? __die+0x20/0x60
       [  745.552218]  ? page_fault_oops+0x150/0x400
       [  745.555307]  ? exc_page_fault+0x79/0x240
       [  745.555729]  ? asm_exc_page_fault+0x22/0x30
       [  745.556166]  ? mlx5e_remove+0x6b/0xb0 [mlx5_core]
       [  745.556698]  auxiliary_bus_remove+0x18/0x30
       [  745.557134]  device_release_driver_internal+0x1df/0x240
       [  745.557654]  bus_remove_device+0xd7/0x140
       [  745.558075]  device_del+0x15b/0x3c0
       [  745.558456]  mlx5_rescan_drivers_locked.part.0+0xb1/0x2f0 [mlx5_core]
       [  745.559112]  mlx5_unregister_device+0x34/0x50 [mlx5_core]
       [  745.559686]  mlx5_uninit_one+0x46/0xf0 [mlx5_core]
       [  745.560203]  remove_one+0x4e/0xd0 [mlx5_core]
       [  745.560694]  pci_device_remove+0x39/0xa0
       [  745.561112]  device_release_driver_internal+0x1df/0x240
       [  745.561631]  driver_detach+0x47/0x90
       [  745.562022]  bus_remove_driver+0x84/0x100
       [  745.562444]  pci_unregister_driver+0x3b/0x90
       [  745.562890]  mlx5_cleanup+0xc/0x1b [mlx5_core]
       [  745.563415]  __x64_sys_delete_module+0x14d/0x2f0
       [  745.563886]  ? kmem_cache_free+0x1b0/0x460
       [  745.564313]  ? lockdep_hardirqs_on_prepare+0xe2/0x190
       [  745.564825]  do_syscall_64+0x6d/0x140
       [  745.565223]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
       [  745.565725] RIP: 0033:0x7f1579b1288b
      
      Fixes: 3ef14e46 ("net/mlx5e: Separate between netdev objects and mlx5e profiles initialization")
      Signed-off-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Reviewed-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4dbc1d1a
    • Cosmin Ratiu's avatar
      net/mlx5: Unregister notifier on eswitch init failure · 1da9cfd6
      Cosmin Ratiu authored
      It otherwise remains registered and a subsequent attempt at eswitch
      enabling might trigger warnings of the sort:
      
      [  682.589148] ------------[ cut here ]------------
      [  682.590204] notifier callback eswitch_vport_event [mlx5_core] already registered
      [  682.590256] WARNING: CPU: 13 PID: 2660 at kernel/notifier.c:31 notifier_chain_register+0x3e/0x90
      [...snipped]
      [  682.610052] Call Trace:
      [  682.610369]  <TASK>
      [  682.610663]  ? __warn+0x7c/0x110
      [  682.611050]  ? notifier_chain_register+0x3e/0x90
      [  682.611556]  ? report_bug+0x148/0x170
      [  682.611977]  ? handle_bug+0x36/0x70
      [  682.612384]  ? exc_invalid_op+0x13/0x60
      [  682.612817]  ? asm_exc_invalid_op+0x16/0x20
      [  682.613284]  ? notifier_chain_register+0x3e/0x90
      [  682.613789]  atomic_notifier_chain_register+0x25/0x40
      [  682.614322]  mlx5_eswitch_enable_locked+0x1d4/0x3b0 [mlx5_core]
      [  682.614965]  mlx5_eswitch_enable+0xc9/0x100 [mlx5_core]
      [  682.615551]  mlx5_device_enable_sriov+0x25/0x340 [mlx5_core]
      [  682.616170]  mlx5_core_sriov_configure+0x50/0x170 [mlx5_core]
      [  682.616789]  sriov_numvfs_store+0xb0/0x1b0
      [  682.617248]  kernfs_fop_write_iter+0x117/0x1a0
      [  682.617734]  vfs_write+0x231/0x3f0
      [  682.618138]  ksys_write+0x63/0xe0
      [  682.618536]  do_syscall_64+0x4c/0x100
      [  682.618958]  entry_SYSCALL_64_after_hwframe+0x4b/0x53
      
      Fixes: 7624e58a ("net/mlx5: E-switch, register event handler before arming the event")
      Signed-off-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1da9cfd6
    • Shay Drory's avatar
      net/mlx5: Fix command bitmask initialization · d62b1404
      Shay Drory authored
      Command bitmask have a dedicated bit for MANAGE_PAGES command, this bit
      isn't Initialize during command bitmask Initialization, only during
      MANAGE_PAGES.
      
      In addition, mlx5_cmd_trigger_completions() is trying to trigger
      completion for MANAGE_PAGES command as well.
      
      Hence, in case health error occurred before any MANAGE_PAGES command
      have been invoke (for example, during mlx5_enable_hca()),
      mlx5_cmd_trigger_completions() will try to trigger completion for
      MANAGE_PAGES command, which will result in null-ptr-deref error.[1]
      
      Fix it by Initialize command bitmask correctly.
      
      While at it, re-write the code for better understanding.
      
      [1]
      BUG: KASAN: null-ptr-deref in mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
      Write of size 4 at addr 0000000000000214 by task kworker/u96:2/12078
      CPU: 10 PID: 12078 Comm: kworker/u96:2 Not tainted 6.9.0-rc2_for_upstream_debug_2024_04_07_19_01 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Workqueue: mlx5_health0000:08:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
      Call Trace:
       <TASK>
       dump_stack_lvl+0x7e/0xc0
       kasan_report+0xb9/0xf0
       kasan_check_range+0xec/0x190
       mlx5_cmd_trigger_completions+0x1db/0x600 [mlx5_core]
       mlx5_cmd_flush+0x94/0x240 [mlx5_core]
       enter_error_state+0x6c/0xd0 [mlx5_core]
       mlx5_fw_fatal_reporter_err_work+0xf3/0x480 [mlx5_core]
       process_one_work+0x787/0x1490
       ? lockdep_hardirqs_on_prepare+0x400/0x400
       ? pwq_dec_nr_in_flight+0xda0/0xda0
       ? assign_work+0x168/0x240
       worker_thread+0x586/0xd30
       ? rescuer_thread+0xae0/0xae0
       kthread+0x2df/0x3b0
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x2d/0x70
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork_asm+0x11/0x20
       </TASK>
      
      Fixes: 9b98d395 ("net/mlx5: Start health poll at earlier stage of driver load")
      Signed-off-by: default avatarShay Drory <shayd@nvidia.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d62b1404
    • Maher Sanalla's avatar
      net/mlx5: Check for invalid vector index on EQ creation · d4f25be2
      Maher Sanalla authored
      Currently, mlx5 driver does not enforce vector index to be lower than
      the maximum number of supported completion vectors when requesting a
      new completion EQ. Thus, mlx5_comp_eqn_get() fails when trying to
      acquire an IRQ with an improper vector index.
      
      To prevent the case above, enforce that vector index value is
      valid and lower than maximum in mlx5_comp_eqn_get() before handling the
      request.
      
      Fixes: f14c1a14 ("net/mlx5: Allocate completion EQs dynamically")
      Signed-off-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d4f25be2
    • Cosmin Ratiu's avatar
      net/mlx5: HWS, use lock classes for bwc locks · 9addffa3
      Cosmin Ratiu authored
      The HWS BWC API uses one lock per queue and usually acquires one of
      them, except when doing changes which require locking all queues in
      order. Naturally, lockdep isn't too happy about acquiring the same lock
      class multiple times, so inform it that each queue lock is a different
      class to avoid false positives.
      
      Fixes: 2ca62599 ("net/mlx5: HWS, added send engine and context handling")
      Signed-off-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      9addffa3
    • Cosmin Ratiu's avatar
      net/mlx5: HWS, don't destroy more bwc queue locks than allocated · 45bcbd49
      Cosmin Ratiu authored
      hws_send_queues_bwc_locks_destroy destroyed more queue locks than
      allocated, leading to memory corruption (occasionally) and warnings such
      as DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock)) in __mutex_destroy because
      sometimes, the 'mutex' being destroyed was random memory.
      The severity of this problem is proportional to the number of queues
      configured because the code overreaches beyond the end of the
      bwc_send_queue_locks array by 2x its length.
      
      Fix that by using the correct number of bwc queues.
      
      Fixes: 2ca62599 ("net/mlx5: HWS, added send engine and context handling")
      Signed-off-by: default avatarCosmin Ratiu <cratiu@nvidia.com>
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      45bcbd49
    • Yevgeny Kliteynik's avatar
      net/mlx5: HWS, fixed double free in error flow of definer layout · 5aa2184e
      Yevgeny Kliteynik authored
      Fix error flow bug that could lead to double free of a buffer
      during a failure to calculate a suitable definer layout.
      
      Fixes: 74a778b4 ("net/mlx5: HWS, added definers handling")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarItamar Gozlan <igozlan@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5aa2184e
    • Yevgeny Kliteynik's avatar
      net/mlx5: HWS, removed wrong access to a number of rules variable · 65b4eb9f
      Yevgeny Kliteynik authored
      Removed wrong access to the num_of_rules field of the matcher.
      This is a usual u32 variable, but the access was as if it was atomic.
      
      This fixes the following CI warnings:
        mlx5hws_bwc.c:708:17: warning: large atomic operation may incur significant performance penalty;
        the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Watomic-alignment]
      
      Fixes: 510f9f61 ("net/mlx5: HWS, added API and enabled HWS support")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202409291101.6NdtMFVC-lkp@intel.com/Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarItamar Gozlan <igozlan@nvidia.com>
      Signed-off-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      65b4eb9f
    • Matthieu Baerts (NGI0)'s avatar
      mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow · 7decd1f5
      Matthieu Baerts (NGI0) authored
      Syzkaller reported this splat:
      
        ==================================================================
        BUG: KASAN: slab-use-after-free in mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
        Read of size 4 at addr ffff8880569ac858 by task syz.1.2799/14662
      
        CPU: 0 UID: 0 PID: 14662 Comm: syz.1.2799 Not tainted 6.12.0-rc2-syzkaller-00307-g36c25451 #0
        Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
        Call Trace:
         <TASK>
         __dump_stack lib/dump_stack.c:94 [inline]
         dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120
         print_address_description mm/kasan/report.c:377 [inline]
         print_report+0xc3/0x620 mm/kasan/report.c:488
         kasan_report+0xd9/0x110 mm/kasan/report.c:601
         mptcp_pm_nl_rm_addr_or_subflow+0xb44/0xcc0 net/mptcp/pm_netlink.c:881
         mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
         mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
         mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
         genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
         genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
         genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
         netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
         genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
         netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
         netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
         netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
         sock_sendmsg_nosec net/socket.c:729 [inline]
         __sock_sendmsg net/socket.c:744 [inline]
         ____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
         ___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
         __sys_sendmsg+0x117/0x1f0 net/socket.c:2690
         do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
         __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
         do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
         entry_SYSENTER_compat_after_hwframe+0x84/0x8e
        RIP: 0023:0xf7fe4579
        Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
        RSP: 002b:00000000f574556c EFLAGS: 00000296 ORIG_RAX: 0000000000000172
        RAX: ffffffffffffffda RBX: 000000000000000b RCX: 0000000020000140
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
        RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
        R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
         </TASK>
      
        Allocated by task 5387:
         kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
         kasan_save_track+0x14/0x30 mm/kasan/common.c:68
         poison_kmalloc_redzone mm/kasan/common.c:377 [inline]
         __kasan_kmalloc+0xaa/0xb0 mm/kasan/common.c:394
         kmalloc_noprof include/linux/slab.h:878 [inline]
         kzalloc_noprof include/linux/slab.h:1014 [inline]
         subflow_create_ctx+0x87/0x2a0 net/mptcp/subflow.c:1803
         subflow_ulp_init+0xc3/0x4d0 net/mptcp/subflow.c:1956
         __tcp_set_ulp net/ipv4/tcp_ulp.c:146 [inline]
         tcp_set_ulp+0x326/0x7f0 net/ipv4/tcp_ulp.c:167
         mptcp_subflow_create_socket+0x4ae/0x10a0 net/mptcp/subflow.c:1764
         __mptcp_subflow_connect+0x3cc/0x1490 net/mptcp/subflow.c:1592
         mptcp_pm_create_subflow_or_signal_addr+0xbda/0x23a0 net/mptcp/pm_netlink.c:642
         mptcp_pm_nl_fully_established net/mptcp/pm_netlink.c:650 [inline]
         mptcp_pm_nl_work+0x3a1/0x4f0 net/mptcp/pm_netlink.c:943
         mptcp_worker+0x15a/0x1240 net/mptcp/protocol.c:2777
         process_one_work+0x958/0x1b30 kernel/workqueue.c:3229
         process_scheduled_works kernel/workqueue.c:3310 [inline]
         worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
         kthread+0x2c1/0x3a0 kernel/kthread.c:389
         ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
         ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
        Freed by task 113:
         kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
         kasan_save_track+0x14/0x30 mm/kasan/common.c:68
         kasan_save_free_info+0x3b/0x60 mm/kasan/generic.c:579
         poison_slab_object mm/kasan/common.c:247 [inline]
         __kasan_slab_free+0x51/0x70 mm/kasan/common.c:264
         kasan_slab_free include/linux/kasan.h:230 [inline]
         slab_free_hook mm/slub.c:2342 [inline]
         slab_free mm/slub.c:4579 [inline]
         kfree+0x14f/0x4b0 mm/slub.c:4727
         kvfree+0x47/0x50 mm/util.c:701
         kvfree_rcu_list+0xf5/0x2c0 kernel/rcu/tree.c:3423
         kvfree_rcu_drain_ready kernel/rcu/tree.c:3563 [inline]
         kfree_rcu_monitor+0x503/0x8b0 kernel/rcu/tree.c:3632
         kfree_rcu_shrink_scan+0x245/0x3a0 kernel/rcu/tree.c:3966
         do_shrink_slab+0x44f/0x11c0 mm/shrinker.c:435
         shrink_slab+0x32b/0x12a0 mm/shrinker.c:662
         shrink_one+0x47e/0x7b0 mm/vmscan.c:4818
         shrink_many mm/vmscan.c:4879 [inline]
         lru_gen_shrink_node mm/vmscan.c:4957 [inline]
         shrink_node+0x2452/0x39d0 mm/vmscan.c:5937
         kswapd_shrink_node mm/vmscan.c:6765 [inline]
         balance_pgdat+0xc19/0x18f0 mm/vmscan.c:6957
         kswapd+0x5ea/0xbf0 mm/vmscan.c:7226
         kthread+0x2c1/0x3a0 kernel/kthread.c:389
         ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
         ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
        Last potentially related work creation:
         kasan_save_stack+0x33/0x60 mm/kasan/common.c:47
         __kasan_record_aux_stack+0xba/0xd0 mm/kasan/generic.c:541
         kvfree_call_rcu+0x74/0xbe0 kernel/rcu/tree.c:3810
         subflow_ulp_release+0x2ae/0x350 net/mptcp/subflow.c:2009
         tcp_cleanup_ulp+0x7c/0x130 net/ipv4/tcp_ulp.c:124
         tcp_v4_destroy_sock+0x1c5/0x6a0 net/ipv4/tcp_ipv4.c:2541
         inet_csk_destroy_sock+0x1a3/0x440 net/ipv4/inet_connection_sock.c:1293
         tcp_done+0x252/0x350 net/ipv4/tcp.c:4870
         tcp_rcv_state_process+0x379b/0x4f30 net/ipv4/tcp_input.c:6933
         tcp_v4_do_rcv+0x1ad/0xa90 net/ipv4/tcp_ipv4.c:1938
         sk_backlog_rcv include/net/sock.h:1115 [inline]
         __release_sock+0x31b/0x400 net/core/sock.c:3072
         __tcp_close+0x4f3/0xff0 net/ipv4/tcp.c:3142
         __mptcp_close_ssk+0x331/0x14d0 net/mptcp/protocol.c:2489
         mptcp_close_ssk net/mptcp/protocol.c:2543 [inline]
         mptcp_close_ssk+0x150/0x220 net/mptcp/protocol.c:2526
         mptcp_pm_nl_rm_addr_or_subflow+0x2be/0xcc0 net/mptcp/pm_netlink.c:878
         mptcp_pm_nl_rm_subflow_received net/mptcp/pm_netlink.c:914 [inline]
         mptcp_nl_remove_id_zero_address+0x305/0x4a0 net/mptcp/pm_netlink.c:1572
         mptcp_pm_nl_del_addr_doit+0x5c9/0x770 net/mptcp/pm_netlink.c:1603
         genl_family_rcv_msg_doit+0x202/0x2f0 net/netlink/genetlink.c:1115
         genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
         genl_rcv_msg+0x565/0x800 net/netlink/genetlink.c:1210
         netlink_rcv_skb+0x165/0x410 net/netlink/af_netlink.c:2551
         genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
         netlink_unicast_kernel net/netlink/af_netlink.c:1331 [inline]
         netlink_unicast+0x53c/0x7f0 net/netlink/af_netlink.c:1357
         netlink_sendmsg+0x8b8/0xd70 net/netlink/af_netlink.c:1901
         sock_sendmsg_nosec net/socket.c:729 [inline]
         __sock_sendmsg net/socket.c:744 [inline]
         ____sys_sendmsg+0x9ae/0xb40 net/socket.c:2607
         ___sys_sendmsg+0x135/0x1e0 net/socket.c:2661
         __sys_sendmsg+0x117/0x1f0 net/socket.c:2690
         do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline]
         __do_fast_syscall_32+0x73/0x120 arch/x86/entry/common.c:386
         do_fast_syscall_32+0x32/0x80 arch/x86/entry/common.c:411
         entry_SYSENTER_compat_after_hwframe+0x84/0x8e
      
        The buggy address belongs to the object at ffff8880569ac800
         which belongs to the cache kmalloc-512 of size 512
        The buggy address is located 88 bytes inside of
         freed 512-byte region [ffff8880569ac800, ffff8880569aca00)
      
        The buggy address belongs to the physical page:
        page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x569ac
        head: order:2 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
        flags: 0x4fff00000000040(head|node=1|zone=1|lastcpupid=0x7ff)
        page_type: f5(slab)
        raw: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
        raw: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
        head: 04fff00000000040 ffff88801ac42c80 dead000000000100 dead000000000122
        head: 0000000000000000 0000000080100010 00000001f5000000 0000000000000000
        head: 04fff00000000002 ffffea00015a6b01 ffffffffffffffff 0000000000000000
        head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
        page dumped because: kasan: bad access detected
        page_owner tracks the page as allocated
        page last allocated via order 2, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 10238, tgid 10238 (kworker/u32:6), ts 597403252405, free_ts 597177952947
         set_page_owner include/linux/page_owner.h:32 [inline]
         post_alloc_hook+0x2d1/0x350 mm/page_alloc.c:1537
         prep_new_page mm/page_alloc.c:1545 [inline]
         get_page_from_freelist+0x101e/0x3070 mm/page_alloc.c:3457
         __alloc_pages_noprof+0x223/0x25a0 mm/page_alloc.c:4733
         alloc_pages_mpol_noprof+0x2c9/0x610 mm/mempolicy.c:2265
         alloc_slab_page mm/slub.c:2412 [inline]
         allocate_slab mm/slub.c:2578 [inline]
         new_slab+0x2ba/0x3f0 mm/slub.c:2631
         ___slab_alloc+0xd1d/0x16f0 mm/slub.c:3818
         __slab_alloc.constprop.0+0x56/0xb0 mm/slub.c:3908
         __slab_alloc_node mm/slub.c:3961 [inline]
         slab_alloc_node mm/slub.c:4122 [inline]
         __kmalloc_cache_noprof+0x2c5/0x310 mm/slub.c:4290
         kmalloc_noprof include/linux/slab.h:878 [inline]
         kzalloc_noprof include/linux/slab.h:1014 [inline]
         mld_add_delrec net/ipv6/mcast.c:743 [inline]
         igmp6_leave_group net/ipv6/mcast.c:2625 [inline]
         igmp6_group_dropped+0x4ab/0xe40 net/ipv6/mcast.c:723
         __ipv6_dev_mc_dec+0x281/0x360 net/ipv6/mcast.c:979
         addrconf_leave_solict net/ipv6/addrconf.c:2253 [inline]
         __ipv6_ifa_notify+0x3f6/0xc30 net/ipv6/addrconf.c:6283
         addrconf_ifdown.isra.0+0xef9/0x1a20 net/ipv6/addrconf.c:3982
         addrconf_notify+0x220/0x19c0 net/ipv6/addrconf.c:3781
         notifier_call_chain+0xb9/0x410 kernel/notifier.c:93
         call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1996
         call_netdevice_notifiers_extack net/core/dev.c:2034 [inline]
         call_netdevice_notifiers net/core/dev.c:2048 [inline]
         dev_close_many+0x333/0x6a0 net/core/dev.c:1589
        page last free pid 13136 tgid 13136 stack trace:
         reset_page_owner include/linux/page_owner.h:25 [inline]
         free_pages_prepare mm/page_alloc.c:1108 [inline]
         free_unref_page+0x5f4/0xdc0 mm/page_alloc.c:2638
         stack_depot_save_flags+0x2da/0x900 lib/stackdepot.c:666
         kasan_save_stack+0x42/0x60 mm/kasan/common.c:48
         kasan_save_track+0x14/0x30 mm/kasan/common.c:68
         unpoison_slab_object mm/kasan/common.c:319 [inline]
         __kasan_slab_alloc+0x89/0x90 mm/kasan/common.c:345
         kasan_slab_alloc include/linux/kasan.h:247 [inline]
         slab_post_alloc_hook mm/slub.c:4085 [inline]
         slab_alloc_node mm/slub.c:4134 [inline]
         kmem_cache_alloc_noprof+0x121/0x2f0 mm/slub.c:4141
         skb_clone+0x190/0x3f0 net/core/skbuff.c:2084
         do_one_broadcast net/netlink/af_netlink.c:1462 [inline]
         netlink_broadcast_filtered+0xb11/0xef0 net/netlink/af_netlink.c:1540
         netlink_broadcast+0x39/0x50 net/netlink/af_netlink.c:1564
         uevent_net_broadcast_untagged lib/kobject_uevent.c:331 [inline]
         kobject_uevent_net_broadcast lib/kobject_uevent.c:410 [inline]
         kobject_uevent_env+0xacd/0x1670 lib/kobject_uevent.c:608
         device_del+0x623/0x9f0 drivers/base/core.c:3882
         snd_card_disconnect.part.0+0x58a/0x7c0 sound/core/init.c:546
         snd_card_disconnect+0x1f/0x30 sound/core/init.c:495
         snd_usx2y_disconnect+0xe9/0x1f0 sound/usb/usx2y/usbusx2y.c:417
         usb_unbind_interface+0x1e8/0x970 drivers/usb/core/driver.c:461
         device_remove drivers/base/dd.c:569 [inline]
         device_remove+0x122/0x170 drivers/base/dd.c:561
      
      That's because 'subflow' is used just after 'mptcp_close_ssk(subflow)',
      which will initiate the release of its memory. Even if it is very likely
      the release and the re-utilisation will be done later on, it is of
      course better to avoid any issues and read the content of 'subflow'
      before closing it.
      
      Fixes: 1c1f7213 ("mptcp: pm: only decrement add_addr_accepted for MPJ req")
      Cc: stable@vger.kernel.org
      Reported-by: syzbot+3c8b7a8e7df6a2a226ca@syzkaller.appspotmail.com
      Closes: https://lore.kernel.org/670d7337.050a0220.4cbc0.004f.GAE@google.comSigned-off-by: default avatarMatthieu Baerts (NGI0) <matttbe@kernel.org>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://patch.msgid.link/20241015-net-mptcp-uaf-pm-rm-v1-1-c4ee5d987a64@kernel.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      7decd1f5
    • Felix Fietkau's avatar
      net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init · 88806efc
      Felix Fietkau authored
      The loop responsible for allocating up to MTK_FQ_DMA_LENGTH buffers must
      only touch as many descriptors, otherwise it ends up corrupting unrelated
      memory. Fix the loop iteration count accordingly.
      
      Fixes: c57e5581 ("net: ethernet: mtk_eth_soc: handle dma buffer size soc specific")
      Signed-off-by: default avatarFelix Fietkau <nbd@nbd.name>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://patch.msgid.link/20241015081755.31060-1-nbd@nbd.nameSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      88806efc
    • Daniel Borkmann's avatar
      vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame · 4678adf9
      Daniel Borkmann authored
      Andrew and Nikolay reported connectivity issues with Cilium's service
      load-balancing in case of vmxnet3.
      
      If a BPF program for native XDP adds an encapsulation header such as
      IPIP and transmits the packet out the same interface, then in case
      of vmxnet3 a corrupted packet is being sent and subsequently dropped
      on the path.
      
      vmxnet3_xdp_xmit_frame() which is called e.g. via vmxnet3_run_xdp()
      through vmxnet3_xdp_xmit_back() calculates an incorrect DMA address:
      
        page = virt_to_page(xdpf->data);
        tbi->dma_addr = page_pool_get_dma_addr(page) +
                        VMXNET3_XDP_HEADROOM;
        dma_sync_single_for_device(&adapter->pdev->dev,
                                   tbi->dma_addr, buf_size,
                                   DMA_TO_DEVICE);
      
      The above assumes a fixed offset (VMXNET3_XDP_HEADROOM), but the XDP
      BPF program could have moved xdp->data. While the passed buf_size is
      correct (xdpf->len), the dma_addr needs to have a dynamic offset which
      can be calculated as xdpf->data - (void *)xdpf, that is, xdp->data -
      xdp->data_hard_start.
      
      Fixes: 54f00cce ("vmxnet3: Add XDP support.")
      Reported-by: default avatarAndrew Sauber <andrew.sauber@isovalent.com>
      Reported-by: default avatarNikolay Nikolaev <nikolay.nikolaev@isovalent.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: default avatarNikolay Nikolaev <nikolay.nikolaev@isovalent.com>
      Acked-by: default avatarAnton Protopopov <aspsk@isovalent.com>
      Cc: William Tu <witu@nvidia.com>
      Cc: Ronak Doshi <ronak.doshi@broadcom.com>
      Link: https://patch.msgid.link/a0888656d7f09028f9984498cc698bb5364d89fc.1728931137.git.daniel@iogearbox.netSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      4678adf9
    • Wei Xu's avatar
      mm/mglru: only clear kswapd_failures if reclaimable · b130ba4a
      Wei Xu authored
      lru_gen_shrink_node() unconditionally clears kswapd_failures, which can
      prevent kswapd from sleeping and cause 100% kswapd cpu usage even when
      kswapd repeatedly fails to make progress in reclaim.
      
      Only clear kswap_failures in lru_gen_shrink_node() if reclaim makes some
      progress, similar to shrink_node().
      
      I happened to run into this problem in one of my tests recently.  It
      requires a combination of several conditions: The allocator needs to
      allocate a right amount of pages such that it can wake up kswapd
      without itself being OOM killed; there is no memory for kswapd to
      reclaim (My test disables swap and cleans page cache first); no other
      process frees enough memory at the same time.
      
      Link: https://lkml.kernel.org/r/20241014221211.832591-1-weixugc@google.com
      Fixes: e4dde56c ("mm: multi-gen LRU: per-node lru_gen_folio lists")
      Signed-off-by: default avatarWei Xu <weixugc@google.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Brian Geffon <bgeffon@google.com>
      Cc: Jan Alexander Steffens <heftig@archlinux.org>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b130ba4a
    • Liu Shixin's avatar
      mm/swapfile: skip HugeTLB pages for unuse_vma · 7528c4fb
      Liu Shixin authored
      I got a bad pud error and lost a 1GB HugeTLB when calling swapoff.  The
      problem can be reproduced by the following steps:
      
       1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
       2. Swapout the above anonymous memory.
       3. run swapoff and we will get a bad pud error in kernel message:
      
        mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7)
      
      We can tell that pud_clear_bad is called by pud_none_or_clear_bad in
      unuse_pud_range() by ftrace.  And therefore the HugeTLB pages will never
      be freed because we lost it from page table.  We can skip HugeTLB pages
      for unuse_vma to fix it.
      
      Link: https://lkml.kernel.org/r/20241015014521.570237-1-liushixin2@huawei.com
      Fixes: 0fe6e20b ("hugetlb, rmap: add reverse mapping for hugepage")
      Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Acked-by: default avatarMuchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7528c4fb
    • Nanyong Sun's avatar
      selftests: mm: fix the incorrect usage() info of khugepaged · 3e822bed
      Nanyong Sun authored
      The mount option of tmpfs should be huge=advise, not madvise which is not
      supported and may mislead the users.
      
      Link: https://lkml.kernel.org/r/20241015020257.139235-1-sunnanyong@huawei.com
      Fixes: 1b03d0d5 ("selftests/vm: add thp collapse file and tmpfs testing")
      Signed-off-by: default avatarNanyong Sun <sunnanyong@huawei.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3e822bed
    • Jann Horn's avatar
      MAINTAINERS: add Jann as memory mapping/VMA reviewer · cb2bb9c5
      Jann Horn authored
      Add myself as a reviewer for memory mapping / VMA code.  I will probably
      only reply to patches sporadically, but hopefully this will help me keep
      up with changes that look interesting security-wise.
      
      Link: https://lkml.kernel.org/r/20241014-maintainers-mmap-reviewer-v1-1-50dce0514752@google.comSigned-off-by: default avatarJann Horn <jannh@google.com>
      Acked-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Acked-by: default avatarLorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cb2bb9c5
    • Jeongjun Park's avatar
      mm: swap: prevent possible data-race in __try_to_reclaim_swap · 818f916e
      Jeongjun Park authored
      A report [1] was uploaded from syzbot.
      
      In the previous commit 862590ac ("mm: swap: allow cache reclaim to
      skip slot cache"), the __try_to_reclaim_swap() function reads offset and
      folio->entry from folio without folio_lock protection.
      
      In the currently reported KCSAN log, it is assumed that the actual
      data-race will not occur because the calltrace that does WRITE already
      obtains the folio_lock and then writes.
      
      However, the existing __try_to_reclaim_swap() function was already
      implemented to perform reads under folio_lock protection [1], and there is
      a risk of a data-race occurring through a function other than the one
      shown in the KCSAN log.
      
      Therefore, I think it is appropriate to change
      read operations for folio to be performed under folio_lock.
      
      [1]
      
      ==================================================================
      BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap
      
      write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
       __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
       delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
       folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
       free_swap_cache mm/swap_state.c:293 [inline]
       free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
       __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
       tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
       tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
       tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
       zap_pte_range mm/memory.c:1700 [inline]
       zap_pmd_range mm/memory.c:1739 [inline]
       zap_pud_range mm/memory.c:1768 [inline]
       zap_p4d_range mm/memory.c:1789 [inline]
       unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
       unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
       unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
       exit_mmap+0x18a/0x690 mm/mmap.c:1864
       __mmput+0x28/0x1b0 kernel/fork.c:1347
       mmput+0x4c/0x60 kernel/fork.c:1369
       exit_mm+0xe4/0x190 kernel/exit.c:571
       do_exit+0x55e/0x17f0 kernel/exit.c:926
       do_group_exit+0x102/0x150 kernel/exit.c:1088
       get_signal+0xf2a/0x1070 kernel/signal.c:2917
       arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
       exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
       exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
       __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
       syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
       do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
       __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
       free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
       zap_pte_range mm/memory.c:1656 [inline]
       zap_pmd_range mm/memory.c:1739 [inline]
       zap_pud_range mm/memory.c:1768 [inline]
       zap_p4d_range mm/memory.c:1789 [inline]
       unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
       unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
       unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
       exit_mmap+0x18a/0x690 mm/mmap.c:1864
       __mmput+0x28/0x1b0 kernel/fork.c:1347
       mmput+0x4c/0x60 kernel/fork.c:1369
       exit_mm+0xe4/0x190 kernel/exit.c:571
       do_exit+0x55e/0x17f0 kernel/exit.c:926
       __do_sys_exit kernel/exit.c:1055 [inline]
       __se_sys_exit kernel/exit.c:1053 [inline]
       __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
       x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      value changed: 0x0000000000000242 -> 0x0000000000000000
      
      Link: https://lkml.kernel.org/r/20241007070623.23340-1-aha310510@gmail.com
      Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
      Fixes: 862590ac ("mm: swap: allow cache reclaim to skip slot cache")
      Signed-off-by: default avatarJeongjun Park <aha310510@gmail.com>
      Acked-by: default avatarChris Li <chrisl@kernel.org>
      Reviewed-by: default avatarKairui Song <kasong@tencent.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      818f916e
    • Baolin Wang's avatar
      mm: khugepaged: fix the incorrect statistics when collapsing large file folios · d60fcaf0
      Baolin Wang authored
      Khugepaged already supports collapsing file large folios (including shmem
      mTHP) by commit 7de856ff ("mm: khugepaged: support shmem mTHP
      collapse"), and the control parameters in khugepaged:
      'khugepaged_max_ptes_swap' and 'khugepaged_max_ptes_none', still compare
      based on PTE granularity to determine whether a file collapse is needed. 
      However, the statistics for 'present' and 'swap' in
      hpage_collapse_scan_file() do not take into account the large folios,
      which may lead to incorrect judgments regarding the
      khugepaged_max_ptes_swap/none parameters, resulting in unnecessary file
      collapses.
      
      To fix this issue, take into account the large folios' statistics for
      'present' and 'swap' variables in the hpage_collapse_scan_file().
      
      Link: https://lkml.kernel.org/r/c76305d96d12d030a1a346b50503d148364246d2.1728901391.git.baolin.wang@linux.alibaba.com
      Fixes: 7de856ff ("mm: khugepaged: support shmem mTHP collapse")
      Signed-off-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarBarry Song <baohua@kernel.org>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d60fcaf0
    • Andrey Konovalov's avatar
      MAINTAINERS: kasan, kcov: add bugzilla links · 22ff9b0f
      Andrey Konovalov authored
      Add links to the Bugzilla component that's used to track KASAN and KCOV
      issues.
      
      Link: https://lkml.kernel.org/r/20241012225524.117871-1-andrey.konovalov@linux.devSigned-off-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Acked-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Marco Elver <elver@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      22ff9b0f
    • David Hildenbrand's avatar
      mm: don't install PMD mappings when THPs are disabled by the hw/process/vma · 2b0f9223
      David Hildenbrand authored
      We (or rather, readahead logic :) ) might be allocating a THP in the
      pagecache and then try mapping it into a process that explicitly disabled
      THP: we might end up installing PMD mappings.
      
      This is a problem for s390x KVM, which explicitly remaps all PMD-mapped
      THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before
      starting the VM.
      
      For example, starting a VM backed on a file system with large folios
      supported makes the VM crash when the VM tries accessing such a mapping
      using KVM.
      
      Is it also a problem when the HW disabled THP using
      TRANSPARENT_HUGEPAGE_UNSUPPORTED?  At least on x86 this would be the case
      without X86_FEATURE_PSE.
      
      In the future, we might be able to do better on s390x and only disallow
      PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED
      really wants.  For now, fix it by essentially performing the same check as
      would be done in __thp_vma_allowable_orders() or in shmem code, where this
      works as expected, and disallow PMD mappings, making us fallback to PTE
      mappings.
      
      Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com
      Fixes: 793917d9 ("mm/readahead: Add large folio readahead")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarLeo Fu <bfu@redhat.com>
      Tested-by: default avatarThomas Huth <thuth@redhat.com>
      Cc: Thomas Huth <thuth@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Janosch Frank <frankja@linux.ibm.com>
      Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2b0f9223
    • Kefeng Wang's avatar
      mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw() · 963756aa
      Kefeng Wang authored
      Patch series "mm: don't install PMD mappings when THPs are disabled by the
      hw/process/vma".
      
      During testing, it was found that we can get PMD mappings in processes
      where THP (and more precisely, PMD mappings) are supposed to be disabled. 
      While it works as expected for anon+shmem, the pagecache is the
      problematic bit.
      
      For s390 KVM this currently means that a VM backed by a file located on
      filesystem with large folio support can crash when KVM tries accessing the
      problematic page, because the readahead logic might decide to use a
      PMD-sized THP and faulting it into the page tables will install a PMD
      mapping, something that s390 KVM cannot tolerate.
      
      This might also be a problem with HW that does not support PMD mappings,
      but I did not try reproducing it.
      
      Fix it by respecting the ways to disable THPs when deciding whether we can
      install a PMD mapping.  khugepaged should already be taking care of not
      collapsing if THPs are effectively disabled for the hw/process/vma.
      
      
      This patch (of 2):
      
      Add vma_thp_disabled() and thp_disabled_by_hw() helpers to be shared by
      shmem_allowable_huge_orders() and __thp_vma_allowable_orders().
      
      [david@redhat.com: rename to vma_thp_disabled(), split out thp_disabled_by_hw() ]
      Link: https://lkml.kernel.org/r/20241011102445.934409-2-david@redhat.com
      Fixes: 793917d9 ("mm/readahead: Add large folio readahead")
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reported-by: default avatarLeo Fu <bfu@redhat.com>
      Tested-by: default avatarThomas Huth <thuth@redhat.com>
      Reviewed-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Cc: Boqiao Fu <bfu@redhat.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Janosch Frank <frankja@linux.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      963756aa
    • SeongJae Park's avatar
      Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs · f4050cca
      SeongJae Park authored
      DAMON GitHub repos have moved from awslabs GitHub org to damonitor org[1].
      Following the change, URLs on documents are also updated[2].  However,
      commit 2e9b3d6e ("Docs/damon/maintainer-profile: add links in place"),
      which was added just after the update, was using the deprecated GitHub
      URLs.  Update those to use damonitor GitHub URLs instead.
      
      [1] https://lore.kernel.org/20240813232158.83903-1-sj@kernel.org
      [2] https://lore.kernel.org/20240826015741.80707-2-sj@kernel.org
      
      Link: https://lkml.kernel.org/r/20241011170154.70651-3-sj@kernel.org
      Fixes: 2e9b3d6e ("Docs/damon/maintainer-profile: add links in place")
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f4050cca
    • SeongJae Park's avatar
      Docs/damon/maintainer-profile: add missing '_' suffixes for external web links · 46e10f64
      SeongJae Park authored
      Patch series "Docs/damon/maintainer-profile: a couple of minor hotfixes".
      
      DAMON maintainer-profile.rst file patches[1] that were merged into the
      v6.12-rc1 have a couple of minor mistakes.  Fix those.
      
      [1] https://lore.kernel.org/20240826015741.80707-1-sj@kernel.org
      
      
      This patch (of 2):
      
      Links to external web pages on DAMON's maintainer-profile.rst are missing
      '_' suffixes.  As a result, rendered document is having only verbose URLs
      that cannot be clicked.  Fix those.
      
      Also, update the link texts for git trees to contain the names of the
      trees, for better readability and avoiding below Sphinx warning.
      
          maintainer-profile.rst:4: WARNING: Duplicate explicit target name: "tree".
      
      Link: https://lkml.kernel.org/r/20241011170154.70651-1-sj@kernel.org
      Link: https://lkml.kernel.org/r/20241011170154.70651-2-sj@kernel.org
      Fixes: 2e9b3d6e ("Docs/damon/maintainer-profile: add links in place")
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      46e10f64
    • Sidhartha Kumar's avatar
      maple_tree: check for MA_STATE_BULK on setting wr_rebalance · a6e0ceb7
      Sidhartha Kumar authored
      It is possible for a bulk operation (MA_STATE_BULK is set) to enter the
      new_end < mt_min_slots[type] case and set wr_rebalance as a store type. 
      This is incorrect as bulk stores do not rebalance per write, but rather
      after the all of the writes are done through the mas_bulk_rebalance()
      path.  Therefore, add a check to make sure MA_STATE_BULK is not set before
      we return wr_rebalance as the store type.
      
      Also add a test to make sure wr_rebalance is never the store type when
      doing bulk operations via mas_expected_entries()
      
      This is a hotfix for this rc however it has no userspace effects as there
      are no users of the bulk insertion mode.
      
      Link: https://lkml.kernel.org/r/20241011214451.7286-1-sidhartha.kumar@oracle.com
      Fixes: 5d659bbb ("maple_tree: introduce mas_wr_store_type()")
      Suggested-by: default avatarLiam Howlett <liam.howlett@oracle.com>
      Signed-off-by: default avatarSidhartha <sidhartha.kumar@oracle.com>
      Reviewed-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Reviewed-by: default avatarLiam Howlett <liam.howlett@oracle.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a6e0ceb7
    • Yang Shi's avatar
      mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point · 37f0b47c
      Yang Shi authored
      The "addr" and "is_shmem" arguments have different order in TP_PROTO and
      TP_ARGS.  This resulted in the incorrect trace result:
      
      text-hugepage-644429 [276] 392092.878683: mm_khugepaged_collapse_file:
      mm=0xffff20025d52c440, hpage_pfn=0x200678c00, index=512, addr=1, is_shmem=0,
      filename=text-hugepage, nr=512, result=failed
      
      The value of "addr" is wrong because it was treated as bool value, the
      type of is_shmem.
      
      Fix the order in TP_PROTO to keep "addr" is before "is_shmem" since the
      original patch review suggested this order to achieve best packing.
      
      And use "lx" for "addr" instead of "ld" in TP_printk because address is
      typically shown in hex.
      
      After the fix, the trace result looks correct:
      
      text-hugepage-7291  [004]   128.627251: mm_khugepaged_collapse_file:
      mm=0xffff0001328f9500, hpage_pfn=0x20016ea00, index=512, addr=0x400000,
      is_shmem=0, filename=text-hugepage, nr=512, result=failed
      
      Link: https://lkml.kernel.org/r/20241012011702.1084846-1-yang@os.amperecomputing.com
      Fixes: 4c9473e8 ("mm/khugepaged: add tracepoint to collapse_file()")
      Signed-off-by: default avatarYang Shi <yang@os.amperecomputing.com>
      Cc: Gautam Menghani <gautammenghani201@gmail.com>
      Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>    [6.2+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      37f0b47c
    • Jinjie Ruan's avatar
      mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets() · 2d6a1c83
      Jinjie Ruan authored
      The sysfs_target->regions allocated in damon_sysfs_regions_alloc() is not
      freed in damon_sysfs_test_add_targets(), which cause the following memory
      leak, free it to fix it.
      
      	unreferenced object 0xffffff80c2a8db80 (size 96):
      	  comm "kunit_try_catch", pid 187, jiffies 4294894363
      	  hex dump (first 32 bytes):
      	    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      	    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      	  backtrace (crc 0):
      	    [<0000000001e3714d>] kmemleak_alloc+0x34/0x40
      	    [<000000008e6835c1>] __kmalloc_cache_noprof+0x26c/0x2f4
      	    [<000000001286d9f8>] damon_sysfs_test_add_targets+0x1cc/0x738
      	    [<0000000032ef8f77>] kunit_try_run_case+0x13c/0x3ac
      	    [<00000000f3edea23>] kunit_generic_run_threadfn_adapter+0x80/0xec
      	    [<00000000adf936cf>] kthread+0x2e8/0x374
      	    [<0000000041bb1628>] ret_from_fork+0x10/0x20
      
      Link: https://lkml.kernel.org/r/20241010125323.3127187-1-ruanjinjie@huawei.com
      Fixes: b8ee5575 ("mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()")
      Signed-off-by: default avatarJinjie Ruan <ruanjinjie@huawei.com>
      Reviewed-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2d6a1c83
    • Andy Shevchenko's avatar
      mm: remove unused stub for can_swapin_thp() · a5e8eb25
      Andy Shevchenko authored
      When can_swapin_thp() is unused, it prevents kernel builds with clang,
      `make W=1` and CONFIG_WERROR=y:
      
      mm/memory.c:4184:20: error: unused function 'can_swapin_thp' [-Werror,-Wunused-function]
      
      Fix this by removing the unused stub.
      
      See also commit 6863f564 ("kbuild: allow Clang to find unused static
      inline functions for W=1 build").
      
      Link: https://lkml.kernel.org/r/20241008191329.2332346-1-andriy.shevchenko@linux.intel.com
      Fixes: 242d12c9 ("mm: support large folios swap-in for sync io devices")
      Signed-off-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: default avatarBarry Song <baohua@kernel.org>
      Cc: Bill Wendling <morbo@google.com>
      Cc: Chuanhua Han <hanchuanhua@oppo.com>
      Cc: Justin Stitt <justinstitt@google.com>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a5e8eb25
    • Andy Chiu's avatar
      mailmap: add an entry for Andy Chiu · 3f4e74cb
      Andy Chiu authored
      Map my outdated addresses within mailmap.
      
      Link: https://lkml.kernel.org/r/20241009144934.43027-1-andybnac@gmail.comSigned-off-by: default avatarAndy Chiu <andybnac@gmail.com>
      Cc: Greentime Hu <greentime.hu@sifive.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Leon Chien <leonchien@synology.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3f4e74cb
    • Lorenzo Stoakes's avatar
      MAINTAINERS: add memory mapping/VMA co-maintainers · f8dc524e
      Lorenzo Stoakes authored
      Add myself and Liam as co-maintainers of the memory mapping and VMA code
      alongside Andrew as we are heavily involved in its implementation and
      maintenance.
      
      Link: https://lkml.kernel.org/r/20241009201032.6130-1-lorenzo.stoakes@oracle.comSigned-off-by: default avatarLorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarLiam R. Howlett <Liam.Howlett@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f8dc524e
    • Brahmajit Das's avatar
      fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization · 5778ace0
      Brahmajit Das authored
      show show_smap_vma_flags() has been a using misspelled initializer in
      mnemonics[] - it needed to initialize 2 element array of char and it used
      NUL-padded 2 character string literals (i.e.  3-element initializer).
      
      This has been spotted by gcc-15[*]; prior to that gcc quietly dropped the
      3rd eleemnt of initializers.  To fix this we are increasing the size of
      mnemonics[] (from mnemonics[BITS_PER_LONG][2] to
      mnemonics[BITS_PER_LONG][3]) to accomodate the NUL-padded string literals.
      
      This also helps us in simplyfying the logic for printing of the flags as
      instead of printing each character from the mnemonics[], we can just print
      the mnemonics[] using seq_printf.
      
      [*]: fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
        917 |                 [0 ... (BITS_PER_LONG-1)] = "??",
            |                                                 ^~~~
      fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
      fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
      fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
      fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
      fs/proc/task_mmu.c:917:49: error: initializer-string for array of `char' is too long [-Werror=unterminate d-string-initialization]
      ...
      
      
      Stephen pointed out:
      
      : The C standard explicitly allows for a string initializer to be too long
      : due to the NUL byte at the end ...  so this warning may be overzealous.
      
      but let's make the warning go away anwyay.
      
      Link: https://lkml.kernel.org/r/20241005063700.2241027-1-brahmajit.xyz@gmail.com
      Link: https://lkml.kernel.org/r/20241003093040.47c08382@canb.auug.org.auSigned-off-by: default avatarBrahmajit Das <brahmajit.xyz@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5778ace0
    • Florian Westphal's avatar
      lib: alloc_tag_module_unload must wait for pending kfree_rcu calls · dc783ba4
      Florian Westphal authored
      Ben Greear reports following splat:
       ------------[ cut here ]------------
       net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload
       WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0
       Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat
      ...
       Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020
       RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0
        codetag_unload_module+0x19b/0x2a0
        ? codetag_load_module+0x80/0x80
      
      nf_nat module exit calls kfree_rcu on those addresses, but the free
      operation is likely still pending by the time alloc_tag checks for leaks.
      
      Wait for outstanding kfree_rcu operations to complete before checking
      resolves this warning.
      
      Reproducer:
      unshare -n iptables-nft -t nat -A PREROUTING -p tcp
      grep nf_nat /proc/allocinfo # will list 4 allocations
      rmmod nft_chain_nat
      rmmod nf_nat                # will WARN.
      
      [akpm@linux-foundation.org: add comment]
      Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de
      Fixes: a4735739 ("lib: code tagging module support")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Reported-by: default avatarBen Greear <greearb@candelatech.com>
      Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candelatech.com/
      Cc: Uladzislau Rezki <urezki@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dc783ba4