1. 25 May, 2024 12 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of... · 9b62e02e
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull misc fixes from Andrew Morton:
       "16 hotfixes, 11 of which are cc:stable.
      
        A few nilfs2 fixes, the remainder are for MM: a couple of selftests
        fixes, various singletons fixing various issues in various parts"
      
      * tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        mm/ksm: fix possible UAF of stable_node
        mm/memory-failure: fix handling of dissolved but not taken off from buddy pages
        mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again
        nilfs2: fix potential hang in nilfs_detach_log_writer()
        nilfs2: fix unexpected freezing of nilfs_segctor_sync()
        nilfs2: fix use-after-free of timer for log writer thread
        selftests/mm: fix build warnings on ppc64
        arm64: patching: fix handling of execmem addresses
        selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
        selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
        selftests/mm: compaction_test: fix bogus test success on Aarch64
        mailmap: update email address for Satya Priya
        mm/huge_memory: don't unpoison huge_zero_folio
        kasan, fortify: properly rename memintrinsics
        lib: add version into /proc/allocinfo output
        mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL
      9b62e02e
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a0db36ed
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
      
       - Fix x86 IRQ vector leak caused by a CPU offlining race
      
       - Fix build failure in the riscv-imsic irqchip driver
         caused by an API-change semantic conflict
      
       - Fix use-after-free in irq_find_at_or_after()
      
      * tag 'irq-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/irqdesc: Prevent use-after-free in irq_find_at_or_after()
        genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline
        irqchip/riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
      a0db36ed
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3a390f24
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
      
       - Fix regressions of the new x86 CPU VFM (vendor/family/model)
         enumeration/matching code
      
       - Fix crash kernel detection on buggy firmware with
         non-compliant ACPI MADT tables
      
       - Address Kconfig warning
      
      * tag 'x86-urgent-2024-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
        crypto: x86/aes-xts - switch to new Intel CPU model defines
        x86/topology: Handle bogus ACPI tables correctly
        x86/kconfig: Select ARCH_WANT_FRAME_POINTERS again when UNWINDER_FRAME_POINTER=y
      3a390f24
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi · 56676c4c
      Linus Torvalds authored
      Pull ipmi updates from Corey Minyard:
       "Mostly updates for deprecated interfaces, platform.remove and
        converting from a tasklet to a BH workqueue.
      
        Also use HAS_IOPORT for disabling inb()/outb()"
      
      * tag 'for-linus-6.10-1' of https://github.com/cminyard/linux-ipmi:
        ipmi: kcs_bmc_npcm7xx: Convert to platform remove callback returning void
        ipmi: kcs_bmc_aspeed: Convert to platform remove callback returning void
        ipmi: ipmi_ssif: Convert to platform remove callback returning void
        ipmi: ipmi_si_platform: Convert to platform remove callback returning void
        ipmi: ipmi_powernv: Convert to platform remove callback returning void
        ipmi: bt-bmc: Convert to platform remove callback returning void
        char: ipmi: handle HAS_IOPORT dependencies
        ipmi: Convert from tasklet to BH workqueue
      56676c4c
    • Linus Torvalds's avatar
      Merge tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client · 74eca356
      Linus Torvalds authored
      Pull ceph updates from Ilya Dryomov:
       "A series from Xiubo that adds support for additional access checks
        based on MDS auth caps which were recently made available to clients.
      
        This is needed to prevent scenarios where the MDS quietly discards
        updates that a UID-restricted client previously (wrongfully) acked to
        the user.
      
        Other than that, just a documentation fixup"
      
      * tag 'ceph-for-6.10-rc1' of https://github.com/ceph/ceph-client:
        doc: ceph: update userspace command to get CephFS metadata
        ceph: add CEPHFS_FEATURE_MDS_AUTH_CAPS_CHECK feature bit
        ceph: check the cephx mds auth access for async dirop
        ceph: check the cephx mds auth access for open
        ceph: check the cephx mds auth access for setattr
        ceph: add ceph_mds_check_access() helper
        ceph: save cap_auths in MDS client when session is opened
      74eca356
    • Linus Torvalds's avatar
      Merge tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3 · 89b61ca4
      Linus Torvalds authored
      Pull ntfs3 updates from Konstantin Komarov:
       "Fixes:
         - reusing of the file index (could cause the file to be trimmed)
         - infinite dir enumeration
         - taking DOS names into account during link counting
         - le32_to_cpu conversion, 32 bit overflow, NULL check
         - some code was refactored
      
        Changes:
         - removed max link count info display during driver init
      
        Remove:
         - atomic_open has been removed for lack of use"
      
      * tag 'ntfs3_for_6.10' of https://github.com/Paragon-Software-Group/linux-ntfs3:
        fs/ntfs3: Break dir enumeration if directory contents error
        fs/ntfs3: Fix case when index is reused during tree transformation
        fs/ntfs3: Mark volume as dirty if xattr is broken
        fs/ntfs3: Always make file nonresident on fallocate call
        fs/ntfs3: Redesign ntfs_create_inode to return error code instead of inode
        fs/ntfs3: Use variable length array instead of fixed size
        fs/ntfs3: Use 64 bit variable to avoid 32 bit overflow
        fs/ntfs3: Check 'folio' pointer for NULL
        fs/ntfs3: Missed le32_to_cpu conversion
        fs/ntfs3: Remove max link count info display during driver init
        fs/ntfs3: Taking DOS names into account during link counting
        fs/ntfs3: remove atomic_open
        fs/ntfs3: use kcalloc() instead of kzalloc()
      89b61ca4
    • Linus Torvalds's avatar
      Merge tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd · 6c8b1a2d
      Linus Torvalds authored
      Pull smb server fixes from Steve French:
       "Two ksmbd server fixes, both for stable"
      
      * tag '6.10-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
        ksmbd: ignore trailing slashes in share paths
        ksmbd: avoid to send duplicate oplock break notifications
      6c8b1a2d
    • Linus Torvalds's avatar
      Merge tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 54f71b03
      Linus Torvalds authored
      Pull RTC updates from Alexandre Belloni:
       "There is one new driver and then most of the changes are the device
        tree bindings conversions to yaml.
      
        New driver:
         - Epson RX8111
      
        Drivers:
         - Many Device Tree bindings conversions to dtschema
         - pcf8563: wakeup-source support"
      
      * tag 'rtc-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        pcf8563: add wakeup-source support
        rtc: rx8111: handle VLOW flag
        rtc: rx8111: demote warnings to debug level
        rtc: rx6110: Constify struct regmap_config
        dt-bindings: rtc: convert trivial devices into dtschema
        dt-bindings: rtc: stmp3xxx-rtc: convert to dtschema
        dt-bindings: rtc: pxa-rtc: convert to dtschema
        rtc: Add driver for Epson RX8111
        dt-bindings: rtc: Add Epson RX8111
        rtc: mcp795: drop unneeded MODULE_ALIAS
        rtc: nuvoton: Modify part number value
        rtc: test: Split rtc unit test into slow and normal speed test
        dt-bindings: rtc: nxp,lpc1788-rtc: convert to dtschema
        dt-bindings: rtc: digicolor-rtc: move to trivial-rtc
        dt-bindings: rtc: alphascale,asm9260-rtc: convert to dtschema
        dt-bindings: rtc: armada-380-rtc: convert to dtschema
        rtc: cros-ec: provide ID table for avoiding fallback match
      54f71b03
    • Linus Torvalds's avatar
      Merge tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux · 4286e1fc
      Linus Torvalds authored
      Pull i3c updates from Alexandre Belloni:
       "Runtime PM (power management) is improved and hot-join support has
        been added to the dw controller driver.
      
        Core:
         - Allow device driver to trigger controller runtime PM
      
        Drivers:
         - dw: hot-join support
         - svc: better IBI handling"
      
      * tag 'i3c/for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/i3c/linux:
        i3c: dw: Add hot-join support.
        i3c: master: Enable runtime PM for master controller
        i3c: master: svc: fix invalidate IBI type and miss call client IBI handler
        i3c: master: svc: change ENXIO to EAGAIN when IBI occurs during start frame
        i3c: Add comment for -EAGAIN in i3c_device_do_priv_xfers()
      4286e1fc
    • Linus Torvalds's avatar
      Merge tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs · 6951abe8
      Linus Torvalds authored
      Pull jffs2 updates from Richard Weinberger:
      
       - Fix illegal memory access in jffs2_free_inode()
      
       - Kernel-doc fixes
      
       - print symbolic error names
      
      * tag 'jffs2-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
        jffs2: Fix potential illegal address access in jffs2_free_inode
        jffs2: Simplify the allocation of slab caches
        jffs2: nodemgmt: fix kernel-doc comments
        jffs2: print symbolic error name instead of error code
      6951abe8
    • Linus Torvalds's avatar
      Merge tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux · 2313022e
      Linus Torvalds authored
      Pull UML updates from Richard Weinberger:
      
       - Fixes for -Wmissing-prototypes warnings and further cleanup
      
       - Remove callback returning void from rtc and virtio drivers
      
       - Fix bash location
      
      * tag 'uml-for-linus-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (26 commits)
        um: virtio_uml: Convert to platform remove callback returning void
        um: rtc: Convert to platform remove callback returning void
        um: Remove unused do_get_thread_area function
        um: Fix -Wmissing-prototypes warnings for __vdso_*
        um: Add an internal header shared among the user code
        um: Fix the declaration of kasan_map_memory
        um: Fix the -Wmissing-prototypes warning for get_thread_reg
        um: Fix the -Wmissing-prototypes warning for __switch_mm
        um: Fix -Wmissing-prototypes warnings for (rt_)sigreturn
        um: Stop tracking host PID in cpu_tasks
        um: process: remove unused 'n' variable
        um: vector: remove unused len variable/calculation
        um: vector: fix bpfflash parameter evaluation
        um: slirp: remove set but unused variable 'pid'
        um: signal: move pid variable where needed
        um: Makefile: use bash from the environment
        um: Add winch to winch_handlers before registering winch IRQ
        um: Fix -Wmissing-prototypes warnings for __warp_* and foo
        um: Fix -Wmissing-prototypes warnings for text_poke*
        um: Move declarations to proper headers
        ...
      2313022e
    • Linus Torvalds's avatar
      Merge tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel · 56fb6f92
      Linus Torvalds authored
      Pull drm fixes from Dave Airlie:
       "Some fixes for the end of the merge window, mostly amdgpu and panthor,
        with one nouveau uAPI change that fixes a bad decision we made a few
        months back.
      
        nouveau:
         - fix bo metadata uAPI for vm bind
      
        panthor:
         - Fixes for panthor's heap logical block.
         - Reset on unrecoverable fault
         - Fix VM references.
         - Reset fix.
      
        xlnx:
         - xlnx compile and doc fixes.
      
        amdgpu:
         - Handle vbios table integrated info v2.3
      
        amdkfd:
         - Handle duplicate BOs in reserve_bo_and_cond_vms
         - Handle memory limitations on small APUs
      
        dp/mst:
         - MST null deref fix.
      
        bridge:
         - Don't let next bridge create connector in adv7511 to make probe
           work"
      
      * tag 'drm-next-2024-05-25' of https://gitlab.freedesktop.org/drm/kernel:
        drm/amdgpu/atomfirmware: add intergrated info v2.3 table
        drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
        drm/amdkfd: Let VRAM allocations go to GTT domain on small APUs
        drm/amdkfd: handle duplicate BOs in reserve_bo_and_cond_vms
        drm/bridge: adv7511: Attach next bridge without creating connector
        drm/buddy: Fix the warn on's during force merge
        drm/nouveau: use tile_mode and pte_kind for VM_BIND bo allocations
        drm/panthor: Call panthor_sched_post_reset() even if the reset failed
        drm/panthor: Reset the FW VM to NULL on unplug
        drm/panthor: Keep a ref to the VM at the panthor_kernel_bo level
        drm/panthor: Force an immediate reset on unrecoverable faults
        drm/panthor: Document drm_panthor_tiler_heap_destroy::handle validity constraints
        drm/panthor: Fix an off-by-one in the heap context retrieval logic
        drm/panthor: Relax the constraints on the tiler chunk size
        drm/panthor: Make sure the tiler initial/max chunks are consistent
        drm/panthor: Fix tiler OOM handling to allow incremental rendering
        drm: xlnx: zynqmp_dpsub: Fix compilation error
        drm: xlnx: zynqmp_dpsub: Fix few function comments
      56fb6f92
  2. 24 May, 2024 28 commits
    • Linus Torvalds's avatar
      Merge tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · 0b32d436
      Linus Torvalds authored
      Pull more mm updates from Andrew Morton:
       "Jeff Xu's implementation of the mseal() syscall"
      
      * tag 'mm-stable-2024-05-24-11-49' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        selftest mm/mseal read-only elf memory segment
        mseal: add documentation
        selftest mm/mseal memory sealing
        mseal: add mseal syscall
        mseal: wire up mseal syscall
      0b32d436
    • Chengming Zhou's avatar
      mm/ksm: fix possible UAF of stable_node · 90e82349
      Chengming Zhou authored
      The commit 2c653d0e ("ksm: introduce ksm_max_page_sharing per page
      deduplication limit") introduced a possible failure case in the
      stable_tree_insert(), where we may free the new allocated stable_node_dup
      if we fail to prepare the missing chain node.
      
      Then that kfolio return and unlock with a freed stable_node set...  And
      any MM activities can come in to access kfolio->mapping, so UAF.
      
      Fix it by moving folio_set_stable_node() to the end after stable_node
      is inserted successfully.
      
      Link: https://lkml.kernel.org/r/20240513-b4-ksm-stable-node-uaf-v1-1-f687de76f452@linux.dev
      Fixes: 2c653d0e ("ksm: introduce ksm_max_page_sharing per page deduplication limit")
      Signed-off-by: default avatarChengming Zhou <chengming.zhou@linux.dev>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Stefan Roesch <shr@devkernel.io>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      90e82349
    • Miaohe Lin's avatar
      mm/memory-failure: fix handling of dissolved but not taken off from buddy pages · 8cf360b9
      Miaohe Lin authored
      When I did memory failure tests recently, below panic occurs:
      
      page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
      flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
      raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
      raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
      page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
      ------------[ cut here ]------------
      kernel BUG at include/linux/page-flags.h:1009!
      invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      RIP: 0010:__del_page_from_free_list+0x151/0x180
      RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
      RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
      RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
      RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
      R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
      R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
      FS:  00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
      Call Trace:
       <TASK>
       __rmqueue_pcplist+0x23b/0x520
       get_page_from_freelist+0x26b/0xe40
       __alloc_pages_noprof+0x113/0x1120
       __folio_alloc_noprof+0x11/0xb0
       alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
       __alloc_fresh_hugetlb_folio+0xe7/0x140
       alloc_pool_huge_folio+0x68/0x100
       set_max_huge_pages+0x13d/0x340
       hugetlb_sysctl_handler_common+0xe8/0x110
       proc_sys_call_handler+0x194/0x280
       vfs_write+0x387/0x550
       ksys_write+0x64/0xe0
       do_syscall_64+0xc2/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7ff916114887
      RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
      RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
      RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
      R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
      R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
       </TASK>
      Modules linked in: mce_inject hwpoison_inject
      ---[ end trace 0000000000000000 ]---
      
      And before the panic, there had an warning about bad page state:
      
      BUG: Bad page state in process page-types  pfn:8cee00
      page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
      flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
      page_type: 0xffffff7f(buddy)
      raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
      raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
      page dumped because: nonzero mapcount
      Modules linked in: mce_inject hwpoison_inject
      CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty #22
      Call Trace:
       <TASK>
       dump_stack_lvl+0x83/0xa0
       bad_page+0x63/0xf0
       free_unref_page+0x36e/0x5c0
       unpoison_memory+0x50b/0x630
       simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
       debugfs_attr_write+0x42/0x60
       full_proxy_write+0x5b/0x80
       vfs_write+0xcd/0x550
       ksys_write+0x64/0xe0
       do_syscall_64+0xc2/0x1d0
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f189a514887
      RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
      RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
      RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
      R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
       </TASK>
      
      The root cause should be the below race:
      
       memory_failure
        try_memory_failure_hugetlb
         me_huge_page
          __page_handle_poison
           dissolve_free_hugetlb_folio
           drain_all_pages -- Buddy page can be isolated e.g. for compaction.
           take_page_off_buddy -- Failed as page is not in the buddy list.
      	     -- Page can be putback into buddy after compaction.
          page_ref_inc -- Leads to buddy page with refcnt = 1.
      
      Then unpoison_memory() can unpoison the page and send the buddy page back
      into buddy list again leading to the above bad page state warning.  And
      bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
      page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
      allocate this page.
      
      Fix this issue by only treating __page_handle_poison() as successful when
      it returns 1.
      
      Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
      Fixes: ceaf8fbe ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8cf360b9
    • Yuanyuan Zhong's avatar
      mm: /proc/pid/smaps_rollup: avoid skipping vma after getting mmap_lock again · 6d065f50
      Yuanyuan Zhong authored
      After switching smaps_rollup to use VMA iterator, searching for next entry
      is part of the condition expression of the do-while loop.  So the current
      VMA needs to be addressed before the continue statement.
      
      Otherwise, with some VMAs skipped, userspace observed memory
      consumption from /proc/pid/smaps_rollup will be smaller than the sum of
      the corresponding fields from /proc/pid/smaps.
      
      Link: https://lkml.kernel.org/r/20240523183531.2535436-1-yzhong@purestorage.com
      Fixes: c4c84f06 ("fs/proc/task_mmu: stop using linked list and highest_vm_end")
      Signed-off-by: default avatarYuanyuan Zhong <yzhong@purestorage.com>
      Reviewed-by: default avatarMohamed Khalfella <mkhalfella@purestorage.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      6d065f50
    • Ryusuke Konishi's avatar
      nilfs2: fix potential hang in nilfs_detach_log_writer() · eb85dace
      Ryusuke Konishi authored
      Syzbot has reported a potential hang in nilfs_detach_log_writer() called
      during nilfs2 unmount.
      
      Analysis revealed that this is because nilfs_segctor_sync(), which
      synchronizes with the log writer thread, can be called after
      nilfs_segctor_destroy() terminates that thread, as shown in the call trace
      below:
      
      nilfs_detach_log_writer
        nilfs_segctor_destroy
          nilfs_segctor_kill_thread  --> Shut down log writer thread
          flush_work
            nilfs_iput_work_func
              nilfs_dispose_list
                iput
                  nilfs_evict_inode
                    nilfs_transaction_commit
                      nilfs_construct_segment (if inode needs sync)
                        nilfs_segctor_sync  --> Attempt to synchronize with
                                                log writer thread
                                 *** DEADLOCK ***
      
      Fix this issue by changing nilfs_segctor_sync() so that the log writer
      thread returns normally without synchronizing after it terminates, and by
      forcing tasks that are already waiting to complete once after the thread
      terminates.
      
      The skipped inode metadata flushout will then be processed together in the
      subsequent cleanup work in nilfs_segctor_destroy().
      
      Link: https://lkml.kernel.org/r/20240520132621.4054-4-konishi.ryusuke@gmail.comSigned-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: syzbot+e3973c409251e136fdd0@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      eb85dace
    • Ryusuke Konishi's avatar
      nilfs2: fix unexpected freezing of nilfs_segctor_sync() · 936184ea
      Ryusuke Konishi authored
      A potential and reproducible race issue has been identified where
      nilfs_segctor_sync() would block even after the log writer thread writes a
      checkpoint, unless there is an interrupt or other trigger to resume log
      writing.
      
      This turned out to be because, depending on the execution timing of the
      log writer thread running in parallel, the log writer thread may skip
      responding to nilfs_segctor_sync(), which causes a call to schedule()
      waiting for completion within nilfs_segctor_sync() to lose the opportunity
      to wake up.
      
      The reason why waking up the task waiting in nilfs_segctor_sync() may be
      skipped is that updating the request generation issued using a shared
      sequence counter and adding an wait queue entry to the request wait queue
      to the log writer, are not done atomically.  There is a possibility that
      log writing and request completion notification by nilfs_segctor_wakeup()
      may occur between the two operations, and in that case, the wait queue
      entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of
      nilfs_segctor_sync() will be carried over until the next request occurs.
      
      Fix this issue by performing these two operations simultaneously within
      the lock section of sc_state_lock.  Also, following the memory barrier
      guidelines for event waiting loops, move the call to set_current_state()
      in the same location into the event waiting loop to ensure that a memory
      barrier is inserted just before the event condition determination.
      
      Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com
      Fixes: 9ff05123 ("nilfs2: segment constructor")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Cc: "Bai, Shuangpeng" <sjb7183@psu.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      936184ea
    • Ryusuke Konishi's avatar
      nilfs2: fix use-after-free of timer for log writer thread · f5d4e046
      Ryusuke Konishi authored
      Patch series "nilfs2: fix log writer related issues".
      
      This bug fix series covers three nilfs2 log writer-related issues,
      including a timer use-after-free issue and potential deadlock issue on
      unmount, and a potential freeze issue in event synchronization found
      during their analysis.  Details are described in each commit log.
      
      
      This patch (of 3):
      
      A use-after-free issue has been reported regarding the timer sc_timer on
      the nilfs_sc_info structure.
      
      The problem is that even though it is used to wake up a sleeping log
      writer thread, sc_timer is not shut down until the nilfs_sc_info structure
      is about to be freed, and is used regardless of the thread's lifetime.
      
      Fix this issue by limiting the use of sc_timer only while the log writer
      thread is alive.
      
      Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com
      Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com
      Fixes: fdce895e ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar"Bai, Shuangpeng" <sjb7183@psu.edu>
      Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJTested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f5d4e046
    • Michael Ellerman's avatar
      selftests/mm: fix build warnings on ppc64 · 1901472f
      Michael Ellerman authored
      Fix warnings like:
      
        In file included from uffd-unit-tests.c:8:
        uffd-unit-tests.c: In function `uffd_poison_handle_fault':
        uffd-common.h:45:33: warning: format `%llu' expects argument of type
        `long long unsigned int', but argument 3 has type `__u64' {aka `long
        unsigned int'} [-Wformat=]
      
      By switching to unsigned long long for u64 for ppc64 builds.
      
      Link: https://lkml.kernel.org/r/20240521030219.57439-1-mpe@ellerman.id.auSigned-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1901472f
    • Will Deacon's avatar
      arm64: patching: fix handling of execmem addresses · b1480ed2
      Will Deacon authored
      Klara Modin reported warnings for a kernel configured with BPF_JIT but
      without MODULES:
      
      [   44.131296] Trying to vfree() bad address (000000004a17c299)
      [   44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G      D W          6.9.0-01786-g2c9e5d4a #25
      [   44.158229] Hardware name: Raspberry Pi 3 Model B (DT)
      [   44.164433] Workqueue: events bpf_prog_free_deferred
      [   44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      [   44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.188772] sp : ffff800082a13c70
      [   44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000
      [   44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000
      [   44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880
      [   44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006
      [   44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002
      [   44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000
      [   44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370
      [   44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370
      [   44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
      [   44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240
      [   44.275703] Call trace:
      [   44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1))
      [   44.283858] vfree (mm/vmalloc.c:3322)
      [   44.287835] execmem_free (mm/execmem.c:70)
      [   44.292347] bpf_jit_free_exec+0x10/0x1c
      [   44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006)
      [   44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195)
      [   44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474)
      [   44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785)
      [   44.317785] process_one_work (kernel/workqueue.c:3273)
      [   44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2))
      [   44.327292] kthread (kernel/kthread.c:388)
      [   44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861)
      
      The problem is because bpf_arch_text_copy() silently fails to write to the
      read-only area as a result of patch_map() faulting and the resulting
      -EFAULT being chucked away.
      
      Update patch_map() to use CONFIG_EXECMEM instead of
      CONFIG_STRICT_MODULE_RWX to check for vmalloc addresses.
      
      Link: https://lkml.kernel.org/r/20240521213813.703309-1-rppt@kernel.org
      Fixes: 2c9e5d4a ("bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of")
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarMike Rapoport (IBM) <rppt@kernel.org>
      Reported-by: default avatarKlara Modin <klarasmodin@gmail.com>
      Closes: https://lore.kernel.org/all/7983fbbf-0127-457c-9394-8d6e4299c685@gmail.comTested-by: default avatarKlara Modin <klarasmodin@gmail.com>
      Cc: Björn Töpel <bjorn@kernel.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b1480ed2
    • Dev Jain's avatar
      selftests/mm: compaction_test: fix bogus test success and reduce probability... · fb9293b6
      Dev Jain authored
      selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
      
      Reset nr_hugepages to zero before the start of the test.
      
      If a non-zero number of hugepages is already set before the start of the
      test, the following problems arise:
      
       - The probability of the test getting OOM-killed increases.  Proof:
         The test wants to run on 80% of available memory to prevent OOM-killing
         (see original code comments).  Let the value of mem_free at the start
         of the test, when nr_hugepages = 0, be x.  In the other case, when
         nr_hugepages > 0, let the memory consumed by hugepages be y.  In the
         former case, the test operates on 0.8 * x of memory.  In the latter,
         the test operates on 0.8 * (x - y) of memory, with y already filled,
         hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
         x.  Q.E.D
      
       - The probability of a bogus test success increases.  Proof: Let the
         memory consumed by hugepages be greater than 25% of x, with x and y
         defined as above.  The definition of compaction_index is c_index = (x -
         y)/z where z is the memory consumed by hugepages after trying to
         increase them again.  In check_compaction(), we set the number of
         hugepages to zero, and then increase them back; the probability that
         they will be set back to consume at least y amount of memory again is
         very high (since there is not much delay between the two attempts of
         changing nr_hugepages).  Hence, z >= y > (x/4) (by the 25% assumption).
         Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
         hence, c_index can always be forced to be less than 3, thereby the test
         succeeding always.  Q.E.D
      
      Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
      Fixes: bd67d5c1 ("Test compaction of mlocked memory")
      Signed-off-by: default avatarDev Jain <dev.jain@arm.com>
      Cc: <stable@vger.kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Sri Jayaramappa <sjayaram@akamai.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fb9293b6
    • Dev Jain's avatar
      selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages · 9ad665ef
      Dev Jain authored
      Currently, the test tries to set nr_hugepages to zero, but that is not
      actually done because the file offset is not reset after read().  Fix that
      using lseek().
      
      Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
      Fixes: bd67d5c1 ("Test compaction of mlocked memory")
      Signed-off-by: default avatarDev Jain <dev.jain@arm.com>
      Cc: <stable@vger.kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Sri Jayaramappa <sjayaram@akamai.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9ad665ef
    • Dev Jain's avatar
      selftests/mm: compaction_test: fix bogus test success on Aarch64 · d4202e66
      Dev Jain authored
      Patch series "Fixes for compaction_test", v2.
      
      The compaction_test memory selftest introduces fragmentation in memory
      and then tries to allocate as many hugepages as possible. This series
      addresses some problems.
      
      On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
      compaction_index becomes 0, which is less than 3, due to no division by
      zero exception being raised. We fix that by checking for division by
      zero.
      
      Secondly, correctly set the number of hugepages to zero before trying
      to set a large number of them.
      
      Now, consider a situation in which, at the start of the test, a non-zero
      number of hugepages have been already set (while running the entire
      selftests/mm suite, or manually by the admin). The test operates on 80%
      of memory to avoid OOM-killer invocation, and because some memory is
      already blocked by hugepages, it would increase the chance of OOM-killing.
      Also, since mem_free used in check_compaction() is the value before we
      set nr_hugepages to zero, the chance that the compaction_index will
      be small is very high if the preset nr_hugepages was high, leading to a
      bogus test success.
      
      
      This patch (of 3):
      
      Currently, if at runtime we are not able to allocate a huge page, the test
      will trivially pass on Aarch64 due to no exception being raised on
      division by zero while computing compaction_index.  Fix that by checking
      for nr_hugepages == 0.  Anyways, in general, avoid a division by zero by
      exiting the program beforehand.  While at it, fix a typo, and handle the
      case where the number of hugepages may overflow an integer.
      
      Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
      Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
      Fixes: bd67d5c1 ("Test compaction of mlocked memory")
      Signed-off-by: default avatarDev Jain <dev.jain@arm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Sri Jayaramappa <sjayaram@akamai.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d4202e66
    • Satya Priya Kakitapalli's avatar
      mailmap: update email address for Satya Priya · c17d39f5
      Satya Priya Kakitapalli authored
      Update mailmap with my latest email ID, quic_c_skakit@quicinc.com
      is no longer active.
      
      Link: https://lkml.kernel.org/r/20240515-mailmap-update-v1-1-df4853f757a3@quicinc.comSigned-off-by: default avatarSatya Priya Kakitapalli <quic_skakitap@quicinc.com>
      Cc: Ajit Pandey <quic_ajipan@quicinc.com>
      Cc: Bjorn Andersson <andersson@kernel.org>
      Cc: Imran Shaik <quic_imrashai@quicinc.com>
      Cc: Jagadeesh Kona <quic_jkona@quicinc.com>
      Cc: Konrad Dybcio <konrad.dybcio@linaro.org>
      Cc: Taniya Das <quic_tdas@quicinc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      c17d39f5
    • Miaohe Lin's avatar
      mm/huge_memory: don't unpoison huge_zero_folio · fe6f86f4
      Miaohe Lin authored
      When I did memory failure tests recently, below panic occurs:
      
       kernel BUG at include/linux/mm.h:1135!
       invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
       CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
       RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
       RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
       RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
       RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
       RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
       R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
       FS:  0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
       Call Trace:
        <TASK>
        do_shrink_slab+0x14f/0x6a0
        shrink_slab+0xca/0x8c0
        shrink_node+0x2d0/0x7d0
        balance_pgdat+0x33a/0x720
        kswapd+0x1f3/0x410
        kthread+0xd5/0x100
        ret_from_fork+0x2f/0x50
        ret_from_fork_asm+0x1a/0x30
        </TASK>
       Modules linked in: mce_inject hwpoison_inject
       ---[ end trace 0000000000000000 ]---
       RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
       RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
       RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
       RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
       RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
       R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
       R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
       FS:  0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
      
      The root cause is that HWPoison flag will be set for huge_zero_folio
      without increasing the folio refcnt.  But then unpoison_memory() will
      decrease the folio refcnt unexpectedly as it appears like a successfully
      hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when
      releasing huge_zero_folio.
      
      Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue. 
      We're not prepared to unpoison huge_zero_folio yet.
      
      Link: https://lkml.kernel.org/r/20240516122608.22610-1-linmiaohe@huawei.com
      Fixes: 478d134e ("mm/huge_memory: do not overkill when splitting huge_zero_page")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYang Shi <shy828301@gmail.com>
      Reviewed-by: default avatarOscar Salvador <osalvador@suse.de>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Xu Yu <xuyu@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fe6f86f4
    • Andrey Konovalov's avatar
      kasan, fortify: properly rename memintrinsics · 2e577732
      Andrey Konovalov authored
      After commit 69d4c0d3 ("entry, kasan, x86: Disallow overriding mem*()
      functions") and the follow-up fixes, with CONFIG_FORTIFY_SOURCE enabled,
      even though the compiler instruments meminstrinsics by generating calls to
      __asan/__hwasan_ prefixed functions, FORTIFY_SOURCE still uses
      uninstrumented memset/memmove/memcpy as the underlying functions.
      
      As a result, KASAN cannot detect bad accesses in memset/memmove/memcpy. 
      This also makes KASAN tests corrupt kernel memory and cause crashes.
      
      To fix this, use __asan_/__hwasan_memset/memmove/memcpy as the underlying
      functions whenever appropriate.  Do this only for the instrumented code
      (as indicated by __SANITIZE_ADDRESS__).
      
      Link: https://lkml.kernel.org/r/20240517130118.759301-1-andrey.konovalov@linux.dev
      Fixes: 69d4c0d3 ("entry, kasan, x86: Disallow overriding mem*() functions")
      Fixes: 51287dcb ("kasan: emit different calls for instrumentable memintrinsics")
      Fixes: 36be5cba ("kasan: treat meminstrinsic as builtins in uninstrumented files")
      Signed-off-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Reported-by: default avatarErhard Furtner <erhard_f@mailbox.org>
      Reported-by: default avatarNico Pache <npache@redhat.com>
      Closes: https://lore.kernel.org/all/20240501144156.17e65021@outsider.home/Reviewed-by: default avatarMarco Elver <elver@google.com>
      Tested-by: default avatarNico Pache <npache@redhat.com>
      Acked-by: default avatarNico Pache <npache@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2e577732
    • Suren Baghdasaryan's avatar
      lib: add version into /proc/allocinfo output · a38568a0
      Suren Baghdasaryan authored
      Add version string and a header at the beginning of /proc/allocinfo to
      allow later format changes.  Example output:
      
      > head /proc/allocinfo
      allocinfo - version: 1.0
      #     <size>  <calls> <tag info>
                 0        0 init/main.c:1314 func:do_initcalls
                 0        0 init/do_mounts.c:353 func:mount_nodev_root
                 0        0 init/do_mounts.c:187 func:mount_root_generic
                 0        0 init/do_mounts.c:158 func:do_mount_root
                 0        0 init/initramfs.c:493 func:unpack_to_rootfs
                 0        0 init/initramfs.c:492 func:unpack_to_rootfs
                 0        0 init/initramfs.c:491 func:unpack_to_rootfs
               512        1 arch/x86/events/rapl.c:681 func:init_rapl_pmus
               128        1 arch/x86/events/rapl.c:571 func:rapl_cpu_online
      
      [akpm@linux-foundation.org: remove stray newline from struct allocinfo_private]
      Link: https://lkml.kernel.org/r/20240514163128.3662251-1-surenb@google.comSigned-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Reviewed-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a38568a0
    • Hailong.Liu's avatar
      mm/vmalloc: fix vmalloc which may return null if called with __GFP_NOFAIL · 8e0545c8
      Hailong.Liu authored
      commit a421ef30 ("mm: allow !GFP_KERNEL allocations for kvmalloc")
      includes support for __GFP_NOFAIL, but it presents a conflict with commit
      dd544141 ("vmalloc: back off when the current task is OOM-killed").  A
      possible scenario is as follows:
      
      process-a
      __vmalloc_node_range(GFP_KERNEL | __GFP_NOFAIL)
          __vmalloc_area_node()
              vm_area_alloc_pages()
      		--> oom-killer send SIGKILL to process-a
              if (fatal_signal_pending(current)) break;
      --> return NULL;
      
      To fix this, do not check fatal_signal_pending() in vm_area_alloc_pages()
      if __GFP_NOFAIL set.
      
      This issue occurred during OPLUS KASAN TEST. Below is part of the log
      -> oom-killer sends signal to process
      [65731.222840] [ T1308] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/apps/uid_10198,task=gs.intelligence,pid=32454,uid=10198
      
      [65731.259685] [T32454] Call trace:
      [65731.259698] [T32454]  dump_backtrace+0xf4/0x118
      [65731.259734] [T32454]  show_stack+0x18/0x24
      [65731.259756] [T32454]  dump_stack_lvl+0x60/0x7c
      [65731.259781] [T32454]  dump_stack+0x18/0x38
      [65731.259800] [T32454]  mrdump_common_die+0x250/0x39c [mrdump]
      [65731.259936] [T32454]  ipanic_die+0x20/0x34 [mrdump]
      [65731.260019] [T32454]  atomic_notifier_call_chain+0xb4/0xfc
      [65731.260047] [T32454]  notify_die+0x114/0x198
      [65731.260073] [T32454]  die+0xf4/0x5b4
      [65731.260098] [T32454]  die_kernel_fault+0x80/0x98
      [65731.260124] [T32454]  __do_kernel_fault+0x160/0x2a8
      [65731.260146] [T32454]  do_bad_area+0x68/0x148
      [65731.260174] [T32454]  do_mem_abort+0x151c/0x1b34
      [65731.260204] [T32454]  el1_abort+0x3c/0x5c
      [65731.260227] [T32454]  el1h_64_sync_handler+0x54/0x90
      [65731.260248] [T32454]  el1h_64_sync+0x68/0x6c
      
      [65731.260269] [T32454]  z_erofs_decompress_queue+0x7f0/0x2258
      --> be->decompressed_pages = kvcalloc(be->nr_pages, sizeof(struct page *), GFP_KERNEL | __GFP_NOFAIL);
      	kernel panic by NULL pointer dereference.
      	erofs assume kvmalloc with __GFP_NOFAIL never return NULL.
      [65731.260293] [T32454]  z_erofs_runqueue+0xf30/0x104c
      [65731.260314] [T32454]  z_erofs_readahead+0x4f0/0x968
      [65731.260339] [T32454]  read_pages+0x170/0xadc
      [65731.260364] [T32454]  page_cache_ra_unbounded+0x874/0xf30
      [65731.260388] [T32454]  page_cache_ra_order+0x24c/0x714
      [65731.260411] [T32454]  filemap_fault+0xbf0/0x1a74
      [65731.260437] [T32454]  __do_fault+0xd0/0x33c
      [65731.260462] [T32454]  handle_mm_fault+0xf74/0x3fe0
      [65731.260486] [T32454]  do_mem_abort+0x54c/0x1b34
      [65731.260509] [T32454]  el0_da+0x44/0x94
      [65731.260531] [T32454]  el0t_64_sync_handler+0x98/0xb4
      [65731.260553] [T32454]  el0t_64_sync+0x198/0x19c
      
      Link: https://lkml.kernel.org/r/20240510100131.1865-1-hailong.liu@oppo.com
      Fixes: 9376130c ("mm/vmalloc: add support for __GFP_NOFAIL")
      Signed-off-by: default avatarHailong.Liu <hailong.liu@oppo.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Suggested-by: default avatarBarry Song <21cnbao@gmail.com>
      Reported-by: default avatarOven <liyangouwen1@oppo.com>
      Reviewed-by: default avatarBarry Song <baohua@kernel.org>
      Reviewed-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
      Cc: Chao Yu <chao@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Gao Xiang <xiang@kernel.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8e0545c8
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · f1f9984f
      Linus Torvalds authored
      Pull more RISC-V updates from Palmer Dabbelt:
      
       - The compression format used for boot images is now configurable at
         build time, and these formats are shown in `make help`
      
       - access_ok() has been optimized
      
       - A pair of performance bugs have been fixed in the uaccess handlers
      
       - Various fixes and cleanups, including one for the IMSIC build failure
         and one for the early-boot ftrace illegal NOPs bug
      
      * tag 'riscv-for-linus-6.10-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: Fix early ftrace nop patching
        irqchip: riscv-imsic: Fixup riscv_ipi_set_virq_range() conflict
        riscv: selftests: Add signal handling vector tests
        riscv: mm: accelerate pagefault when badaccess
        riscv: uaccess: Relax the threshold for fast path
        riscv: uaccess: Allow the last potential unrolled copy
        riscv: typo in comment for get_f64_reg
        Use bool value in set_cpu_online()
        riscv: selftests: Add hwprobe binaries to .gitignore
        riscv: stacktrace: fixed walk_stackframe()
        ftrace: riscv: move from REGS to ARGS
        riscv: do not select MODULE_SECTIONS by default
        riscv: show help string for riscv-specific targets
        riscv: make image compression configurable
        riscv: cpufeature: Fix extension subset checking
        riscv: cpufeature: Fix thead vector hwcap removal
        riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
        riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
        riscv: Define TASK_SIZE_MAX for __access_ok()
        riscv: Remove PGDIR_SIZE_L3 and TASK_SIZE_MIN
      f1f9984f
    • Linus Torvalds's avatar
      Merge tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 9351f138
      Linus Torvalds authored
      Pull xen updates from Juergen Gross:
      
       - a small cleanup in the drivers/xen/xenbus Makefile
      
       - a fix of the Xen xenstore driver to improve connecting to a late
         started Xenstore
      
       - an enhancement for better support of ballooning in PVH guests
      
       - a cleanup using try_cmpxchg() instead of open coding it
      
      * tag 'for-linus-6.10a-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        drivers/xen: Improve the late XenStore init protocol
        xen/xenbus: Use *-y instead of *-objs in Makefile
        xen/x86: add extra pages to unpopulated-alloc if available
        locking/x86/xen: Use try_cmpxchg() in xen_alloc_p2m_entry()
      9351f138
    • Linus Torvalds's avatar
      Merge tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 02c438bb
      Linus Torvalds authored
      Pull more btrfs updates from David Sterba:
       "A few more updates, mostly stability fixes or user visible changes:
      
         - fix race in zoned mode during device replace that can lead to
           use-after-free
      
         - update return codes and lower message levels for quota rescan where
           it's causing false alerts
      
         - fix unexpected qgroup id reuse under some conditions
      
         - fix condition when looking up extent refs
      
         - add option norecovery (removed in 6.8), the intended replacements
           haven't been used and some aplications still rely on the old one
      
         - build warning fixes"
      
      * tag 'for-6.10-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: re-introduce 'norecovery' mount option
        btrfs: fix end of tree detection when searching for data extent ref
        btrfs: scrub: initialize ret in scrub_simple_mirror() to fix compilation warning
        btrfs: zoned: fix use-after-free due to race with dev replace
        btrfs: qgroup: fix qgroup id collision across mounts
        btrfs: qgroup: update rescan message levels and error codes
      02c438bb
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · dcb9f486
      Linus Torvalds authored
      Pull more erofs updates from Gao Xiang:
       "The main ones are metadata API conversion to byte offsets by Al Viro.
      
        Another patch gets rid of unnecessary memory allocation out of DEFLATE
        decompressor. The remaining one is a trivial cleanup.
      
         - Convert metadata APIs to byte offsets
      
         - Avoid allocating DEFLATE streams unnecessarily
      
         - Some erofs_show_options() cleanup"
      
      * tag 'erofs-for-6.10-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: avoid allocating DEFLATE streams before mounting
        z_erofs_pcluster_begin(): don't bother with rounding position down
        erofs: don't round offset down for erofs_read_metabuf()
        erofs: don't align offset for erofs_read_metabuf() (simple cases)
        erofs: mechanically convert erofs_read_metabuf() to offsets
        erofs: clean up erofs_show_options()
      dcb9f486
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs · c40b1994
      Linus Torvalds authored
      Pull bcachefs fixes from Kent Overstreet:
       "Nothing exciting, just syzbot fixes (except for the one
        FMODE_CAN_ODIRECT patch).
      
        Looks like syzbot reports have slowed down; this is all catch up from
        two weeks of conferences.
      
        Next hardening project is using Thomas's error injection tooling to
        torture test repair"
      
      * tag 'bcachefs-2024-05-24' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: Fix race path in bch2_inode_insert()
        bcachefs: Ensure we're RW before journalling
        bcachefs: Fix shutdown ordering
        bcachefs: Fix unsafety in bch2_dirent_name_bytes()
        bcachefs: Fix stack oob in __bch2_encrypt_bio()
        bcachefs: Fix btree_trans leak in bch2_readahead()
        bcachefs: Fix bogus verify_replicas_entry() assert
        bcachefs: Check for subvolues with bogus snapshot/inode fields
        bcachefs: bch2_checksum() returns 0 for unknown checksum type
        bcachefs: Fix bch2_alloc_ciphers()
        bcachefs: Add missing guard in bch2_snapshot_has_children()
        bcachefs: Fix missing parens in drop_locks_do()
        bcachefs: Improve bch2_assert_pos_locked()
        bcachefs: Fix shift overflows in replicas.c
        bcachefs: Fix shift overflow in btree_lost_data()
        bcachefs: Fix ref in trans_mark_dev_sbs() error path
        bcachefs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
        bcachefs: Fix rcu splat in check_fix_ptrs()
      c40b1994
    • Linus Torvalds's avatar
      Merge tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 9ea370f3
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
      
       - a change to input core to trim amount of keys data in modalias string
         in case when a device declares too many keys and they do not fit in
         uevent buffer instead of reporting an error which results in uevent
         not being generated at all
      
       - support for Machenike G5 Pro Controller added to xpad driver
      
       - support for FocalTech FT5452 and FT8719 added to edt-ft5x06
      
       - support for new SPMI vibrator added to pm8xxx-vibrator driver
      
       - missing locking added to cyapa touchpad driver
      
       - removal of unused fields in various driver structures
      
       - explicit initialization of i2c_device_id::driver_data to 0 dropped
         from input drivers
      
       - other assorted fixes and cleanups.
      
      * tag 'input-for-v6.10-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (24 commits)
        Input: edt-ft5x06 - add support for FocalTech FT5452 and FT8719
        dt-bindings: input: touchscreen: edt-ft5x06: Document FT5452 and FT8719 support
        Input: xpad - add support for Machenike G5 Pro Controller
        Input: try trimming too long modalias strings
        Input: drop explicit initialization of struct i2c_device_id::driver_data to 0
        Input: zet6223 - remove an unused field in struct zet6223_ts
        Input: chipone_icn8505 - remove an unused field in struct icn8505_data
        Input: cros_ec_keyb - remove an unused field in struct cros_ec_keyb
        Input: lpc32xx-keys - remove an unused field in struct lpc32xx_kscan_drv
        Input: matrix_keypad - remove an unused field in struct matrix_keypad
        Input: tca6416-keypad - remove unused struct tca6416_drv_data
        Input: tca6416-keypad - remove an unused field in struct tca6416_keypad_chip
        Input: da7280 - remove an unused field in struct da7280_haptic
        Input: ff-core - prefer struct_size over open coded arithmetic
        Input: cyapa - add missing input core locking to suspend/resume functions
        input: pm8xxx-vibrator: add new SPMI vibrator support
        dt-bindings: input: qcom,pm8xxx-vib: add new SPMI vibrator module
        input: pm8xxx-vibrator: refactor to support new SPMI vibrator
        Input: pm8xxx-vibrator - correct VIB_MAX_LEVELS calculation
        Input: sur40 - convert le16 to cpu before use
        ...
      9ea370f3
    • Linus Torvalds's avatar
      Merge tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 041c9f71
      Linus Torvalds authored
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes for 6.10-rc1. Most of changes are various
        device-specific fixes and quirks, while there are a few small changes
        in ALSA core timer and module / built-in fixes"
      
      * tag 'sound-fix-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/realtek: fix mute/micmute LEDs don't work for ProBook 440/460 G11.
        ALSA: core: Enable proc module when CONFIG_MODULES=y
        ALSA: core: Fix NULL module pointer assignment at card init
        ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897
        ASoC: dt-bindings: stm32: Ensure compatible pattern matches whole string
        ASoC: tas2781: Fix wrong loading calibrated data sequence
        ASoC: tas2552: Add TX path for capturing AUDIO-OUT data
        ALSA: usb-audio: Fix for sampling rates support for Mbox3
        Documentation: sound: Fix trailing whitespaces
        ALSA: timer: Set lower bound of start tick time
        ASoC: codecs: ES8326: solve hp and button detect issue
        ASoC: rt5645: mic-in detection threshold modification
        ASoC: Intel: sof_sdw_rt_sdca_jack_common: Use name_prefix for `-sdca` detection
      041c9f71
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.10-rc1-fix' of... · e292ead0
      Linus Torvalds authored
      Merge tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
      
      Pull char/misc fix from Greg KH:
       "Here is one remaining bugfix for 6.10-rc1 that missed the 6.9-final
        merge window, and has been sitting in my tree and linux-next for quite
        a while now, but wasn't sent to you (my fault, travels...)
      
        It is a bugfix to resolve an error in the speakup code that could
        overflow a buffer.
      
        It has been in linux-next for a while with no reported problems"
      
      * tag 'char-misc-6.10-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        speakup: Fix sizeof() vs ARRAY_SIZE() bug
      e292ead0
    • Linus Torvalds's avatar
      Merge tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · f6d199c7
      Linus Torvalds authored
      Pull tty/serial fixes from Greg KH:
       "Here are some small TTY and Serial driver fixes that missed the
        6.9-final merge window, but have been in my tree for weeks (my fault,
        travel caused me to miss this)
      
        These fixes include:
      
         - more n_gsm fixes for reported problems
      
         - 8520_mtk driver fix
      
         - 8250_bcm7271 driver fix
      
         - sc16is7xx driver fix
      
        All of these have been in linux-next for weeks without any reported
        problems"
      
      * tag 'tty-6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: sc16is7xx: fix bug in sc16is7xx_set_baud() when using prescaler
        serial: 8250_bcm7271: use default_mux_rate if possible
        serial: 8520_mtk: Set RTS on shutdown for Rx in-band wakeup
        tty: n_gsm: fix missing receive state reset after mode switch
        tty: n_gsm: fix possible out-of-bounds in gsm0_receive()
      f6d199c7
    • Linus Torvalds's avatar
      Merge tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · b0a9ba13
      Linus Torvalds authored
      Pull hardening fixes from Kees Cook:
      
       - loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module
         decompression (Stephen Boyd)
      
       - ubsan: Restore dependency on ARCH_HAS_UBSAN
      
       - kunit/fortify: Fix memcmp() test to be amplitude agnostic
      
      * tag 'hardening-v6.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        kunit/fortify: Fix memcmp() test to be amplitude agnostic
        ubsan: Restore dependency on ARCH_HAS_UBSAN
        loadpin: Prevent SECURITY_LOADPIN_ENFORCE=y without module decompression
      b0a9ba13
    • Linus Torvalds's avatar
      Merge tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 0eb03c7e
      Linus Torvalds authored
      Pull tracefs/eventfs updates from Steven Rostedt:
       "Bug fixes:
      
         - The eventfs directories need to have unique inode numbers. Make
           sure that they do not get the default file inode number.
      
         - Update the inode uid and gid fields on remount.
      
           When a remount happens where a uid and/or gid is specified, all the
           tracefs files and directories should get the specified uid and/or
           gid. But this can be sporadic when some uids were assigned already.
           There's already a list of inodes that are allocated. Just update
           their uid and gid fields at the time of remount.
      
         - Update the eventfs_inodes on remount from the top level "events"
           descriptor.
      
           There was a bug where not all the eventfs files or directories
           where getting updated on remount. One fix was to clear the
           SAVED_UID/GID flags from the inode list during the iteration of the
           inodes during the remount. But because the eventfs inodes can be
           freed when the last referenced is released, not all the
           eventfs_inodes were being updated. This lead to the ownership
           selftest to fail if it was run a second time (the first time would
           leave eventfs_inodes with no corresponding tracefs_inode).
      
           Instead, for eventfs_inodes, only process the "events"
           eventfs_inode from the list iteration, as it is guaranteed to have
           a tracefs_inode (it's never freed while the "events" directory
           exists). As it has a list of its children, and the children have a
           list of their children, just iterate all the eventfs_inodes from
           the "events" descriptor and it is guaranteed to get all of them.
      
         - Clear the EVENT_INODE flag from the tracefs_drop_inode() callback.
      
           Currently the EVENTFS_INODE FLAG is cleared in the tracefs_d_iput()
           callback. But this is the wrong location. The iput() callback is
           called when the last reference to the dentry inode is hit. There
           could be a case where two dentry's have the same inode, and the
           flag will be cleared prematurely. The flag needs to be cleared when
           the last reference of the inode is dropped and that happens in the
           inode's drop_inode() callback handler.
      
        Cleanups:
      
         - Consolidate the creation of a tracefs_inode for an eventfs_inode
      
           A tracefs_inode is created for both files and directories of the
           eventfs system. It is open coded. Instead, consolidate it into a
           single eventfs_get_inode() function call.
      
         - Remove the eventfs getattr and permission callbacks.
      
           The permissions for the eventfs files and directories are updated
           when the inodes are created, on remount, and when the user sets
           them (via setattr). The inodes hold the current permissions so
           there is no need to have custom getattr or permissions callbacks as
           they will more likely cause them to be incorrect. The inode's
           permissions are updated when they should be updated. Remove the
           getattr and permissions inode callbacks.
      
         - Do not update eventfs_inode attributes on creation of inodes.
      
           The eventfs_inodes attribute field is used to store the permissions
           of the directories and files for when their corresponding inodes
           are freed and are created again. But when the creation of the
           inodes happen, the eventfs_inode attributes are recalculated. The
           recalculation should only happen when the permissions change for a
           given file or directory. Currently, the attribute changes are just
           being set to their current files so this is not a bug, but it's
           unnecessary and error prone. Stop doing that.
      
         - The events directory inode is created once when the events
           directory is created and deleted when it is deleted. It is now
           updated on remount and when the user changes the permissions.
           There's no need to use the eventfs_inode of the events directory to
           store the events directory permissions. But using it to store the
           default permissions for the files within the directory that have
           not been updated by the user can simplify the code"
      
      * tag 'trace-tracefs-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        eventfs: Do not use attributes for events directory
        eventfs: Cleanup permissions in creation of inodes
        eventfs: Remove getattr and permission callbacks
        eventfs: Consolidate the eventfs_inode update in eventfs_get_inode()
        tracefs: Clear EVENT_INODE flag in tracefs_drop_inode()
        eventfs: Update all the eventfs_inodes from the events descriptor
        tracefs: Update inode permissions on remount
        eventfs: Keep the directories from having the same inode number as files
      0eb03c7e