1. 23 Aug, 2022 4 commits
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · df0219d1
      Linus Torvalds authored
      Pull parisc fixes from Helge Deller:
       "Some interesting background to the current patchset:
      
        It turned out that the fldw instruction (which loads a 32-bit word
        from memory into one half of a FP register) failed on unaligned
        addresses and even trashed some other random FP register instead. It's
        a trivial one-liner fix in the exception handler but this failure
        dates back to the very beginnings of the parisc-port. It's strange
        that it was never noticed before.
      
        Another patch fixes an annoyance noticed by Randy Dunlap. Running
        "make ARCH=parisc64 randconfig" always returned a 32-bit config,
        although one would expect a 64-bit config. Masahiro Yamada suggested
        to mimik sparc Kconfig code, which fixed the issue nicely. This
        allowed to drop some compiler build checks too.
      
        Third, it's possible to build an optimized 32-bit kernel for PA8X00
        (64-bit) CPUs, which then wouldn't start on 32-bit-only (PA1.x)
        machines. I've added a bootup check which prevents that and which
        prints a message to the console. This can be tested with qemu, which
        currently only supports 32-bit emulation.
      
        The other patches are usual clean-up stuff like added return value
        checks and typo fixes in comments.
      
        Summary:
      
         - Fix emulation of fldw instruction on unaligned addresses
      
         - Fix "make ARCH=parisc64 randconfig" to return a 64-bit config
      
         - Prevent boot if trying to boot a 32-bit kernel compiled for PA8X00
           CPUs on 32-bit only machines
      
         - ccio-dma: Handle kmalloc failure in ccio_init_resources()"
      
      * tag 'parisc-for-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines
        parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources()
        parisc: led: Move from strlcpy with unused retval to strscpy
        parisc: ccio-dma: Fix typo in comment
        Revert "parisc: Show error if wrong 32/64-bit compiler is being used"
        parisc: Make CONFIG_64BIT available for ARCH=parisc64 only
        parisc: Fix exception handler for fldw and fstw instructions
      df0219d1
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm · 95607ad9
      Linus Torvalds authored
      Pull misc fixes from Andrew Morton:
       "Thirteen fixes, almost all for MM.
      
        Seven of these are cc:stable and the remainder fix up the changes
        which went into this -rc cycle"
      
      * tag 'mm-hotfixes-stable-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
        kprobes: don't call disarm_kprobe() for disabled kprobes
        mm/shmem: shmem_replace_page() remember NR_SHMEM
        mm/shmem: tmpfs fallocate use file_modified()
        mm/shmem: fix chattr fsflags support in tmpfs
        mm/hugetlb: support write-faults in shared mappings
        mm/hugetlb: fix hugetlb not supporting softdirty tracking
        mm/uffd: reset write protection when unregister with wp-mode
        mm/smaps: don't access young/dirty bit if pte unpresent
        mm: add DEVICE_ZONE to FOR_ALL_ZONES
        kernel/sys_ni: add compat entry for fadvise64_64
        mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW
        Revert "zram: remove double compression logic"
        get_maintainer: add Alan to .get_maintainer.ignore
      95607ad9
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-kunit-fixes-6.0-rc3' of... · 6234806f
      Linus Torvalds authored
      Merge tag 'linux-kselftest-kunit-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull KUnit fixes from Shuah Khan:
       "Fix for a mmc test and to load .kunit_test_suites section when
        CONFIG_KUNIT=m, and not just when KUnit is built-in"
      
      * tag 'linux-kselftest-kunit-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        module: kunit: Load .kunit_test_suites section when CONFIG_KUNIT=m
        mmc: sdhci-of-aspeed: test: Fix dependencies when KUNIT=m
      6234806f
    • Linus Torvalds's avatar
      Merge tag 'linux-kselftest-fixes-6.0-rc3' of... · 3ee3d984
      Linus Torvalds authored
      Merge tag 'linux-kselftest-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
      
      Pull Kselftest fixes from Shuah Khan:
       "Fixes to vm and sgx test builds"
      
      * tag 'linux-kselftest-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
        selftests/vm: fix inability to build any vm tests
        selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning
      3ee3d984
  2. 22 Aug, 2022 12 commits
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 072e5135
      Linus Torvalds authored
      Pull NFS client fixes from Trond Myklebust:
      "Stable fixes:
         - NFS: Fix another fsync() issue after a server reboot
      
        Bugfixes:
         - NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT
         - NFS: Fix missing unlock in nfs_unlink()
         - Add sanity checking of the file type used by __nfs42_ssc_open
         - Fix a case where we're failing to set task->tk_rpc_status
      
        Cleanups:
         - Remove the NFS_CONTEXT_RESEND_WRITES flag that got obsoleted by the
           fsync() fix"
      
      * tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        SUNRPC: RPC level errors should set task->tk_rpc_status
        NFSv4.2 fix problems with __nfs42_ssc_open
        NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT
        NFS: Cleanup to remove unused flag NFS_CONTEXT_RESEND_WRITES
        NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mds
        NFS: Fix another fsync() issue after a server reboot
        NFS: Fix missing unlock in nfs_unlink()
      072e5135
    • Linus Torvalds's avatar
      Merge tag 'fs.idmapped.fixes.v6.0-rc3' of... · d3cd67d6
      Linus Torvalds authored
      Merge tag 'fs.idmapped.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
      
      Pull idmapping fixes from Christian Brauner:
      
       - Since Seth joined as co-maintainer for idmapped mounts we decided to
         use a shared git tree. Konstantin suggested we use vfs/idmapping.git
         on kernel.org under the vfs/ namespace. So this updates the tree in
         the maintainers file.
      
       - Ensure that POSIX ACLs checking, getting, and setting works correctly
         for filesystems mountable with a filesystem idmapping that want to
         support idmapped mounts.
      
         Since no filesystems mountable with an fs_idmapping do yet support
         idmapped mounts there is no problem. But this could change in the
         future, so add a check to refuse to create idmapped mounts when the
         mounter is not privileged over the mount's idmapping.
      
       - Check that caller is privileged over the idmapping that will be
         attached to a mount.
      
         Currently no FS_USERNS_MOUNT filesystems support idmapped mounts,
         thus this is not a problem as only CAP_SYS_ADMIN in init_user_ns is
         allowed to set up idmapped mounts. But this could change in the
         future, so add a check to refuse to create idmapped mounts when the
         mounter is not privileged over the mount's idmapping.
      
       - Fix POSIX ACLs for ntfs3. While looking at our current POSIX ACL
         handling in the context of some overlayfs work I went through a range
         of other filesystems checking how they handle them currently and
         encountered a few bugs in ntfs3.
      
         I've sent this some time ago and the fixes haven't been picked up
         even though the pull request for other ntfs3 fixes got sent after.
         This should really be fixed as right now POSIX ACLs are broken in
         certain circumstances for ntfs3.
      
      * tag 'fs.idmapped.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
        ntfs: fix acl handling
        fs: require CAP_SYS_ADMIN in target namespace for idmapped mounts
        MAINTAINERS: update idmapping tree
        acl: handle idmapped mounts for idmapped filesystems
      d3cd67d6
    • Linus Torvalds's avatar
      Merge tag 'filelock-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux · b20ee481
      Linus Torvalds authored
      Pull file locking fix from Jeff Layton:
       "Just a single patch for a bugfix in the flock() codepath, introduced
        by a patch that went in recently"
      
      * tag 'filelock-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
        locks: Fix dropped call to ->fl_release_private()
      b20ee481
    • Yang Jihong's avatar
      perf tools: Fix compile error for x86 · cfd2b5c1
      Yang Jihong authored
      Commit a0a12c3e ("asm goto: eradicate CC_HAS_ASM_GOTO") eradicates
      CC_HAS_ASM_GOTO, and in the process also causes the perf tool on x86 to
      use asm_volatile_goto when compiling __GEN_RMWcc.
      
      However, asm_volatile_goto is not declared in the perf tool headers,
      which causes a compilation error:
      
        In file included from tools/arch/x86/include/asm/atomic.h:7,
                         from tools/include/asm/atomic.h:6,
                         from tools/include/linux/atomic.h:5,
                         from tools/include/linux/refcount.h:41,
                         from tools/lib/perf/include/internal/cpumap.h:5,
                         from tools/perf/util/cpumap.h:7,
                         from tools/perf/util/env.h:7,
                         from tools/perf/util/header.h:12,
                         from pmu-events/pmu-events.c:9:
        tools/arch/x86/include/asm/atomic.h: In function ‘atomic_dec_and_test’:
        tools/arch/x86/include/asm/rmwcc.h:7:2: error: implicit declaration of function ‘asm_volatile_goto’ [-Werror=implicit-function-declaration]
          asm_volatile_goto (fullop "; j" cc " %l[cc_label]"  \
          ^~~~~~~~~~~~~~~~~
      
      Define asm_volatile_goto in compiler_types.h if not declared, like the
      main kernel header files do.
      
      Fixes: a0a12c3e ("asm goto: eradicate CC_HAS_ASM_GOTO")
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Tested-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cfd2b5c1
    • Christian Brauner's avatar
      ntfs: fix acl handling · 0c3bc789
      Christian Brauner authored
      While looking at our current POSIX ACL handling in the context of some
      overlayfs work I went through a range of other filesystems checking how they
      handle them currently and encountered ntfs3.
      
      The posic_acl_{from,to}_xattr() helpers always need to operate on the
      filesystem idmapping. Since ntfs3 can only be mounted in the initial user
      namespace the relevant idmapping is init_user_ns.
      
      The posix_acl_{from,to}_xattr() helpers are concerned with translating between
      the kernel internal struct posix_acl{_entry} and the uapi struct
      posix_acl_xattr_{header,entry} and the kernel internal data structure is cached
      filesystem wide.
      
      Additional idmappings such as the caller's idmapping or the mount's idmapping
      are handled higher up in the VFS. Individual filesystems usually do not need to
      concern themselves with these.
      
      The posix_acl_valid() helper is concerned with checking whether the values in
      the kernel internal struct posix_acl can be represented in the filesystem's
      idmapping. IOW, if they can be written to disk. So this helper too needs to
      take the filesystem's idmapping.
      
      Fixes: be71b5cb ("fs/ntfs3: Add attrib operations")
      Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
      Cc: ntfs3@lists.linux.dev
      Signed-off-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      0c3bc789
    • Helge Deller's avatar
      parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines · 591d2108
      Helge Deller authored
      If a 32-bit kernel was compiled for PA2.0 CPUs, it won't be able to run
      on machines with PA1.x CPUs. Add a check and bail out early if a PA1.x
      machine is detected.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      591d2108
    • Li Qiong's avatar
      parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources() · d46c742f
      Li Qiong authored
      As the possible failure of the kmalloc(), it should be better
      to fix this error path, check and return '-ENOMEM' error code.
      Signed-off-by: default avatarLi Qiong <liqiong@nfschina.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      d46c742f
    • Wolfram Sang's avatar
      parisc: led: Move from strlcpy with unused retval to strscpy · 4cb26436
      Wolfram Sang authored
      Follow the advice of the below link and prefer 'strscpy' in this
      subsystem. Conversion is 1:1 because the return value is not used.
      Generated by a coccinelle script.
      
      Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/Signed-off-by: default avatarWolfram Sang <wsa+renesas@sang-engineering.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      4cb26436
    • Jason Wang's avatar
      parisc: ccio-dma: Fix typo in comment · db4538ad
      Jason Wang authored
      The double `was' is duplicated in the comment, remove one.
      Signed-off-by: default avatarJason Wang <wangborong@cdjrlc.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      db4538ad
    • Helge Deller's avatar
      Revert "parisc: Show error if wrong 32/64-bit compiler is being used" · b4b18f47
      Helge Deller authored
      This reverts commit b160628e.
      
      There is no need any longer to have this sanity check, because the
      previous commit ("parisc: Make CONFIG_64BIT available for ARCH=parisc64
      only") prevents that CONFIG_64BIT is set if ARCH==parisc.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      b4b18f47
    • Helge Deller's avatar
      parisc: Make CONFIG_64BIT available for ARCH=parisc64 only · 3dcfb729
      Helge Deller authored
      With this patch the ARCH= parameter decides if the
      CONFIG_64BIT option will be set or not. This means, the
      ARCH= parameter will give:
      
      	ARCH=parisc	-> 32-bit kernel
      	ARCH=parisc64	-> 64-bit kernel
      
      This simplifies the usage of the other config options like
      randconfig, allmodconfig and allyesconfig a lot and produces
      the output which is expected for parisc64 (64-bit) vs. parisc (32-bit).
      Suggested-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Tested-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: <stable@vger.kernel.org> # 5.15+
      3dcfb729
    • Linus Torvalds's avatar
      Linux 6.0-rc2 · 1c23f9e6
      Linus Torvalds authored
      1c23f9e6
  3. 21 Aug, 2022 18 commits
  4. 20 Aug, 2022 6 commits
    • Kuniyuki Iwashima's avatar
      kprobes: don't call disarm_kprobe() for disabled kprobes · 9c80e799
      Kuniyuki Iwashima authored
      The assumption in __disable_kprobe() is wrong, and it could try to disarm
      an already disarmed kprobe and fire the WARN_ONCE() below. [0]  We can
      easily reproduce this issue.
      
      1. Write 0 to /sys/kernel/debug/kprobes/enabled.
      
        # echo 0 > /sys/kernel/debug/kprobes/enabled
      
      2. Run execsnoop.  At this time, one kprobe is disabled.
      
        # /usr/share/bcc/tools/execsnoop &
        [1] 2460
        PCOMM            PID    PPID   RET ARGS
      
        # cat /sys/kernel/debug/kprobes/list
        ffffffff91345650  r  __x64_sys_execve+0x0    [FTRACE]
        ffffffff91345650  k  __x64_sys_execve+0x0    [DISABLED][FTRACE]
      
      3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes
         kprobes_all_disarmed to false but does not arm the disabled kprobe.
      
        # echo 1 > /sys/kernel/debug/kprobes/enabled
      
        # cat /sys/kernel/debug/kprobes/list
        ffffffff91345650  r  __x64_sys_execve+0x0    [FTRACE]
        ffffffff91345650  k  __x64_sys_execve+0x0    [DISABLED][FTRACE]
      
      4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the
         disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace().
      
        # fg
        /usr/share/bcc/tools/execsnoop
        ^C
      
      Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses
      some cleanups and leaves the aggregated kprobe in the hash table.  Then,
      __unregister_trace_kprobe() initialises tk->rp.kp.list and creates an
      infinite loop like this.
      
        aggregated kprobe.list -> kprobe.list -.
                                           ^    |
                                           '.__.'
      
      In this situation, these commands fall into the infinite loop and result
      in RCU stall or soft lockup.
      
        cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the
                                             infinite loop with RCU.
      
        /usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex,
                                         and __get_valid_kprobe() is stuck in
      				   the loop.
      
      To avoid the issue, make sure we don't call disarm_kprobe() for disabled
      kprobes.
      
      [0]
      Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2)
      WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
      Modules linked in: ena
      CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ #28
      Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017
      RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
      Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94
      RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001
      RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff
      RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff
      R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40
      R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000
      FS:  00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
      <TASK>
       __disable_kprobe (kernel/kprobes.c:1716)
       disable_kprobe (kernel/kprobes.c:2392)
       __disable_trace_kprobe (kernel/trace/trace_kprobe.c:340)
       disable_trace_kprobe (kernel/trace/trace_kprobe.c:429)
       perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168)
       perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295)
       _free_event (kernel/events/core.c:4971)
       perf_event_release_kernel (kernel/events/core.c:5176)
       perf_release (kernel/events/core.c:5186)
       __fput (fs/file_table.c:321)
       task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1))
       exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201)
       syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296)
       do_syscall_64 (arch/x86/entry/common.c:87)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
      RIP: 0033:0x7fe7ff210654
      Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc
      RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654
      RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008
      RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30
      R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600
      R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560
      </TASK>
      
      Link: https://lkml.kernel.org/r/20220813020509.90805-1-kuniyu@amazon.com
      Fixes: 69d54b91 ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reported-by: default avatarAyushman Dutta <ayudutta@amazon.com>
      Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
      Cc: Ayushman Dutta <ayudutta@amazon.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9c80e799
    • Hugh Dickins's avatar
      mm/shmem: shmem_replace_page() remember NR_SHMEM · 76d36dea
      Hugh Dickins authored
      Elsewhere, NR_SHMEM is updated at the same time as shmem NR_FILE_PAGES;
      but shmem_replace_page() was forgetting to do that - so NR_SHMEM stats
      could grow too big or too small, in those unusual cases when it's used.
      
      Link: https://lkml.kernel.org/r/cec7c09d-5874-e160-ada6-6e10ee48784@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: "Darrick J. Wong" <djwong@kernel.org>
      Cc: Radoslaw Burny <rburny@google.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      76d36dea
    • Hugh Dickins's avatar
      mm/shmem: tmpfs fallocate use file_modified() · 15f242bb
      Hugh Dickins authored
      5.18 fixed the btrfs and ext4 fallocates to use file_modified(), as xfs
      was already doing, to drop privileges: and fstests generic/{683,684,688}
      expect this.  There's no need to argue over keep-size allocation (which
      could just update ctime): fix shmem_fallocate() to behave the same way.
      
      Link: https://lkml.kernel.org/r/39c5e62-4896-7795-c0a0-f79c50d4909@google.comSigned-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarChristian Brauner (Microsoft) <brauner@kernel.org>
      Cc: "Darrick J. Wong" <djwong@kernel.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Radoslaw Burny <rburny@google.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      15f242bb
    • Hugh Dickins's avatar
      mm/shmem: fix chattr fsflags support in tmpfs · cb241339
      Hugh Dickins authored
      ext[234] have always allowed unimplemented chattr flags to be set, but
      other filesystems have tended to be stricter.  Follow the stricter
      approach for tmpfs: I don't want to have to explain why csu attributes
      don't actually work, and we won't need to update the chattr(1) manpage;
      and it's never wrong to start off strict, relaxing later if persuaded. 
      Allow only a (append only) i (immutable) A (no atime) and d (no dump).
      
      Although lsattr showed 'A' inherited, the NOATIME behavior was not being
      inherited: because nothing sync'ed FS_NOATIME_FL to S_NOATIME.  Add
      shmem_set_inode_flags() to sync the flags, using inode_set_flags() to
      avoid that instant of lost immutablility during fileattr_set().
      
      But that change switched generic/079 from passing to failing: because
      FS_IMMUTABLE_FL and FS_APPEND_FL had been unconventionally included in the
      INHERITED fsflags: remove them and generic/079 is back to passing.
      
      Link: https://lkml.kernel.org/r/2961dcb0-ddf3-b9f0-3268-12a4ff996856@google.com
      Fixes: e408e695 ("mm/shmem: support FS_IOC_[SG]ETFLAGS in tmpfs")
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Radoslaw Burny <rburny@google.com>
      Cc: "Darrick J. Wong" <djwong@kernel.org>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cb241339
    • David Hildenbrand's avatar
      mm/hugetlb: support write-faults in shared mappings · 1d8d1464
      David Hildenbrand authored
      If we ever get a write-fault on a write-protected page in a shared
      mapping, we'd be in trouble (again).  Instead, we can simply map the page
      writable.
      
      And in fact, there is even a way right now to trigger that code via
      uffd-wp ever since we stared to support it for shmem in 5.19:
      
      --------------------------------------------------------------------------
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <fcntl.h>
       #include <unistd.h>
       #include <errno.h>
       #include <sys/mman.h>
       #include <sys/syscall.h>
       #include <sys/ioctl.h>
       #include <linux/userfaultfd.h>
      
       #define HUGETLB_SIZE (2 * 1024 * 1024u)
      
       static char *map;
       int uffd;
      
       static int temp_setup_uffd(void)
       {
       	struct uffdio_api uffdio_api;
       	struct uffdio_register uffdio_register;
       	struct uffdio_writeprotect uffd_writeprotect;
       	struct uffdio_range uffd_range;
      
       	uffd = syscall(__NR_userfaultfd,
       		       O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
       	if (uffd < 0) {
       		fprintf(stderr, "syscall() failed: %d\n", errno);
       		return -errno;
       	}
      
       	uffdio_api.api = UFFD_API;
       	uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
       	if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
       		fprintf(stderr, "UFFDIO_API failed: %d\n", errno);
       		return -errno;
       	}
      
       	if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) {
       		fprintf(stderr, "UFFD_FEATURE_WRITEPROTECT missing\n");
       		return -ENOSYS;
       	}
      
       	/* Register UFFD-WP */
       	uffdio_register.range.start = (unsigned long) map;
       	uffdio_register.range.len = HUGETLB_SIZE;
       	uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
       	if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) {
       		fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno);
       		return -errno;
       	}
      
       	/* Writeprotect a single page. */
       	uffd_writeprotect.range.start = (unsigned long) map;
       	uffd_writeprotect.range.len = HUGETLB_SIZE;
       	uffd_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP;
       	if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) {
       		fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno);
       		return -errno;
       	}
      
       	/* Unregister UFFD-WP without prior writeunprotection. */
       	uffd_range.start = (unsigned long) map;
       	uffd_range.len = HUGETLB_SIZE;
       	if (ioctl(uffd, UFFDIO_UNREGISTER, &uffd_range)) {
       		fprintf(stderr, "UFFDIO_UNREGISTER failed: %d\n", errno);
       		return -errno;
       	}
      
       	return 0;
       }
      
       int main(int argc, char **argv)
       {
       	int fd;
      
       	fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
       	if (!fd) {
       		fprintf(stderr, "open() failed\n");
       		return -errno;
       	}
       	if (ftruncate(fd, HUGETLB_SIZE)) {
       		fprintf(stderr, "ftruncate() failed\n");
       		return -errno;
       	}
      
       	map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
       	if (map == MAP_FAILED) {
       		fprintf(stderr, "mmap() failed\n");
       		return -errno;
       	}
      
       	*map = 0;
      
       	if (temp_setup_uffd())
       		return 1;
      
       	*map = 0;
      
       	return 0;
       }
      --------------------------------------------------------------------------
      
      Above test fails with SIGBUS when there is only a single free hugetlb page.
       # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
       # ./test
       Bus error (core dumped)
      
      And worse, with sufficient free hugetlb pages it will map an anonymous page
      into a shared mapping, for example, messing up accounting during unmap
      and breaking MAP_SHARED semantics:
       # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
       # ./test
       # cat /proc/meminfo | grep HugePages_
       HugePages_Total:       2
       HugePages_Free:        1
       HugePages_Rsvd:    18446744073709551615
       HugePages_Surp:        0
      
      Reason is that uffd-wp doesn't clear the uffd-wp PTE bit when
      unregistering and consequently keeps the PTE writeprotected.  Reason for
      this is to avoid the additional overhead when unregistering.  Note that
      this is the case also for !hugetlb and that we will end up with writable
      PTEs that still have the uffd-wp PTE bit set once we return from
      hugetlb_wp().  I'm not touching the uffd-wp PTE bit for now, because it
      seems to be a generic thing -- wp_page_reuse() also doesn't clear it.
      
      VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE indicates
      that MAP_SHARED handling was at least envisioned, but could never have
      worked as expected.
      
      While at it, make sure that we never end up in hugetlb_wp() on write
      faults without VM_WRITE, because we don't support maybe_mkwrite()
      semantics as commonly used in the !hugetlb case -- for example, in
      wp_page_reuse().
      
      Note that there is no need to do any kind of reservation in
      hugetlb_fault() in this case ...  because we already have a hugetlb page
      mapped R/O that we will simply map writable and we are not dealing with
      COW/unsharing.
      
      Link: https://lkml.kernel.org/r/20220811103435.188481-3-david@redhat.com
      Fixes: b1f9e876 ("mm/uffd: enable write protection for shmem & hugetlbfs")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jamie Liu <jamieliu@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>	[5.19]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      1d8d1464
    • David Hildenbrand's avatar
      mm/hugetlb: fix hugetlb not supporting softdirty tracking · f96f7a40
      David Hildenbrand authored
      Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.
      
      I observed that hugetlb does not support/expect write-faults in shared
      mappings that would have to map the R/O-mapped page writable -- and I
      found two case where we could currently get such faults and would
      erroneously map an anon page into a shared mapping.
      
      Reproducers part of the patches.
      
      I propose to backport both fixes to stable trees.  The first fix needs a
      small adjustment.
      
      
      This patch (of 2):
      
      Staring at hugetlb_wp(), one might wonder where all the logic for shared
      mappings is when stumbling over a write-protected page in a shared
      mapping.  In fact, there is none, and so far we thought we could get away
      with that because e.g., mprotect() should always do the right thing and
      map all pages directly writable.
      
      Looks like we were wrong:
      
      --------------------------------------------------------------------------
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <fcntl.h>
       #include <unistd.h>
       #include <errno.h>
       #include <sys/mman.h>
      
       #define HUGETLB_SIZE (2 * 1024 * 1024u)
      
       static void clear_softdirty(void)
       {
               int fd = open("/proc/self/clear_refs", O_WRONLY);
               const char *ctrl = "4";
               int ret;
      
               if (fd < 0) {
                       fprintf(stderr, "open(clear_refs) failed\n");
                       exit(1);
               }
               ret = write(fd, ctrl, strlen(ctrl));
               if (ret != strlen(ctrl)) {
                       fprintf(stderr, "write(clear_refs) failed\n");
                       exit(1);
               }
               close(fd);
       }
      
       int main(int argc, char **argv)
       {
               char *map;
               int fd;
      
               fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
               if (!fd) {
                       fprintf(stderr, "open() failed\n");
                       return -errno;
               }
               if (ftruncate(fd, HUGETLB_SIZE)) {
                       fprintf(stderr, "ftruncate() failed\n");
                       return -errno;
               }
      
               map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
               if (map == MAP_FAILED) {
                       fprintf(stderr, "mmap() failed\n");
                       return -errno;
               }
      
               *map = 0;
      
               if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                       fprintf(stderr, "mmprotect() failed\n");
                       return -errno;
               }
      
               clear_softdirty();
      
               if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                       fprintf(stderr, "mmprotect() failed\n");
                       return -errno;
               }
      
               *map = 0;
      
               return 0;
       }
      --------------------------------------------------------------------------
      
      Above test fails with SIGBUS when there is only a single free hugetlb page.
       # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
       # ./test
       Bus error (core dumped)
      
      And worse, with sufficient free hugetlb pages it will map an anonymous page
      into a shared mapping, for example, messing up accounting during unmap
      and breaking MAP_SHARED semantics:
       # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
       # ./test
       # cat /proc/meminfo | grep HugePages_
       HugePages_Total:       2
       HugePages_Free:        1
       HugePages_Rsvd:    18446744073709551615
       HugePages_Surp:        0
      
      Reason in this particular case is that vma_wants_writenotify() will
      return "true", removing VM_SHARED in vma_set_page_prot() to map pages
      write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
      support softdirty tracking.
      
      Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com
      Fixes: 64e45507 ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Jamie Liu <jamieliu@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.18+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f96f7a40