1. 27 Feb, 2024 10 commits
    • Guenter Roeck's avatar
      parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds · 0568b6f0
      Guenter Roeck authored
      IPv6 checksum tests with unaligned addresses on 64-bit builds result
      in unexpected failures.
      
      Expected expected == csum_result, but
          expected == 46591 (0xb5ff)
          csum_result == 46381 (0xb52d)
      with alignment offset 1
      
      Oddly enough, the problem disappeared after adding test code into
      the beginning of csum_ipv6_magic().
      
      As it turns out, the 'sum' parameter of csum_ipv6_magic() is declared as
      __wsum, which is a 32-bit variable. However, it is treated as 64-bit
      variable in the 64-bit assembler code. Tests showed that the upper 32 bit
      of the register used to pass the variable are _not_ cleared when entering
      the function. This can result in checksum calculation errors.
      
      Clearing the upper 32 bit of 'sum' as first operation in the assembler
      code fixes the problem.
      Acked-by: default avatarHelge Deller <deller@gmx.de>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      0568b6f0
    • Guenter Roeck's avatar
      parisc: Fix csum_ipv6_magic on 64-bit systems · 4b75b12d
      Guenter Roeck authored
      hppa 64-bit systems calculates the IPv6 checksum using 64-bit add
      operations. The last add folds protocol and length fields into the 64-bit
      result. While unlikely, this operation can overflow. The overflow can be
      triggered with a code sequence such as the following.
      
      	/* try to trigger massive overflows */
      	memset(tmp_buf, 0xff, sizeof(struct in6_addr));
      	csum_result = csum_ipv6_magic((struct in6_addr *)tmp_buf,
      				      (struct in6_addr *)tmp_buf,
      				      0xffff, 0xff, 0xffffffff);
      
      Fix the problem by adding any overflows from the final add operation into
      the calculated checksum. Fortunately, we can do this without additional
      cost by replacing the add operation used to fold the checksum into 32 bit
      with "add,dc" to add in the missing carry.
      
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      4b75b12d
    • Guenter Roeck's avatar
      parisc: Fix csum_ipv6_magic on 32-bit systems · 4408ba75
      Guenter Roeck authored
      Calculating the IPv6 checksum on 32-bit systems missed overflows when
      adding the proto+len fields into the checksum. This results in the
      following unit test failure.
      
          # test_csum_ipv6_magic: ASSERTION FAILED at lib/checksum_kunit.c:506
          Expected ( u64)csum_result == ( u64)expected, but
              ( u64)csum_result == 46722 (0xb682)
              ( u64)expected == 46721 (0xb681)
          not ok 5 test_csum_ipv6_magic
      
      This is probably rarely seen in the real world because proto+len are
      usually small values which will rarely result in overflows when calculating
      the checksum. However, the unit test code uses large values for the length
      field, causing the test to fail.
      
      Fix the problem by adding the missing carry into the final checksum.
      
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Reviewed-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      4408ba75
    • Guenter Roeck's avatar
      parisc: Fix ip_fast_csum · a2abae8f
      Guenter Roeck authored
      IP checksum unit tests report the following error when run on hppa/hppa64.
      
          # test_ip_fast_csum: ASSERTION FAILED at lib/checksum_kunit.c:463
          Expected ( u64)csum_result == ( u64)expected, but
              ( u64)csum_result == 33754 (0x83da)
              ( u64)expected == 10946 (0x2ac2)
          not ok 4 test_ip_fast_csum
      
      0x83da is the expected result if the IP header length is 20 bytes. 0x2ac2
      is the expected result if the IP header length is 24 bytes. The test fails
      with an IP header length of 24 bytes. It appears that ip_fast_csum()
      always returns the checksum for a 20-byte header, no matter how long
      the header actually is.
      
      Code analysis shows a suspicious assembler sequence in ip_fast_csum().
      
       "      addc            %0, %3, %0\n"
       "1:    ldws,ma         4(%1), %3\n"
       "      addib,<         0, %2, 1b\n"	<---
      
      While my understanding of HPPA assembler is limited, it does not seem
      to make much sense to subtract 0 from a register and to expect the result
      to ever be negative. Subtracting 1 from the length parameter makes more
      sense. On top of that, the operation should be repeated if and only if
      the result is still > 0, so change the suspicious instruction to
       "      addib,>         -1, %2, 1b\n"
      
      The IP checksum unit test passes after this change.
      
      Cc: Palmer Dabbelt <palmer@rivosinc.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Reviewed-by: default avatarCharlie Jenkins <charlie@rivosinc.com>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      a2abae8f
    • John David Anglin's avatar
      parisc: Avoid clobbering the C/B bits in the PSW with tophys and tovirt macros · 4603fbaa
      John David Anglin authored
      Use add,l to avoid clobbering the C/B bits in the PSW.
      Signed-off-by: default avatarJohn David Anglin <dave.anglin@bell.net>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # v5.10+
      4603fbaa
    • Guenter Roeck's avatar
      parisc/unaligned: Rewrite 64-bit inline assembly of emulate_ldd() · e5db6a74
      Guenter Roeck authored
      Convert to use real temp variables instead of clobbering processor
      registers. This aligns the 64-bit inline assembly code with the 32-bit
      assembly code which was rewritten with commit 427c1073
      ("parisc/unaligned: Rewrite 32-bit inline assembly of emulate_ldd()").
      
      While at it, fix comment in 32-bit rewrite code. Temporary variables are
      now used for both 32-bit and 64-bit code, so move their declarations
      to the function header.
      
      No functional change intended.
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: stable@vger.kernel.org # v6.0+
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      e5db6a74
    • Ricardo B. Marliere's avatar
      parisc: make parisc_bus_type const · 0b9ec151
      Ricardo B. Marliere authored
      Since commit d492cc25 ("driver core: device.h: make struct
      bus_type a const *"), the driver core can properly handle constant
      struct bus_type, move the parisc_bus_type variable to be a constant
      structure as well, placing it into read-only memory which can not be
      modified at runtime.
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Suggested-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarRicardo B. Marliere <ricardo@marliere.net>
      Reviewed-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      0b9ec151
    • Arnd Bergmann's avatar
      parisc: avoid c23 'nullptr' idenitifier · cf159848
      Arnd Bergmann authored
      Starting in c23, this is a reserved keyword, so in the future, using it
      will start causing build failures:
      
      arch/parisc/math-emu/frnd.c:36:23: error: expected ';', ',' or ')' before 'nullptr'
      
      Since I can't think of a good replacement name, add a leading underscore
      to the function argument to avoid this namespace conflict. Apparently
      all of these arguments are unused.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      cf159848
    • Helge Deller's avatar
      parisc: Show kernel unaligned memory accesses · 94a1b192
      Helge Deller authored
      Warn if some kernel function triggers unaligned memory accesses.
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      94a1b192
    • Helge Deller's avatar
      parisc: Use irq_enter_rcu() to fix warning at kernel/context_tracking.c:367 · 73cb4a2d
      Helge Deller authored
      Use irq*_rcu() functions to fix this kernel warning:
      
       WARNING: CPU: 0 PID: 0 at kernel/context_tracking.c:367 ct_irq_enter+0xa0/0xd0
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.7.0-rc3-64bit+ #1037
       Hardware name: 9000/785/C3700
      
       IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000412cd758 00000000412cd75c
        IIR: 03ffe01f    ISR: 0000000000000000  IOR: 0000000043c20c20
        CPU:        0   CR30: 0000000041caa000 CR31: 0000000000000000
        ORIG_R28: 0000000000000005
        IAOQ[0]: ct_irq_enter+0xa0/0xd0
        IAOQ[1]: ct_irq_enter+0xa4/0xd0
        RP(r2): irq_enter+0x34/0x68
       Backtrace:
        [<000000004034a3ec>] irq_enter+0x34/0x68
        [<000000004030dc48>] do_cpu_irq_mask+0xc0/0x450
        [<0000000040303070>] intr_return+0x0/0xc
      Signed-off-by: default avatarHelge Deller <deller@gmx.de>
      73cb4a2d
  2. 25 Feb, 2024 30 commits
    • Linus Torvalds's avatar
      Linux 6.8-rc6 · d206a76d
      Linus Torvalds authored
      d206a76d
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs · e231dbd4
      Linus Torvalds authored
      Pull bcachefs fixes from Kent Overstreet:
       "Some more mostly boring fixes, but some not
      
        User reported ones:
      
         - the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty
           performance bug; user reported an untar initially taking two
           seconds and then ~2 minutes
      
         - kill a __GFP_NOFAIL in the buffered read path; this was a leftover
           from the trickier fix to kill __GFP_NOFAIL in readahead, where we
           can't return errors (and have to silently truncate the read
           ourselves).
      
           bcachefs can't use GFP_NOFAIL for folio state unlike iomap based
           filesystems because our folio state is just barely too big, 2MB
           hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL.
      
           additionally, the flags argument was just buggy, we weren't
           supplying GFP_KERNEL previously (!)"
      
      * tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: fix bch2_save_backtrace()
        bcachefs: Fix check_snapshot() memcpy
        bcachefs: Fix bch2_journal_flush_device_pins()
        bcachefs: fix iov_iter count underflow on sub-block dio read
        bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
        bcachefs: Kill __GFP_NOFAIL in buffered read path
        bcachefs: fix backpointer_to_text() when dev does not exist
      e231dbd4
    • Kent Overstreet's avatar
      bcachefs: fix bch2_save_backtrace() · 5197728f
      Kent Overstreet authored
      Missed a call in the previous fix.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      5197728f
    • Linus Torvalds's avatar
      Merge tag 'docs-6.8-fixes3' of git://git.lwn.net/linux · 70ff1fe6
      Linus Torvalds authored
      Pull two documentation build fixes from Jonathan Corbet:
      
       - The XFS online fsck documentation uses incredibly deeply nested
         subsection and list nesting; that broke the PDF docs build. Tweak a
         parameter to tell LaTeX to allow the deeper nesting.
      
       - Fix a 6.8 PDF-build regression
      
      * tag 'docs-6.8-fixes3' of git://git.lwn.net/linux:
        docs: translations: use attribute to store current language
        docs: Instruct LaTeX to cope with deeper nesting
      70ff1fe6
    • Linus Torvalds's avatar
      Merge tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c46ac50e
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 6.8-rc6 to resolve some reported
        problems. These include:
      
         - regression fixes with typec tpcm code as reported by many
      
         - cdnsp and cdns3 driver fixes
      
         - usb role setting code bugfixes
      
         - build fix for uhci driver
      
         - ncm gadget driver bugfix
      
         - MAINTAINERS entry update
      
        All of these have been in linux-next all week with no reported issues
        and there is at least one fix in here that is in Thorsten's regression
        list that is being tracked"
      
      * tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: tpcm: Fix issues with power being removed during reset
        MAINTAINERS: Drop myself as maintainer of TYPEC port controller drivers
        usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
        Revert "usb: typec: tcpm: reset counter when enter into unattached state after try role"
        usb: gadget: omap_udc: fix USB gadget regression on Palm TE
        usb: dwc3: gadget: Don't disconnect if not started
        usb: cdns3: fix memory double free when handle zero packet
        usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable()
        usb: roles: don't get/set_role() when usb_role_switch is unregistered
        usb: roles: fix NULL pointer issue when put module's reference
        usb: cdnsp: fixed issue with incorrect detecting CDNSP family controllers
        usb: cdnsp: blocked some cdns3 specific code
        usb: uhci-grlib: Explicitly include linux/platform_device.h
      c46ac50e
    • Linus Torvalds's avatar
      Merge tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 1e592e95
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are three small serial/tty driver fixes for 6.8-rc6 that resolve
        the following reported errors:
      
         - riscv hvc console driver fix that was reported by many
      
         - amba-pl011 serial driver fix for RS485 mode
      
         - stm32 serial driver fix for RS485 mode
      
        All of these have been in linux-next all week with no reported
        problems"
      
      * tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: amba-pl011: Fix DMA transmission in RS485 mode
        serial: stm32: do not always set SER_RS485_RX_DURING_TX if RS485 is enabled
        tty: hvc: Don't enable the RISC-V SBI console by default
      1e592e95
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1eee4ef3
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - Make sure clearing CPU buffers using VERW happens at the latest
         possible point in the return-to-userspace path, otherwise memory
         accesses after the VERW execution could cause data to land in CPU
         buffers again
      
      * tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        KVM/VMX: Move VERW closer to VMentry for MDS mitigation
        KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
        x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
        x86/entry_32: Add VERW just before userspace transition
        x86/entry_64: Add VERW just before userspace transition
        x86/bugs: Add asm helpers for executing VERW
      1eee4ef3
    • Linus Torvalds's avatar
      Merge tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8c46ed37
      Linus Torvalds authored
      Pull irq fixes from Borislav Petkov:
      
       - Make sure GICv4 always gets initialized to prevent a kexec-ed kernel
         from silently failing to set it up
      
       - Do not call bus_get_dev_root() for the mbigen irqchip as it always
         returns NULL - use NULL directly
      
       - Fix hardware interrupt number truncation when assigning MSI
         interrupts
      
       - Correct sending end-of-interrupt messages to disabled interrupts
         lines on RISC-V PLIC
      
      * tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/gic-v3-its: Do not assume vPE tables are preallocated
        irqchip/mbigen: Don't use bus_get_dev_root() to find the parent
        PCI/MSI: Prevent MSI hardware interrupt number truncation
        irqchip/sifive-plic: Enable interrupt if needed before EOI
      8c46ed37
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 4ca0d989
      Linus Torvalds authored
      Pull erofs fix from Gao Xiang:
      
       - Fix page refcount leak when looking up specific inodes
         introduced by metabuf reworking
      
      * tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: fix refcount on the metabuf used for inode lookup
      4ca0d989
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 66a97c2e
      Linus Torvalds authored
      Pull RCU pathwalk fixes from Al Viro:
       "We still have some races in filesystem methods when exposed to RCU
        pathwalk. This series is a result of code audit (the second round of
        it) and it should deal with most of that stuff.
      
        Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
        Up to maintainers (a note for NTFS folks - when documentation says
        that a method may not block, it *does* imply that blocking allocations
        are to be avoided. Really)"
      
      [ More explanations for people who aren't familiar with the vagaries of
        RCU path walking: most of it is hidden from filesystems, but if a
        filesystem actively participates in the low-level path walking it
        needs to make sure the fields involved in that walk are RCU-safe.
      
        That "actively participate in low-level path walking" includes things
        like having its own ->d_hash()/->d_compare() routines, or by having
        its own directory permission function that doesn't just use the common
        helpers.  Having a ->d_revalidate() function will also have this issue.
      
        Note that instead of making everything RCU safe you can also choose to
        abort the RCU pathwalk if your operation cannot be done safely under
        RCU, but that obviously comes with a performance penalty. One common
        pattern is to allow the simple cases under RCU, and abort only if you
        need to do something more complicated.
      
        So not everything needs to be RCU-safe, and things like the inode etc
        that the VFS itself maintains obviously already are. But these fixes
        tend to be about properly RCU-delaying things like ->s_fs_info that
        are maintained by the filesystem and that got potentially released too
        early.   - Linus ]
      
      * tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        ext4_get_link(): fix breakage in RCU mode
        cifs_get_link(): bail out in unsafe case
        fuse: fix UAF in rcu pathwalks
        procfs: make freeing proc_fs_info rcu-delayed
        procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
        nfs: fix UAF on pathwalk running into umount
        nfs: make nfs_set_verifier() safe for use in RCU pathwalk
        afs: fix __afs_break_callback() / afs_drop_open_mmap() race
        hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
        exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
        affs: free affs_sb_info with kfree_rcu()
        rcu pathwalk: prevent bogus hard errors from may_lookup()
        fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
      66a97c2e
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 9b243492
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "A couple of fixes - revert of regression from this cycle and a fix for
        erofs failure exit breakage (had been there since way back)"
      
      * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        erofs: fix handling kern_mount() failure
        Revert "get rid of DCACHE_GENOCIDE"
      9b243492
    • Al Viro's avatar
      ext4_get_link(): fix breakage in RCU mode · 9fa8e282
      Al Viro authored
      1) errors from ext4_getblk() should not be propagated to caller
      unless we are really sure that we would've gotten the same error
      in non-RCU pathwalk.
      2) we leak buffer_heads if ext4_getblk() is successful, but bh is
      not uptodate.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9fa8e282
    • Al Viro's avatar
      cifs_get_link(): bail out in unsafe case · 0511fdb4
      Al Viro authored
      ->d_revalidate() bails out there, anyway.  It's not enough
      to prevent getting into ->get_link() in RCU mode, but that
      could happen only in a very contrieved setup.  Not worth
      trying to do anything fancy here unless ->d_revalidate()
      stops kicking out of RCU mode at least in some cases.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Acked-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0511fdb4
    • Al Viro's avatar
      fuse: fix UAF in rcu pathwalks · 053fc4f7
      Al Viro authored
      ->permission(), ->get_link() and ->inode_get_acl() might dereference
      ->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
      as well) when called from rcu pathwalk.
      
      Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
      and dropping ->user_ns rcu-delayed too.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      053fc4f7
    • Al Viro's avatar
      procfs: make freeing proc_fs_info rcu-delayed · e31f0a57
      Al Viro authored
      makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns()
      is still synchronous, but that's not a problem - it does
      rcu-delay everything that needs to be)
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e31f0a57
    • Al Viro's avatar
      procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() · 47458802
      Al Viro authored
      that keeps both around until struct inode is freed, making access
      to them safe from rcu-pathwalk
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      47458802
    • Al Viro's avatar
      nfs: fix UAF on pathwalk running into umount · c1b967d0
      Al Viro authored
      NFS ->d_revalidate(), ->permission() and ->get_link() need to access
      some parts of nfs_server when called in RCU mode:
      	server->flags
      	server->caps
      	*(server->io_stats)
      and, worst of all, call
      	server->nfs_client->rpc_ops->have_delegation
      (the last one - as NFS_PROTO(inode)->have_delegation()).  We really
      don't want to RCU-delay the entire nfs_free_server() (it would have
      to be done with schedule_work() from RCU callback, since it can't
      be made to run from interrupt context), but actual freeing of
      nfs_server and ->io_stats can be done via call_rcu() just fine.
      nfs_client part is handled simply by making nfs_free_client() use
      kfree_rcu().
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c1b967d0
    • Al Viro's avatar
      nfs: make nfs_set_verifier() safe for use in RCU pathwalk · 10a973fc
      Al Viro authored
      nfs_set_verifier() relies upon dentry being pinned; if that's
      the case, grabbing ->d_lock stabilizes ->d_parent and guarantees
      that ->d_parent points to a positive dentry.  For something
      we'd run into in RCU mode that is *not* true - dentry might've
      been through dentry_kill() just as we grabbed ->d_lock, with
      its parent going through the same just as we get to into
      nfs_set_verifier_locked().  It might get to detaching inode
      (and zeroing ->d_inode) before nfs_set_verifier_locked() gets
      to fetching that; we get an oops as the result.
      
      That can happen in nfs{,4} ->d_revalidate(); the call chain in
      question is nfs_set_verifier_locked() <- nfs_set_verifier() <-
      nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate().
      We have checked that the parent had been positive, but that's
      done before we get to nfs_set_verifier() and it's possible for
      memory pressure to pick our dentry as eviction candidate by that
      time.  If that happens, back-to-back attempts to kill dentry and
      its parent are quite normal.  Sure, in case of eviction we'll
      fail the ->d_seq check in the caller, but we need to survive
      until we return there...
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      10a973fc
    • Al Viro's avatar
      afs: fix __afs_break_callback() / afs_drop_open_mmap() race · 275655d3
      Al Viro authored
      In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero
      do queue_work(&vnode->cb_work).  In afs_drop_open_mmap() we decrement
      ->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero.
      
      The trouble is, there's nothing to prevent __afs_break_callback() from
      seeing ->cb_nr_mmap before the decrement and do queue_work() after both
      the decrement and flush_work().  If that happens, we might be in trouble -
      vnode might get freed before the queued work runs.
      
      __afs_break_callback() is always done under ->cb_lock, so let's make
      sure that ->cb_nr_mmap can change from non-zero to zero while holding
      ->cb_lock (the spinlock component of it - it's a seqlock and we don't
      need to mess with the counter).
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      275655d3
    • Al Viro's avatar
      hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info · af072cf6
      Al Viro authored
      ->d_hash() and ->d_compare() use those, so we need to delay freeing
      them.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      af072cf6
    • Al Viro's avatar
      exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper · a13d1a4d
      Al Viro authored
      That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have
      a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem
      that is in process of getting shut down.
      
      Besides, having nls and upcase table cleanup moved from ->put_super() towards
      the place where sbi is freed makes for simpler failure exits.
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a13d1a4d
    • Al Viro's avatar
      affs: free affs_sb_info with kfree_rcu() · 529f89a9
      Al Viro authored
      one of the flags in it is used by ->d_hash()/->d_compare()
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      529f89a9
    • Al Viro's avatar
      rcu pathwalk: prevent bogus hard errors from may_lookup() · cdb67fde
      Al Viro authored
      If lazy call of ->permission() returns a hard error, check that
      try_to_unlazy() succeeds before returning it.  That both makes
      life easier for ->permission() instances and closes the race
      in ENOTDIR handling - it is possible that positive d_can_lookup()
      seen in link_path_walk() applies to the state *after* unlink() +
      mkdir(), while nd->inode matches the state prior to that.
      
      Normally seeing e.g. EACCES from permission check in rcu pathwalk
      means that with some timings non-rcu pathwalk would've run into
      the same; however, running into a non-executable regular file
      in the middle of a pathname would not get to permission check -
      it would fail with ENOTDIR instead.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cdb67fde
    • Al Viro's avatar
      fs/super.c: don't drop ->s_user_ns until we free struct super_block itself · 583340de
      Al Viro authored
      Avoids fun races in RCU pathwalk...  Same goes for freeing LSM shite
      hanging off super_block's arse.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      583340de
    • Kent Overstreet's avatar
      bcachefs: Fix check_snapshot() memcpy · c4333eb5
      Kent Overstreet authored
      check_snapshot() copies the bch_snapshot to a temporary to easily handle
      older versions that don't have all the fields of the current version,
      but it lacked a min() to correctly handle keys newer and larger than the
      current version.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      c4333eb5
    • Kent Overstreet's avatar
      bcachefs: Fix bch2_journal_flush_device_pins() · 097471f9
      Kent Overstreet authored
      If a journal write errored, the list of devices it was written to could
      be empty - we're not supposed to mark an empty replicas list.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      097471f9
    • Brian Foster's avatar
      bcachefs: fix iov_iter count underflow on sub-block dio read · b58b1b88
      Brian Foster authored
      bch2_direct_IO_read() checks the request offset and size for sector
      alignment and then falls through to a couple calculations to shrink
      the size of the request based on the inode size. The problem is that
      these checks round up to the fs block size, which runs the risk of
      underflowing iter->count if the block size happens to be large
      enough. This is triggered by fstest generic/361 with a 4k block
      size, which subsequently leads to a crash. To avoid this crash,
      check that the shorten length doesn't exceed the overall length of
      the iter.
      
      Fixes:
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarSu Yue <glass.su@suse.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      b58b1b88
    • Kent Overstreet's avatar
      bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree · 204f4514
      Kent Overstreet authored
      If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the
      keyspace where no keys are visible in the current snapshot, we have a
      problem - we'll scan for a very long time before scanning terminates.
      
      Awhile back, this was fixed for most cases with peek_upto() (and
      assertions that enforce that it's being used).
      
      But the fix missed the fact that the inodes btree is different - every
      key offset is in a different snapshot tree, not just the inode field.
      
      Fixes:
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      204f4514
    • Kent Overstreet's avatar
      bcachefs: Kill __GFP_NOFAIL in buffered read path · 04fee68d
      Kent Overstreet authored
      Recently, we fixed our __GFP_NOFAIL usage in the readahead path, but the
      easy one in read_single_folio() (where wa can return an error) was
      missed - oops.
      
      Fixes:
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      04fee68d
    • Kent Overstreet's avatar