1. 09 Mar, 2024 6 commits
    • Josef Bacik's avatar
      nfs: make the rpc_stat per net namespace · 1548036e
      Josef Bacik authored
      Now that we're exposing the rpc stats on a per-network namespace basis,
      move this struct into struct nfs_net and use that to make sure only the
      per-network namespace stats are exposed.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      1548036e
    • Josef Bacik's avatar
      nfs: expose /proc/net/sunrpc/nfs in net namespaces · d47151b7
      Josef Bacik authored
      We're using nfs mounts inside of containers in production and noticed
      that the nfs stats are not exposed in /proc.  This is a problem for us
      as we use these stats for monitoring, and have to do this awkward bind
      mount from the main host into the container in order to get to these
      states.
      
      Add the rpc_proc_register call to the pernet operations entry and exit
      points so these stats can be exposed inside of network namespaces.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      d47151b7
    • Josef Bacik's avatar
      sunrpc: add a struct rpc_stats arg to rpc_create_args · 2057a48d
      Josef Bacik authored
      We want to be able to have our rpc stats handled in a per network
      namespace manner, so add an option to rpc_create_args to specify a
      different rpc_stats struct instead of using the one on the rpc_program.
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      2057a48d
    • Jeff Layton's avatar
      nfs: remove unused NFS_CALL macro · edc99a2d
      Jeff Layton authored
      Nothing uses this, and thank goodness, as the syntax looks horrid.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      edc99a2d
    • Olga Kornievskaia's avatar
      NFSv4.1: add tracepoint to trunked nfs4_exchange_id calls · 7e5ae43b
      Olga Kornievskaia authored
      Add a tracepoint to track when the client sends EXCHANGE_ID to test
      a new transport for session trunking.
      
      nfs4_detect_session_trunking() tests for trunking and returns
      EINVAL if trunking can't be done, add EINVAL mapping to
      show_nfs4_status() in tracepoints.
      Signed-off-by: default avatarOlga Kornievskaia <kolga@netapp.com>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      7e5ae43b
    • Dave Wysochanski's avatar
      NFS: Fix nfs_netfs_issue_read() xarray locking for writeback interrupt · fd5860ab
      Dave Wysochanski authored
      The loop inside nfs_netfs_issue_read() currently does not disable
      interrupts while iterating through pages in the xarray to submit
      for NFS read.  This is not safe though since after taking xa_lock,
      another page in the mapping could be processed for writeback inside
      an interrupt, and deadlock can occur.  The fix is simple and clean
      if we use xa_for_each_range(), which handles the iteration with RCU
      while reducing code complexity.
      
      The problem is easily reproduced with the following test:
       mount -o vers=3,fsc 127.0.0.1:/export /mnt/nfs
       dd if=/dev/zero of=/mnt/nfs/file1.bin bs=4096 count=1
       echo 3 > /proc/sys/vm/drop_caches
       dd if=/mnt/nfs/file1.bin of=/dev/null
       umount /mnt/nfs
      
      On the console with a lockdep-enabled kernel a message similar to
      the following will be seen:
      
       ================================
       WARNING: inconsistent lock state
       6.7.0-lockdbg+ #10 Not tainted
       --------------------------------
       inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
       test5/1708 [HC0[0]:SC0[0]:HE1:SE1] takes:
       ffff888127baa598 (&xa->xa_lock#4){+.?.}-{3:3}, at:
      nfs_netfs_issue_read+0x1b2/0x4b0 [nfs]
       {IN-SOFTIRQ-W} state was registered at:
         lock_acquire+0x144/0x380
         _raw_spin_lock_irqsave+0x4e/0xa0
         __folio_end_writeback+0x17e/0x5c0
         folio_end_writeback+0x93/0x1b0
         iomap_finish_ioend+0xeb/0x6a0
         blk_update_request+0x204/0x7f0
         blk_mq_end_request+0x30/0x1c0
         blk_complete_reqs+0x7e/0xa0
         __do_softirq+0x113/0x544
         __irq_exit_rcu+0xfe/0x120
         irq_exit_rcu+0xe/0x20
         sysvec_call_function_single+0x6f/0x90
         asm_sysvec_call_function_single+0x1a/0x20
         pv_native_safe_halt+0xf/0x20
         default_idle+0x9/0x20
         default_idle_call+0x67/0xa0
         do_idle+0x2b5/0x300
         cpu_startup_entry+0x34/0x40
         start_secondary+0x19d/0x1c0
         secondary_startup_64_no_verify+0x18f/0x19b
       irq event stamp: 176891
       hardirqs last  enabled at (176891): [<ffffffffa67a0be4>]
      _raw_spin_unlock_irqrestore+0x44/0x60
       hardirqs last disabled at (176890): [<ffffffffa67a0899>]
      _raw_spin_lock_irqsave+0x79/0xa0
       softirqs last  enabled at (176646): [<ffffffffa515d91e>]
      __irq_exit_rcu+0xfe/0x120
       softirqs last disabled at (176633): [<ffffffffa515d91e>]
      __irq_exit_rcu+0xfe/0x120
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&xa->xa_lock#4);
         <Interrupt>
           lock(&xa->xa_lock#4);
      
        *** DEADLOCK ***
      
       2 locks held by test5/1708:
        #0: ffff888127baa498 (&sb->s_type->i_mutex_key#22){++++}-{4:4}, at:
            nfs_start_io_read+0x28/0x90 [nfs]
        #1: ffff888127baa650 (mapping.invalidate_lock#3){.+.+}-{4:4}, at:
            page_cache_ra_unbounded+0xa4/0x280
      
       stack backtrace:
       CPU: 6 PID: 1708 Comm: test5 Kdump: loaded Not tainted 6.7.0-lockdbg+
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39
      04/01/2014
       Call Trace:
        dump_stack_lvl+0x5b/0x90
        mark_lock+0xb3f/0xd20
        __lock_acquire+0x77b/0x3360
        _raw_spin_lock+0x34/0x80
        nfs_netfs_issue_read+0x1b2/0x4b0 [nfs]
        netfs_begin_read+0x77f/0x980 [netfs]
        nfs_netfs_readahead+0x45/0x60 [nfs]
        nfs_readahead+0x323/0x5a0 [nfs]
        read_pages+0xf3/0x5c0
        page_cache_ra_unbounded+0x1c8/0x280
        filemap_get_pages+0x38c/0xae0
        filemap_read+0x206/0x5e0
        nfs_file_read+0xb7/0x140 [nfs]
        vfs_read+0x2a9/0x460
        ksys_read+0xb7/0x140
      
      Fixes: 000dbe0b ("NFS: Convert buffered read paths to use netfs when fscache is enabled")
      Suggested-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarDave Wysochanski <dwysocha@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      fd5860ab
  2. 28 Feb, 2024 11 commits
  3. 25 Feb, 2024 23 commits
    • Linus Torvalds's avatar
      Linux 6.8-rc6 · d206a76d
      Linus Torvalds authored
      d206a76d
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs · e231dbd4
      Linus Torvalds authored
      Pull bcachefs fixes from Kent Overstreet:
       "Some more mostly boring fixes, but some not
      
        User reported ones:
      
         - the BTREE_ITER_FILTER_SNAPSHOTS one fixes a really nasty
           performance bug; user reported an untar initially taking two
           seconds and then ~2 minutes
      
         - kill a __GFP_NOFAIL in the buffered read path; this was a leftover
           from the trickier fix to kill __GFP_NOFAIL in readahead, where we
           can't return errors (and have to silently truncate the read
           ourselves).
      
           bcachefs can't use GFP_NOFAIL for folio state unlike iomap based
           filesystems because our folio state is just barely too big, 2MB
           hugepages cause us to exceed the 2 page threshhold for GFP_NOFAIL.
      
           additionally, the flags argument was just buggy, we weren't
           supplying GFP_KERNEL previously (!)"
      
      * tag 'bcachefs-2024-02-25' of https://evilpiepirate.org/git/bcachefs:
        bcachefs: fix bch2_save_backtrace()
        bcachefs: Fix check_snapshot() memcpy
        bcachefs: Fix bch2_journal_flush_device_pins()
        bcachefs: fix iov_iter count underflow on sub-block dio read
        bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
        bcachefs: Kill __GFP_NOFAIL in buffered read path
        bcachefs: fix backpointer_to_text() when dev does not exist
      e231dbd4
    • Kent Overstreet's avatar
      bcachefs: fix bch2_save_backtrace() · 5197728f
      Kent Overstreet authored
      Missed a call in the previous fix.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      5197728f
    • Linus Torvalds's avatar
      Merge tag 'docs-6.8-fixes3' of git://git.lwn.net/linux · 70ff1fe6
      Linus Torvalds authored
      Pull two documentation build fixes from Jonathan Corbet:
      
       - The XFS online fsck documentation uses incredibly deeply nested
         subsection and list nesting; that broke the PDF docs build. Tweak a
         parameter to tell LaTeX to allow the deeper nesting.
      
       - Fix a 6.8 PDF-build regression
      
      * tag 'docs-6.8-fixes3' of git://git.lwn.net/linux:
        docs: translations: use attribute to store current language
        docs: Instruct LaTeX to cope with deeper nesting
      70ff1fe6
    • Linus Torvalds's avatar
      Merge tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · c46ac50e
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 6.8-rc6 to resolve some reported
        problems. These include:
      
         - regression fixes with typec tpcm code as reported by many
      
         - cdnsp and cdns3 driver fixes
      
         - usb role setting code bugfixes
      
         - build fix for uhci driver
      
         - ncm gadget driver bugfix
      
         - MAINTAINERS entry update
      
        All of these have been in linux-next all week with no reported issues
        and there is at least one fix in here that is in Thorsten's regression
        list that is being tracked"
      
      * tag 'usb-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: tpcm: Fix issues with power being removed during reset
        MAINTAINERS: Drop myself as maintainer of TYPEC port controller drivers
        usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs
        Revert "usb: typec: tcpm: reset counter when enter into unattached state after try role"
        usb: gadget: omap_udc: fix USB gadget regression on Palm TE
        usb: dwc3: gadget: Don't disconnect if not started
        usb: cdns3: fix memory double free when handle zero packet
        usb: cdns3: fixed memory use after free at cdns3_gadget_ep_disable()
        usb: roles: don't get/set_role() when usb_role_switch is unregistered
        usb: roles: fix NULL pointer issue when put module's reference
        usb: cdnsp: fixed issue with incorrect detecting CDNSP family controllers
        usb: cdnsp: blocked some cdns3 specific code
        usb: uhci-grlib: Explicitly include linux/platform_device.h
      c46ac50e
    • Linus Torvalds's avatar
      Merge tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · 1e592e95
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are three small serial/tty driver fixes for 6.8-rc6 that resolve
        the following reported errors:
      
         - riscv hvc console driver fix that was reported by many
      
         - amba-pl011 serial driver fix for RS485 mode
      
         - stm32 serial driver fix for RS485 mode
      
        All of these have been in linux-next all week with no reported
        problems"
      
      * tag 'tty-6.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        serial: amba-pl011: Fix DMA transmission in RS485 mode
        serial: stm32: do not always set SER_RS485_RX_DURING_TX if RS485 is enabled
        tty: hvc: Don't enable the RISC-V SBI console by default
      1e592e95
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1eee4ef3
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - Make sure clearing CPU buffers using VERW happens at the latest
         possible point in the return-to-userspace path, otherwise memory
         accesses after the VERW execution could cause data to land in CPU
         buffers again
      
      * tag 'x86_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        KVM/VMX: Move VERW closer to VMentry for MDS mitigation
        KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
        x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
        x86/entry_32: Add VERW just before userspace transition
        x86/entry_64: Add VERW just before userspace transition
        x86/bugs: Add asm helpers for executing VERW
      1eee4ef3
    • Linus Torvalds's avatar
      Merge tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8c46ed37
      Linus Torvalds authored
      Pull irq fixes from Borislav Petkov:
      
       - Make sure GICv4 always gets initialized to prevent a kexec-ed kernel
         from silently failing to set it up
      
       - Do not call bus_get_dev_root() for the mbigen irqchip as it always
         returns NULL - use NULL directly
      
       - Fix hardware interrupt number truncation when assigning MSI
         interrupts
      
       - Correct sending end-of-interrupt messages to disabled interrupts
         lines on RISC-V PLIC
      
      * tag 'irq_urgent_for_v6.8_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/gic-v3-its: Do not assume vPE tables are preallocated
        irqchip/mbigen: Don't use bus_get_dev_root() to find the parent
        PCI/MSI: Prevent MSI hardware interrupt number truncation
        irqchip/sifive-plic: Enable interrupt if needed before EOI
      8c46ed37
    • Linus Torvalds's avatar
      Merge tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 4ca0d989
      Linus Torvalds authored
      Pull erofs fix from Gao Xiang:
      
       - Fix page refcount leak when looking up specific inodes
         introduced by metabuf reworking
      
      * tag 'erofs-for-6.8-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
        erofs: fix refcount on the metabuf used for inode lookup
      4ca0d989
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 66a97c2e
      Linus Torvalds authored
      Pull RCU pathwalk fixes from Al Viro:
       "We still have some races in filesystem methods when exposed to RCU
        pathwalk. This series is a result of code audit (the second round of
        it) and it should deal with most of that stuff.
      
        Still pending: ntfs3 ->d_hash()/->d_compare() and ceph_d_revalidate().
        Up to maintainers (a note for NTFS folks - when documentation says
        that a method may not block, it *does* imply that blocking allocations
        are to be avoided. Really)"
      
      [ More explanations for people who aren't familiar with the vagaries of
        RCU path walking: most of it is hidden from filesystems, but if a
        filesystem actively participates in the low-level path walking it
        needs to make sure the fields involved in that walk are RCU-safe.
      
        That "actively participate in low-level path walking" includes things
        like having its own ->d_hash()/->d_compare() routines, or by having
        its own directory permission function that doesn't just use the common
        helpers.  Having a ->d_revalidate() function will also have this issue.
      
        Note that instead of making everything RCU safe you can also choose to
        abort the RCU pathwalk if your operation cannot be done safely under
        RCU, but that obviously comes with a performance penalty. One common
        pattern is to allow the simple cases under RCU, and abort only if you
        need to do something more complicated.
      
        So not everything needs to be RCU-safe, and things like the inode etc
        that the VFS itself maintains obviously already are. But these fixes
        tend to be about properly RCU-delaying things like ->s_fs_info that
        are maintained by the filesystem and that got potentially released too
        early.   - Linus ]
      
      * tag 'pull-fixes.pathwalk-rcu-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        ext4_get_link(): fix breakage in RCU mode
        cifs_get_link(): bail out in unsafe case
        fuse: fix UAF in rcu pathwalks
        procfs: make freeing proc_fs_info rcu-delayed
        procfs: move dropping pde and pid from ->evict_inode() to ->free_inode()
        nfs: fix UAF on pathwalk running into umount
        nfs: make nfs_set_verifier() safe for use in RCU pathwalk
        afs: fix __afs_break_callback() / afs_drop_open_mmap() race
        hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info
        exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper
        affs: free affs_sb_info with kfree_rcu()
        rcu pathwalk: prevent bogus hard errors from may_lookup()
        fs/super.c: don't drop ->s_user_ns until we free struct super_block itself
      66a97c2e
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 9b243492
      Linus Torvalds authored
      Pull vfs fixes from Al Viro:
       "A couple of fixes - revert of regression from this cycle and a fix for
        erofs failure exit breakage (had been there since way back)"
      
      * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        erofs: fix handling kern_mount() failure
        Revert "get rid of DCACHE_GENOCIDE"
      9b243492
    • Al Viro's avatar
      ext4_get_link(): fix breakage in RCU mode · 9fa8e282
      Al Viro authored
      1) errors from ext4_getblk() should not be propagated to caller
      unless we are really sure that we would've gotten the same error
      in non-RCU pathwalk.
      2) we leak buffer_heads if ext4_getblk() is successful, but bh is
      not uptodate.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      9fa8e282
    • Al Viro's avatar
      cifs_get_link(): bail out in unsafe case · 0511fdb4
      Al Viro authored
      ->d_revalidate() bails out there, anyway.  It's not enough
      to prevent getting into ->get_link() in RCU mode, but that
      could happen only in a very contrieved setup.  Not worth
      trying to do anything fancy here unless ->d_revalidate()
      stops kicking out of RCU mode at least in some cases.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Acked-by: default avatarMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      0511fdb4
    • Al Viro's avatar
      fuse: fix UAF in rcu pathwalks · 053fc4f7
      Al Viro authored
      ->permission(), ->get_link() and ->inode_get_acl() might dereference
      ->s_fs_info (and, in case of ->permission(), ->s_fs_info->fc->user_ns
      as well) when called from rcu pathwalk.
      
      Freeing ->s_fs_info->fc is rcu-delayed; we need to make freeing ->s_fs_info
      and dropping ->user_ns rcu-delayed too.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      053fc4f7
    • Al Viro's avatar
      procfs: make freeing proc_fs_info rcu-delayed · e31f0a57
      Al Viro authored
      makes proc_pid_ns() safe from rcu pathwalk (put_pid_ns()
      is still synchronous, but that's not a problem - it does
      rcu-delay everything that needs to be)
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      e31f0a57
    • Al Viro's avatar
      procfs: move dropping pde and pid from ->evict_inode() to ->free_inode() · 47458802
      Al Viro authored
      that keeps both around until struct inode is freed, making access
      to them safe from rcu-pathwalk
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      47458802
    • Al Viro's avatar
      nfs: fix UAF on pathwalk running into umount · c1b967d0
      Al Viro authored
      NFS ->d_revalidate(), ->permission() and ->get_link() need to access
      some parts of nfs_server when called in RCU mode:
      	server->flags
      	server->caps
      	*(server->io_stats)
      and, worst of all, call
      	server->nfs_client->rpc_ops->have_delegation
      (the last one - as NFS_PROTO(inode)->have_delegation()).  We really
      don't want to RCU-delay the entire nfs_free_server() (it would have
      to be done with schedule_work() from RCU callback, since it can't
      be made to run from interrupt context), but actual freeing of
      nfs_server and ->io_stats can be done via call_rcu() just fine.
      nfs_client part is handled simply by making nfs_free_client() use
      kfree_rcu().
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c1b967d0
    • Al Viro's avatar
      nfs: make nfs_set_verifier() safe for use in RCU pathwalk · 10a973fc
      Al Viro authored
      nfs_set_verifier() relies upon dentry being pinned; if that's
      the case, grabbing ->d_lock stabilizes ->d_parent and guarantees
      that ->d_parent points to a positive dentry.  For something
      we'd run into in RCU mode that is *not* true - dentry might've
      been through dentry_kill() just as we grabbed ->d_lock, with
      its parent going through the same just as we get to into
      nfs_set_verifier_locked().  It might get to detaching inode
      (and zeroing ->d_inode) before nfs_set_verifier_locked() gets
      to fetching that; we get an oops as the result.
      
      That can happen in nfs{,4} ->d_revalidate(); the call chain in
      question is nfs_set_verifier_locked() <- nfs_set_verifier() <-
      nfs_lookup_revalidate_delegated() <- nfs{,4}_do_lookup_revalidate().
      We have checked that the parent had been positive, but that's
      done before we get to nfs_set_verifier() and it's possible for
      memory pressure to pick our dentry as eviction candidate by that
      time.  If that happens, back-to-back attempts to kill dentry and
      its parent are quite normal.  Sure, in case of eviction we'll
      fail the ->d_seq check in the caller, but we need to survive
      until we return there...
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      10a973fc
    • Al Viro's avatar
      afs: fix __afs_break_callback() / afs_drop_open_mmap() race · 275655d3
      Al Viro authored
      In __afs_break_callback() we might check ->cb_nr_mmap and if it's non-zero
      do queue_work(&vnode->cb_work).  In afs_drop_open_mmap() we decrement
      ->cb_nr_mmap and do flush_work(&vnode->cb_work) if it reaches zero.
      
      The trouble is, there's nothing to prevent __afs_break_callback() from
      seeing ->cb_nr_mmap before the decrement and do queue_work() after both
      the decrement and flush_work().  If that happens, we might be in trouble -
      vnode might get freed before the queued work runs.
      
      __afs_break_callback() is always done under ->cb_lock, so let's make
      sure that ->cb_nr_mmap can change from non-zero to zero while holding
      ->cb_lock (the spinlock component of it - it's a seqlock and we don't
      need to mess with the counter).
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      275655d3
    • Al Viro's avatar
      hfsplus: switch to rcu-delayed unloading of nls and freeing ->s_fs_info · af072cf6
      Al Viro authored
      ->d_hash() and ->d_compare() use those, so we need to delay freeing
      them.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      af072cf6
    • Al Viro's avatar
      exfat: move freeing sbi, upcase table and dropping nls into rcu-delayed helper · a13d1a4d
      Al Viro authored
      That stuff can be accessed by ->d_hash()/->d_compare(); as it is, we have
      a hard-to-hit UAF if rcu pathwalk manages to get into ->d_hash() on a filesystem
      that is in process of getting shut down.
      
      Besides, having nls and upcase table cleanup moved from ->put_super() towards
      the place where sbi is freed makes for simpler failure exits.
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a13d1a4d
    • Al Viro's avatar
      affs: free affs_sb_info with kfree_rcu() · 529f89a9
      Al Viro authored
      one of the flags in it is used by ->d_hash()/->d_compare()
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      529f89a9
    • Al Viro's avatar
      rcu pathwalk: prevent bogus hard errors from may_lookup() · cdb67fde
      Al Viro authored
      If lazy call of ->permission() returns a hard error, check that
      try_to_unlazy() succeeds before returning it.  That both makes
      life easier for ->permission() instances and closes the race
      in ENOTDIR handling - it is possible that positive d_can_lookup()
      seen in link_path_walk() applies to the state *after* unlink() +
      mkdir(), while nd->inode matches the state prior to that.
      
      Normally seeing e.g. EACCES from permission check in rcu pathwalk
      means that with some timings non-rcu pathwalk would've run into
      the same; however, running into a non-executable regular file
      in the middle of a pathname would not get to permission check -
      it would fail with ENOTDIR instead.
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      cdb67fde