1. 20 Sep, 2024 21 commits
    • Li Lingfeng's avatar
      nfsd: map the EBADMSG to nfserr_io to avoid warning · 340e61e4
      Li Lingfeng authored
      Ext4 will throw -EBADMSG through ext4_readdir when a checksum error
      occurs, resulting in the following WARNING.
      
      Fix it by mapping EBADMSG to nfserr_io.
      
      nfsd_buffered_readdir
       iterate_dir // -EBADMSG -74
        ext4_readdir // .iterate_shared
         ext4_dx_readdir
          ext4_htree_fill_tree
           htree_dirblock_to_tree
            ext4_read_dirblock
             __ext4_read_dirblock
              ext4_dirblock_csum_verify
               warn_no_space_for_csum
                __warn_no_space_for_csum
              return ERR_PTR(-EFSBADCRC) // -EBADMSG -74
       nfserrno // WARNING
      
      [  161.115610] ------------[ cut here ]------------
      [  161.116465] nfsd: non-standard errno: -74
      [  161.117315] WARNING: CPU: 1 PID: 780 at fs/nfsd/nfsproc.c:878 nfserrno+0x9d/0xd0
      [  161.118596] Modules linked in:
      [  161.119243] CPU: 1 PID: 780 Comm: nfsd Not tainted 5.10.0-00014-g79679361fd5d #138
      [  161.120684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qe
      mu.org 04/01/2014
      [  161.123601] RIP: 0010:nfserrno+0x9d/0xd0
      [  161.124676] Code: 0f 87 da 30 dd 00 83 e3 01 b8 00 00 00 05 75 d7 44 89 ee 48 c7 c7 c0 57 24 98 89 44 24 04 c6
       05 ce 2b 61 03 01 e8 99 20 d8 00 <0f> 0b 8b 44 24 04 eb b5 4c 89 e6 48 c7 c7 a0 6d a4 99 e8 cc 15 33
      [  161.127797] RSP: 0018:ffffc90000e2f9c0 EFLAGS: 00010286
      [  161.128794] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  161.130089] RDX: 1ffff1103ee16f6d RSI: 0000000000000008 RDI: fffff520001c5f2a
      [  161.131379] RBP: 0000000000000022 R08: 0000000000000001 R09: ffff8881f70c1827
      [  161.132664] R10: ffffed103ee18304 R11: 0000000000000001 R12: 0000000000000021
      [  161.133949] R13: 00000000ffffffb6 R14: ffff8881317c0000 R15: ffffc90000e2fbd8
      [  161.135244] FS:  0000000000000000(0000) GS:ffff8881f7080000(0000) knlGS:0000000000000000
      [  161.136695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  161.137761] CR2: 00007fcaad70b348 CR3: 0000000144256006 CR4: 0000000000770ee0
      [  161.139041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  161.140291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  161.141519] PKRU: 55555554
      [  161.142076] Call Trace:
      [  161.142575]  ? __warn+0x9b/0x140
      [  161.143229]  ? nfserrno+0x9d/0xd0
      [  161.143872]  ? report_bug+0x125/0x150
      [  161.144595]  ? handle_bug+0x41/0x90
      [  161.145284]  ? exc_invalid_op+0x14/0x70
      [  161.146009]  ? asm_exc_invalid_op+0x12/0x20
      [  161.146816]  ? nfserrno+0x9d/0xd0
      [  161.147487]  nfsd_buffered_readdir+0x28b/0x2b0
      [  161.148333]  ? nfsd4_encode_dirent_fattr+0x380/0x380
      [  161.149258]  ? nfsd_buffered_filldir+0xf0/0xf0
      [  161.150093]  ? wait_for_concurrent_writes+0x170/0x170
      [  161.151004]  ? generic_file_llseek_size+0x48/0x160
      [  161.151895]  nfsd_readdir+0x132/0x190
      [  161.152606]  ? nfsd4_encode_dirent_fattr+0x380/0x380
      [  161.153516]  ? nfsd_unlink+0x380/0x380
      [  161.154256]  ? override_creds+0x45/0x60
      [  161.155006]  nfsd4_encode_readdir+0x21a/0x3d0
      [  161.155850]  ? nfsd4_encode_readlink+0x210/0x210
      [  161.156731]  ? write_bytes_to_xdr_buf+0x97/0xe0
      [  161.157598]  ? __write_bytes_to_xdr_buf+0xd0/0xd0
      [  161.158494]  ? lock_downgrade+0x90/0x90
      [  161.159232]  ? nfs4svc_decode_voidarg+0x10/0x10
      [  161.160092]  nfsd4_encode_operation+0x15a/0x440
      [  161.160959]  nfsd4_proc_compound+0x718/0xe90
      [  161.161818]  nfsd_dispatch+0x18e/0x2c0
      [  161.162586]  svc_process_common+0x786/0xc50
      [  161.163403]  ? nfsd_svc+0x380/0x380
      [  161.164137]  ? svc_printk+0x160/0x160
      [  161.164846]  ? svc_xprt_do_enqueue.part.0+0x365/0x380
      [  161.165808]  ? nfsd_svc+0x380/0x380
      [  161.166523]  ? rcu_is_watching+0x23/0x40
      [  161.167309]  svc_process+0x1a5/0x200
      [  161.168019]  nfsd+0x1f5/0x380
      [  161.168663]  ? nfsd_shutdown_threads+0x260/0x260
      [  161.169554]  kthread+0x1c4/0x210
      [  161.170224]  ? kthread_insert_work_sanity_check+0x80/0x80
      [  161.171246]  ret_from_fork+0x1f/0x30
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      340e61e4
    • Li Lingfeng's avatar
      NFSD: remove redundant assignment operation · 2039c5da
      Li Lingfeng authored
      Commit 5826e09b ("NFSD: OP_CB_RECALL_ANY should recall both read and
      write delegations") added a new assignment statement to add
      RCA4_TYPE_MASK_WDATA_DLG to ra_bmval bitmask of OP_CB_RECALL_ANY. So the
      old one should be removed.
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2039c5da
    • Chuck Lever's avatar
      .mailmap: Add an entry for my work email address · ecbf8494
      Chuck Lever authored
      Collect a few very old previous employers as well.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ecbf8494
    • Chuck Lever's avatar
      NFSD: Fix NFSv4's PUTPUBFH operation · 202f3903
      Chuck Lever authored
      According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH.
      
      Replace the XDR decoder for PUTPUBFH with a "noop" since we no
      longer want the minorversion check, and PUTPUBFH has no arguments to
      decode. (Ideally nfsd4_decode_noop should really be called
      nfsd4_decode_void).
      
      PUTPUBFH should now behave just like PUTROOTFH.
      Reported-by: default avatarCedric Blancher <cedric.blancher@gmail.com>
      Fixes: e1a90ebd ("NFSD: Combine decode operations for v4 and v4.1")
      Cc: Dan Shelton <dan.f.shelton@gmail.com>
      Cc: Roland Mainz <roland.mainz@nrubsig.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      202f3903
    • Mark Grimes's avatar
      nfsd: Add quotes to client info 'callback address' · 32b34fa4
      Mark Grimes authored
      The 'callback address' in client_info_show is output without quotes
      causing yaml parsers to fail on processing IPv6 addresses.
      Adding quotes to 'callback address' also matches that used by
      the 'address' field.
      Signed-off-by: default avatarMark Grimes <mark.grimes@ixsystems.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      32b34fa4
    • Chuck Lever's avatar
      svcrdma: Handle device removal outside of the CM event handler · c4de97f7
      Chuck Lever authored
      Synchronously wait for all disconnects to complete to ensure the
      transports have divested all hardware resources before the
      underlying RDMA device can safely be removed.
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c4de97f7
    • NeilBrown's avatar
      nfsd: move error choice for incorrect object types to version-specific code. · 438f81e0
      NeilBrown authored
      If an NFS operation expects a particular sort of object (file, dir, link,
      etc) but gets a file handle for a different sort of object, it must
      return an error.  The actual error varies among NFS versions in non-trivial
      ways.
      
      For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only,
      INVAL is suitable.
      
      For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK
      was found when not expected.  This take precedence over NOTDIR.
      
      For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in
      preference to EINVAL when none of the specific error codes apply.
      
      When nfsd_mode_check() finds a symlink where it expected a directory it
      needs to return an error code that can be converted to NOTDIR for v2 or
      v3 but will be SYMLINK for v4.  It must be different from the error
      code returns when it finds a symlink but expects a regular file - that
      must be converted to EINVAL or SYMLINK.
      
      So we introduce an internal error code nfserr_symlink_not_dir which each
      version converts as appropriate.
      
      nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is
      only used by NFSv4 and only for OPEN.  NFSERR_INVAL is never a suitable
      error if the object is the wrong time.  For v4.0 we use nfserr_symlink
      for non-dirs even if not a symlink.  For v4.1 we have nfserr_wrong_type.
      We handle this difference in-place in nfsd_check_obj_isreg() as there is
      nothing to be gained by delaying the choice to nfsd4_map_status().
      
      As a result of these changes, nfsd_mode_check() doesn't need an rqstp
      arg any more.
      
      Note that NFSv4 operations are actually performed in the xdr code(!!!)
      so to the only place that we can map the status code successfully is in
      nfsd4_encode_operation().
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      438f81e0
    • NeilBrown's avatar
      nfsd: be more systematic about selecting error codes for internal use. · 36ffa3d0
      NeilBrown authored
      Rather than using ad hoc values for internal errors (30000, 11000, ...)
      use 'enum' to sequentially allocate numbers starting from the first
      known available number - now visible as NFS4ERR_FIRST_FREE.
      
      The goal is values that are distinct from all be32 error codes.  To get
      those we must first select integers that are not already used, then
      convert them with cpu_to_be32().
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      36ffa3d0
    • NeilBrown's avatar
      nfsd: Move error code mapping to per-version proc code. · 1459ad57
      NeilBrown authored
      There is code scattered around nfsd which chooses an error status based
      on the particular version of nfs being used.  It is cleaner to have the
      version specific choices in version specific code.
      
      With this patch common code returns the most specific error code
      possible and the version specific code maps that if necessary.
      
      Both v2 (nfsproc.c) and v3 (nfs3proc.c) now have a "map_status()"
      function which is called to map the resp->status before each non-trivial
      nfsd_proc_* or nfsd3_proc_* function returns.
      
      NFS4ERR_SYMLINK and NFS4ERR_WRONG_TYPE introduce extra complications and
      are left for a later patch.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      1459ad57
    • NeilBrown's avatar
      nfsd: move V4ROOT version check to nfsd_set_fh_dentry() · ef7f6c49
      NeilBrown authored
      This further centralizes version number checks.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ef7f6c49
    • NeilBrown's avatar
      nfsd: further centralize protocol version checks. · c689bdd3
      NeilBrown authored
      With this patch the only places that test ->rq_vers against a specific
      version are nfsd_v4client() and nfsd_set_fh_dentry().
      The latter sets some flags in the svc_fh, which now includes:
        fh_64bit_cookies
        fh_use_wgather
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c689bdd3
    • NeilBrown's avatar
      nfsd: use nfsd_v4client() in nfsd_breaker_owns_lease() · 4f67d24f
      NeilBrown authored
      nfsd_breaker_owns_lease() currently open-codes the same test that
      nfsd_v4client() performs.
      
      With this patch we use nfsd_v4client() instead.
      
      Also as i_am_nfsd() is only used in combination with kthread_data(),
      replace it with nfsd_current_rqst() which combines the two and returns a
      valid svc_rqst, or NULL.
      
      The test for NULL is moved into nfsd_v4client() for code clarity.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      4f67d24f
    • NeilBrown's avatar
      nfsd: Pass 'cred' instead of 'rqstp' to some functions. · 9fd45c16
      NeilBrown authored
      nfsd_permission(), exp_rdonly(), nfsd_setuser(), and nfsexp_flags()
      only ever need the cred out of rqstp, so pass it explicitly instead of
      the whole rqstp.
      
      This makes the interfaces cleaner.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9fd45c16
    • NeilBrown's avatar
      nfsd: Don't pass all of rqst into rqst_exp_find() · c55aeef7
      NeilBrown authored
      Rather than passing the whole rqst, pass the pieces that are actually
      needed.  This makes the inputs to rqst_exp_find() more obvious.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c55aeef7
    • Sagi Grimberg's avatar
      nfsd: don't assume copy notify when preprocessing the stateid · 11673b2a
      Sagi Grimberg authored
      Move the stateid handling to nfsd4_copy_notify.
      If nfs4_preprocess_stateid_op did not produce an output stateid, error out.
      
      Copy notify specifically does not permit the use of special stateids,
      so enforce that outside generic stateid pre-processing.
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarOlga Kornievskaia <aglo@umich.edu>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      11673b2a
    • NeilBrown's avatar
      sunrpc: allow svc threads to fail initialisation cleanly · 3391fc92
      NeilBrown authored
      If an svc thread needs to perform some initialisation that might fail,
      it has no good way to handle the failure.
      
      Before the thread can exit it must call svc_exit_thread(), but that
      requires the service mutex to be held.  The thread cannot simply take
      the mutex as that could deadlock if there is a concurrent attempt to
      shut down all threads (which is unlikely, but not impossible).
      
      nfsd currently call svc_exit_thread() unprotected in the unlikely event
      that unshare_fs_struct() fails.
      
      We can clean this up by introducing svc_thread_init_status() by which an
      svc thread can report whether initialisation has succeeded.  If it has,
      it continues normally into the action loop.  If it has not,
      svc_thread_init_status() immediately aborts the thread.
      svc_start_kthread() waits for either of these to happen, and calls
      svc_exit_thread() (under the mutex) if the thread aborted.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      3391fc92
    • NeilBrown's avatar
      sunrpc: merge svc_rqst_alloc() into svc_prepare_thread() · 59f3b138
      NeilBrown authored
      The only caller of svc_rqst_alloc() is svc_prepare_thread().  So merge
      the one into the other and simplify.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      59f3b138
    • NeilBrown's avatar
      sunrpc: don't take ->sv_lock when updating ->sv_nrthreads. · 9dcbc4e0
      NeilBrown authored
      As documented in svc_xprt.c, sv_nrthreads is protected by the service
      mutex, and it does not need ->sv_lock.
      (->sv_lock is needed only for sv_permsocks, sv_tempsocks, and
      sv_tmpcnt).
      
      So remove the unnecessary locking.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9dcbc4e0
    • NeilBrown's avatar
      sunrpc: change sp_nrthreads from atomic_t to unsigned int. · 60749cbe
      NeilBrown authored
      sp_nrthreads is only ever accessed under the service mutex
        nlmsvc_mutex nfs_callback_mutex nfsd_mutex
      so these is no need for it to be an atomic_t.
      
      The fact that all code using it is single-threaded means that we can
      simplify svc_pool_victim and remove the temporary elevation of
      sp_nrthreads.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      60749cbe
    • NeilBrown's avatar
      sunrpc: document locking rules for svc_exit_thread() · 16ef80ee
      NeilBrown authored
      The locking required for svc_exit_thread() is not obvious, so document
      it in a kdoc comment.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      16ef80ee
    • NeilBrown's avatar
      nfsd: don't allocate the versions array. · 73598a0c
      NeilBrown authored
      Instead of using kmalloc to allocate an array for storing active version
      info, just declare an array to the max size - it is only 5 or so.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      73598a0c
  2. 01 Sep, 2024 13 commits
    • NeilBrown's avatar
      nfsd: move nfsd_pool_stats_open into nfsctl.c · c9f10f81
      NeilBrown authored
      nfsd_pool_stats_open() is used in nfsctl.c, so move it there.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c9f10f81
    • NeilBrown's avatar
      SUNRPC: make various functions static, or not exported. · f2b27e1d
      NeilBrown authored
      Various functions are only used within the sunrpc module, and several
      are only use in the one file.  So clean up:
      
      These are marked static, and any EXPORT is removed.
        svc_rcpb_setup()
        svc_rqst_alloc()
        svc_rqst_free()  - also moved before first use
        svc_rpcbind_set_version()
        svc_drop() - also moved to svc.c
      
      These are now not EXPORTed, but are not static.
        svc_authenticate()
        svc_sock_update_bufs()
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      f2b27e1d
    • NeilBrown's avatar
      lockd: discard nlmsvc_timeout · 4ed9ef32
      NeilBrown authored
      nlmsvc_timeout always has the same value as (nlm_timeout * HZ), so use
      that in the one place that nlmsvc_timeout is used.
      
      In truth it *might* not always be the same as nlmsvc_timeout is only set
      when lockd is started while nlm_timeout can be set at anytime via
      sysctl.  I think this difference it not helpful so removing it is good.
      
      Also remove the test for nlm_timout being 0.  This is not possible -
      unless a module parameter is used to set the minimum timeout to 0, and
      if that happens then it probably should be honoured.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      4ed9ef32
    • NeilBrown's avatar
      nfsd: don't EXPORT_SYMBOL nfsd4_ssc_init_umount_work() · 8203ab8a
      NeilBrown authored
      nfsd4_ssc_init_umount_work() is only used in the nfsd module, so there
      is no need to EXPORT it.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      8203ab8a
    • Chen Hanxiao's avatar
      NFS: trace: show TIMEDOUT instead of 0x6e · cef48236
      Chen Hanxiao authored
      __nfs_revalidate_inode may return ETIMEDOUT.
      
      print symbol of ETIMEDOUT in nfs trace:
      
      before:
      cat-5191 [005] 119.331127: nfs_revalidate_inode_exit: error=-110 (0x6e)
      
      after:
      cat-1738 [004] 44.365509: nfs_revalidate_inode_exit: error=-110 (TIMEDOUT)
      Signed-off-by: default avatarChen Hanxiao <chenhx.fnst@fujitsu.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      cef48236
    • Youzhong Yang's avatar
      nfsd: use system_unbound_wq for nfsd_file_gc_worker() · 4b84551a
      Youzhong Yang authored
      After many rounds of changes in filecache.c, the fix by commit
      ce7df055(NFSD: Make the file_delayed_close workqueue UNBOUND)
      is gone, now we are getting syslog messages like these:
      
      [ 1618.186688] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 4 times, consider switching to WQ_UNBOUND
      [ 1638.661616] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 8 times, consider switching to WQ_UNBOUND
      [ 1665.284542] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 16 times, consider switching to WQ_UNBOUND
      [ 1759.491342] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 32 times, consider switching to WQ_UNBOUND
      [ 3013.012308] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 64 times, consider switching to WQ_UNBOUND
      [ 3154.172827] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 128 times, consider switching to WQ_UNBOUND
      [ 3422.461924] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 256 times, consider switching to WQ_UNBOUND
      [ 3963.152054] workqueue: nfsd_file_gc_worker [nfsd] hogged CPU for >13333us 512 times, consider switching to WQ_UNBOUND
      
      Consider use system_unbound_wq instead of system_wq for
      nfsd_file_gc_worker().
      Signed-off-by: default avatarYouzhong Yang <youzhong@gmail.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      4b84551a
    • Jeff Layton's avatar
      nfsd: count nfsd_file allocations · 700bb4ff
      Jeff Layton authored
      We already count the frees (via nfsd_file_releases). Count the
      allocations as well. Also switch the direct call to nfsd_file_slab_free
      in nfsd_file_do_acquire to nfsd_file_free, so that the allocs and
      releases match up.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      700bb4ff
    • Jeff Layton's avatar
      nfsd: fix refcount leak when file is unhashed after being found · 8a792617
      Jeff Layton authored
      If we wait_for_construction and find that the file is no longer hashed,
      and we're going to retry the open, the old nfsd_file reference is
      currently leaked. Put the reference before retrying.
      
      Fixes: c6593366 ("nfsd: don't kill nfsd_files because of lease break error")
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Tested-by: default avatarYouzhong Yang <youzhong@gmail.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      8a792617
    • Jeff Layton's avatar
      nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire · 81a95c2b
      Jeff Layton authored
      Given that we do the search and insertion while holding the i_lock, I
      don't think it's possible for us to get EEXIST here. Remove this case.
      
      Fixes: c6593366 ("nfsd: don't kill nfsd_files because of lease break error")
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Tested-by: default avatarYouzhong Yang <youzhong@gmail.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      81a95c2b
    • Youzhong Yang's avatar
      nfsd: add list_head nf_gc to struct nfsd_file · 8e6e2ffa
      Youzhong Yang authored
      nfsd_file_put() in one thread can race with another thread doing
      garbage collection (running nfsd_file_gc() -> list_lru_walk() ->
      nfsd_file_lru_cb()):
      
        * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add().
        * nfsd_file_lru_add() returns true (with NFSD_FILE_REFERENCED bit set)
        * garbage collector kicks in, nfsd_file_lru_cb() clears REFERENCED bit and
          returns LRU_ROTATE.
        * garbage collector kicks in again, nfsd_file_lru_cb() now decrements nf->nf_ref
          to 0, runs nfsd_file_unhash(), removes it from the LRU and adds to the dispose
          list [list_lru_isolate_move(lru, &nf->nf_lru, head)]
        * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove
          the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))]. The 'nf' has been added
          to the 'dispose' list by nfsd_file_lru_cb(), so nfsd_file_lru_remove(nf) simply
          treats it as part of the LRU and removes it, which leads to its removal from
          the 'dispose' list.
        * At this moment, 'nf' is unhashed with its nf_ref being 0, and not on the LRU.
          nfsd_file_put() continues its execution [if (refcount_dec_and_test(&nf->nf_ref))],
          as nf->nf_ref is already 0, nf->nf_ref is set to REFCOUNT_SATURATED, and the 'nf'
          gets no chance of being freed.
      
      nfsd_file_put() can also race with nfsd_file_cond_queue():
        * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add().
        * nfsd_file_lru_add() sets REFERENCED bit and returns true.
        * Some userland application runs 'exportfs -f' or something like that, which triggers
          __nfsd_file_cache_purge() -> nfsd_file_cond_queue().
        * In nfsd_file_cond_queue(), it runs [if (!nfsd_file_unhash(nf))], unhash is done
          successfully.
        * nfsd_file_cond_queue() runs [if (!nfsd_file_get(nf))], now nf->nf_ref goes to 2.
        * nfsd_file_cond_queue() runs [if (nfsd_file_lru_remove(nf))], it succeeds.
        * nfsd_file_cond_queue() runs [if (refcount_sub_and_test(decrement, &nf->nf_ref))]
          (with "decrement" being 2), so the nf->nf_ref goes to 0, the 'nf' is added to the
          dispose list [list_add(&nf->nf_lru, dispose)]
        * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove
          the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))], although the 'nf' is not
          in the LRU, but it is linked in the 'dispose' list, nfsd_file_lru_remove() simply
          treats it as part of the LRU and removes it. This leads to its removal from
          the 'dispose' list!
        * Now nf->ref is 0, unhashed. nfsd_file_put() continues its execution and set
          nf->nf_ref to REFCOUNT_SATURATED.
      
      As shown in the above analysis, using nf_lru for both the LRU list and dispose list
      can cause the leaks. This patch adds a new list_head nf_gc in struct nfsd_file, and uses
      it for the dispose list. This does not fix the nfsd_file leaking issue completely.
      Signed-off-by: default avatarYouzhong Yang <youzhong@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      8e6e2ffa
    • Linus Torvalds's avatar
      Linux 6.11-rc6 · 431c1646
      Linus Torvalds authored
      431c1646
    • Linus Torvalds's avatar
      Merge tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 6b9ffc45
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - copy_file_range fix
      
       - two read fixes including read past end of file rc fix and read retry
         crediting fix
      
       - falloc zero range fix
      
      * tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
        cifs: Fix copy offload to flush destination region
        netfs, cifs: Fix handling of short DIO read
        cifs: Fix lack of credit renegotiation on read retry
      6b9ffc45
    • Linus Torvalds's avatar
      Merge tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs · a4c76312
      Linus Torvalds authored
      Push bcachefs fixes from Kent Overstreet:
       "The data corruption in the buffered write path is troubling; inode
        lock should not have been able to cause that...
      
         - Fix a rare data corruption in the rebalance path, caught as a nonce
           inconsistency on encrypted filesystems
      
         - Revert lockless buffered write path
      
         - Mark more errors as autofix"
      
      * tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
        bcachefs: Mark more errors as autofix
        bcachefs: Revert lockless buffered IO path
        bcachefs: Fix bch2_extents_match() false positive
        bcachefs: Fix failure to return error in data_update_index_update()
      a4c76312
  3. 31 Aug, 2024 6 commits
    • Kent Overstreet's avatar
      bcachefs: Mark more errors as autofix · 3d3020c4
      Kent Overstreet authored
      errors that are known to always be safe to fix should be autofix: this
      should be most errors even at this point, but that will need some
      thorough review.
      
      note that errors are still logged in the superblock, so we'll still know
      that they happened.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      3d3020c4
    • Kent Overstreet's avatar
      bcachefs: Revert lockless buffered IO path · e3e69409
      Kent Overstreet authored
      We had a report of data corruption on nixos when building installer
      images.
      
      https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334
      
      It seems that writes are being dropped, but only when issued by QEMU,
      and possibly only in snapshot mode. It's undetermined if it's write
      calls are being dropped or dirty folios.
      
      Further testing, via minimizing the original patch to just the change
      that skips the inode lock on non appends/truncates, reveals that it
      really is just not taking the inode lock that causes the corruption: it
      has nothing to do with the other logic changes for preserving write
      atomicity in corner cases.
      
      It's also kernel config dependent: it doesn't reproduce with the minimal
      kernel config that ktest uses, but it does reproduce with nixos's distro
      config. Bisection the kernel config initially pointer the finger at page
      migration or compaction, but it appears that was erroneous; we haven't
      yet determined what kernel config option actually triggers it.
      
      Sadly it appears this will have to be reverted since we're getting too
      close to release and my plate is full, but we'd _really_ like to fully
      debug it.
      
      My suspicion is that this patch is exposing a preexisting bug - the
      inode lock actually covers very little in IO paths, and we have a
      different lock (the pagecache add lock) that guards against races with
      truncate here.
      
      Fixes: 7e64c86c ("bcachefs: Buffered write path now can avoid the inode lock")
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      e3e69409
    • Linus Torvalds's avatar
      Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging · 6cd90e5e
      Linus Torvalds authored
      Pull misc fixes from Guenter Roeck.
      
      These are fixes for regressions that Guenther has been reporting, and
      the maintainers haven't picked up and sent in. With rc6 fairly imminent,
      I'm taking them directly from Guenter.
      
      * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        apparmor: fix policy_unpack_test on big endian systems
        Revert "MIPS: csrc-r4k: Apply verification clocksource flags"
        microblaze: don't treat zero reserved memory regions as error
      6cd90e5e
    • Linus Torvalds's avatar
      Merge tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux · 8463be84
      Linus Torvalds authored
      Pull power sequencing fix from Bartosz Golaszewski:
       "A follow-up fix for the power sequencing subsystem. It turned out the
        previous fix for this driver was incomplete and broke the WLAN support
        on some platforms. This addresses the issue.
      
         - set the direction of the wlan-enable GPIO to output after
           requesting it as-is"
      
      * tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
        power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
      8463be84
    • Bartosz Golaszewski's avatar
      power: sequencing: qcom-wcn: set the wlan-enable GPIO to output · d8b76207
      Bartosz Golaszewski authored
      Commit a9aaf1ff ("power: sequencing: request the WLAN enable GPIO
      as-is") broke WLAN on boards on which the wlan-enable GPIO enabling the
      wifi module isn't in output mode by default. We need to set direction to
      output while retaining the value that was already set to keep the ath
      module on if it's already started.
      
      Fixes: a9aaf1ff ("power: sequencing: request the WLAN enable GPIO as-is")
      Link: https://lore.kernel.org/r/20240823115500.37280-1-brgl@bgdev.plSigned-off-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      d8b76207
    • Linus Torvalds's avatar
      Merge tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · e8784b0a
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB fixes for 6.11-rc6.  Included in here are:
      
         - dwc3 driver fixes for reported issues
      
         - MAINTAINER file update, marking a driver as unsupported :(
      
         - cdnsp driver fixes
      
         - USB gadget driver fix
      
         - USB sysfs fix
      
         - other tiny fixes
      
         - new device ids for usb serial driver
      
        All of these have been in linux-next this week with no reported
        issues"
      
      * tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        USB: serial: option: add MeiG Smart SRM825L
        usb: cdnsp: fix for Link TRB with TC
        usb: dwc3: st: add missing depopulate in probe error path
        usb: dwc3: st: fix probed platform device ref count on probe error path
        usb: dwc3: ep0: Don't reset resource alloc flag (including ep0)
        usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes()
        usb: typec: fsa4480: Relax CHIP_ID check
        usb: dwc3: xilinx: add missing depopulate in probe error path
        usb: dwc3: omap: add missing depopulate in probe error path
        dt-bindings: usb: microchip,usb2514: Fix reference USB device schema
        usb: gadget: uvc: queue pump work in uvcg_video_enable()
        cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller
        usb: cdnsp: fix incorrect index in cdnsp_get_hw_deq function
        usb: dwc3: core: Prevent USB core invalid event buffer address access
        MAINTAINERS: Mark UVC gadget driver as orphan
      e8784b0a