1. 23 Sep, 2024 3 commits
    • NeilBrown's avatar
      nfs: simplify and guarantee owner uniqueness. · d98f7227
      NeilBrown authored
      I have evidence of an Linux NFS client getting NFS4ERR_BAD_SEQID to a
      v4.0 LOCK request to a Linux server (which had fixed the problem with
      RELEASE_LOCKOWNER bug fixed).
      The LOCK request presented a "new" lock owner so there are two seq ids
      in the request: that for the open file, and that for the new lock.
      Given the context I am confident that the new lock owner was reported to
      have the wrong seqid.  As lock owner identifiers are reused, the server
      must still have a lock owner active which the client thinks is no longer
      active.
      
      I wasn't able to determine a root-cause but the simplest fix seems to be
      to ensure lock owners are always unique much as open owners are (thanks
      to a time stamp).  The easiest way to ensure uniqueness is with a 64bit
      counter for each server.  That will never cycle (if updated once a
      nanosecond the last 584 years.  A single NFS server would not handle
      open/lock requests nearly that fast, and a Linux node is unlikely to
      have an uptime approaching that).
      
      This patch removes the 2 ida and instead uses a per-server
      atomic64_t to provide uniqueness.
      
      Note that the lock owner already encodes the id as 64 bits even though
      it is a 32bit value.  So changing to a 64bit value does not change the
      encoding of the lock owner.  The open owner encoding is now 4 bytes
      larger.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarAnna Schumaker <anna.schumaker@oracle.com>
      d98f7227
    • Li Lingfeng's avatar
      nfs: fix memory leak in error path of nfs4_do_reclaim · 8f6a7c94
      Li Lingfeng authored
      Commit c77e2283 ("NFSv4: Fix a potential sleep while atomic in
      nfs4_do_reclaim()") separate out the freeing of the state owners from
      nfs4_purge_state_owners() and finish it outside the rcu lock.
      However, the error path is omitted. As a result, the state owners in
      "freeme" will not be released.
      Fix it by adding freeing in the error path.
      
      Fixes: c77e2283 ("NFSv4: Fix a potential sleep while atomic in nfs4_do_reclaim()")
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Cc: stable@vger.kernel.org # v5.3+
      Signed-off-by: default avatarAnna Schumaker <anna.schumaker@oracle.com>
      8f6a7c94
    • Anna Schumaker's avatar
      Merge tag 'nfsd-6.12' into linux-next-with-localio · 8c04a6d6
      Anna Schumaker authored
      NFSD 6.12 Release Notes
      
      Notable features of this release include:
      
      - Pre-requisites for automatically determining the RPC server thread
        count
      - Clean-up and preparation for supporting LOCALIO, which will be
        merged via the NFS client tree
      - Enhancements and fixes to NFSv4.2 COPY offload
      - A new Python-based tool for generating kernel SunRPC XDR encoding
        and decoding functions, added as an aid for prototyping features
        in protocols based on the Linux kernel's SunRPC implementation.
      
      As always I am grateful to the NFSD contributors, reviewers,
      testers, and bug reporters who participated during this cycle.
      8c04a6d6
  2. 20 Sep, 2024 37 commits
    • Chuck Lever's avatar
      xdrgen: Prevent reordering of encoder and decoder functions · 509abfc7
      Chuck Lever authored
      I noticed that "xdrgen source" reorders the procedure encoder and
      decoder functions every time it is run. I would prefer that the
      generated code be more deterministic: it enables a reader to better
      see exactly what has changed between runs of the tool.
      
      The problem is that Python sets are not ordered. I use a Python set
      to ensure that, when multiple procedures use a particular argument or
      result type, the encoder/decoder for that type is emitted only once.
      
      Sets aren't ordered, but I can use Python dictionaries for this
      purpose to ensure the procedure functions are always emitted in the
      same order if the .x file does not change.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      509abfc7
    • Chuck Lever's avatar
      xdrgen: typedefs should use the built-in string and opaque functions · fed8a17c
      Chuck Lever authored
      'typedef opaque yada<XYZ>' should use xdrgen's built-in opaque
      encoder and decoder, to enable better compiler optimization.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      fed8a17c
    • Chuck Lever's avatar
      xdrgen: Fix return code checking in built-in XDR decoders · 663ad8b1
      Chuck Lever authored
      xdr_stream_encode_u32() returns XDR_UNIT on success.
      xdr_stream_decode_u32() returns zero or -EMSGSIZE, but never
      XDR_UNIT.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      663ad8b1
    • Chuck Lever's avatar
      tools: Add xdrgen · 4b132aac
      Chuck Lever authored
      Add a Python-based tool for translating XDR specifications into XDR
      encoder and decoder functions written in the Linux kernel's C coding
      style. The generator attempts to match the usual C coding style of
      the Linux kernel's SunRPC consumers.
      
      This approach is similar to the netlink code generator in
      tools/net/ynl .
      
      The maintainability benefits of machine-generated XDR code include:
      
      - Stronger type checking
      - Reduces the number of bugs introduced by human error
      - Makes the XDR code easier to audit and analyze
      - Enables rapid prototyping of new RPC-based protocols
      - Hardens the layering between protocol logic and marshaling
      - Makes it easier to add observability on demand
      - Unit tests might be built for both the tool and (automatically)
        for the generated code
      
      In addition, converting the XDR layer to use memory-safe languages
      such as Rust will be easier if much of the code can be converted
      automatically.
      Tested-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      4b132aac
    • NeilBrown's avatar
      nfsd: fix delegation_blocked() to block correctly for at least 30 seconds · 45bb63ed
      NeilBrown authored
      The pair of bloom filtered used by delegation_blocked() was intended to
      block delegations on given filehandles for between 30 and 60 seconds.  A
      new filehandle would be recorded in the "new" bit set.  That would then
      be switch to the "old" bit set between 0 and 30 seconds later, and it
      would remain as the "old" bit set for 30 seconds.
      
      Unfortunately the code intended to clear the old bit set once it reached
      30 seconds old, preparing it to be the next new bit set, instead cleared
      the *new* bit set before switching it to be the old bit set.  This means
      that the "old" bit set is always empty and delegations are blocked
      between 0 and 30 seconds.
      
      This patch updates bd->new before clearing the set with that index,
      instead of afterwards.
      Reported-by: default avatarOlga Kornievskaia <okorniev@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 6282cd56 ("NFSD: Don't hand out delegations for 30 seconds after recalling them.")
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      45bb63ed
    • Jeff Layton's avatar
      nfsd: fix initial getattr on write delegation · bf92e500
      Jeff Layton authored
      At this point in compound processing, currentfh refers to the parent of
      the file, not the file itself. Get the correct dentry from the delegation
      stateid instead.
      
      Fixes: c5967721 ("NFSD: handle GETATTR conflict with write delegation")
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      bf92e500
    • NeilBrown's avatar
      nfsd: untangle code in nfsd4_deleg_getattr_conflict() · a078a7dc
      NeilBrown authored
      The code in nfsd4_deleg_getattr_conflict() is convoluted and buggy.
      
      With this patch we:
       - properly handle non-nfsd leases.  We must not assume flc_owner is a
          delegation unless fl_lmops == &nfsd_lease_mng_ops
       - move the main code out of the for loop
       - have a single exit which calls nfs4_put_stid()
         (and other exits which don't need to call that)
      
      [ jlayton: refactored on top of Neil's other patch: nfsd: fix
      	   nfsd4_deleg_getattr_conflict in presence of third party lease ]
      
      Fixes: c5967721 ("NFSD: handle GETATTR conflict with write delegation")
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      a078a7dc
    • Scott Mayhew's avatar
      nfsd: enforce upper limit for namelen in __cld_pipe_inprogress_downcall() · 5559c157
      Scott Mayhew authored
      This patch is intended to go on top of "nfsd: return -EINVAL when
      namelen is 0" from Li Lingfeng.  Li's patch checks for 0, but we should
      be enforcing an upper bound as well.
      
      Note that if nfsdcld somehow gets an id > NFS4_OPAQUE_LIMIT in its
      database, it'll truncate it to NFS4_OPAQUE_LIMIT when it does the
      downcall anyway.
      Signed-off-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      5559c157
    • Li Lingfeng's avatar
      nfsd: return -EINVAL when namelen is 0 · 22451a16
      Li Lingfeng authored
      When we have a corrupted main.sqlite in /var/lib/nfs/nfsdcld/, it may
      result in namelen being 0, which will cause memdup_user() to return
      ZERO_SIZE_PTR.
      When we access the name.data that has been assigned the value of
      ZERO_SIZE_PTR in nfs4_client_to_reclaim(), null pointer dereference is
      triggered.
      
      [ T1205] ==================================================================
      [ T1205] BUG: KASAN: null-ptr-deref in nfs4_client_to_reclaim+0xe9/0x260
      [ T1205] Read of size 1 at addr 0000000000000010 by task nfsdcld/1205
      [ T1205]
      [ T1205] CPU: 11 PID: 1205 Comm: nfsdcld Not tainted 5.10.0-00003-g2c1423731b8d #406
      [ T1205] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      [ T1205] Call Trace:
      [ T1205]  dump_stack+0x9a/0xd0
      [ T1205]  ? nfs4_client_to_reclaim+0xe9/0x260
      [ T1205]  __kasan_report.cold+0x34/0x84
      [ T1205]  ? nfs4_client_to_reclaim+0xe9/0x260
      [ T1205]  kasan_report+0x3a/0x50
      [ T1205]  nfs4_client_to_reclaim+0xe9/0x260
      [ T1205]  ? nfsd4_release_lockowner+0x410/0x410
      [ T1205]  cld_pipe_downcall+0x5ca/0x760
      [ T1205]  ? nfsd4_cld_tracking_exit+0x1d0/0x1d0
      [ T1205]  ? down_write_killable_nested+0x170/0x170
      [ T1205]  ? avc_policy_seqno+0x28/0x40
      [ T1205]  ? selinux_file_permission+0x1b4/0x1e0
      [ T1205]  rpc_pipe_write+0x84/0xb0
      [ T1205]  vfs_write+0x143/0x520
      [ T1205]  ksys_write+0xc9/0x170
      [ T1205]  ? __ia32_sys_read+0x50/0x50
      [ T1205]  ? ktime_get_coarse_real_ts64+0xfe/0x110
      [ T1205]  ? ktime_get_coarse_real_ts64+0xa2/0x110
      [ T1205]  do_syscall_64+0x33/0x40
      [ T1205]  entry_SYSCALL_64_after_hwframe+0x67/0xd1
      [ T1205] RIP: 0033:0x7fdbdb761bc7
      [ T1205] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 514
      [ T1205] RSP: 002b:00007fff8c4b7248 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [ T1205] RAX: ffffffffffffffda RBX: 000000000000042b RCX: 00007fdbdb761bc7
      [ T1205] RDX: 000000000000042b RSI: 00007fff8c4b75f0 RDI: 0000000000000008
      [ T1205] RBP: 00007fdbdb761bb0 R08: 0000000000000000 R09: 0000000000000001
      [ T1205] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000042b
      [ T1205] R13: 0000000000000008 R14: 00007fff8c4b75f0 R15: 0000000000000000
      [ T1205] ==================================================================
      
      Fix it by checking namelen.
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Fixes: 74725959 ("nfsd: un-deprecate nfsdcld")
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarScott Mayhew <smayhew@redhat.com>
      Tested-by: default avatarScott Mayhew <smayhew@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      22451a16
    • Chuck Lever's avatar
      NFSD: Wrap async copy operations with trace points · 0505de96
      Chuck Lever authored
      Add an nfsd_copy_async_done to record the timestamp, the final
      status code, and the callback stateid of an async copy.
      
      Rename the nfsd_copy_do_async tracepoint to match that naming
      convention to make it easier to enable both of these with a
      single glob.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      0505de96
    • Chuck Lever's avatar
    • Chuck Lever's avatar
      NFSD: Record the callback stateid in copy tracepoints · e1d2697c
      Chuck Lever authored
      Match COPY operations up with CB_OFFLOAD operations.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      e1d2697c
    • Chuck Lever's avatar
      NFSD: Display copy stateids with conventional print formatting · 11848e98
      Chuck Lever authored
      Make it easier to grep for s2s COPY stateids in trace logs: Use the
      same display format in nfsd_copy_class as is used to display other
      stateids.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      11848e98
    • Chuck Lever's avatar
      NFSD: Limit the number of concurrent async COPY operations · aadc3bbe
      Chuck Lever authored
      Nothing appears to limit the number of concurrent async COPY
      operations that clients can start. In addition, AFAICT each async
      COPY can copy an unlimited number of 4MB chunks, so can run for a
      long time. Thus IMO async COPY can become a DoS vector.
      
      Add a restriction mechanism that bounds the number of concurrent
      background COPY operations. Start simple and try to be fair -- this
      patch implements a per-namespace limit.
      
      An async COPY request that occurs while this limit is exceeded gets
      NFS4ERR_DELAY. The requesting client can choose to send the request
      again after a delay or fall back to a traditional read/write style
      copy.
      
      If there is need to make the mechanism more sophisticated, we can
      visit that in future patches.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      aadc3bbe
    • Chuck Lever's avatar
      NFSD: Async COPY result needs to return a write verifier · 9ed666eb
      Chuck Lever authored
      Currently, when NFSD handles an asynchronous COPY, it returns a
      zero write verifier, relying on the subsequent CB_OFFLOAD callback
      to pass the write verifier and a stable_how4 value to the client.
      
      However, if the CB_OFFLOAD never arrives at the client (for example,
      if a network partition occurs just as the server sends the
      CB_OFFLOAD operation), the client will never receive this verifier.
      Thus, if the client sends a follow-up COMMIT, there is no way for
      the client to assess the COMMIT result.
      
      The usual recovery for a missing CB_OFFLOAD is for the client to
      send an OFFLOAD_STATUS operation, but that operation does not carry
      a write verifier in its result. Neither does it carry a stable_how4
      value, so the client /must/ send a COMMIT in this case -- which will
      always fail because currently there's still no write verifier in the
      COPY result.
      
      Thus the server needs to return a normal write verifier in its COPY
      result even if the COPY operation is to be performed asynchronously.
      
      If the server recognizes the callback stateid in subsequent
      OFFLOAD_STATUS operations, then obviously it has not restarted, and
      the write verifier the client received in the COPY result is still
      valid and can be used to assess a COMMIT of the copied data, if one
      is needed.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9ed666eb
    • NeilBrown's avatar
      nfsd: avoid races with wake_up_var() · 15392c8c
      NeilBrown authored
      wake_up_var() needs a barrier after the important change is made in the
      var and before wake_up_var() is called, else it is possible that a wake
      up won't be sent when it should.
      
      In each case here the var is changed in an "atomic" manner, so
      smb_mb__after_atomic() is sufficient.
      
      In one case the important change (removing the lease) is performed
      *after* the wake_up, which is backwards.  The code survives in part
      because the wait_var_event is given a timeout.
      
      This patch adds the required barriers and calls destroy_delegation()
      *before* waking any threads waiting for the delegation to be destroyed.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      15392c8c
    • NeilBrown's avatar
      nfsd: use clear_and_wake_up_bit() · 985eeae9
      NeilBrown authored
      nfsd has two places that open-code clear_and_wake_up_bit().  One has
      the required memory barriers.  The other does not.
      
      Change both to use clear_and_wake_up_bit() so we have the barriers
      without the noise.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      985eeae9
    • Yan Zhen's avatar
      sunrpc: xprtrdma: Use ERR_CAST() to return · aeddf8e6
      Yan Zhen authored
      Using ERR_CAST() is more reasonable and safer, When it is necessary
      to convert the type of an error pointer and return it.
      Signed-off-by: default avatarYan Zhen <yanzhen@vivo.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      aeddf8e6
    • Thorsten Blum's avatar
      NFSD: Annotate struct pnfs_block_deviceaddr with __counted_by() · 2869b3a0
      Thorsten Blum authored
      Add the __counted_by compiler attribute to the flexible array member
      volumes to improve access bounds-checking via CONFIG_UBSAN_BOUNDS and
      CONFIG_FORTIFY_SOURCE.
      
      Use struct_size() instead of manually calculating the number of bytes to
      allocate for a pnfs_block_deviceaddr with a single volume.
      Signed-off-by: default avatarThorsten Blum <thorsten.blum@toblux.com>
      Reviewed-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Acked-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2869b3a0
    • Guoqing Jiang's avatar
      nfsd: call cache_put if xdr_reserve_space returns NULL · d078cbf5
      Guoqing Jiang authored
      If not enough buffer space available, but idmap_lookup has triggered
      lookup_fn which calls cache_get and returns successfully. Then we
      missed to call cache_put here which pairs with cache_get.
      
      Fixes: ddd1ea56 ("nfsd4: use xdr_reserve_space in attribute encoding")
      Signed-off-by: default avatarGuoqing Jiang <guoqing.jiang@linux.dev>
      Reviwed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      d078cbf5
    • Jeff Layton's avatar
      nfsd: add more nfsd_cb tracepoints · ba017fd3
      Jeff Layton authored
      Add some tracepoints in the callback client RPC operations. Also
      add a tracepoint to nfsd4_cb_getattr_done.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ba017fd3
    • Jeff Layton's avatar
      nfsd: track the main opcode for callbacks · c1c9f3ea
      Jeff Layton authored
      Keep track of the "main" opcode for the callback, and display it in the
      tracepoint. This makes it simpler to discern what's happening when there
      is more than one callback in flight.
      
      The one special case is the CB_NULL RPC. That's not a CB_COMPOUND
      opcode, so designate the value 0 for that.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c1c9f3ea
    • Jeff Layton's avatar
      nfsd: add more info to WARN_ON_ONCE on failed callbacks · e8581a91
      Jeff Layton authored
      Currently, you get the warning and stack trace, but nothing is printed
      about the relevant error codes. Add that in.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      e8581a91
    • Li Lingfeng's avatar
      nfsd: fix some spelling errors in comments · 76a3f3f1
      Li Lingfeng authored
      Fix spelling errors in comments of nfsd4_release_lockowner and
      nfs4_set_delegation.
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      76a3f3f1
    • Li Lingfeng's avatar
      nfsd: remove unused parameter of nfsd_file_mark_find_or_create · eb059a41
      Li Lingfeng authored
      Commit 427f5f83 ("NFSD: Ensure nf_inode is never dereferenced") passes
      inode directly to nfsd_file_mark_find_or_create instead of getting it from
      nf, so there is no need to pass nf.
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      eb059a41
    • Hongbo Li's avatar
      nfsd: use LIST_HEAD() to simplify code · c2feb7ee
      Hongbo Li authored
      list_head can be initialized automatically with LIST_HEAD()
      instead of calling INIT_LIST_HEAD().
      Signed-off-by: default avatarHongbo Li <lihongbo22@huawei.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c2feb7ee
    • Li Lingfeng's avatar
      nfsd: map the EBADMSG to nfserr_io to avoid warning · 340e61e4
      Li Lingfeng authored
      Ext4 will throw -EBADMSG through ext4_readdir when a checksum error
      occurs, resulting in the following WARNING.
      
      Fix it by mapping EBADMSG to nfserr_io.
      
      nfsd_buffered_readdir
       iterate_dir // -EBADMSG -74
        ext4_readdir // .iterate_shared
         ext4_dx_readdir
          ext4_htree_fill_tree
           htree_dirblock_to_tree
            ext4_read_dirblock
             __ext4_read_dirblock
              ext4_dirblock_csum_verify
               warn_no_space_for_csum
                __warn_no_space_for_csum
              return ERR_PTR(-EFSBADCRC) // -EBADMSG -74
       nfserrno // WARNING
      
      [  161.115610] ------------[ cut here ]------------
      [  161.116465] nfsd: non-standard errno: -74
      [  161.117315] WARNING: CPU: 1 PID: 780 at fs/nfsd/nfsproc.c:878 nfserrno+0x9d/0xd0
      [  161.118596] Modules linked in:
      [  161.119243] CPU: 1 PID: 780 Comm: nfsd Not tainted 5.10.0-00014-g79679361fd5d #138
      [  161.120684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qe
      mu.org 04/01/2014
      [  161.123601] RIP: 0010:nfserrno+0x9d/0xd0
      [  161.124676] Code: 0f 87 da 30 dd 00 83 e3 01 b8 00 00 00 05 75 d7 44 89 ee 48 c7 c7 c0 57 24 98 89 44 24 04 c6
       05 ce 2b 61 03 01 e8 99 20 d8 00 <0f> 0b 8b 44 24 04 eb b5 4c 89 e6 48 c7 c7 a0 6d a4 99 e8 cc 15 33
      [  161.127797] RSP: 0018:ffffc90000e2f9c0 EFLAGS: 00010286
      [  161.128794] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  161.130089] RDX: 1ffff1103ee16f6d RSI: 0000000000000008 RDI: fffff520001c5f2a
      [  161.131379] RBP: 0000000000000022 R08: 0000000000000001 R09: ffff8881f70c1827
      [  161.132664] R10: ffffed103ee18304 R11: 0000000000000001 R12: 0000000000000021
      [  161.133949] R13: 00000000ffffffb6 R14: ffff8881317c0000 R15: ffffc90000e2fbd8
      [  161.135244] FS:  0000000000000000(0000) GS:ffff8881f7080000(0000) knlGS:0000000000000000
      [  161.136695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  161.137761] CR2: 00007fcaad70b348 CR3: 0000000144256006 CR4: 0000000000770ee0
      [  161.139041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  161.140291] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  161.141519] PKRU: 55555554
      [  161.142076] Call Trace:
      [  161.142575]  ? __warn+0x9b/0x140
      [  161.143229]  ? nfserrno+0x9d/0xd0
      [  161.143872]  ? report_bug+0x125/0x150
      [  161.144595]  ? handle_bug+0x41/0x90
      [  161.145284]  ? exc_invalid_op+0x14/0x70
      [  161.146009]  ? asm_exc_invalid_op+0x12/0x20
      [  161.146816]  ? nfserrno+0x9d/0xd0
      [  161.147487]  nfsd_buffered_readdir+0x28b/0x2b0
      [  161.148333]  ? nfsd4_encode_dirent_fattr+0x380/0x380
      [  161.149258]  ? nfsd_buffered_filldir+0xf0/0xf0
      [  161.150093]  ? wait_for_concurrent_writes+0x170/0x170
      [  161.151004]  ? generic_file_llseek_size+0x48/0x160
      [  161.151895]  nfsd_readdir+0x132/0x190
      [  161.152606]  ? nfsd4_encode_dirent_fattr+0x380/0x380
      [  161.153516]  ? nfsd_unlink+0x380/0x380
      [  161.154256]  ? override_creds+0x45/0x60
      [  161.155006]  nfsd4_encode_readdir+0x21a/0x3d0
      [  161.155850]  ? nfsd4_encode_readlink+0x210/0x210
      [  161.156731]  ? write_bytes_to_xdr_buf+0x97/0xe0
      [  161.157598]  ? __write_bytes_to_xdr_buf+0xd0/0xd0
      [  161.158494]  ? lock_downgrade+0x90/0x90
      [  161.159232]  ? nfs4svc_decode_voidarg+0x10/0x10
      [  161.160092]  nfsd4_encode_operation+0x15a/0x440
      [  161.160959]  nfsd4_proc_compound+0x718/0xe90
      [  161.161818]  nfsd_dispatch+0x18e/0x2c0
      [  161.162586]  svc_process_common+0x786/0xc50
      [  161.163403]  ? nfsd_svc+0x380/0x380
      [  161.164137]  ? svc_printk+0x160/0x160
      [  161.164846]  ? svc_xprt_do_enqueue.part.0+0x365/0x380
      [  161.165808]  ? nfsd_svc+0x380/0x380
      [  161.166523]  ? rcu_is_watching+0x23/0x40
      [  161.167309]  svc_process+0x1a5/0x200
      [  161.168019]  nfsd+0x1f5/0x380
      [  161.168663]  ? nfsd_shutdown_threads+0x260/0x260
      [  161.169554]  kthread+0x1c4/0x210
      [  161.170224]  ? kthread_insert_work_sanity_check+0x80/0x80
      [  161.171246]  ret_from_fork+0x1f/0x30
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      340e61e4
    • Li Lingfeng's avatar
      NFSD: remove redundant assignment operation · 2039c5da
      Li Lingfeng authored
      Commit 5826e09b ("NFSD: OP_CB_RECALL_ANY should recall both read and
      write delegations") added a new assignment statement to add
      RCA4_TYPE_MASK_WDATA_DLG to ra_bmval bitmask of OP_CB_RECALL_ANY. So the
      old one should be removed.
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2039c5da
    • Chuck Lever's avatar
      .mailmap: Add an entry for my work email address · ecbf8494
      Chuck Lever authored
      Collect a few very old previous employers as well.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ecbf8494
    • Chuck Lever's avatar
      NFSD: Fix NFSv4's PUTPUBFH operation · 202f3903
      Chuck Lever authored
      According to RFC 8881, all minor versions of NFSv4 support PUTPUBFH.
      
      Replace the XDR decoder for PUTPUBFH with a "noop" since we no
      longer want the minorversion check, and PUTPUBFH has no arguments to
      decode. (Ideally nfsd4_decode_noop should really be called
      nfsd4_decode_void).
      
      PUTPUBFH should now behave just like PUTROOTFH.
      Reported-by: default avatarCedric Blancher <cedric.blancher@gmail.com>
      Fixes: e1a90ebd ("NFSD: Combine decode operations for v4 and v4.1")
      Cc: Dan Shelton <dan.f.shelton@gmail.com>
      Cc: Roland Mainz <roland.mainz@nrubsig.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      202f3903
    • Mark Grimes's avatar
      nfsd: Add quotes to client info 'callback address' · 32b34fa4
      Mark Grimes authored
      The 'callback address' in client_info_show is output without quotes
      causing yaml parsers to fail on processing IPv6 addresses.
      Adding quotes to 'callback address' also matches that used by
      the 'address' field.
      Signed-off-by: default avatarMark Grimes <mark.grimes@ixsystems.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      32b34fa4
    • Chuck Lever's avatar
      svcrdma: Handle device removal outside of the CM event handler · c4de97f7
      Chuck Lever authored
      Synchronously wait for all disconnects to complete to ensure the
      transports have divested all hardware resources before the
      underlying RDMA device can safely be removed.
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c4de97f7
    • NeilBrown's avatar
      nfsd: move error choice for incorrect object types to version-specific code. · 438f81e0
      NeilBrown authored
      If an NFS operation expects a particular sort of object (file, dir, link,
      etc) but gets a file handle for a different sort of object, it must
      return an error.  The actual error varies among NFS versions in non-trivial
      ways.
      
      For v2 and v3 there are ISDIR and NOTDIR errors and, for NFSv4 only,
      INVAL is suitable.
      
      For v4.0 there is also NFS4ERR_SYMLINK which should be used if a SYMLINK
      was found when not expected.  This take precedence over NOTDIR.
      
      For v4.1+ there is also NFS4ERR_WRONG_TYPE which should be used in
      preference to EINVAL when none of the specific error codes apply.
      
      When nfsd_mode_check() finds a symlink where it expected a directory it
      needs to return an error code that can be converted to NOTDIR for v2 or
      v3 but will be SYMLINK for v4.  It must be different from the error
      code returns when it finds a symlink but expects a regular file - that
      must be converted to EINVAL or SYMLINK.
      
      So we introduce an internal error code nfserr_symlink_not_dir which each
      version converts as appropriate.
      
      nfsd_check_obj_isreg() is similar to nfsd_mode_check() except that it is
      only used by NFSv4 and only for OPEN.  NFSERR_INVAL is never a suitable
      error if the object is the wrong time.  For v4.0 we use nfserr_symlink
      for non-dirs even if not a symlink.  For v4.1 we have nfserr_wrong_type.
      We handle this difference in-place in nfsd_check_obj_isreg() as there is
      nothing to be gained by delaying the choice to nfsd4_map_status().
      
      As a result of these changes, nfsd_mode_check() doesn't need an rqstp
      arg any more.
      
      Note that NFSv4 operations are actually performed in the xdr code(!!!)
      so to the only place that we can map the status code successfully is in
      nfsd4_encode_operation().
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      438f81e0
    • NeilBrown's avatar
      nfsd: be more systematic about selecting error codes for internal use. · 36ffa3d0
      NeilBrown authored
      Rather than using ad hoc values for internal errors (30000, 11000, ...)
      use 'enum' to sequentially allocate numbers starting from the first
      known available number - now visible as NFS4ERR_FIRST_FREE.
      
      The goal is values that are distinct from all be32 error codes.  To get
      those we must first select integers that are not already used, then
      convert them with cpu_to_be32().
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      36ffa3d0
    • NeilBrown's avatar
      nfsd: Move error code mapping to per-version proc code. · 1459ad57
      NeilBrown authored
      There is code scattered around nfsd which chooses an error status based
      on the particular version of nfs being used.  It is cleaner to have the
      version specific choices in version specific code.
      
      With this patch common code returns the most specific error code
      possible and the version specific code maps that if necessary.
      
      Both v2 (nfsproc.c) and v3 (nfs3proc.c) now have a "map_status()"
      function which is called to map the resp->status before each non-trivial
      nfsd_proc_* or nfsd3_proc_* function returns.
      
      NFS4ERR_SYMLINK and NFS4ERR_WRONG_TYPE introduce extra complications and
      are left for a later patch.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      1459ad57
    • NeilBrown's avatar
      nfsd: move V4ROOT version check to nfsd_set_fh_dentry() · ef7f6c49
      NeilBrown authored
      This further centralizes version number checks.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ef7f6c49
    • NeilBrown's avatar
      nfsd: further centralize protocol version checks. · c689bdd3
      NeilBrown authored
      With this patch the only places that test ->rq_vers against a specific
      version are nfsd_v4client() and nfsd_set_fh_dentry().
      The latter sets some flags in the svc_fh, which now includes:
        fh_64bit_cookies
        fh_use_wgather
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c689bdd3