1. 25 Apr, 2024 1 commit
  2. 24 Apr, 2024 2 commits
    • Chuck Lever's avatar
      Revert "NFSD: Convert the callback workqueue to use delayed_work" · 8ddb7142
      Chuck Lever authored
      This commit was a pre-requisite for commit c1ccfcf1 ("NFSD:
      Reschedule CB operations when backchannel rpc_clnt is shut down"),
      which has already been reverted.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      8ddb7142
    • Chuck Lever's avatar
      Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down" · 9c8ecb93
      Chuck Lever authored
      The reverted commit attempted to enable NFSD to retransmit pending
      callback operations if an NFS client disconnects, but
      unintentionally introduces a hazardous behavior regression if the
      client becomes permanently unreachable while callback operations are
      still pending.
      
      A disconnect can occur due to network partition or if the NFS server
      needs to force the NFS client to retransmit (for example, if a GSS
      window under-run occurs).
      
      Reverting the commit will make NFSD behave the same as it did in
      v6.8 and before. Pending callback operations are permanently lost if
      the client connection is terminated before the client receives them.
      
      For some callback operations, this loss is not harmful.
      
      However, for CB_RECALL, the loss means a delegation might be revoked
      unnecessarily. For CB_OFFLOAD, pending COPY operations will never
      complete unless the NFS client subsequently sends an OFFLOAD_STATUS
      operation, which the Linux NFS client does not currently implement.
      
      These issues still need to be addressed somehow.
      Reported-by: default avatarDai Ngo <dai.ngo@oracle.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=218735Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9c8ecb93
  3. 20 Apr, 2024 1 commit
    • Chuck Lever's avatar
      Revert "svcrdma: Add Write chunk WRs to the RPC's Send WR chain" · 32cf5a4e
      Chuck Lever authored
      Performance regression reported with NFS/RDMA using Omnipath,
      bisected to commit e084ee67 ("svcrdma: Add Write chunk WRs to
      the RPC's Send WR chain").
      
      Tracing on the server reports:
      
        nfsd-7771  [060]  1758.891809: svcrdma_sq_post_err:
      	cq.id=205 cid=226 sc_sq_avail=13643/851 status=-12
      
      sq_post_err reports ENOMEM, and the rdma->sc_sq_avail (13643) is
      larger than rdma->sc_sq_depth (851). The number of available Send
      Queue entries is always supposed to be smaller than the Send Queue
      depth. That seems like a Send Queue accounting bug in svcrdma.
      
      As it's getting to be late in the 6.9-rc cycle, revert this commit.
      It can be revisited in a subsequent kernel release.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=218743
      Fixes: e084ee67 ("svcrdma: Add Write chunk WRs to the RPC's Send WR chain")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      32cf5a4e
  4. 11 Apr, 2024 1 commit
  5. 10 Apr, 2024 1 commit
    • Steven Rostedt (Google)'s avatar
      SUNRPC: Fix rpcgss_context trace event acceptor field · a4833e3a
      Steven Rostedt (Google) authored
      The rpcgss_context trace event acceptor field is a dynamically sized
      string that records the "data" parameter. But this parameter is also
      dependent on the "len" field to determine the size of the data.
      
      It needs to use __string_len() helper macro where the length can be passed
      in. It also incorrectly uses strncpy() to save it instead of
      __assign_str(). As these macros can change, it is not wise to open code
      them in trace events.
      
      As of commit c759e609 ("tracing: Remove __assign_str_len()"),
      __assign_str() can be used for both __string() and __string_len() fields.
      Before that commit, __assign_str_len() is required to be used. This needs
      to be noted for backporting. (In actuality, commit c1fa617c ("tracing:
      Rework __assign_str() and __string() to not duplicate getting the string")
      is the commit that makes __string_str_len() obsolete).
      
      Cc: stable@vger.kernel.org
      Fixes: 0c77668d ("SUNRPC: Introduce trace points in rpc_auth_gss.ko")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      a4833e3a
  6. 05 Apr, 2024 1 commit
    • Jeff Layton's avatar
      nfsd: hold a lighter-weight client reference over CB_RECALL_ANY · 10396f4d
      Jeff Layton authored
      Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the
      client. While a callback job is technically an RPC that counter is
      really more for client-driven RPCs, and this has the effect of
      preventing the client from being unhashed until the callback completes.
      
      If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we
      can end up in a situation where the callback can't complete on the (now
      dead) callback channel, but the new client can't connect because the old
      client can't be unhashed. This usually manifests as a NFS4ERR_DELAY
      return on the CREATE_SESSION operation.
      
      The job is only holding a reference to the client so it can clear a flag
      after the RPC completes. Fix this by having CB_RECALL_ANY instead hold a
      reference to the cl_nfsdfs.cl_ref. Typically we only take that sort of
      reference when dealing with the nfsdfs info files, but it should work
      appropriately here to ensure that the nfs4_client doesn't disappear.
      
      Fixes: 44df6f43 ("NFSD: add delegation reaper to react to low memory condition")
      Reported-by: default avatarVladimir Benes <vbenes@redhat.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      10396f4d
  7. 04 Apr, 2024 1 commit
  8. 27 Mar, 2024 1 commit
    • Chuck Lever's avatar
      NFSD: CREATE_SESSION must never cache NFS4ERR_DELAY replies · 99dc2ef0
      Chuck Lever authored
      There are one or two cases where CREATE_SESSION returns
      NFS4ERR_DELAY in order to force the client to wait a bit and try
      CREATE_SESSION again. However, after commit e4469c6c ("NFSD: Fix
      the NFSv4.1 CREATE_SESSION operation"), NFSD caches that response in
      the CREATE_SESSION slot. Thus, when the client resends the
      CREATE_SESSION, the server always returns the cached NFS4ERR_DELAY
      response rather than actually executing the request and properly
      recording its outcome. This blocks the client from making further
      progress.
      
      RFC 8881 Section 15.1.1.3 says:
      > If NFS4ERR_DELAY is returned on an operation other than SEQUENCE
      > that validly appears as the first operation of a request ... [t]he
      > request can be retried in full without modification. In this case
      > as well, the replier MUST avoid returning a response containing
      > NFS4ERR_DELAY as the response to an initial operation of a request
      > solely on the basis of its presence in the reply cache.
      
      Neither the original NFSD code nor the discussion in section 18.36.4
      refer explicitly to this important requirement, so I missed it.
      
      Note also that not only must the server not cache NFS4ERR_DELAY, but
      it has to not advance the CREATE_SESSION slot sequence number so
      that it can properly recognize and accept the client's retry.
      Reported-by: default avatarDai Ngo <dai.ngo@oracle.com>
      Fixes: e4469c6c ("NFSD: Fix the NFSv4.1 CREATE_SESSION operation")
      Tested-by: default avatarDai Ngo <dai.ngo@oracle.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      99dc2ef0
  9. 22 Mar, 2024 2 commits
  10. 09 Mar, 2024 1 commit
    • Chuck Lever's avatar
      NFSD: Clean up nfsd4_encode_replay() · 9b350d3e
      Chuck Lever authored
      Replace open-coded encoding logic with the use of conventional XDR
      utility functions. Add a tracepoint to make replays observable in
      field troubleshooting situations.
      
      The WARN_ON is removed. A stack trace is of little use, as there is
      only one call site for nfsd4_encode_replay(), and a buffer length
      shortage here is unlikely.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9b350d3e
  11. 05 Mar, 2024 2 commits
  12. 01 Mar, 2024 26 commits