1. 05 Jan, 2024 1 commit
  2. 18 Dec, 2023 3 commits
    • Chuck Lever's avatar
      SUNRPC: Revert 5f7fc5d6 · bd018b98
      Chuck Lever authored
      Guillaume says:
      > I believe commit 5f7fc5d6 ("SUNRPC: Resupply rq_pages from
      > node-local memory") in Linux 6.5+ is incorrect. It passes
      > unconditionally rq_pool->sp_id as the NUMA node.
      >
      > While the comment in the svc_pool declaration in sunrpc/svc.h says
      > that sp_id is also the NUMA node id, it might not be the case if
      > the svc is created using svc_create_pooled(). svc_created_pooled()
      > can use the per-cpu pool mode therefore in this case sp_id would
      > be the cpu id.
      
      Fix this by reverting now. At a later point this minor optimization,
      and the deceptive labeling of the sp_id field, can be revisited.
      Reported-by: default avatarGuillaume Morin <guillaume@morinfr.org>
      Closes: https://lore.kernel.org/linux-nfs/ZYC9rsno8qYggVt9@bender.morinfr.org/T/#u
      Fixes: 5f7fc5d6 ("SUNRPC: Resupply rq_pages from node-local memory")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      bd018b98
    • Chuck Lever's avatar
      NFSD: Revert 738401a9 · 1227561c
      Chuck Lever authored
      There's nothing wrong with this commit, but this is dead code now
      that nothing triggers a CB_GETATTR callback. It can be re-introduced
      once the issues with handling conflicting GETATTRs are resolved.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      1227561c
    • Chuck Lever's avatar
      NFSD: Revert 6c41d9a9 · 862bee84
      Chuck Lever authored
      For some reason, the wait_on_bit() in nfsd4_deleg_getattr_conflict()
      is waiting forever, preventing a clean server shutdown. The
      requesting client might also hang waiting for a reply to the
      conflicting GETATTR.
      
      Invoking wait_on_bit() in an nfsd thread context is a hazard. The
      correct fix is to replace this wait_on_bit() call site with a
      mechanism that defers the conflicting GETATTR until the CB_GETATTR
      completes or is known to have failed.
      
      That will require some surgery and extended testing and it's late
      in the v6.7-rc cycle, so I'm reverting now in favor of trying again
      in a subsequent kernel release.
      
      This is my fault: I should have recognized the ramifications of
      calling wait_on_bit() in here before accepting this patch.
      
      Thanks to Dai Ngo <dai.ngo@oracle.com> for diagnosing the issue.
      Reported-by: default avatarWolfgang Walter <linux-nfs@stwm.de>
      Closes: https://lore.kernel.org/linux-nfs/e3d43ecdad554fbdcaa7181833834f78@stwm.de/Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      862bee84
  3. 15 Dec, 2023 2 commits
  4. 17 Nov, 2023 4 commits
    • Chuck Lever's avatar
      NFSD: Fix checksum mismatches in the duplicate reply cache · bf51c52a
      Chuck Lever authored
      nfsd_cache_csum() currently assumes that the server's RPC layer has
      been advancing rq_arg.head[0].iov_base as it decodes an incoming
      request, because that's the way it used to work. On entry, it
      expects that buf->head[0].iov_base points to the start of the NFS
      header, and excludes the already-decoded RPC header.
      
      These days however, head[0].iov_base now points to the start of the
      RPC header during all processing. It no longer points at the NFS
      Call header when execution arrives at nfsd_cache_csum().
      
      In a retransmitted RPC the XID and the NFS header are supposed to
      be the same as the original message, but the contents of the
      retransmitted RPC header can be different. For example, for krb5,
      the GSS sequence number will be different between the two. Thus if
      the RPC header is always included in the DRC checksum computation,
      the checksum of the retransmitted message might not match the
      checksum of the original message, even though the NFS part of these
      messages is identical.
      
      The result is that, even if a matching XID is found in the DRC,
      the checksum mismatch causes the server to execute the
      retransmitted RPC transaction again.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Tested-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      bf51c52a
    • Chuck Lever's avatar
      NFSD: Fix "start of NFS reply" pointer passed to nfsd_cache_update() · 1caf5f61
      Chuck Lever authored
      The "statp + 1" pointer that is passed to nfsd_cache_update() is
      supposed to point to the start of the egress NFS Reply header. In
      fact, it does point there for AUTH_SYS and RPCSEC_GSS_KRB5 requests.
      
      But both krb5i and krb5p add fields between the RPC header's
      accept_stat field and the start of the NFS Reply header. In those
      cases, "statp + 1" points at the extra fields instead of the Reply.
      The result is that nfsd_cache_update() caches what looks to the
      client like garbage.
      
      A connection break can occur for a number of reasons, but the most
      common reason when using krb5i/p is a GSS sequence number window
      underrun. When an underrun is detected, the server is obliged to
      drop the RPC and the connection to force a retransmit with a fresh
      GSS sequence number. The client presents the same XID, it hits in
      the server's DRC, and the server returns the garbage cache entry.
      
      The "statp + 1" argument has been used since the oldest changeset
      in the kernel history repo, so it has been in nfsd_dispatch()
      literally since before history began. The problem arose only when
      the server-side GSS implementation was added twenty years ago.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Tested-by: Jeff Layton <jlayton@kernel.org
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      1caf5f61
    • Chuck Lever's avatar
      NFSD: Update nfsd_cache_append() to use xdr_stream · 49cecd86
      Chuck Lever authored
      When inserting a DRC-cached response into the reply buffer, ensure
      that the reply buffer's xdr_stream is updated properly. Otherwise
      the server will send a garbage response.
      
      Cc: stable@vger.kernel.org # v6.3+
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Tested-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      49cecd86
    • Mahmoud Adam's avatar
      nfsd: fix file memleak on client_opens_release · bc1b5acb
      Mahmoud Adam authored
      seq_release should be called to free the allocated seq_file
      
      Cc: stable@vger.kernel.org # v5.3+
      Signed-off-by: default avatarMahmoud Adam <mngyadam@amazon.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Fixes: 78599c42 ("nfsd4: add file to display list of client's opens")
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Tested-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      bc1b5acb
  5. 16 Oct, 2023 30 commits