1. 16 Oct, 2023 40 commits
    • Chuck Lever's avatar
      NFSD: Add nfsd4_encode_fattr4_change() · 263453d9
      Chuck Lever authored
      Refactor the encoder for FATTR4_CHANGE into a helper. In a
      subsequent patch, this helper will be called from a bitmask loop.
      
      The code is restructured a bit to use the modern xdr_stream flow,
      and the encoded cinfo value is made const so that callers of the
      encoders can be passed a const cinfo.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      263453d9
    • Chuck Lever's avatar
      NFSD: Add nfsd4_encode_fattr4_fh_expire_type() · 36ed7e64
      Chuck Lever authored
      Refactor the encoder for FATTR4_FH_EXPIRE_TYPE into a helper. In a
      subsequent patch, this helper will be called from a bitmask loop.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      36ed7e64
    • Chuck Lever's avatar
      NFSD: Add nfsd4_encode_fattr4_type() · b06cf375
      Chuck Lever authored
      Refactor the encoder for FATTR4_TYPE into a helper. In a subsequent
      patch, this helper will be called from a bitmask loop.
      
      In addition, restructure the code so that byte-swapping is done on
      constant values rather than at run time. Run-time swapping can be
      costly on some platforms, and "type" is a frequently-requested
      attribute.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      b06cf375
    • Chuck Lever's avatar
      NFSD: Add nfsd4_encode_fattr4_supported_attrs() · c9090e27
      Chuck Lever authored
      Refactor the encoder for FATTR4_SUPPORTED_ATTRS into a helper. In a
      subsequent patch, this helper will be called from a bitmask loop.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c9090e27
    • Chuck Lever's avatar
      NFSD: Add nfsd4_encode_fattr4__false() · 8c442288
      Chuck Lever authored
      Add an encoding helper that encodes a single boolean "false" value.
      Attributes that always return "false" can use this helper.
      
      In a subsequent patch, this helper will be called from a bitmask
      loop, so it is given a standardized synopsis.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      8c442288
    • Chuck Lever's avatar
      NFSD: Add nfsd4_encode_fattr4__true() · c88cb472
      Chuck Lever authored
      Add an encoding helper that encodes a single boolean "true" value.
      Attributes that always return "true" can use this helper.
      
      In a subsequent patch, this helper will be called from a bitmask
      loop, so it is given a standardized synopsis.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c88cb472
    • Chuck Lever's avatar
      NFSD: Add struct nfsd4_fattr_args · 83ab8678
      Chuck Lever authored
      I'm about to split nfsd4_encode_fattr() into a number of smaller
      functions. Instead of passing a large number of arguments to each of
      the smaller functions, create a struct that can gather the common
      argument variables into something with a convenient handle on it.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      83ab8678
    • Chuck Lever's avatar
      NFSD: Clean up nfsd4_encode_setattr() · c3dcb45b
      Chuck Lever authored
      De-duplicate the encoding of bitmap4 results in
      nfsd4_encode_setattr().
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c3dcb45b
    • Chuck Lever's avatar
      NFSD: Rename nfsd4_encode_bitmap() · e64301f5
      Chuck Lever authored
      For alignment with the specification, the name of NFSD's encoder
      function should match the name of the XDR type.
      
      I've also replaced a few "naked integers" with symbolic constants
      that better reflect the usage of these values.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      e64301f5
    • Chuck Lever's avatar
      NFSD: Add simple u32, u64, and bool encoders · 6cc58291
      Chuck Lever authored
      The generic XDR encoders return a length or a negative errno. NFSv4
      encoders want to know simply whether the encode ran out of stream
      buffer space. The return values for server-side encoding are either
      nfs_ok or nfserr_resource.
      
      So far I've found it adds a lot of duplicate code to try to use the
      generic XDR encoder utilities when encoding the simple data types in
      the NFSv4 operation encoders.
      
      Add a set of NFSv4-specific utilities that handle the basic XDR data
      types. These are added in xdr4.h so they might eventually be used by
      the callback server and pNFS driver encoders too.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      6cc58291
    • Chuck Lever's avatar
      SUNRPC: Remove BUG_ON call sites · 789ce196
      Chuck Lever authored
      There is no need to take down the whole system for these assertions.
      
      I'd rather not attempt a heroic save here, as some bug has occurred
      that has left the transport data structures in an unknown state.
      Just warn and then leak the left-over resources.
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      789ce196
    • Kinglong Mee's avatar
    • Chuck Lever's avatar
    • Lorenzo Bianconi's avatar
      NFSD: add rpc_status netlink support · bd9d6a3e
      Lorenzo Bianconi authored
      Introduce rpc_status netlink support for NFSD in order to dump pending
      RPC requests debugging information from userspace.
      
      Closes: https://bugzilla.linux-nfs.org/show_bug.cgi?id=366Tested-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      bd9d6a3e
    • Lorenzo Bianconi's avatar
      NFSD: introduce netlink stubs · 13727f85
      Lorenzo Bianconi authored
      Generate stubs and uAPI for nfsd netlink protocol. For the moment,
      the new protocol has one operation: rpc_status.
      
      The generated header and source files are created by running:
      
        tools/net/ynl/ynl-regen.sh
      Tested-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarLorenzo Bianconi <lorenzo@kernel.org>
      Acked-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      13727f85
    • Dai Ngo's avatar
      NFSD: handle GETATTR conflict with write delegation · 6c41d9a9
      Dai Ngo authored
      If the GETATTR request on a file that has write delegation in effect
      and the request attributes include the change info and size attribute
      then the request is handled as below:
      
      Server sends CB_GETATTR to client to get the latest change info and file
      size. If these values are the same as the server's cached values then
      the GETATTR proceeds as normal.
      
      If either the change info or file size is different from the server's
      cached values, or the file was already marked as modified, then:
      
          . update time_modify and time_metadata into file's metadata
            with current time
      
          . encode GETATTR as normal except the file size is encoded with
            the value returned from CB_GETATTR
      
          . mark the file as modified
      
      If the CB_GETATTR fails for any reasons, the delegation is recalled
      and NFS4ERR_DELAY is returned for the GETATTR.
      Signed-off-by: default avatarDai Ngo <dai.ngo@oracle.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      6c41d9a9
    • Dai Ngo's avatar
      NFSD: add support for CB_GETATTR callback · 738401a9
      Dai Ngo authored
      Includes:
         . CB_GETATTR proc for nfs4_cb_procedures[]
         . XDR encoding and decoding function for CB_GETATTR request/reply
         . add nfs4_cb_fattr to nfs4_delegation for sending CB_GETATTR
           and store file attributes from client's reply.
      Signed-off-by: default avatarDai Ngo <dai.ngo@oracle.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      738401a9
    • NeilBrown's avatar
      SUNRPC: change the back-channel queue to lwq · 15d39883
      NeilBrown authored
      This removes the need to store and update back-links in the list.
      It also remove the need for the _bh version of spin_lock().
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      15d39883
    • NeilBrown's avatar
      SUNRPC: discard sp_lock · 580a2575
      NeilBrown authored
      sp_lock is now only used to protect sp_all_threads.  This isn't needed
      as sp_all_threads is only manipulated through svc_set_num_threads(),
      which is already serialized.  Read-acccess only requires rcu_read_lock().
      So no more locking is needed.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      580a2575
    • NeilBrown's avatar
      SUNRPC: change sp_nrthreads to atomic_t · 2e8fc923
      NeilBrown authored
      Using an atomic_t avoids the need to take a spinlock (which can soon be
      removed).
      
      Choosing a thread to kill needs to be careful as we cannot set the "die
      now" bit atomically with the test on the count.  Instead we temporarily
      increase the count.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2e8fc923
    • NeilBrown's avatar
      SUNRPC: use lwq for sp_sockets - renamed to sp_xprts · 9a0e6acc
      NeilBrown authored
      lwq avoids using back pointers in lists, and uses less locking.
      This introduces a new spinlock, but the other one will be removed in a
      future patch.
      
      For svc_clean_up_xprts(), we now dequeue the entire queue, walk it to
      remove and process the xprts that need cleaning up, then re-enqueue the
      remaining queue.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9a0e6acc
    • NeilBrown's avatar
      SUNRPC: only have one thread waking up at a time · 5b80147e
      NeilBrown authored
      Currently if several items of work become available in quick succession,
      that number of threads (if available) will be woken.  By the time some
      of them wake up another thread that was already cache-warm might have
      come along and completed the work.  Anecdotal evidence suggests as many
      as 15% of wakes find nothing to do once they get to the point of
      looking.
      
      This patch changes svc_pool_wake_idle_thread() to wake the first thread
      on the queue but NOT remove it.  Subsequent calls will wake the same
      thread.  Once that thread starts it will dequeue itself and after
      dequeueing some work to do, it will wake the next thread if there is more
      work ready.  This results in a more orderly increase in the number of
      busy threads.
      
      As a bonus, this allows us to reduce locking around the idle queue.
      svc_pool_wake_idle_thread() no longer needs to take a lock (beyond
      rcu_read_lock()) as it doesn't manipulate the queue, it just looks at
      the first item.
      
      The thread itself can avoid locking by using the new
      llist_del_first_this() interface.  This will safely remove the thread
      itself if it is the head.  If it isn't the head, it will do nothing.
      If multiple threads call this concurrently only one will succeed.  The
      others will do nothing, so no corruption can result.
      
      If a thread wakes up and finds that it cannot dequeue itself that means
      either
      - that it wasn't woken because it was the head of the queue.  Maybe the
        freezer woke it.  In that case it can go back to sleep (after trying
        to freeze of course).
      - some other thread found there was nothing to do very recently, and
        placed itself on the head of the queue in front of this thread.
        It must check again after placing itself there, so it can be deemed to
        be responsible for any pending work, and this thread can go back to
        sleep until woken.
      
      No code ever tests for busy threads any more.  Only each thread itself
      cares if it is busy.  So svc_thread_busy() is no longer needed.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      5b80147e
    • NeilBrown's avatar
      SUNRPC: rename some functions from rqst_ to svc_thread_ · d7926ee8
      NeilBrown authored
      Functions which directly manipulate a 'struct rqst', such as
      svc_rqst_alloc() or svc_rqst_release_pages(), can reasonably
      have "rqst" in there name.
      However functions that act on the running thread, such as
      XX_should_sleep() or XX_wait_for_work() should seem more
      natural with a "svc_thread_" prefix.
      
      So make those changes.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      d7926ee8
    • NeilBrown's avatar
      lib: add light-weight queuing mechanism. · de9e82c3
      NeilBrown authored
      lwq is a FIFO single-linked queue that only requires a spinlock
      for dequeueing, which happens in process context.  Enqueueing is atomic
      with no spinlock and can happen in any context.
      
      This is particularly useful when work items are queued from BH or IRQ
      context, and when they are handled one at a time by dedicated threads.
      
      Avoiding any locking when enqueueing means there is no need to disable
      BH or interrupts, which is generally best avoided (particularly when
      there are any RT tasks on the machine).
      
      This solution is superior to using "list_head" links because we need
      half as many pointers in the data structures, and because list_head
      lists would need locking to add items to the queue.
      
      This solution is superior to a bespoke solution as all locking and
      container_of casting is integrated, so the interface is simple.
      
      Despite the similar name, this solution meets a distinctly different
      need to kfifo.  kfifo provides a fixed sized circular buffer to which
      data can be added at one end and removed at the other, and does not
      provide any locking.  lwq does not have any size limit and works with
      data structures (objects?) rather than data (bytes).
      
      A unit test for basic functionality, which runs at boot time, is included.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: David Gow <davidgow@google.com>
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20230911111333.4d1a872330e924a00acb905b@linux-foundation.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      de9e82c3
    • NeilBrown's avatar
      llist: add llist_del_first_this() · 8a3e5975
      NeilBrown authored
      llist_del_first_this() deletes a specific entry from an llist, providing
      it is at the head of the list.  Multiple threads can call this
      concurrently providing they each offer a different entry.
      
      This can be uses for a set of worker threads which are on the llist when
      they are idle.  The head can always be woken, and when it is woken it
      can remove itself, and possibly wake the next if there is an excess of
      work to do.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      8a3e5975
    • NeilBrown's avatar
      SUNRPC: change service idle list to be an llist · 9bd4161c
      NeilBrown authored
      With an llist we don't need to take a lock to add a thread to the list,
      though we still need a lock to remove it.  That will go in the next
      patch.
      
      Unlike double-linked lists, a thread cannot reliably remove itself from
      the list.  Only the first thread can be removed, and that can change
      asynchronously.  So some care is needed.
      
      We already check if there is pending work to do, so we are unlikely to
      add ourselves to the idle list and then want to remove ourselves again.
      
      If we DO find something needs to be done after adding ourselves to the
      list, we simply wake up the first thread on the list.  If that was us,
      we successfully removed ourselves and can continue.  If it was some
      other thread, they will do the work that needs to be done.  We can
      safely sleep until woken.
      
      We also remove the test on freezing() from rqst_should_sleep().  Instead
      we set TASK_FREEZABLE before scheduling.  This makes is safe to
      schedule() when a freeze is pending.  As we now loop waiting to be
      removed from the idle queue, this is a cleaner way to handle freezing.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9bd4161c
    • NeilBrown's avatar
      llist: add interface to check if a node is on a list. · d6b3358a
      NeilBrown authored
      With list.h lists, it is easy to test if a node is on a list, providing
      it was initialised and that it is removed with list_del_init().
      
      This patch provides similar functionality for llist.h lists.
      
       init_llist_node()
      marks a node as being not-on-any-list be setting the ->next pointer to
      the node itself.
       llist_on_list()
      tests if the node is on any list.
       llist_del_first_init()
      remove the first element from a llist, and marks it as being off-list.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      d6b3358a
    • NeilBrown's avatar
      SUNRPC: discard SP_CONGESTED · 2b65a226
      NeilBrown authored
      We can tell if a pool is congested by checking if the idle list is
      empty.  We don't need a separate flag.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2b65a226
    • NeilBrown's avatar
      SUNRPC: add list of idle threads · 5ff817b2
      NeilBrown authored
      Rather than searching a list of threads to find an idle one, having a
      list of idle threads allows an idle thread to be found immediately.
      
      This adds some spin_lock calls which is not ideal, but as the hold-time
      is tiny it is still faster than searching a list.  A future patch will
      remove them using llist.h.  This involves some subtlety and so is left
      to a separate patch.
      
      This removes the need for the RQ_BUSY flag.  The rqst is "busy"
      precisely when it is not on the "idle" list.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      5ff817b2
    • NeilBrown's avatar
      SUNRPC: change how svc threads are asked to exit. · fa341560
      NeilBrown authored
      svc threads are currently stopped using kthread_stop().  This requires
      identifying a specific thread.  However we don't care which thread
      stops, just as long as one does.
      
      So instead, set a flag in the svc_pool to say that a thread needs to
      die, and have each thread check this flag instead of calling
      kthread_should_stop().  The first thread to find and clear this flag
      then moves towards exiting.
      
      This removes an explicit dependency on sp_all_threads which will make a
      future patch simpler.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      fa341560
    • NeilBrown's avatar
      lockd: hold a reference to nlmsvc_serv while stopping the thread. · f4578ba1
      NeilBrown authored
      Both nfsd and nfsv4-callback take a temporary reference to the svc_serv
      while calling svc_set_num_threads() to stop the last thread.  lockd does
      not.
      
      This extra reference prevents the scv_serv from being freed when the
      last thread drops its reference count.  This is not currently needed
      for lockd as the svc_serv is not accessed after the last thread is told
      to exit.
      
      However a future patch will require svc_exit_thread() to access the
      svc_serv after the svc_put() so it will need the code that calls
      svc_set_num_threads() to keep a reference and keep the svc_serv active.
      
      So copy the pattern from nfsd and nfsv4-cb to lockd, and take a
      reference around svc_set_num_threads(.., 0)
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Tested-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      f4578ba1
    • NeilBrown's avatar
      SUNRPC: integrate back-channel processing with svc_recv() · 063ab935
      NeilBrown authored
      Using svc_recv() for (NFSv4.1) back-channel handling means we have just
      one mechanism for waking threads.
      
      Also change kthread_freezable_should_stop() in nfs4_callback_svc() to
      kthread_should_stop() as used elsewhere.
      kthread_freezable_should_stop() effectively adds a try_to_freeze() call,
      and svc_recv() already contains that at an appropriate place.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      063ab935
    • Chuck Lever's avatar
      SUNRPC: Clean up bc_svc_process() · 6ed8cdf9
      Chuck Lever authored
      The test robot complained that, in some build configurations, the
      @error variable in bc_svc_process's only caller is set but never
      used. This happens because dprintk() is the only consumer of that
      value.
      
       - Remove the dprintk() call sites in favor of the svc_process
         tracepoint
       - The @error variable and the return value of bc_svc_process() are
         now unused, so get rid of them.
       - The @serv parameter is set to rqstp->rq_serv by the only caller,
         and bc_svc_process() then uses it only to set rqstp->rq_serv. It
         can be removed.
       - Rename bc_svc_process() according to the convention that
         globally-visible RPC server functions have names that begin with
         "svc_"; and because it is globally-visible, give it a proper
         kdoc comment.
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202308121314.HA8Rq2XG-lkp@intel.com/Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      6ed8cdf9
    • NeilBrown's avatar
      SUNRPC: rename and refactor svc_get_next_xprt() · 7b31f4da
      NeilBrown authored
      svc_get_next_xprt() does a lot more than just get an xprt.  It also
      decides if it needs to sleep, depending not only on the availability of
      xprts but also on the need to exit or handle external work.
      
      So rename it to svc_rqst_wait_for_work() and only do the testing and
      waiting.  Move all the waiting-related code out of svc_recv() into the
      new svc_rqst_wait_for_work().
      
      Move the dequeueing code out of svc_get_next_xprt() into svc_recv().
      
      Previously svc_xprt_dequeue() would be called twice, once before waiting
      and possibly once after.  Now instead rqst_should_sleep() is called
      twice.  Once to decide if waiting is needed, and once to check against
      after setting the task state do see if we might have missed a wakeup.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      7b31f4da
    • NeilBrown's avatar
      SUNRPC: move all of xprt handling into svc_xprt_handle() · e3274026
      NeilBrown authored
      svc_xprt_handle() does lots of things itself, but leaves some to the
      caller - svc_recv().  This isn't elegant.
      
      Move that code out of svc_recv() into svc_xprt_handle()
      
      Move the calls to svc_xprt_release() from svc_send() and svc_drop()
      (the two possible final steps in svc_process()) and from svc_recv() (in
      the case where svc_process() wasn't called) into svc_xprt_handle().
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      e3274026
    • Alexander Aring's avatar
      lockd: add doc to enable EXPORT_OP_ASYNC_LOCK · e70da176
      Alexander Aring authored
      This patch adds a note to enable EXPORT_OP_ASYNC_LOCK for
      asynchronous lock request handling.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      e70da176
    • Alexander Aring's avatar
      lockd: fix race in async lock request handling · afb13302
      Alexander Aring authored
      This patch fixes a race in async lock request handling between adding
      the relevant struct nlm_block to nlm_blocked list after the request was
      sent by vfs_lock_file() and nlmsvc_grant_deferred() does a lookup of the
      nlm_block in the nlm_blocked list. It could be that the async request is
      completed before the nlm_block was added to the list. This would end
      in a -ENOENT and a kernel log message of "lockd: grant for unknown
      block".
      
      To solve this issue we add the nlm_block before the vfs_lock_file() call
      to be sure it has been added when a possible nlmsvc_grant_deferred() is
      called. If the vfs_lock_file() results in an case when it wouldn't be
      added to nlm_blocked list, the nlm_block struct will be removed from
      this list again.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      afb13302
    • Alexander Aring's avatar
      lockd: don't call vfs_lock_file() for pending requests · b743612c
      Alexander Aring authored
      This patch returns nlm_lck_blocked in nlmsvc_lock() when an asynchronous
      lock request is pending. During testing I ran into the case with the
      side-effects that lockd is waiting for only one lm_grant() callback
      because it's already part of the nlm_blocked list. If another
      asynchronous for the same nlm_block is triggered two lm_grant()
      callbacks will occur but lockd was only waiting for one.
      
      To avoid any change of existing users this handling will only being made
      when export_op_support_safe_async_lock() returns true.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      b743612c
    • Alexander Aring's avatar
      lockd: introduce safe async lock op · 2dd10de8
      Alexander Aring authored
      This patch reverts mostly commit 40595cdc ("nfs: block notification
      on fs with its own ->lock") and introduces an EXPORT_OP_ASYNC_LOCK
      export flag to signal that the "own ->lock" implementation supports
      async lock requests. The only main user is DLM that is used by GFS2 and
      OCFS2 filesystem. Those implement their own lock() implementation and
      return FILE_LOCK_DEFERRED as return value. Since commit 40595cdc
      ("nfs: block notification on fs with its own ->lock") the DLM
      implementation were never updated. This patch should prepare for DLM
      to set the EXPORT_OP_ASYNC_LOCK export flag and update the DLM
      plock implementation regarding to it.
      Acked-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2dd10de8
    • Trond Myklebust's avatar
      nfsd: Don't reset the write verifier on a commit EAGAIN · 1b2021bd
      Trond Myklebust authored
      If fsync() is returning EAGAIN, then we can assume that the filesystem
      being exported is something like NFS with the 'softerr' mount option
      enabled, and that it is just asking us to replay the fsync() operation
      at a later date.
      
      If we see an ESTALE, then ditto: the file is gone, so there is no danger
      of losing the error.
      
      For those cases, do not reset the write verifier. A write verifier
      change has a global effect, causing retransmission by all clients of
      all uncommitted unstable writes for all files, so it is worth
      mitigating where possible.
      
      Link: https://lore.kernel.org/linux-nfs/20230911184357.11739-1-trond.myklebust@hammerspace.com/Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      1b2021bd