1. 28 Sep, 2023 1 commit
    • Chuck Lever's avatar
      NFSD: Fix zero NFSv4 READ results when RQ_SPLICE_OK is not set · 0d32a6bb
      Chuck Lever authored
      nfsd4_encode_readv() uses xdr->buf->page_len as a starting point for
      the nfsd_iter_read() sink buffer -- page_len is going to be offset
      by the parts of the COMPOUND that have already been encoded into
      xdr->buf->pages.
      
      However, that value must be captured /before/
      xdr_reserve_space_vec() advances page_len by the expected size of
      the read payload. Otherwise, the whole front part of the first
      page of the payload in the reply will be uninitialized.
      
      Mantas hit this because sec=krb5i forces RQ_SPLICE_OK off, which
      invokes the readv part of the nfsd4_encode_read() path. Also,
      older Linux NFS clients appear to send shorter READ requests
      for files smaller than a page, whereas newer clients just send
      page-sized requests and let the server send as many bytes as
      are in the file.
      Reported-by: default avatarMantas Mikulėnas <grawity@gmail.com>
      Closes: https://lore.kernel.org/linux-nfs/f1d0b234-e650-0f6e-0f5d-126b3d51d1eb@gmail.com/
      Fixes: 703d7521 ("NFSD: Hoist rq_vec preparation into nfsd_read() [step two]")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      0d32a6bb
  2. 12 Sep, 2023 1 commit
    • NeilBrown's avatar
      NFSD: fix possible oops when nfsd/pool_stats is closed. · 88956eab
      NeilBrown authored
      If /proc/fs/nfsd/pool_stats is open when the last nfsd thread exits, then
      when the file is closed a NULL pointer is dereferenced.
      This is because nfsd_pool_stats_release() assumes that the
      pointer to the svc_serv cannot become NULL while a reference is held.
      
      This used to be the case but a recent patch split nfsd_last_thread() out
      from nfsd_put(), and clearing the pointer is done in nfsd_last_thread().
      
      This is easily reproduced by running
         rpc.nfsd 8 ; ( rpc.nfsd 0;true) < /proc/fs/nfsd/pool_stats
      
      Fortunately nfsd_pool_stats_release() has easy access to the svc_serv
      pointer, and so can call svc_put() on it directly.
      
      Fixes: 9f28a971 ("nfsd: separate nfsd_last_thread() from nfsd_put()")
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      88956eab
  3. 09 Sep, 2023 1 commit
  4. 29 Aug, 2023 37 commits
    • Chuck Lever's avatar
      Documentation: Add missing documentation for EXPORT_OP flags · b38a6023
      Chuck Lever authored
      The commits that introduced these flags neglected to update the
      Documentation/filesystems/nfs/exporting.rst file.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      b38a6023
    • Yue Haibing's avatar
      SUNRPC: Remove unused declaration rpc_modcount() · 899525e8
      Yue Haibing authored
      These declarations are never implemented since the beginning of git
      history. Remove these, then merge the two #ifdef block for
      simplification.
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      899525e8
    • Yue Haibing's avatar
      SUNRPC: Remove unused declarations · 07dc19db
      Yue Haibing authored
      Commit c7d7ec8f ("SUNRPC: Remove svc_shutdown_net()") removed
      svc_close_net() implementation but left declaration in place. Remove
      it.
      
      Commit 1f11a034 ("SUNRPC new transport for the NFSv4.1 shared
      back channel") removed svc_sock_create()/svc_sock_destroy() but not
      the declarations.
      Signed-off-by: default avatarYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      07dc19db
    • Chuck Lever's avatar
      NFSD: da_addr_body field missing in some GETDEVICEINFO replies · 6372e2ee
      Chuck Lever authored
      The XDR specification in RFC 8881 looks like this:
      
      struct device_addr4 {
      	layouttype4	da_layout_type;
      	opaque		da_addr_body<>;
      };
      
      struct GETDEVICEINFO4resok {
      	device_addr4	gdir_device_addr;
      	bitmap4		gdir_notification;
      };
      
      union GETDEVICEINFO4res switch (nfsstat4 gdir_status) {
      case NFS4_OK:
      	GETDEVICEINFO4resok gdir_resok4;
      case NFS4ERR_TOOSMALL:
      	count4		gdir_mincount;
      default:
      	void;
      };
      
      Looking at nfsd4_encode_getdeviceinfo() ....
      
      When the client provides a zero gd_maxcount, then the Linux NFS
      server implementation encodes the da_layout_type field and then
      skips the da_addr_body field completely, proceeding directly to
      encode gdir_notification field.
      
      There does not appear to be an option in the specification to skip
      encoding da_addr_body. Moreover, Section 18.40.3 says:
      
      > If the client wants to just update or turn off notifications, it
      > MAY send a GETDEVICEINFO operation with gdia_maxcount set to zero.
      > In that event, if the device ID is valid, the reply's da_addr_body
      > field of the gdir_device_addr field will be of zero length.
      
      Since the layout drivers are responsible for encoding the
      da_addr_body field, put this fix inside the ->encode_getdeviceinfo
      methods.
      
      Fixes: 9cf514cc ("nfsd: implement pNFS operations")
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Tom Haynes <loghyr@gmail.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      6372e2ee
    • NeilBrown's avatar
      SUNRPC: Remove return value of svc_pool_wake_idle_thread() · 2a455745
      NeilBrown authored
      The returned value is not used (any more), so don't return it.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2a455745
    • NeilBrown's avatar
      SUNRPC: make rqst_should_sleep() idempotent() · 6859d1f2
      NeilBrown authored
      Based on its name you would think that rqst_should_sleep() would be
      read-only, not changing anything.  But in fact it will clear
      SP_TASK_PENDING if that was set.  This is surprising, and it blurs the
      line between "check for work to do" and "dequeue work to do".
      
      So change the "test_and_clear" to simple "test" and clear the bit once
      the thread has decided to wake up and return to the caller.
      
      With this, it makes sense to *always* set SP_TASK_PENDING when asked,
      rather than to set it only if no thread could be woken up.
      
      [ cel: Previously TASK_PENDING indicated there is work waiting but no
      idle threads were found to pick up that work. After this patch, it acts
      as an XPT_BUSY flag for wake-ups that have no associated xprt. ]
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      6859d1f2
    • Chuck Lever's avatar
      SUNRPC: Clean up svc_set_num_threads · d2f0ef1c
      Chuck Lever authored
      Document the API contract and remove stale or obvious comments.
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      d2f0ef1c
    • Chuck Lever's avatar
      SUNRPC: Count ingress RPC messages per svc_pool · f208e950
      Chuck Lever authored
      svc_xprt_enqueue() can be costly, since it involves selecting and
      waking up a process.
      
      More than one enqueue is done per incoming RPC. For example,
      svc_data_ready() enqueues, and so does svc_xprt_receive(). Also, if
      an RPC message requires more than one call to ->recvfrom() to
      receive it fully, each one of those calls does an enqueue.
      
      To get a sense of the average number of transport enqueue operations
      needed to process an incoming RPC message, re-use the "packets" pool
      stat. Track the number of complete RPC messages processed by each
      thread pool.
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      f208e950
    • Chuck Lever's avatar
      SUNRPC: Deduplicate thread wake-up code · 850bac3a
      Chuck Lever authored
      Refactor: Extract the loop that finds an idle service thread from
      svc_xprt_enqueue() and svc_wake_up(). Both functions do just about
      the same thing.
      
      Note that svc_wake_up() currently does not hold the RCU read lock
      while waking the target thread. It indeed should hold the lock, just
      as svc_xprt_enqueue() does, to ensure the rqstp does not vanish
      during the wake-up. This patch adds the RCU lock for svc_wake_up().
      
      Note that shrinking the pool thread count is rare, and calls to
      svc_wake_up() are also quite infrequent. In practice, this race is
      very unlikely to be hit, so we are not marking the lock fix for
      stable backport at this time.
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      850bac3a
    • Chuck Lever's avatar
      SUNRPC: Move trace_svc_xprt_enqueue · 82e5d82a
      Chuck Lever authored
      The xpt_flags field frequently changes between the time that
      svc_xprt_ready() grabs a copy and execution flow arrives at the
      tracepoint at the tail of svc_xprt_enqueue(). In fact, there's
      usually a sleep/wake-up in there, so those flags are almost
      guaranteed to be different.
      
      It would be more useful to record the exact flags that were used to
      decide whether the transport is ready, so move the tracepoint.
      
      Moving it means the tracepoint can't pick up the waker's pid. That
      can be added to struct svc_rqst if it turns out that is important.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      82e5d82a
    • Chuck Lever's avatar
      SUNRPC: Add enum svc_auth_status · 78c542f9
      Chuck Lever authored
      In addition to the benefits of using an enum rather than a set of
      macros, we now have a named type that can improve static type
      checking of function return values.
      
      As part of this change, I removed a stale comment from svcauth.h;
      the return values from current implementations of the
      auth_ops::release method are all zero/negative errno, not the SVC_OK
      enum values as the old comment suggested.
      Suggested-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      78c542f9
    • Chuck Lever's avatar
      SUNRPC: change svc_xprt::xpt_flags bits to enum · d75e490f
      Chuck Lever authored
      When a sequence of numbers are needed for internal-use only, an enum is
      typically best.  The sequence will inevitably need to be changed one
      day, and having an enum means the developer doesn't need to think about
      renumbering after insertion or deletion.  Such patches will be easier
      to review.
      Suggested-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      d75e490f
    • NeilBrown's avatar
      SUNRPC: change svc_rqst::rq_flags bits to enum · a6b4ec39
      NeilBrown authored
      When a sequence of numbers are needed for internal-use only, an enum is
      typically best.  The sequence will inevitably need to be changed one
      day, and having an enum means the developer doesn't need to think about
      renumbering after insertion or deletion.  Such patches will be easier
      to review.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      a6b4ec39
    • NeilBrown's avatar
      SUNRPC: change svc_pool::sp_flags bits to enum · 3275694a
      NeilBrown authored
      When a sequence of numbers are needed for internal-use only, an enum is
      typically best.  The sequence will inevitably need to be changed one
      day, and having an enum means the developer doesn't need to think about
      renumbering after insertion or deletion.  Such patches will be easier
      to review.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      3275694a
    • NeilBrown's avatar
      SUNRPC: change cache_head.flags bits to enum · ba4bba6c
      NeilBrown authored
      When a sequence of numbers are needed for internal-use only, an enum is
      typically best.  The sequence will inevitably need to be changed one
      day, and having an enum means the developer doesn't need to think about
      renumbering after insertion or deletion.  Such patches will be easier
      to review.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      ba4bba6c
    • NeilBrown's avatar
      SUNRPC: remove timeout arg from svc_recv() · c743b425
      NeilBrown authored
      Most svc threads have no interest in a timeout.
      nfsd sets it to 1 hour, but this is a wart of no significance.
      
      lockd uses the timeout so that it can call nlmsvc_retry_blocked().
      It also sometimes calls svc_wake_up() to ensure this is called.
      
      So change lockd to be consistent and always use svc_wake_up() to trigger
      nlmsvc_retry_blocked() - using a timer instead of a timeout to
      svc_recv().
      
      And change svc_recv() to not take a timeout arg.
      
      This makes the sp_threads_timedout counter always zero.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      c743b425
    • NeilBrown's avatar
      SUNRPC: change svc_recv() to return void. · 7b719e2b
      NeilBrown authored
      svc_recv() currently returns a 0 on success or one of two errors:
       - -EAGAIN means no message was successfully received
       - -EINTR means the thread has been told to stop
      
      Previously nfsd would stop as the result of a signal as well as
      following kthread_stop().  In that case the difference was useful: EINTR
      means stop unconditionally.  EAGAIN means stop if kthread_should_stop(),
      continue otherwise.
      
      Now threads only exit when kthread_should_stop() so we don't need the
      distinction.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      7b719e2b
    • NeilBrown's avatar
      SUNRPC: call svc_process() from svc_recv(). · f78116d3
      NeilBrown authored
      All callers of svc_recv() go on to call svc_process() on success.
      Simplify callers by having svc_recv() do that for them.
      
      This loses one call to validate_process_creds() in nfsd.  That was
      debugging code added 14 years ago.  I don't think we need to keep it.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      f78116d3
    • NeilBrown's avatar
      nfsd: separate nfsd_last_thread() from nfsd_put() · 9f28a971
      NeilBrown authored
      Now that the last nfsd thread is stopped by an explicit act of calling
      svc_set_num_threads() with a count of zero, we only have a limited
      number of places that can happen, and don't need to call
      nfsd_last_thread() in nfsd_put()
      
      So separate that out and call it at the two places where the number of
      threads is set to zero.
      
      Move the clearing of ->nfsd_serv and the call to svc_xprt_destroy_all()
      into nfsd_last_thread(), as they are really part of the same action.
      
      nfsd_put() is now a thin wrapper around svc_put(), so make it a static
      inline.
      
      nfsd_put() cannot be called after nfsd_last_thread(), so in a couple of
      places we have to use svc_put() instead.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      9f28a971
    • NeilBrown's avatar
      nfsd: Simplify code around svc_exit_thread() call in nfsd() · 18e4cf91
      NeilBrown authored
      Previously a thread could exit asynchronously (due to a signal) so some
      care was needed to hold nfsd_mutex over the last svc_put() call.  Now a
      thread can only exit when svc_set_num_threads() is called, and this is
      always called under nfsd_mutex.  So no care is needed.
      
      Not only is the mutex held when a thread exits now, but the svc refcount
      is elevated, so the svc_put() in svc_exit_thread() will never be a final
      put, so the mutex isn't even needed at this point in the code.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      18e4cf91
    • NeilBrown's avatar
      nfsd: don't allow nfsd threads to be signalled. · 39039024
      NeilBrown authored
      The original implementation of nfsd used signals to stop threads during
      shutdown.
      In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads
      internally it if was asked to run "0" threads.  After this user-space
      transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to
      threads was no longer an important part of the API.
      
      In commit 3ebdbe52 ("SUNRPC: discard svo_setup and rename
      svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the
      use of signals for stopping threads, using kthread_stop() instead.
      
      This patch makes the "obvious" next step and removes the ability to
      signal nfsd threads - or any svc threads.  nfsd stops allowing signals
      and we don't check for their delivery any more.
      
      This will allow for some simplification in later patches.
      
      A change worth noting is in nfsd4_ssc_setup_dul().  There was previously
      a signal_pending() check which would only succeed when the thread was
      being shut down.  It should really have tested kthread_should_stop() as
      well.  Now it just does the latter, not the former.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      39039024
    • NeilBrown's avatar
      lockd: remove SIGKILL handling · 8db14cad
      NeilBrown authored
      lockd allows SIGKILL and responds by dropping all locks and restarting
      the grace period.  This functionality has been present since 2.1.32 when
      lockd was added to Linux.
      
      This functionality is undocumented and most likely added as a useful
      debug aid.  When there is a need to drop locks, the better approach is
      to use /proc/fs/nfsd/unlock_*.
      
      This patch removes SIGKILL handling as part of preparation for removing
      all signal handling from sunrpc service threads.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      8db14cad
    • Su Hui's avatar
      fs: lockd: avoid possible wrong NULL parameter · de8d38cf
      Su Hui authored
      clang's static analysis warning: fs/lockd/mon.c: line 293, column 2:
      Null pointer passed as 2nd argument to memory copy function.
      
      Assuming 'hostname' is NULL and calling 'nsm_create_handle()', this will
      pass NULL as 2nd argument to memory copy function 'memcpy()'. So return
      NULL if 'hostname' is invalid.
      
      Fixes: 77a3ef33 ("NSM: More clean up of nsm_get_handle()")
      Signed-off-by: default avatarSu Hui <suhui@nfschina.com>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      de8d38cf
    • Zhu Wang's avatar
      exportfs: remove kernel-doc warnings in exportfs · 7afdc0c9
      Zhu Wang authored
      Remove kernel-doc warning in exportfs:
      
      fs/exportfs/expfs.c:395: warning: Function parameter or member 'parent'
      not described in 'exportfs_encode_inode_fh'
      Signed-off-by: default avatarZhu Wang <wangzhu9@huawei.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      7afdc0c9
    • Chuck Lever's avatar
      SUNRPC: Reduce thread wake-up rate when receiving large RPC messages · 2b877fc5
      Chuck Lever authored
      With large NFS WRITE requests on TCP, I measured 5-10 thread wake-
      ups to receive each request. This is because the socket layer
      calls ->sk_data_ready() frequently, and each call triggers a
      thread wake-up. Each recvmsg() seems to pull in less than 100KB.
      
      Have the socket layer hold ->sk_data_ready() calls until the full
      incoming message has arrived to reduce the wake-up rate.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2b877fc5
    • Chuck Lever's avatar
      SUNRPC: Revert e0a912e8 · 89d2d9fb
      Chuck Lever authored
      Flamegraph analysis showed that the cork/uncork calls consume
      nearly a third of the CPU time spent in svc_tcp_sendto(). The
      other two consumers are mutex lock/unlock and svc_tcp_sendmsg().
      
      Now that svc_tcp_sendto() coalesces RPC messages properly, there
      is no need to introduce artificial delays to prevent sending
      partial messages.
      
      After applying this change, I measured a 1.2K read IOPS increase
      for 8KB random I/O (several percent) on 56Gb IP over IB.
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      89d2d9fb
    • Chuck Lever's avatar
      SUNRPC: Convert svc_udp_sendto() to use the per-socket bio_vec array · baabf59c
      Chuck Lever authored
      Commit da1661b9 ("SUNRPC: Teach server to use xprt_sock_sendmsg
      for socket sends") modified svc_udp_sendto() to use xprt_sock_sendmsg()
      because we originally believed xprt_sock_sendmsg() would be needed
      for TLS support. That does not actually appear to be the case.
      
      In addition, the linkage between the client and server send code has
      been a bit of a maintenance headache because of the distinct ways
      that the client and server handle memory allocation.
      
      Going forward, eventually the XDR layer will deal with its buffers
      in the form of bio_vec arrays, so convert this function accordingly.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      baabf59c
    • Chuck Lever's avatar
      SUNRPC: Send RPC message on TCP with a single sock_sendmsg() call · e18e157b
      Chuck Lever authored
      There is now enough infrastructure in place to combine the stream
      record marker into the biovec array used to send each outgoing RPC
      message on TCP. The whole message can be more efficiently sent with
      a single call to sock_sendmsg() using a bio_vec iterator.
      
      Note that this also helps with RPC-with-TLS: the TLS implementation
      can now clearly see where the upper layer message boundaries are.
      Before, it would send each component of the xdr_buf (record marker,
      head, page payload, tail) in separate TLS records.
      Suggested-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      e18e157b
    • Chuck Lever's avatar
      SUNRPC: Convert svc_tcp_sendmsg to use bio_vecs directly · 2eb2b935
      Chuck Lever authored
      Add a helper to convert a whole xdr_buf directly into an array of
      bio_vecs, then send this array instead of iterating piecemeal over
      the xdr_buf containing the outbound RPC message.
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      2eb2b935
    • Jeff Layton's avatar
      nfsd: inherit required unset default acls from effective set · d4247970
      Jeff Layton authored
      A well-formed NFSv4 ACL will always contain OWNER@/GROUP@/EVERYONE@
      ACEs, but there is no requirement for inheritable entries for those
      entities. POSIX ACLs must always have owner/group/other entries, even for a
      default ACL.
      
      nfsd builds the default ACL from inheritable ACEs, but the current code
      just leaves any unspecified ACEs zeroed out. The result is that adding a
      default user or group ACE to an inode can leave it with unwanted deny
      entries.
      
      For instance, a newly created directory with no acl will look something
      like this:
      
      	# NFSv4 translation by server
      	A::OWNER@:rwaDxtTcCy
      	A::GROUP@:rxtcy
      	A::EVERYONE@:rxtcy
      
      	# POSIX ACL of underlying file
      	user::rwx
      	group::r-x
      	other::r-x
      
      ...if I then add new v4 ACE:
      
      	nfs4_setfacl -a A:fd:1000:rwx /mnt/local/test
      
      ...I end up with a result like this today:
      
      	user::rwx
      	user:1000:rwx
      	group::r-x
      	mask::rwx
      	other::r-x
      	default:user::---
      	default:user:1000:rwx
      	default:group::---
      	default:mask::rwx
      	default:other::---
      
      	A::OWNER@:rwaDxtTcCy
      	A::1000:rwaDxtcy
      	A::GROUP@:rxtcy
      	A::EVERYONE@:rxtcy
      	D:fdi:OWNER@:rwaDx
      	A:fdi:OWNER@:tTcCy
      	A:fdi:1000:rwaDxtcy
      	A:fdi:GROUP@:tcy
      	A:fdi:EVERYONE@:tcy
      
      ...which is not at all expected. Adding a single inheritable allow ACE
      should not result in everyone else losing access.
      
      The setfacl command solves a silimar issue by copying owner/group/other
      entries from the effective ACL when none of them are set:
      
          "If a Default ACL entry is created, and the  Default  ACL  contains  no
           owner,  owning group,  or  others  entry,  a  copy of the ACL owner,
           owning group, or others entry is added to the Default ACL.
      
      Having nfsd do the same provides a more sane result (with no deny ACEs
      in the resulting set):
      
      	user::rwx
      	user:1000:rwx
      	group::r-x
      	mask::rwx
      	other::r-x
      	default:user::rwx
      	default:user:1000:rwx
      	default:group::r-x
      	default:mask::rwx
      	default:other::r-x
      
      	A::OWNER@:rwaDxtTcCy
      	A::1000:rwaDxtcy
      	A::GROUP@:rxtcy
      	A::EVERYONE@:rxtcy
      	A:fdi:OWNER@:rwaDxtTcCy
      	A:fdi:1000:rwaDxtcy
      	A:fdi:GROUP@:rxtcy
      	A:fdi:EVERYONE@:rxtcy
      Reported-by: default avatarOndrej Valousek <ondrej.valousek@diasemi.com>
      Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2136452Suggested-by: default avatarAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      d4247970
    • YueHaibing's avatar
      sunrpc: Remove unused extern declarations · f8077478
      YueHaibing authored
      Since commit 49b28684 ("nfsd: Remove deprecated nfsctl system call and related code.")
      these declarations are unused, so can remove it.
      Signed-off-by: default avatarYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      f8077478
    • Alexander Aring's avatar
      lockd: nlm_blocked list race fixes · be2be5f7
      Alexander Aring authored
      This patch fixes races when lockd accesses the global nlm_blocked list.
      It was mostly safe to access the list because everything was accessed
      from the lockd kernel thread context but there exist cases like
      nlmsvc_grant_deferred() that could manipulate the nlm_blocked list and
      it can be called from any context.
      Signed-off-by: default avatarAlexander Aring <aahringo@redhat.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      be2be5f7
    • Jeff Layton's avatar
      nfsd: set missing after_change as before_change + 1 · f2b7019d
      Jeff Layton authored
      In the event that we can't fetch post_op_attr attributes, we still need
      to set a value for the after_change. The operation has already happened,
      so we're not able to return an error at that point, but we do want to
      ensure that the client knows that its cache should be invalidated.
      
      If we weren't able to fetch post-op attrs, then just set the
      after_change to before_change + 1. The atomic flag should already be
      clear in this case.
      Suggested-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      f2b7019d
    • Jeff Layton's avatar
      nfsd: remove unsafe BUG_ON from set_change_info · 97662607
      Jeff Layton authored
      At one time, nfsd would scrape inode information directly out of struct
      inode in order to populate the change_info4. At that time, the BUG_ON in
      set_change_info made some sense, since having it unset meant a coding
      error.
      
      More recently, it calls vfs_getattr to get this information, which can
      fail. If that fails, fh_pre_saved can end up not being set. While this
      situation is unfortunate, we don't need to crash the box.
      
      Move set_change_info to nfs4proc.c since all of the callers are there.
      Revise the condition for setting "atomic" to also check for
      fh_pre_saved. Drop the BUG_ON and just have it zero out both
      change_attr4s when this occurs.
      Reported-by: default avatarBoyang Xue <bxue@redhat.com>
      Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2223560Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      97662607
    • Jeff Layton's avatar
      nfsd: handle failure to collect pre/post-op attrs more sanely · a332018a
      Jeff Layton authored
      Collecting pre_op_attrs can fail, in which case it's probably best to
      fail the whole operation.
      
      Change fh_fill_pre_attrs and fh_fill_both_attrs to return __be32, and
      have the callers check the return code and abort the operation if it's
      not nfs_ok.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      a332018a
    • Jeff Layton's avatar
      nfsd: add a MODULE_DESCRIPTION · 5865bafa
      Jeff Layton authored
      I got this today from modpost:
      
          WARNING: modpost: missing MODULE_DESCRIPTION() in fs/nfsd/nfsd.o
      
      Add a module description.
      Signed-off-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      5865bafa
    • Chuck Lever's avatar
      NFSD: Rename struct svc_cacherep · e7421ce7
      Chuck Lever authored
      The svc_ prefix is identified with the SunRPC layer. Although the
      duplicate reply cache caches RPC replies, it is only for the NFS
      protocol. Rename the struct to better reflect its purpose.
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      e7421ce7