Commits · 10717f45639f6c1bc27b56405252c3a027406d92 · nexedi / linux

03 Feb, 2020 9 commits

NFSv4: Limit the total number of cached delegations · 10717f45

Trond Myklebust authored Jan 27, 2020

Delegations can be expensive to return, and can cause scalability issues
for the server. Let's therefore try to limit the number of inactive
delegations we hold.
Once the number of delegations is above a certain threshold, start
to return them on close.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

10717f45

NFSv4: Add accounting for the number of active delegations held · d2269ea1

Trond Myklebust authored Jan 27, 2020

In order to better manage our delegation caching, add a counter
to track the number of active delegations.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

d2269ea1

NFSv4: Try to return the delegation immediately when marked for return on close · b7b7dac6

Trond Myklebust authored Jan 27, 2020

Add a routine to return the delegation immediately upon close of the
file if it was marked for return-on-close.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

b7b7dac6

NFS: Clear NFS_DELEGATION_RETURN_IF_CLOSED when the delegation is returned · 0d104167

Trond Myklebust authored Jan 27, 2020

If a delegation is marked as needing to be returned when the file is
closed, then don't clear that marking until we're ready to return
it.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

0d104167

NFSv4: nfs_inode_evict_delegation() should set NFS_DELEGATION_RETURNING · f885ea64

Trond Myklebust authored Jan 27, 2020

In particular, the pnfs return-on-close code will check for that flag,
so ensure we set it appropriately.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

f885ea64

NFS: nfs_find_open_context() should use cred_fscmp() · 65f51603

Trond Myklebust authored Jan 26, 2020

We want to find open contexts that match our filesystem access
properties. They don't have to exactly match the cred.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

65f51603

NFS: nfs_access_get_cached_rcu() should use cred_fscmp() · 9a206de2

Trond Myklebust authored Jan 26, 2020

We do not need to have the rcu lookup method fail in the case where
the fsuid/fsgid and supplemental groups match.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

9a206de2

NFSv4: pnfs_roc() must use cred_fscmp() to compare creds · 38712247

Trond Myklebust authored Jan 26, 2020

When comparing two 'struct cred' for equality w.r.t. behaviour under
filesystem access, we need to use cred_fscmp().

Fixes: a52458b4 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

38712247

NFS: remove unused macros · c0399cf6

Alex Shi authored Jan 21, 2020

MNT_fhs_status_sz/MNT_fhandle3_sz are never used after they were
introduced. So better to remove them.
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

c0399cf6

24 Jan, 2020 3 commits

nfs: Return EINVAL rather than ERANGE for mount parse errors · 3a21409a

David Howells authored Jan 17, 2020

Return EINVAL rather than ERANGE for mount parse errors as the userspace
mount command doesn't necessarily understand what to do with anything other
than EINVAL.

The old code returned -ERANGE as an intermediate error that then get
converted to -EINVAL, whereas the new code returns -ERANGE.

This was induced by passing minorversion=1 to a v4 mount where
CONFIG_NFS_V4_1 was disabled in the kernel build.

Fixes: 68f65ef40e1e ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

3a21409a

NFS: allow deprecation of NFS UDP protocol · b24ee6c6

Olga Kornievskaia authored Dec 16, 2019

Add a kernel config CONFIG_NFS_DISABLE_UDP_SUPPORT to disallow NFS
UDP mounts and enable it by default.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

b24ee6c6

NFS: Add softreval behaviour to nfs_lookup_revalidate() · f7b37b8b

Trond Myklebust authored Jan 14, 2020

If the server is unavaliable, we want to allow the revalidating
lookup to time out, and to default to validating the cached dentry
if the 'softreval' mount option is set.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

f7b37b8b

15 Jan, 2020 28 commits

NFSv3: FIx bug when using chacl and chmod to change acl · fe1e8dbe

Su Yanjun authored Dec 25, 2019

We find a bug when running test under nfsv3  as below.
1)
chacl u::r--,g::rwx,o:rw- file1
2)
chmod u+w file1
3)
chacl -l file1

We expect u::rw-, but it shows u::r--, more likely it returns the
cached acl in inode.

We dig the code find that the code path is different.

chacl->..->__nfs3_proc_setacls->nfs_zap_acl_cache
Then nfs_zap_acl_cache clears the NFS_INO_INVALID_ACL in
NFS_I(inode)->cache_validity.

chmod->..->nfs3_proc_setattr
Because NFS_INO_INVALID_ACL has been cleared by chacl path,
nfs_zap_acl_cache wont be called.

nfs_setattr_update_inode will set NFS_INO_INVALID_ACL so let it
before nfs_zap_acl_cache call.
Signed-off-by: Su Yanjun <suyanjun218@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

fe1e8dbe

NFSv4.x recover from pre-mature loss of openstateid · d826e5b8

Olga Kornievskaia authored Dec 18, 2019

Ever since the commit 0e0cb35b, it's possible to lose an open stateid
while retrying a CLOSE due to ERR_OLD_STATEID. Once that happens,
operations that require openstateid fail with EAGAIN which is propagated
to the application then tests like generic/446 and generic/168 fail with
"Resource temporarily unavailable".

Instead of returning this error, initiate state recovery when possible to
recover the open stateid and then try calling nfs4_select_rw_stateid()
again.

Fixes: 0e0cb35b ("NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

d826e5b8

NFSv4 fix acl retrieval over krb5i/krb5p mounts · 62a1573f

Olga Kornievskaia authored Jan 02, 2020

For the krb5i and krb5p mount, it was problematic to truncate the
received ACL to the provided buffer because an integrity check
could not be preformed.

Instead, provide enough pages to accommodate the largest buffer
bounded by the largest RPC receive buffer size.

Note: I don't think it's possible for the ACL to be truncated now.
Thus NFS4_ACL_TRUNC flag and related code could be possibly
removed but since I'm unsure, I'm leaving it.

v2: needs +1 page.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

62a1573f

NFS: Add mount option 'softreval' · c74dfe97

Trond Myklebust authored Jan 06, 2020

Add a mount option 'softreval' that allows attribute revalidation 'getattr'
calls to time out, and causes them to fall back to using the cached
attributes.
The use case for this option is for ensuring that we can still (slowly)
traverse paths and use cached information even when the server is down.
Once the server comes back up again, the getattr calls start succeeding,
and the caches will revalidate as usual.

The 'softreval' mount option is automatically enabled if you have
specified 'softerr'.  It can be turned off using the options
'nosoftreval', or 'hard'.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

c74dfe97

NFS: Trust cached access if we've already revalidated the inode once · 5c965db8

Trond Myklebust authored Jan 06, 2020

If we've already revalidated the inode once then don't distrust the
access cache unless the NFS_INO_INVALID_ACCESS flag is actually set.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

5c965db8

NFS: Fix nfs_direct_write_reschedule_io() · 4daaeba9

Trond Myklebust authored Jan 06, 2020

The 'hdr->good_bytes' is defined as the number of bytes we expect to
read or write starting at offset hdr->io_start. In the case of a partial
read/write we may end up adjusting hdr->args.offset and hdr->args.count
to skip I/O for data that was already read/written, and so we must ensure
the calculation takes that into account.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

4daaeba9

NFS: When resending after a short write, reset the reply count to zero · 8c9cb714

Trond Myklebust authored Jan 06, 2020

If we're resending a write due to a short read or write, ensure we
reset the reply count to zero.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

8c9cb714

NFS: Improve tracing of permission calls · e8194b7d

Trond Myklebust authored Jan 06, 2020

On exit from nfs_do_access(), record the mask representing the requested
permissions, as well as the server-supplied set of access rights for
this user.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

e8194b7d

pNFS/flexfiles: Add tracing for layout errors · 088f3e68

Trond Myklebust authored Jan 06, 2020

Trace layout errors for pNFS/flexfiles on read/write/commit operations.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

088f3e68

NFS: Clean up generic file commit tracepoint · 7bdd297e

Trond Myklebust authored Jan 06, 2020

Clean up the generic file commit tracepoints to use a 64-bit value
for the verifier, and to display the pNFS filehandle, if it exists.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

7bdd297e

NFS: Clean up generic writeback tracepoints · 5bb2a7cb

Trond Myklebust authored Jan 06, 2020

Clean up the generic writeback tracepoints so they do pass the
full structures as arguments. Also ensure we report the number
of bytes actually written.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

5bb2a7cb

NFS: Clean up generic file read tracepoints · 2343172d

Trond Myklebust authored Jan 06, 2020

Clean up the generic file read tracepoints so they do pass the
full structures as arguments. Also ensure we report the number
of bytes actually read.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

2343172d

pNFS/flexfiles: Record resend attempts on I/O failure · 0722dc9f

Trond Myklebust authored Jan 06, 2020

If the attempt to do pNFS fails, then record what action we
take to recover (resend, reset to pnfs or reset to mds).
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

0722dc9f

NFS: Fix fix of show_nfs_errors · 118b6292

Trond Myklebust authored Jan 06, 2020

Casting a negative value to an unsigned long is not the same as
converting it to its absolute value.

Fixes: 96650e2e ("NFS: Fix show_nfs_errors macros again")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

118b6292

NFSv4: Improve read/write/commit tracing · 25925b00

Trond Myklebust authored Jan 06, 2020

Ensure we always return the number of bytes read/written. Also display
the pnfs filehandle if it is in use.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

25925b00

NFS/pnfs: Fix pnfs_generic_prepare_to_resend_writes() · 221203ce

Trond Myklebust authored Jan 06, 2020

Instead of making assumptions about the commit verifier contents, change
the commit code to ensure we always check that the verifier was set
by the XDR code.

Fixes: f54bcf2e ("pnfs: Prepare for flexfiles by pulling out common code")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

221203ce

NFS: Fix up fsync() when the server rebooted · 2197e9b0

Trond Myklebust authored Jan 06, 2020

Don't clear the NFS_CONTEXT_RESEND_WRITES flag until after calling
nfs_commit_inode(). Otherwise, if nfs_commit_inode() returns an
error, we end up with dirty pages in the page cache, but no tag
to tell us that those pages need resending.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

2197e9b0

SUNRPC: Remove broken gss_mech_list_pseudoflavors() · b32d2855

Trond Myklebust authored Jan 06, 2020

Remove gss_mech_list_pseudoflavors() and its callers. This is part of
an unused API, and could leak an RCU reference if it were ever called.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

b32d2855

NFS: Revalidate the file mapping on all fatal writeback errors · b8946d7b

Trond Myklebust authored Jan 06, 2020

If a write or commit failed, and the mapping sees a fatal error, we
need to revalidate the contents of that mapping.

Fixes: 06c9fdf3 ("NFS: On fatal writeback errors, we need to call nfs_inode_remove_request()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

b8946d7b

NFS: Revalidate the file size on a fatal write error · 0df68ced

Trond Myklebust authored Jan 06, 2020

If we suffer a fatal error upon writing a file, which causes us to
need to revalidate the entire mapping, then we should also revalidate
the file size.

Fixes: d2ceb7e5 ("NFS: Don't use page_file_mapping after removing the page")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

0df68ced

xprtrdma: DMA map rr_rdma_buf as each rpcrdma_rep is created · e515dd9d

Chuck Lever authored Jan 03, 2020

Clean up: This simplifies the logic in rpcrdma_post_recvs.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

e515dd9d

xprtrdma: Destroy reps from previous connection instance · b7ff0185

Chuck Lever authored Jan 03, 2020

To safely get rid of all rpcrdma_reps from a particular connection
instance, xprtrdma has to wait until each of those reps is finished
being used. A rep may be backing the rq_rcv_buf of an RPC that has
just completed, for example.

Since it is safe to invoke rpcrdma_rep_destroy() only in the Receive
completion handler, simply mark reps remaining in the rb_all_reps
list after the transport is drained. These will then be deleted as
rpcrdma_post_recvs pulls them off the rep free list.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

b7ff0185

xprtrdma: Destroy rpcrdma_rep when Receive is flushed · 85810388

Chuck Lever authored Jan 03, 2020

This reduces the hardware and memory footprint of an unconnected
transport.

At some point in the future, transport reconnect will allow
resolving the destination IP address through a different device. The
current change enables reps for the new connection to be allocated
on whichever NUMA node the new device affines to after a reconnect.

Note that this does not destroy _all_ the transport's reps... there
will be a few that are still part of a running RPC completion.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

85810388

xprtrdma: Allocate and map transport header buffers at connect time · b78de1dc

Chuck Lever authored Jan 03, 2020

Currently the underlying RDMA device is chosen at transport set-up
time. But it will soon be at connect time instead.

The maximum size of a transport header is based on device
capabilities. Thus transport header buffers have to be allocated
_after_ the underlying device has been chosen (via address and route
resolution); ie, in the connect worker.

Thus, move the allocation of transport header buffers to the connect
worker, after the point at which the underlying RDMA device has been
chosen.

This also means the RDMA device is available to do a DMA mapping of
these buffers at connect time, instead of in the hot I/O path. Make
that optimization as well.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

b78de1dc

xprtrdma: Refactor frwr_is_supported · 25868e61

Chuck Lever authored Jan 03, 2020

Refactor: Perform the "is supported" check in rpcrdma_ep_create()
instead of in rpcrdma_ia_open(). frwr_open() is where most of the
logic to query device attributes is already located.

The current code displays a redundant error message when the device
does not support FRWR. As an additional clean-up, this patch removes
the extra message.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

25868e61

xprtrdma: Eliminate per-transport "max pages" · 18d065a5

Chuck Lever authored Jan 03, 2020

To support device hotplug and migrating a connection between devices
of different capabilities, we have to guarantee that all in-kernel
devices can support the same max NFS payload size (1 megabyte).

This means that possibly one or two in-tree devices are no longer
supported for NFS/RDMA because they cannot support 1MB rsize/wsize.
The only one I confirmed was cxgb3, but it has already been removed
from the kernel.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

18d065a5

xprtrdma: Refactor initialization of ep->rep_max_requests · 7581d901

Chuck Lever authored Jan 03, 2020

Clean up: there is no need to keep two copies of the same value.
Also, in subsequent patches, rpcrdma_ep_create() will be called in
the connect worker rather than at set-up time.

Minor fix: Initialize the transport's sendctx to the value based on
the capabilities of the underlying device, not the maximum setting.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

7581d901

xprtrdma: Make sendctx queue lifetime the same as connection lifetime · cb586dec

Chuck Lever authored Jan 03, 2020

The size of the sendctx queue depends on the value stored in
ia->ri_max_send_sges. This value is determined by querying the
underlying device.

Eventually, rpcrdma_ia_open() and rpcrdma_ep_create() will be called
in the connect worker rather than at transport set-up time. The
underlying device will not have been chosen device set-up time.

The sendctx queue will thus have to be created after the underlying
device has been chosen via address and route resolution; in other
words, in the connect worker.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

cb586dec