Commits · 39fd01863616964f009599e50ca5c6ea9ebf88d6 · Kirill Smelkov / linux

16 Apr, 2021 2 commits

NFS: Don't discard pNFS layout segments that are marked for return · 39fd0186

Trond Myklebust authored Apr 15, 2021

If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).

Fixes: e0b7d420 ("pNFS: Don't discard layout segments that are marked for return")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

39fd0186

NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmounting · 8926cc83

Trond Myklebust authored Apr 15, 2021

If the NFS super block is being unmounted, then we currently may end up
telling the server that we've forgotten the layout while it is actually
still in use by the client.
In that case, just assume that the client will soon return the layout
anyway, and so return NFS4ERR_DELAY in response to the layout recall.

Fixes: 58ac3e59 ("NFSv4/pnfs: Clean up nfs_layout_find_inode()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

8926cc83

14 Apr, 2021 15 commits

NFSv42: Don't force attribute revalidation of the copy offload source · febfeaae

Trond Myklebust authored Apr 14, 2021

When a copy offload is performed, we do not expect the source file to
change other than perhaps to see the atime be updated.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

febfeaae

NFSv42: Copy offload should update the file size when appropriate · 94d202d5

Trond Myklebust authored Apr 14, 2021

If the result of a copy offload or clone operation is to grow the
destination file size, then we should update it. The reason is that when
a client holds a delegation, it is authoritative for the file size.

Fixes: 16abd2a0 ("NFSv4.2: fix client's attribute cache management for copy_file_range")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

94d202d5

SUNRPC: Handle major timeout in xprt_adjust_timeout() · 09252177

Chris Dion authored Apr 04, 2021

Currently if a major timeout value is reached, but the minor value has
not been reached, an ETIMEOUT will not be sent back to the caller.
This can occur if the v4 server is not responding to requests and
retrans is configured larger than the default of two.

For example, A TCP mount with a configured timeout value of 50 and a
retransmission count of 3 to a v4 server which is not responding:

1. Initial value and increment set to 5s, maxval set to 20s, retries at 3
2. Major timeout is set to 20s, minor timeout set to 5s initially
3. xport_adjust_timeout() is called after 5s, retry with 10s timeout,
   minor timeout is bumped to 10s
4. And again after another 10s, 15s total time with minor timeout set
   to 15s
5. After 20s total time xport_adjust_timeout is called as major timeout is
   reached, but skipped because the minor timeout is not reached
       - After this time the cpu spins continually calling
       	 xport_adjust_timeout() and returning 0 for 10 seconds.
	 As seen on perf sched:
   	 39243.913182 [0005]  mount.nfs[3794] 4607.938      0.017   9746.863
6. This continues until the 15s minor timeout condition is reached (in
   this case for 10 seconds). After which the ETIMEOUT is processed
   back to the caller, the cpu spinning stops, and normal operations
   continue

Fixes: 7de62bc0 ("SUNRPC dont update timeout value on connection reset")
Signed-off-by: Chris Dion <Christopher.Dion@dell.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

09252177

SUNRPC: Remove trace_xprt_transmit_queued · 6cf23783

Chuck Lever authored Mar 31, 2021

This tracepoint can crash when dereferencing snd_task because
when some transports connect, they put a cookie in that field
instead of a pointer to an rpc_task.

BUG: KASAN: use-after-free in trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc]
Read of size 2 at addr ffff8881a83bd3a0 by task git/331872

CPU: 11 PID: 331872 Comm: git Tainted: G S                5.12.0-rc2-00007-g3ab6e585a7f9 #1453
Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Call Trace:
 dump_stack+0x9c/0xcf
 print_address_description.constprop.0+0x18/0x239
 kasan_report+0x174/0x1b0
 trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc]
 xprt_prepare_transmit+0x8e/0xc1 [sunrpc]
 call_transmit+0x4d/0xc6 [sunrpc]

Fixes: 9ce07ae5 ("SUNRPC: Replace dprintk() call site in xprt_prepare_transmit")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6cf23783

SUNRPC: Add tracepoint that fires when an RPC is retransmitted · e936a597

Chuck Lever authored Mar 31, 2021

A separate tracepoint can be left enabled all the time to capture
rare but important retransmission events. So for example:

kworker/u26:3-568 [009] 156.967933: xprt_retransmit: task:44093@5 xid=0xa25dbc79 nfsv3 WRITE ntrans=2

Or, for example, enable all nfs and nfs4 tracepoints, and set up a
trigger to disable tracing when xprt_retransmit fires to capture
everything that leads up to it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e936a597

SUNRPC: Move fault injection call sites · 7638e0bf

Chuck Lever authored Mar 31, 2021

I've hit some crashes that occur in the xprt_rdma_inject_disconnect
path. It appears that, for some provides, rdma_disconnect() can
take so long that the transport can disconnect and release its
hardware resources while rdma_disconnect() is still running,
resulting in a UAF in the provider.

The transport's fault injection method may depend on the stability
of transport data structures. That means it needs to be invoked
only from contexts that hold the transport write lock.

Fixes: 4a068258 ("SUNRPC: Transport fault injection")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7638e0bf

NFSv4.2 fix handling of sr_eof in SEEK's reply · 73f5c88f

Olga Kornievskaia authored Mar 31, 2021

Currently the client ignores the value of the sr_eof of the SEEK
operation. According to the spec, if the server didn't find the
requested extent and reached the end of the file, the server
would return sr_eof=true. In case the request for DATA and no
data was found (ie in the middle of the hole), then the lseek
expects that ENXIO would be returned.

Fixes: 1c6dcbe5 ("NFS: Implement SEEK")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

73f5c88f

pNFS/flexfiles: fix incorrect size check in decode_nfs_fh() · ed34695e

Nikola Livic authored Mar 29, 2021

We (adam zabrocki, alexander matrosov, alexander tereshkin, maksym
bazalii) observed the check:

	if (fh->size > sizeof(struct nfs_fh))

should not use the size of the nfs_fh struct which includes an extra two
bytes from the size field.

struct nfs_fh {
	unsigned short         size;
	unsigned char          data[NFS_MAXFHSIZE];
}

but should determine the size from data[NFS_MAXFHSIZE] so the memcpy
will not write 2 bytes beyond destination.  The proposed fix is to
compare against the NFS_MAXFHSIZE directly, as is done elsewhere in fs
code base.

Fixes: d67ae825 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Nikola Livic <nlivic@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ed34695e

NFSv4: Catch and trace server filehandle encoding errors · eb3d58c6

Trond Myklebust authored Apr 01, 2021

If the server returns a filehandle with an invalid length, then trace
that, and return an EREMOTEIO error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

eb3d58c6

NFSv4: Convert nfs_xdr_status tracepoint to an event class · 3d66bae1

Trond Myklebust authored Apr 01, 2021

We would like the ability to record other XDR errors, particularly
those that are due to server bugs.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

3d66bae1

NFSv4: Add tracing for COMPOUND errors · da934ae0

Trond Myklebust authored Apr 01, 2021

When the server returns a different operation than we expected, then
trace that.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

da934ae0

NFS: Split attribute support out from the server capabilities · ce62b114

Trond Myklebust authored Mar 05, 2018

There are lots of attributes, and they are crowding out the bit space.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ce62b114

NFS: Don't store NFS_INO_REVAL_FORCED · cc7f2dae

Trond Myklebust authored Apr 11, 2021

NFS_INO_REVAL_FORCED is intended to tell us that the cache needs
revalidation despite the fact that we hold a delegation. We shouldn't
need to store it anymore, though.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

cc7f2dae

NFSv4: link must update the inode nlink. · 1301e421
Trond Myklebust authored Apr 01, 2021
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
1301e421
NFSv4: nfs4_inc/dec_nlink_locked should also invalidate ctime · 82eae5a4
Trond Myklebust authored Apr 01, 2021
```
If the nlink changes, then so will the ctime.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
82eae5a4

13 Apr, 2021 15 commits

NFS: Another inode revalidation improvement · 7b24dacf

Trond Myklebust authored Apr 09, 2021

If we're trying to update the inode because a previous update left the
cache in a partially unrevalidated state, then allow the update if the
change attrs match.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7b24dacf

NFS: Use information about the change attribute to optimise updates · 6f9be83d

Trond Myklebust authored Mar 26, 2021

If the NFSv4.2 server supports the 'change_attr_type' attribute, then
allow the client to optimise its attribute cache update strategy.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6f9be83d

NFSv4: Add support for the NFSv4.2 "change_attr_type" attribute · 7f08a335

Trond Myklebust authored Mar 26, 2021

The change_attr_type allows the server to provide a description of how
the change attribute will behave. This again will allow the client to
optimise its behaviour w.r.t. attribute revalidation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

7f08a335

NFSv4: Don't modify the change attribute cached in the inode · 993e2d4b

Trond Myklebust authored Apr 12, 2021

When the client is caching data and a write delegation is held, then the
server may send a CB_GETATTR to query the attributes. When this happens,
the client is supposed to bump the change attribute value that it
returns if it holds cached data.
However that process uses a value that is stored in the delegation. We
do not want to bump the change attribute held in the inode.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

993e2d4b

NFSv4: Fix value of decode_fsinfo_maxsz · 57a789a1

Trond Myklebust authored Mar 26, 2021

At least two extra fields have been added to fsinfo since this was last
updated.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

57a789a1

NFS: Simplify cache consistency in nfs_check_inode_attributes() · 04c63498

Trond Myklebust authored Mar 25, 2021

We should not be invalidating the access or acl caches in
nfs_check_inode_attributes(), since the point is we're unsure about
whether the contents of the struct nfs_fattr are fully up to date.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

04c63498

NFS: Remove a line of code that has no effect in nfs_update_inode() · c88c696c

Trond Myklebust authored Mar 25, 2021

Commit 0b467264 ("NFS: Fix attribute revalidation") changed the way
we populate the 'invalid' attribute, and made the line that strips away
the NFS_INO_INVALID_ATTR bits redundant.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c88c696c

NFS: Fix up handling of outstanding layoutcommit in nfs_update_inode() · 709fa576

Trond Myklebust authored Mar 25, 2021

If there is an outstanding layoutcommit, then the list of attributes
whose values are expected to change is not the full set. So let's
be explicit about the full list.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

709fa576

NFS: Separate tracking of file mode cache validity from the uid/gid · 720869eb

Trond Myklebust authored Apr 13, 2021

chown()/chgrp() and chmod() are separate operations, and in addition,
there are mode operations that are performed automatically by the
server. So let's track mode validity separately from the file ownership
validity.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

720869eb

NFS: Separate tracking of file nlinks cache validity from the mode/uid/gid · fabf2b34

Trond Myklebust authored Mar 25, 2021

Rename can cause us to revalidate the access cache, so lets track the
nlinks separately from the mode/uid/gid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

fabf2b34

NFSv4: Fix nfs4_bitmap_copy_adjust() · a71029b8

Trond Myklebust authored Apr 10, 2021

Don't remove flags from the set retrieved from the cache_validity.
We do want to retrieve all attributes that are listed as being
invalid, whether or not there is a delegation set.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

a71029b8

NFS: Don't set NFS_INO_REVAL_PAGECACHE in the inode cache validity · 36a9346c

Trond Myklebust authored Mar 25, 2021

It is no longer necessary to preserve the NFS_INO_REVAL_PAGECACHE flag.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

36a9346c

NFS: Replace use of NFS_INO_REVAL_PAGECACHE when checking cache validity · 13c0b082

Trond Myklebust authored Mar 25, 2021

When checking cache validity, be more specific than just 'we want to
check the page cache validity'. In almost all cases, we want to check
that change attribute, and possibly also the size.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

13c0b082

NFS: Add a cache validity flag argument to nfs_revalidate_inode() · 1f3208b2

Trond Myklebust authored Mar 25, 2021

Add an argument to nfs_revalidate_inode() to allow callers to specify
which attributes they need to check for validity.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

1f3208b2

NFS: nfs_setattr_update_inode() should clear the suid/sgid bits · 1f9f4328

Trond Myklebust authored Apr 12, 2021

When we do a 'chown' or 'chgrp', the server will clear the suid/sgid
bits. Ensure that we mirror that in nfs_setattr_update_inode().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

1f9f4328

12 Apr, 2021 8 commits

NFS: Fix up statx() results · 63cdd7ed

Trond Myklebust authored Mar 24, 2021

If statx has valid attributes available that weren't asked for, then
return them and set the result mask appropriately.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

63cdd7ed

NFS: Don't revalidate attributes that are not being asked for · e8764a6f

Trond Myklebust authored Mar 24, 2021

If the user doesn't set STATX_UID/GID/MODE, then don't care if they are
known to be stale. Ditto if we're not being asked for the file size.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e8764a6f

NFS: Fix up revalidation of space used · 4cdfeb64

Trond Myklebust authored Mar 24, 2021

Ensure that when the change attribute or the size change, we also
remember to revalidate the space used.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4cdfeb64

NFS: NFS_INO_REVAL_PAGECACHE should mark the change attribute invalid · 50c7a799

Trond Myklebust authored Mar 25, 2021

When we're looking to revalidate the page cache, we should just ensure
that we mark the change attribute invalid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

50c7a799

NFS: Mask out unsupported attributes in nfs_getattr() · 4eb6a823

Trond Myklebust authored Mar 25, 2021

We don't currently support STATX_BTIME, so don't advertise it in the
return values for nfs_getattr().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4eb6a823

NFS: Fix up inode cache tracing · 8a27c7cc

Trond Myklebust authored Mar 25, 2021

Add missing enum definitions and missing entries for
nfs_show_cache_validity().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

8a27c7cc

NFS: Deal correctly with attribute generation counter overflow · 9fdbfad1

Trond Myklebust authored Mar 29, 2021

We need to use unsigned long subtraction and then convert to signed in
order to deal correcly with C overflow rules.

Fixes: f5062003 ("NFS: Set an attribute barrier on all updates")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

9fdbfad1

NFSv4.2: Always flush out writes in nfs42_proc_fallocate() · 99f23783

Trond Myklebust authored Mar 28, 2021

Whether we're allocating or delallocating space, we should flush out the
pending writes in order to avoid races with attribute updates.

Fixes: 1e564d3d ("NFSv4.2: Fix a race in nfs42_proc_deallocate()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

99f23783