Commits · e71708d4df1d4b81427badb9ac4bc4a813338b17 · Kirill Smelkov / linux

19 Dec, 2016 14 commits

pNFS: Return RW layouts on OPEN_DOWNGRADE · e71708d4

Trond Myklebust authored Nov 21, 2016

If the client holds no more writeable open state, and does not hold a
write delegation, then send a layoutreturn as part of the OPEN_DOWNGRADE.

We do this only for writes, since some layout drivers may require you to
also hold a read layout if you are doing a R/W workload.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

e71708d4

NFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADE · b6808145

Trond Myklebust authored Nov 20, 2016

While we do not need to return the RW layout when downgrading from a
read/write open state to read-only, we might want to do so in order
to reduce the burden on the metadataserver so that it does not need
to check for changed data when responding to GETATTR requests.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b6808145

NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID · 86cfb041

NeilBrown authored Dec 19, 2016

When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
the ->state_owners rbtree so that it will no longer be used.

If any stateids attached to this open-owner are still in use, and if a
request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.

The state is marked as needing recovery and the nfs4_state_manager()
is scheduled to clean up. nfs4_state_manager() finds states to be
recovered by walking the state_owners rbtree. As the open-owner is
not in the rbtree, the bad state is not found so nfs4_state_manager()
completes having done nothing. The request is then retried, with a
predicatable result (indefinite retries).

If the stateid is for a delegation, this open_owner will be used
to open files when the delegation is returned. For that to work,
a new open-owner needs to be presented to the server.

This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
in the rbtree but updates the 'create_time' so it looks like a new
open-owner. With this the indefinite retries no longer happen.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

86cfb041

NFSv4: ensure __nfs4_find_lock_state returns consistent result. · 3f8f2548

NeilBrown authored Dec 19, 2016

If a file has both flock locks and OFD locks, then it is possible that
two different nfs4 lock states could apply to file accesses from a
single process.

It is not possible to know, efficiently, which one is "correct".
Presumably the state which represents a lock that covers the region
undergoing IO would be the "correct" one to use, but finding that has
a non-trivial cost and would provide miniscule value.

Currently we just return whichever is first in the list, which could
result in inconsistent behaviour if an application ever put it self in
this position.  As consistent behaviour is preferable (when perfectly
correct behaviour is not available), change the search to return a
consistent result in this circumstance.
Specifically: if there is both a flock and OFD lock state, always return
the flock one.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

3f8f2548

NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. · cfd278c2

NeilBrown authored Dec 19, 2016

Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
then ds->ds_clp will also be non-NULL.

This is not necessasrily true in the case when the process received a fatal signal
while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
In that case ->ds_clp may not be set, and the devid may not recently have been marked
unavailable.

So add a test for ds_clp == NULL and return NULL in that case.

Fixes: c23266d5 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Olga Kornievskaia <aglo@umich.edu>
Acked-by: Adamson, Andy <William.Adamson@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

cfd278c2

pNFS/flexfiles: delete deviceid, don't mark inactive · 1c48cee8

Weston Andros Adamson authored Dec 14, 2016

Instead of marking a device inactive, remove it from the cache entirely.

Flexfiles has a way to report errors back to the server, so we don't want
to stop devices from being tried again for 120 seconds.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1c48cee8

NFS: Clean up nfs_attribute_timeout() · 187e593d
Trond Myklebust authored Dec 16, 2016
```
It can be made static.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
187e593d
NFS: Remove unused function nfs_revalidate_inode_rcu() · 3f642a13
Trond Myklebust authored Dec 16, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
3f642a13

NFS: Fix and clean up the access cache validity checking · 21c3ba7e

Trond Myklebust authored Dec 16, 2016

The access cache needs to check whether or not the mode bits, ownership,
or ACL has changed or the cache has timed out.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

21c3ba7e

NFS: Only look at the change attribute cache state in nfs_weak_revalidate() · 9cdd1d3f

Trond Myklebust authored Dec 16, 2016

Just like in nfs_check_verifier(), we want to use
nfs_mapping_need_revalidate_inode() to check our knowledge of the
change attribute is up to date.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

9cdd1d3f

NFS: Clean up cache validity checking · 61540bf6

Trond Myklebust authored Dec 08, 2016

Consolidate the open-coded checking of NFS_I(inode)->cache_validity
into a couple of helper functions.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

61540bf6

NFS: Don't revalidate the file on close if we hold a delegation · 58ff4184

Trond Myklebust authored Dec 16, 2016

If we're holding a delegation, we can skip sending the close-to-open
GETATTR until we're returning that delegation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

58ff4184

NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN · 0bc2c9b4

Trond Myklebust authored Dec 16, 2016

DELEGRETURN will always carry a reference to the inode except when
the latter is being freed, so let's ensure that we always use that
inode information to ensure close-to-open cache consistency, even
when the DELEGRETURN call is asynchronous.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

0bc2c9b4

NFSv4: Update the attribute cache info in update_changeattr · e603a4c1

Trond Myklebust authored Dec 16, 2016

If we successfully updated the change attribute, we should timestamp the
cache. While we do know that the other attributes are not completely up
to date, we have the NFS_INO_INVALID_ATTR flag that let us know that,
so it is valid to say that the cache has not timed out.
We can also clear NFS_INO_REVAL_PAGECACHE, since our change attribute
is now known to be valid.

Conversely, if the change attribute did not match, we should make sure to
also revalidate the access and ACL caches.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

e603a4c1

10 Dec, 2016 5 commits

Merge tag 'nfs-rdma-4.10-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma · 2549f307

Trond Myklebust authored Dec 10, 2016

NFS: NFSoRDMA Client Side Changes

New Features:
- Support for SG_GAP devices

Bugfixes and cleanups:
- Cap size of callback buffer resources
- Improve send queue and RPC metric accounting
- Fix coverity warning
- Avoid calls to ro_unmap_safe()
- Refactor FRMR invalidation
- Error message improvements

2549f307

SUNRPC: fix refcounting problems with auth_gss messages. · 1cded9d2

NeilBrown authored Dec 05, 2016

There are two problems with refcounting of auth_gss messages.

First, the reference on the pipe->pipe list (taken by a call
to rpc_queue_upcall()) is not counted.  It seems to be
assumed that a message in pipe->pipe will always also be in
pipe->in_downcall, where it is correctly reference counted.

However there is no guaranty of this.  I have a report of a
NULL dereferences in rpc_pipe_read() which suggests a msg
that has been freed is still on the pipe->pipe list.

One way I imagine this might happen is:
- message is queued for uid=U and auth->service=S1
- rpc.gssd reads this message and starts processing.
  This removes the message from pipe->pipe
- message is queued for uid=U and auth->service=S2
- rpc.gssd replies to the first message. gss_pipe_downcall()
  calls __gss_find_upcall(pipe, U, NULL) and it finds the
  *second* message, as new messages are placed at the head
  of ->in_downcall, and the service type is not checked.
- This second message is removed from ->in_downcall and freed
  by gss_release_msg() (even though it is still on pipe->pipe)
- rpc.gssd tries to read another message, and dereferences a pointer
  to this message that has just been freed.

I fix this by incrementing the reference count before calling
rpc_queue_upcall(), and decrementing it if that fails, or normally in
gss_pipe_destroy_msg().

It seems strange that the reply doesn't target the message more
precisely, but I don't know all the details.  In any case, I think the
reference counting irregularity became a measureable bug when the
extra arg was added to __gss_find_upcall(), hence the Fixes: line
below.

The second problem is that if rpc_queue_upcall() fails, the new
message is not freed. gss_alloc_msg() set the ->count to 1,
gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
then the pointer is discarded so the memory never gets freed.

Fixes: 9130b8db ("SUNRPC: allow for upcalls for same uid but different gss service")
Cc: stable@vger.kernel.org
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1cded9d2

nfs: add support for the umask attribute · dff25ddb

Andreas Gruenbacher authored Dec 02, 2016

Clients can set the umask attribute when creating files to cause the
server to apply it always except when inheriting permissions from the
parent directory. That way, the new files will end up with the same
permissions as files created locally.

See https://tools.ietf.org/html/draft-ietf-nfsv4-umask-02 for more details.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

dff25ddb

pNFS/flexfiles: Ensure we have enough buffer for layoutreturn · d9152114

Trond Myklebust authored Dec 09, 2016

The flexfiles client can piggyback both layout errors and layoutstats
as part of the layoutreturn. Both these payloads can get large, with
20 layout error entries taking up about 1.2K, and 4 layoutstats entries
taking up another 1K.
This patch allows a maximum payload of 4k by allocating a full page.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d9152114

pNFS/flexfiles: Remove a redundant parameter in ff_layout_encode_ioerr() · 5ba6a09e
Trond Myklebust authored Dec 09, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
5ba6a09e

09 Dec, 2016 1 commit

pNFS/flexfiles: Fix a deadlock on LAYOUTGET · 65990d1a

Fred Isaman authored Sep 30, 2016

  We encountered a deadlock where the SEQUENCE that accompanied the
LAYOUTGET triggered a session drain, while ff_layout_alloc_lseg
triggered a GETDEVICEINFO.  The GETDEVICEINFO hung waiting for the
session drain, while the LAYOUTGET held the slot waiting for
alloc_lseg to finish.
  Avoid this by moving the call to nfs4_find_get_deviceid out of
ff_layout_alloc_lseg and into nfs4_ff_layout_prepare_ds.
Signed-off-by: Fred Isaman <fred.isaman@gmail.com>
[dros@primarydata.com: pNFS/flexfiles: fix races in ff_layout_mirror_valid]
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

65990d1a

07 Dec, 2016 3 commits

pNFS: Layoutreturn must free the layout after the layout-private data · 2f065ddb

Trond Myklebust authored Dec 07, 2016

The layout-private data may depend on the layout and/or the inode
still existing when it does post-processing and frees its data, so we
need to free them after calling lrp->ld_private.ops->free().

This fixes a mirror list corruption issue in the flexfiles driver.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

2f065ddb

pNFS/flexfiles: Fix ff_layout_add_ds_error_locked() · cb067935

Trond Myklebust authored Dec 06, 2016

When we're merging an old entry into our new entry, we want to ensure that
we add the list entry in the correct place.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

cb067935

NFSv4: Add missing nfs_put_lock_context() · 7a0566b3

NeilBrown authored Dec 06, 2016

Otherwise the lock context won't be freed when we're done with it.

From: NeilBrown <neilb@suse.com>
Fixes: 5bd3f817 ("NFSv4: change nfs4_select_rw_stateid to take a lock_context inplace of lock_owner")
Signed-off-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

7a0566b3

06 Dec, 2016 1 commit

pNFS: Release NFS_LAYOUT_RETURN when invalidating the layout stateid · 362fb578

Trond Myklebust authored Dec 05, 2016

Ensure we release the NFS_LAYOUT_RETURN lock when we invalidate the
layout stateid, so that processes and RPC tasks that are waiting on
the layout return can continue.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

362fb578

05 Dec, 2016 2 commits

NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery() · d94cbf6c

Trond Myklebust authored Dec 04, 2016

If the session has an error, then we want to start by recovering the
session, as any SEQUENCE we send is going to fail with a session
error.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d94cbf6c

NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE · 2cf10cdd

Trond Myklebust authored Dec 04, 2016

In the case where SEQUENCE receives a NFS4ERR_BADSESSION or
NFS4ERR_DEADSESSION error, we just want to report the session as needing
recovery, and then we want to retry the operation.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

2cf10cdd

04 Dec, 2016 3 commits

NFS: Only look at the change attribute cache state in nfs_check_verifier · 1cd9cb05

Trond Myklebust authored Dec 04, 2016

When looking at whether or not our dcache is valid, we really don't care
about the general state of the directory attribute cache. Instead, we
we only care about the state of the change attribute.

This fixes a performance issue when the client is responsible for
changing the directory contents; a number of NFSv4 operations will
atomically update the directory change attribute, but may not return
all the other attributes.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1cd9cb05

NFS: Fix incorrect size revalidation when holding a delegation · 9310b224

Trond Myklebust authored Dec 04, 2016

We should only care about checking the attributes if the page cache
is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the
NFS_INO_REVAL_FORCED flag is set.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

9310b224

NFS: Fix incorrect mapping revalidation when holding a delegation · 10727772

Trond Myklebust authored Dec 04, 2016

We should only care about checking the attributes if the page cache
is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the
NFS_INO_REVAL_FORCED flag is set.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

10727772

03 Dec, 2016 7 commits

pNFS/flexfiles: Support sending layoutstats in layoutreturn · 230bc962

Trond Myklebust authored Oct 19, 2016

Add the ability to send an array of layoutstats entries as part of
layoutreturn.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

230bc962

pNFS/flexfiles: Minor refactoring before adding iostats to layoutreturn · 422c93c8
Trond Myklebust authored Oct 06, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
422c93c8

NFS: Fix up read of mirror stats · 2f8220c1

Trond Myklebust authored Oct 03, 2016

Need to lock while reading in order to ensure 64-bit reads are correct.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

2f8220c1

pNFS/flexfiles: Clean up layoutstats · 08e2e5bc
Trond Myklebust authored Sep 29, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
08e2e5bc

pNFS/flexfiles: Refactor encoding of the layoutreturn payload · 5b9b3c85

Trond Myklebust authored Dec 02, 2016

Add the layout error payload to the flexfiles layoutreturn private
data, and set up the encoding mechanisms. This is a refactoring in
preparation for adding the layout iostats payload.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

5b9b3c85

pNFS: Add a layoutreturn callback to performa layout-private setup · 287bd3e9

Trond Myklebust authored Dec 02, 2016

Add a callback to allow the flexfiles layout driver to initialise the
layout private payload.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

287bd3e9

pNFS: Allow layout drivers to manage private data in struct nfs4_layoutreturn · 4d796d75

Trond Myklebust authored Sep 23, 2016

Cleanup to allow layout drivers to attach private data to layoutreturn,
and manage the data.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

4d796d75

02 Dec, 2016 4 commits

NFSv4: Add a generic structure for managing layout-private information · f8c3cf9d
Trond Myklebust authored Oct 20, 2016
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
f8c3cf9d

pNFS/flexfiles: Only send layoutstats updates for mirrors that were updated · 06946c6a

Trond Myklebust authored Nov 25, 2016

If there have been no reads or writes to a given mirror since the last
layoutstats update, then don't resend the same data.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

06946c6a

pNFS/flexfiles: Don't attempt to send layoutstats if there are no entries · 46c98c6d

Trond Myklebust authored Nov 25, 2016

If the list of mirrors is empty, then don't send an RPC call.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

46c98c6d

NFS: Allow getattr to also report readdirplus cache hits · 1bcf4c5c

Trond Myklebust authored Dec 02, 2016

If the use called stat() on an 'ls -l' workload, and the attribute
cache was successfully revalidate by READDIRPLUS, then we want to
report that back so that the readdir code continues to use
readdirplus.
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1bcf4c5c