Commits · cefa587a40bb5333901486632d4062f40a146585 · Kirill Smelkov / linux

02 Mar, 2019 14 commits

NFS/flexfiles: Clean up mirror DS initialisation · cefa587a

Trond Myklebust authored Feb 28, 2019

Get rid of the redundant parameter and rename the function
ff_layout_mirror_valid() to ff_layout_init_mirror_ds() for clarity.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

cefa587a

NFS/flexfiles: Remove dead code in ff_layout_mirror_valid() · 29a23909

Trond Myklebust authored Feb 28, 2019

nfs4_ff_alloc_deviceid_node() guarantees that if mirror->mirror_ds is
a valid pointer, then so is mirror->mirror_ds->ds.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

29a23909

NFS/flexfile: Simplify nfs4_ff_layout_select_ds_stateid() · 4cbc8a57

Trond Myklebust authored Feb 28, 2019

Pass in a pointer to the mirror rather than forcing another
array access.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

4cbc8a57

NFS/flexfile: Simplify nfs4_ff_layout_ds_version() · 626d48b1

Trond Myklebust authored Feb 28, 2019

Pass in a pointer to the mirror rather than forcing another
array access.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

626d48b1

NFS/flexfiles: Simplify ff_layout_get_ds_cred() · 312cd4cb

Trond Myklebust authored Feb 28, 2019

Pass in a pointer to the mirror rather than forcing another
array access.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

312cd4cb

NFS/flexfiles: Simplify nfs4_ff_find_or_create_ds_client() · 561d6f8a

Trond Myklebust authored Feb 28, 2019

Pass in a pointer to the mirror rather than forcing another
array access.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

561d6f8a

NFS/flexfiles: Simplify nfs4_ff_layout_select_ds_fh() · 749da527

Trond Myklebust authored Feb 28, 2019

Pass in a pointer to the mirror rather than having to retrieve it from
the array and then verify the resulting pointer.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

749da527

NFS/flexfiles: Speed up read failover when DSes are down · 76c66905

Trond Myklebust authored Feb 14, 2019

If we notice that a DS may be down, we should attempt to read from the
other mirrors first before we go back to retry the dead DS.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

76c66905

NFS/flexfiles: Don't invalidate DS deviceids for being unresponsive · 17aaec81

Trond Myklebust authored Feb 26, 2019

If the DS is unresponsive, we want to just mark it as such, while
reporting the errors. If the server later returns the same deviceid
in a new layout, then we don't want to have to look it up again.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

17aaec81

NFS/flexfiles: Remove bogus checks for invalid deviceids · d082d4b5

Trond Myklebust authored Feb 26, 2019

We already check the deviceids before we start the RPC call.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

d082d4b5

NFS/flexfiles: Avoid unnecessary layout invalidations · 0a156dd5

Trond Myklebust authored Feb 27, 2019

In ff_layout_mirror_valid() we may not want to invalidate the layout
segment despite the call to GETDEVICEINFO failing. The reason is that
a read may still be able to make progress on another mirror.

So instead we let the caller (in this case nfs4_ff_layout_prepare_ds())
decide whether or not it needs to invalidate.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0a156dd5

NFS/flexfiles: refactor calls to fs4_ff_layout_prepare_ds() · 2444ff27

Trond Myklebust authored Feb 14, 2019

While we may want to skip attempting to connect to a downed mirror
when we're deciding which mirror to select for a read, we do not
want to do so once we've committed to attempting the I/O in
ff_layout_read/write_pagelist(), or ff_layout_initiate_commit()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

2444ff27

NFSv4: Handle early exit in layoutget by returning an error · 18c0778a

Trond Myklebust authored Feb 13, 2019

If the LAYOUTGET rpc call exits early without an error, convert it to
EAGAIN.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

18c0778a

NFS/flexfiles: Send LAYOUTERROR when failing over mirrored reads · f0922a6c

Trond Myklebust authored Feb 10, 2019

When a read to the preferred mirror returns an error, the flexfiles
driver records the error in the inode list and currently marks the
layout for return before failing over the attempted read to the next
mirror.
What we actually want to do is fire off a LAYOUTERROR to notify the
MDS that there is an issue with the preferred mirror, then we fail
over. Only once we've failed to read from all mirrors should we
return the layout.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

f0922a6c

01 Mar, 2019 8 commits

NFSv4.2: Add client support for the generic 'layouterror' RPC call · 3eb86093
Trond Myklebust authored Feb 08, 2019
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
3eb86093

NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated · a79f194a

Trond Myklebust authored Feb 27, 2019

If a layout segment gets invalidated while a pNFS I/O operation
is queued for transmission, then we ideally want to abort
immediately. This is particularly the case when there is a large
number of I/O related RPCs queued in the RPC layer, and the layout
segment gets invalidated due to an ENOSPC error, or an EACCES (because
the client was fenced). We may end up forced to spam the MDS with a
lot of otherwise unnecessary LAYOUTERRORs after that I/O fails.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

a79f194a

NFSv4/pnfs: Fix barriers in nfs4_mark_deviceid_unavailable() · 39a5201a

Trond Myklebust authored Feb 26, 2019

Fix the memory barriers in nfs4_mark_deviceid_unavailable() and
nfs4_test_deviceid_unavailable().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

39a5201a

NFS/flexfiles: Fix up sparse RCU annotations · 762bb7e9
Trond Myklebust authored Feb 26, 2019
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
762bb7e9

NFSv4/flexfiles: Fix invalid deref in FF_LAYOUT_DEVID_NODE() · 108bb4af

Trond Myklebust authored Feb 26, 2019

If the attempt to instantiate the mirror's layout DS pointer failed,
then that pointer may hold a value of type ERR_PTR(), so we need
to check that before we dereference it.

Fixes: 65990d1a ("pNFS/flexfiles: Fix a deadlock on LAYOUTGET")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

108bb4af

NFS: Add missing encode / decode sequence_maxsz to v4.2 operations · 1a3466ae

Anna Schumaker authored Mar 01, 2019

These really should have been there from the beginning, but we never
noticed because there was enough slack in the RPC request for the extra
bytes. Chuck's recent patch to use au_cslack and au_rslack to compute
buffer size shrunk the buffer enough that this was now a problem for
SEEK operations on my test client.

Fixes: f4ac1674 ("nfs: Add ALLOCATE support")
Fixes: 2e72448b ("NFS: Add COPY nfs operation")
Fixes: cb95deea ("NFS OFFLOAD_CANCEL xdr")
Fixes: 624bd5b7 ("nfs: Add DEALLOCATE support")
Fixes: 1c6dcbe5 ("NFS: Implement SEEK")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

1a3466ae

NFSv4.1: Don't process the sequence op more than once. · c71c46f0

Trond Myklebust authored Mar 01, 2019

Ensure that if we call nfs41_sequence_process() a second time for the
same rpc_task, then we only process the results once.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c71c46f0

NFSv4.1: Reinitialise sequence results before retransmitting a request · c1dffe0b

Trond Myklebust authored Mar 01, 2019

If we have to retransmit a request, we should ensure that we reinitialise
the sequence results structure, since in the event of a signal
we need to treat the request as if it had not been sent.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org

c1dffe0b

26 Feb, 2019 1 commit

SUNRPC: Fix an Oops in udp_poll() · a73881c9

Trond Myklebust authored Feb 26, 2019

udp_poll() checks the struct file for the O_NONBLOCK flag, so we must not
call it with a NULL file pointer.

Fixes: 0ffe86f4 ("SUNRPC: Use poll() to fix up the socket requeue races")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

a73881c9

25 Feb, 2019 1 commit

Merge tag 'nfs-rdma-for-5.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs · 06b5fc3a

Trond Myklebust authored Feb 25, 2019

NFSoRDMA client updates for 5.1

New features:
- Convert rpc auth layer to use xdr_streams
- Config option to disable insecure enctypes
- Reduce size of RPC receive buffers

Bugfixes and cleanups:
- Fix sparse warnings
- Check inline size before providing a write chunk
- Reduce the receive doorbell rate
- Various tracepoint improvements

[Trond: Fix up merge conflicts]
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

06b5fc3a

23 Feb, 2019 1 commit

NFS/pnfs: Bulk destroy of layouts needs to be safe w.r.t. umount · 5085607d

Trond Myklebust authored Feb 22, 2019

If a bulk layout recall or a metadata server reboot coincides with a
umount, then holding a reference to an inode is unsafe unless we
also hold a reference to the super block.

Fixes: fd9a8d71 ("NFSv4.1: Fix bulk recall and destroy of layouts")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

5085607d

21 Feb, 2019 2 commits

NFS: Fix a soft lockup in the delegation recovery code · 6f9449be

Trond Myklebust authored Feb 21, 2019

Fix a soft lockup when NFS client delegation recovery is attempted
but the inode is in the process of being freed. When the
igrab(inode) call fails, and we have to restart the recovery process,
we need to ensure that we won't attempt to recover the same delegation
again.

Fixes: 45870d69 ("NFSv4.1: Test delegation stateids when server...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

6f9449be

NFSv4.1: Avoid false retries when RPC calls are interrupted · 3453d570

Trond Myklebust authored Jun 20, 2018

A 'false retry' in NFSv4.1 occurs when the client attempts to transmit a
new RPC call using a slot+sequence number combination that references an
already cached one. Currently, the Linux NFS client will do this if a
user process interrupts an RPC call that is in progress.
The problem with doing so is that we defeat the main mechanism used by
the server to differentiate between a new call and a replayed one. Even
if the server is able to perfectly cache the arguments of the old call,
it cannot know if the client intended to replay or send a new call.

The obvious fix is to bump the sequence number pre-emptively if an
RPC call is interrupted, but in order to deal with the corner cases
where the interrupted call is not actually received and processed by
the server, we need to interpret the error NFS4ERR_SEQ_MISORDERED
as a sign that we need to either wait or locate a correct sequence
number that lies between the value we sent, and the last value that
was acked by a SEQUENCE call on that slot.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: Jason Tibbitts <tibbs@math.uh.edu>

3453d570

20 Feb, 2019 13 commits

SUNRPC: Remove the redundant 'zerocopy' argument to xs_sendpages() · 6f903b11
Trond Myklebust authored Feb 19, 2019
```
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
```
6f903b11

SUNRPC: Further cleanups of xs_sendpages() · c87dc4c7

Trond Myklebust authored Feb 19, 2019

Now that we send the pages using a struct msghdr, instead of
using sendpage(), we no longer need to 'prime the socket' with
an address for unconnected UDP messages.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

c87dc4c7

SUNRPC: Convert socket page send code to use iov_iter() · 0472e476

Trond Myklebust authored Feb 19, 2019

Simplify the page send code using iov_iter and bvecs.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

0472e476

SUNRPC: Convert xs_send_kvec() to use iov_iter_kvec() · e791f8e9

Trond Myklebust authored Feb 19, 2019

Prepare to the socket transmission code to use iov_iter.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e791f8e9

SUNRPC: Initiate a connection close on an ESHUTDOWN error in stream receive · 5f52a9d4

Trond Myklebust authored Feb 16, 2019

If the client stream receive code receives an ESHUTDOWN error either
because the server closed the connection, or because it sent a
callback which cannot be processed, then we should shut down
the connection.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

5f52a9d4

SUNRPC: Don't suppress socket errors when a message read completes · 727fcc64

Trond Myklebust authored Feb 15, 2019

If the message read completes, but the socket returned an error
condition, we should ensure to propagate that error.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

727fcc64

SUNRPC: Handle zero length fragments correctly · e92053a5

Trond Myklebust authored Feb 15, 2019

A zero length fragment is really a bug, but let's ensure we don't
go nuts when one turns up.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

e92053a5

SUNRPC: Don't reset the stream record info when the receive worker is running · ae053551

Trond Myklebust authored Feb 20, 2019

To ensure that the receive worker has exclusive access to the stream record
info, we must not reset the contents other than when holding the
transport->recv_mutex.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ae053551

nfs: fix xfstest generic/099 failed on nfsv3 · ded52fbe

ZhangXiaoxu authored Feb 18, 2019

After setxattr, the nfsv3 cached the acl which set by user.

But at the backend, the shared file system (eg. ext4) will check
the acl, if it can merged with mode, it won't add acl to the file.
So, the nfsv3 cached acl is redundant.

Don't 'set_cached_acl' when setxattr.
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

ded52fbe

pNFS: Avoid read/modify/write when it is not necessary · 2cde04e9

Kazuo Ito authored Feb 14, 2019

As the block and SCSI layouts can only read/write fixed-length
blocks, we must perform read-modify-write when data to be written is
not aligned to a block boundary or smaller than the block size.
(612aa983 pnfs: add flag to force read-modify-write in ->write_begin)

The current code tries to see if we have to do read-modify-write
on block-oriented pNFS layouts by just checking !PageUptodate(page),
but the same condition also applies for overwriting of any uncached
potions of existing files, making such operations excessively slow
even it is block-aligned.

The change does not affect the optimization for modify-write-read
cases (38c73044 NFS: read-modify-write page updating),
because partial update of !PageUptodate() pages can only happen
in layouts that can do arbitrary length read/write and never
in block-based ones.

Testing results:

We ran fio on one of the pNFS clients running 4.20 kernel
(vanilla and patched) in this configuration to read/write/overwrite
files on the storage array, exported as pnfs share by the server.

 pNFS clients ---1G Ethernet--- pNFS server
 (HP DL360 G8)                  (HP DL360 G8)
       |                              |
       |                              |
       +------8G Fiber Channel--------+
                     |
               Storage Array
                 (HP P6350)

Throughput of overwrite (both buffered and O_SYNC) is noticeably
improved.

Ops.     |block size|   Throughput   |
         |  (KiB)   |    (MiB/s)     |
         |          |  4.20 | patched|
---------+----------+----------------+
buffered |         4|  21.3 |  232   |
overwrite|        32|  22.2 |  256   |
         |       512|  22.4 |  260   |
---------+----------+----------------+
O_SYNC   |         4|   3.84|    4.77|
overwrite|        32|  12.2 |   32.0 |
         |       512|  18.5 |  152   |
---------+----------+----------------+

Read and write (buffered and O_SYNC) by the same client remain unchanged
by the patch either negatively or positively, as they should do.

Ops.     |block size|   Throughput   |
         |  (KiB)   |    (MiB/s)     |
         |          |  4.20 | patched|
---------+----------+----------------+
read     |         4| 548   |  550   |
         |        32| 547   |  551   |
         |       512| 548   |  551   |
---------+----------+----------------+
buffered |         4| 237   |  244   |
write    |        32| 261   |  268   |
         |       512| 265   |  272   |
---------+----------+----------------+
O_SYNC   |         4|   0.46|    0.46|
write    |        32|   3.60|    3.57|
         |       512| 105   |  106   |
---------+----------+----------------+
Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
Tested-by: Hiroyuki Watanabe <watanabe.hiroyuki@lab.ntt.co.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

2cde04e9

pNFS: Fix potential corruption of page being written · 97ae91bb

Kazuo Ito authored Feb 14, 2019

nfs_want_read_modify_write() didn't check for !PagePrivate when pNFS
block or SCSI layout was in use, therefore we could lose data forever
if the page being written was filled by a read before completion.
Signed-off-by: Kazuo Ito <ito_kazuo_g3@lab.ntt.co.jp>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

97ae91bb

NFS: Fix typo in comments of nfs_readdir_alloc_pages() · bf211ca1

zhangliguang authored Feb 16, 2019

This fixes the typo in comments of nfs_readdir_alloc_pages().
Because nfs_readdir_large_page and nfs_readdir_free_pagearray had been
renamed.
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

bf211ca1

NFS: Remove redundant semicolon · 42f72cf3

zhangliguang authored Feb 12, 2019

This removes redundant semicolon for ending code.

Fixes: c7944ebb ("NFSv4: Fix lookup revalidate of regular files")
Signed-off-by: Liguang Zhang <zhangliguang@linux.alibaba.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

42f72cf3