Commits · 3471648a7569512e10f154cdfe5076c341a5c099 · nexedi / linux

28 Jul, 2015 1 commit

nfs: plug memory leak when ->prepare_layoutcommit fails · 3471648a

Jeff Layton authored Jul 10, 2015

"data" is currently leaked when the prepare_layoutcommit operation
returns an error. Put the cred before taking the spinlock in that
case, take the lock and then goto out_unlock which will drop the
lock and then free "data".
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

3471648a

27 Jul, 2015 5 commits

SUNRPC: Report TCP errors to the caller · f580dd04
Trond Myklebust authored Jul 11, 2015
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
f580dd04

sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable. · 743c69e7

NeilBrown authored Jul 27, 2015

The networking layer does not reliably report the distinction between
a non-block write failing because:
 1/ the queue is too full already and
 2/ a memory allocation attempt failed.

The distinction is important because in the first case it is
appropriate to retry as soon as the socket reports that it is
writable, and in the second case a small delay is required as the
socket will most likely report as writable but kmalloc could still
fail.

sk_stream_wait_memory() exhibits this distinction nicely, setting
'vm_wait' if a small wait is needed.  However in the non-blocking case
it always returns -EAGAIN no matter the cause of the failure.  This
-EAGAIN call get all the way to sunrpc.

The sunrpc layer expects EAGAIN to indicate the first cause, and
ENOBUFS to indicate the second.  Various documentation suggests that
this is not unreasonable, but does not guarantee the desired error
codes.

The result of getting -EAGAIN when -ENOBUFS is expected is that the
send is tried again in a tight loop and soft lockups are reported.

so: add tests after calls to xs_sendpages() to translate -EAGAIN into
-ENOBUFS if the socket is writable.  This cannot happen inside
xs_sendpages() as the test for "is socket writable" is different
between TCP and UDP.

With this change, the tight loop retrying xs_sendpages() becomes a
loop which only retries every 250ms, and so will not trigger a
soft-lockup warning.

It is possible that the write did fail because the queue was too full
and by the time xs_sendpages() completed, the queue was writable
again.  In this case an extra 250ms delay is inserted that isn't
really needed.  This circumstance suggests a degree of congestion so a
delay is not necessarily a bad thing, and it can only cause a single
250ms delay, not a series of them.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

743c69e7

NFSv4.2: handle NFS-specific llseek errors · bdcc2cd1

J. Bruce Fields authored Jul 23, 2015

Handle NFS-specific llseek errors instead of letting them leak out to
userspace.
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

bdcc2cd1

NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce() · d4c30454

Trond Myklebust authored Jul 24, 2015

Recoalescing does not affect whether or not we've already sent off
I/O, and doing so means that we end up sending a bunch of synchronous
for cases where we actually need to be using unstable writes.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d4c30454

NFS: Fix a memory leak in nfs_do_recoalesce · 03d5eb65

Trond Myklebust authored Jul 27, 2015

If the function exits early, then we must put those requests that were
not processed back onto the &mirror->pg_list so they can be cleaned up
by nfs_pgio_error().

Fixes: a7d42ddb ("nfs: add mirroring support to pgio layer")
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

03d5eb65

22 Jul, 2015 8 commits

NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHE · 115c48d7

Trond Myklebust authored Jul 05, 2015

I'm not aware of any existing bugs around this, but the expectation is
that nfs_mark_for_revalidate() should always force a revalidation of
the cached metadata.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

115c48d7

NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability · cd812599

Trond Myklebust authored Jul 05, 2015

Setting the change attribute has been mandatory for all NFS versions, since
commit 3a1556e8 ("NFSv2/v3: Simulate the change attribute"). We should
therefore not have anything be conditional on it being set/unset.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

cd812599

NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised · 5c675d64

Trond Myklebust authored Jul 05, 2015

We can't allow caching of data until the change attribute has been
initialised correctly.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

5c675d64

NFS: Don't revalidate the mapping if both size and change attr are up to date · 85a23cee

Trond Myklebust authored Jul 05, 2015

If we've ensured that the size and the change attribute are both correct,
then there is no point in marking those attributes as needing revalidation
again. Only do so if we know the size is incorrect and was not updated.

Fixes: f2467b6f ("NFS: Clear NFS_INO_REVAL_PAGECACHE when...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

85a23cee

NFSv4/pnfs: Ensure we don't miss a file extension · 2b83d3de

Trond Myklebust authored Jul 05, 2015

pNFS writes don't return attributes, however that doesn't mean that we
should ignore the fact that they may be extending the file. This patch
ensures that if a write is seen to extend the file, then we always set
an attribute barrier, and update the cached file size.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

2b83d3de

NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked · 3c38cbe2

Trond Myklebust authored Jul 22, 2015

Otherwise, nfs4_select_rw_stateid() will always return the zero stateid
instead of the correct open stateid.

Fixes: f95549cf ("NFSv4: More CLOSE/OPEN races")
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

3c38cbe2

SUNRPC: xprt_complete_bc_request must also decrement the free slot count · 1980bd4d

Trond Myklebust authored Jul 22, 2015

Calling xprt_complete_bc_request() effectively causes the slot to be allocated,
so it needs to decrement the backchannel free slot count as well.

Fixes: 0d2a970d ("SUNRPC: Fix a backchannel race")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

1980bd4d

SUNRPC: Fix a backchannel deadlock · 68514471

Trond Myklebust authored Jul 22, 2015

xprt_alloc_bc_request() cannot call xprt_free_bc_request() without
deadlocking, since it already holds the xprt->bc_pa_lock.
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Fixes: 0d2a970d ("SUNRPC: Fix a backchannel race")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

68514471

11 Jul, 2015 5 commits

pNFS: Don't throw out valid layout segments · faa4a54f

Trond Myklebust authored Jul 09, 2015

It is OK for layout segments to remain hashed even if no-one holds any
references to them, provided that the segments are still valid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

faa4a54f

pNFS: pnfs_roc_drain() fix a race with open · bdc59cf2

Trond Myklebust authored Jul 09, 2015

If a process reopens the file before we can send off the CLOSE/DELEGRETURN,
then pnfs_roc_drain() may end up waiting for a new set of layout segments
that are marked as return-on-close, but haven't yet been returned.

Fix this by only waiting for those layout segments that were invalidated in
pnfs_roc().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

bdc59cf2

pNFS: Fix races between return-on-close and layoutreturn. · 7f27392c

Trond Myklebust authored Jul 09, 2015

If one or more of the layout segments reports an error during I/O, then
we may have to send a layoutreturn to report the error back to the NFS
metadata server.
This patch ensures that the return-on-close code can detect the
outstanding layoutreturn, and not preempt it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

7f27392c

pNFS: pnfs_roc_drain should return 'true' when sleeping · df9cecc1

Trond Myklebust authored Jul 09, 2015

Also clean up the case where we don't find a return-on-close layout segment.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

df9cecc1

pNFS: Layoutreturn must invalidate all existing layout segments. · c5d73716
Trond Myklebust authored Jul 09, 2015
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
c5d73716

08 Jul, 2015 1 commit
- NFSv4.2/flexfiles: Fix a typo in the flexfiles layoutstats code · 690edcfa
  Trond Myklebust authored Jul 08, 2015
```
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
```
  690edcfa
05 Jul, 2015 5 commits

NFSv4: Leases are renewed in sequence_done when we have sessions · be824167

Trond Myklebust authored Jul 05, 2015

Ensure that the calls to renew_lease() in open_done() etc. only apply
to session-less versions of NFSv4.x (i.e. NFSv4.0).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

be824167

NFSv4.1: nfs41_sequence_done should handle sequence flag errors · b15c7cdd

Trond Myklebust authored Jul 05, 2015

Instead of just kicking off lease recovery, we should look into the
sequence flag errors and handle them.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b15c7cdd

NFSv4.1: Handle SEQ4_STATUS_BACKCHANNEL_FAULT correctly · b1352905

Trond Myklebust authored Jul 05, 2015

RFC5661 states:

      The server has encountered an unrecoverable fault with the
      backchannel (e.g., it has lost track of the sequence ID for a slot
      in the backchannel).  The client MUST stop sending more requests
      on the session's fore channel, wait for all outstanding requests
      to complete on the fore and back channel, and then destroy the
      session.

Ensure we do so...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b1352905

NFSv4.1: Handle SEQ4_STATUS_RECALLABLE_STATE_REVOKED status bit correctly · 4099287f

Trond Myklebust authored Jul 05, 2015

Try to handle this for now by invalidating all outstanding layouts for this
server and then testing all the open+lock+delegation stateids.
At some later stage, we may want to optimise by separating out the testing of
delegation stateids only, and adding testing of layout stateids.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

4099287f

NFSv4.1: Handle SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED status bit correctly. · 8b895ce6

Trond Myklebust authored Jul 05, 2015

If the server tells us that only some state has been revoked, then we
need to run the full TEST_STATEID dog and pony show in order to discover
which locks and delegations are still OK. Currently we blow away all
state, which means that we lose all locks!
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

8b895ce6

03 Jul, 2015 2 commits

SUNRPC: Don't confuse ENOBUFS with a write_space issue · b5872f0c

Trond Myklebust authored Jul 03, 2015

ENOBUFS means that memory allocations are failing due to an actual
low memory situation. It should not be confused with being out of
socket buffer space.

Handle the problem by just punting to the delay in call_status.
Reported-by: Neil Brown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b5872f0c

SUNRPC: Don't reencode message if transmission failed with ENOBUFS · 93aa6c7b

Trond Myklebust authored Jul 03, 2015

If we're running out of buffer memory when transmitting data, then
we want to just delay for a moment, and then continue transmitting
the remainder of the message.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

93aa6c7b

01 Jul, 2015 9 commits

nfs: Remove invalid tk_pid from debug message · b4839ebe

Kinglong Mee authored Jul 01, 2015

Before rpc_run_task(), tk_pid is uninitiated as 0 always.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

b4839ebe

nfs: Remove invalid NFS_ATTR_FATTR_V4_REFERRAL checking in nfs4_get_rootfh · d356a7d1

Kinglong Mee authored Jul 01, 2015

NFS_ATTR_FATTR_V4_REFERRAL is only set in nfs4_proc_lookup_common.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d356a7d1

nfs: Drop bad comment in nfs41_walk_client_list() · bc4da2a2

Kinglong Mee authored Jul 01, 2015

Commit 7b1f1fd1 "NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list"
have change the logical of the list_for_each_entry().
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

bc4da2a2

nfs: Remove unneeded micro checking of CONFIG_PROC_FS · cd738ee9

Kinglong Mee authored Jul 01, 2015

Have checking CONFIG_PROC_FS in include/linux/sunrpc/stats.h.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

cd738ee9

nfs: Don't setting FILE_CREATED flags always · 2785110d

Kinglong Mee authored Jul 01, 2015

Commit 5bc2afc2 "NFSv4: Honour the 'opened' parameter in the atomic_open()
 filesystem method" have support the opened arguments now.

Also,
Commit 03da633a "atomic_open: take care of EEXIST in no-open case with
 O_CREAT|O_EXCL in fs/namei.c" have change vfs's logical.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

2785110d

nfs: Use remove_proc_subtree() instead remove_proc_entry() · 6a062a36

Kinglong Mee authored Jul 01, 2015

Thanks for Al Viro's comments of killing proc_fs_nfs completely.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

6a062a36

nfs: Remove unused argument in nfs_server_set_fsinfo() · 71f81e51

Kinglong Mee authored Jul 01, 2015

Commit e38eb650 "NFS: set_pnfs_layoutdriver() from nfs4_proc_fsinfo()"
have remove the using of mntfh from nfs_server_set_fsinfo().
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

71f81e51

nfs: Fix a memory leak when meeting an unsupported state protect · 6b55970b

Kinglong Mee authored Jul 01, 2015

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

6b55970b

nfs: take extra reference to fl->fl_file when running a LOCKU operation · db2efec0

Jeff Layton authored Jun 30, 2015

Jean reported another crash, similar to the one fixed by feaff8e5:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
    IP: [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    PGD 0
    Oops: 0000 [#1] SMP
    Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache vmw_vsock_vmci_transport vsock cfg80211 rfkill coretemp crct10dif_pclmul ppdev vmw_balloon crc32_pclmul crc32c_intel ghash_clmulni_intel pcspkr vmxnet3 parport_pc i2c_piix4 microcode serio_raw parport nfsd floppy vmw_vmci acpi_cpufreq auth_rpcgss shpchp nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih ata_generic mptbase i2c_core pata_acpi
    CPU: 0 PID: 329 Comm: kworker/0:1H Not tainted 4.1.0-rc7+ #2
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
    Workqueue: rpciod rpc_async_schedule [sunrpc]
    30ec000
    RIP: 0010:[<ffffffff8124ef7f>]  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
    RSP: 0018:ffff8802330efc08  EFLAGS: 00010296
    RAX: ffff8802330efc58 RBX: ffff880097187c80 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
    RBP: ffff8802330efc18 R08: ffff88023fc173d8 R09: 3038b7bf00000000
    R10: 00002f1a02000000 R11: 3038b7bf00000000 R12: 0000000000000000
    R13: 0000000000000000 R14: ffff8802337a2300 R15: 0000000000000020
    FS:  0000000000000000(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000148 CR3: 000000003680f000 CR4: 00000000001407f0
    Stack:
     ffff880097187c80 ffff880097187cd8 ffff8802330efc98 ffffffff81250281
     ffff8802330efc68 ffffffffa013e7df ffff8802330efc98 0000000000000246
     ffff8801f6901c00 ffff880233d2b8d8 ffff8802330efc58 ffff8802330efc58
    Call Trace:
     [<ffffffff81250281>] __posix_lock_file+0x31/0x5e0
     [<ffffffffa013e7df>] ? rpc_wake_up_task_queue_locked.part.35+0xcf/0x240 [sunrpc]
     [<ffffffff8125088b>] posix_lock_file_wait+0x3b/0xd0
     [<ffffffffa03890b2>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
     [<ffffffffa0365808>] ? nfs41_sequence_done+0xd8/0x300 [nfsv4]
     [<ffffffffa0367525>] do_vfs_lock+0x35/0x40 [nfsv4]
     [<ffffffffa03690c1>] nfs4_locku_done+0x81/0x120 [nfsv4]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e310>] ? rpc_destroy_wait_queue+0x20/0x20 [sunrpc]
     [<ffffffffa013e33c>] rpc_exit_task+0x2c/0x90 [sunrpc]
     [<ffffffffa0134400>] ? call_refreshresult+0x170/0x170 [sunrpc]
     [<ffffffffa013ece4>] __rpc_execute+0x84/0x410 [sunrpc]
     [<ffffffffa013f085>] rpc_async_schedule+0x15/0x20 [sunrpc]
     [<ffffffff810add67>] process_one_work+0x147/0x400
     [<ffffffff810ae42b>] worker_thread+0x11b/0x460
     [<ffffffff810ae310>] ? rescuer_thread+0x2f0/0x2f0
     [<ffffffff810b35d9>] kthread+0xc9/0xe0
     [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pmd+0xa0/0x160
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
     [<ffffffff8173c222>] ret_from_fork+0x42/0x70
     [<ffffffff810b3510>] ? kthread_create_on_node+0x170/0x170
    Code: a5 81 e8 85 75 e4 ff c6 05 31 ee aa 00 01 eb 98 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 <48> 8b 9f 48 01 00 00 48 85 db 74 08 48 89 d8 5b 41 5c 5d c3 83
    RIP  [<ffffffff8124ef7f>] locks_get_lock_context+0xf/0xa0
     RSP <ffff8802330efc08>
    CR2: 0000000000000148
    ---[ end trace 64484f16250de7ef ]---

The problem is almost exactly the same as the one fixed by feaff8e5.
We must take a reference to the struct file when running the LOCKU
compound to prevent the final fput from running until the operation is
complete.
Reported-by: Jean Spector <jean@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

db2efec0

29 Jun, 2015 1 commit

NFSv4: When returning a delegation, don't reclaim an incompatible open mode. · 39f897fd

NeilBrown authored Jun 29, 2015

It is possible to have an active open with one mode, and a delegation
for the same file with a different mode.
In particular, a WR_ONLY open and an RD_ONLY delegation.
This happens if a WR_ONLY open is followed by a RD_ONLY open which
provides a delegation, but is then close.

When returning the delegation, we currently try to claim opens for
every open type (n_rdwr, n_rdonly, n_wronly).  As there is no harm
in claiming an open for a mode that we already have, this is often
simplest.

However if the delegation only provides a subset of the modes that we
currently have open, this will produce an error from the server.

So when claiming open modes prior to returning a delegation, skip the
open request if the mode is not covered by the delegation - the open_stateid
must already cover that mode, so there is nothing to do.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

39f897fd

27 Jun, 2015 2 commits

NFSv4.2: LAYOUTSTATS is optional to implement · 6c5a0d89

Trond Myklebust authored Jun 27, 2015

Make it so, by checking the return value for NFS4ERR_MOTSUPP and
caching the information as a server capability.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

6c5a0d89

NFSv4.2: Fix up a decoding error in layoutstats · da2e8127

Trond Myklebust authored Jun 27, 2015

According to the spec, the server is only returning the status,
which we decode in the op header.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

da2e8127

26 Jun, 2015 1 commit

pNFS/flexfiles: Fix the reset of struct pgio_header when resending · d6208769

Trond Myklebust authored Jun 26, 2015

hdr->good_bytes needs to be set to the length of the request, not
zero.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

d6208769