Commits · 55c760cfc40d75b4d8a17d56580ec306db2ab14f · Kirill Smelkov / linux

09 Apr, 2013 8 commits

nfsd4: fix forechannel attribute negotiation · 55c760cf

J. Bruce Fields authored Apr 08, 2013

Negotiation of the 4.1 session forechannel attributes is a mess.  Fix:

	- Move it all into check_forechannel_attrs instead of spreading
	  it between that, alloc_session, and init_forechannel_attrs.
	- set a minimum "slotsize" so that our drc memory limits apply
	  even for small maxresponsesize_cached.  This also fixes some
	  bugs when slotsize becomes <= 0.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

55c760cf

nfsd4: cleanup check_forechannel_attrs · 373cd409

J. Bruce Fields authored Apr 08, 2013

Pass this struct by reference, not by value, and return an error instead
of a boolean to allow for future additions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

373cd409

nfsd4: don't close read-write opens too soon · 0c7c3e67

J. Bruce Fields authored Mar 28, 2013

Don't actually close any opens until we don't need them at all.

This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0c7c3e67

nfsd4: release lockowners on last unlock in 4.1 case · eb2099f3

J. Bruce Fields authored Apr 07, 2013

In the 4.1 case we're supposed to release lockowners as soon as they're
no longer used.

It would probably be more efficient to reference count them, but that's
slightly fiddly due to the need to have callbacks from locks.c to take
into account lock merging and splitting.

For most cases just scanning the inode's lock list on unlock for
matching locks will be sufficient.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

eb2099f3

nfsd4: more sessions/open-owner-replay cleanup · bbc9c36c

J. Bruce Fields authored Mar 22, 2013

More logic that's unnecessary in the 4.1 case.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

bbc9c36c

nfsd4: no need for replay_owner in sessions case · 3d74e6a5

J. Bruce Fields authored Mar 22, 2013

The replay_owner will never be used in the sessions case.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

3d74e6a5

nfsd4: remove some redundant comments · c383747e
J. Bruce Fields authored Apr 07, 2013
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
c383747e

nfsd: use kmem_cache_free() instead of kfree() · 2c44a234

Wei Yongjun authored Apr 09, 2013

memory allocated by kmem_cache_alloc() should be freed using
kmem_cache_free(), not kfree().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2c44a234

08 Apr, 2013 1 commit

nfsd4: cleanup handling of nfsv4.0 closed stateid's · 9411b1d4

J. Bruce Fields authored Apr 01, 2013

Closed stateid's are kept around a little while to handle close replays
in the 4.0 case.  So we stash them in the last-used stateid in the
oo_last_closed_stateid field of the open owner.  We can free that in
encode_seqid_op_tail once the seqid on the open owner is next
incremented.  But we don't want to do that on the close itself; so we
set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
first time through encode_seqid_op_tail, then when we see that flag set
next time we free it.

This is unnecessarily baroque.

Instead, just move the logic that increments the seqid out of the xdr
code and into the operation code itself.

The justification given for the current placement is that we need to
wait till the last minute to be sure we know whether the status is a
sequence-id-mutating error or not, but examination of the code shows
that can't actually happen.
Reported-by: Yanchuan Nian <ycnian@gmail.com>
Tested-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

9411b1d4

04 Apr, 2013 4 commits

nfsd4: remove unused nfs4_check_deleg argument · 41d22663
J. Bruce Fields authored Mar 21, 2013
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
41d22663

nfsd4: make del_recall_lru per-network-namespace · e8c69d17

J. Bruce Fields authored Mar 21, 2013

If nothing else this simplifies the nfs4_state_shutdown_net logic a tad.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

e8c69d17

nfsd4: shut down more of delegation earlier · 68a33961

J. Bruce Fields authored Mar 21, 2013

Once we've unhashed the delegation, it's only hanging around for the
benefit of an oustanding recall, which only needs the encoded
filehandle, stateid, and dl_retries counter.  No point keeping the file
around any longer, or keeping it hashed.

This also fixes a race: calls to idr_remove should really be serialized
by the caller, but the nfs4_put_delegation call from the callback code
isn't taking the state lock.

(Better might be to cancel the callback before destroying the
delegation, and remove any need for reference counting--but I don't see
an easy way to cancel an rpc call.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

68a33961

nfsd4: minor cb_recall simplification · 8be2d234
J. Bruce Fields authored Mar 21, 2013
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
8be2d234

03 Apr, 2013 25 commits

SUNRPC/cache: add module_put() on error path in cache_open() · a7823c79

Alexey Khoroshilov authored Mar 23, 2013

If kmalloc() fails in cache_open(), module cd->owner left locked.
The patch adds module_put(cd->owner) on this path.

Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

a7823c79

nfsd: remove /proc/fs/nfs when create /proc/fs/nfs/exports error · ff7c4b36

fanchaoting authored Mar 27, 2013

when create /proc/fs/nfs/exports error, we should remove /proc/fs/nfs,
if don't do it, it maybe cause Memory leak.
Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Reviewed-by: chendt.fnst <chendt.fnst@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ff7c4b36

nfsd: don't run get_file if nfs4_preprocess_stateid_op return error · b022032e

fanchaoting authored Apr 01, 2013

we should return error status directly when nfs4_preprocess_stateid_op
return error.
Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

b022032e

nfsd: convert the file_hashtbl to a hlist · 89876f8c

Jeff Layton authored Apr 02, 2013

We only ever traverse the hash chains in the forward direction, so a
double pointer list head isn't really necessary.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

89876f8c

nfsd4: don't destroy in-use session · 66b2b9b2

J. Bruce Fields authored Mar 19, 2013

This changes session destruction to be similar to client destruction in
that attempts to destroy a session while in use (which should be rare
corner cases) result in DELAY.  This simplifies things somewhat and
helps meet a coming 4.2 requirement.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

66b2b9b2

nfsd4: don't destroy in-use clients · 221a6876

J. Bruce Fields authored Apr 01, 2013

When a setclientid_confirm or create_session confirms a client after a
client reboot, it also destroys any previous state held by that client.

The shutdown of that previous state must be careful not to free the
client out from under threads processing other requests that refer to
the client.

This is a particular problem in the NFSv4.1 case when we hold a
reference to a session (hence a client) throughout compound processing.

The server attempts to handle this by unhashing the client at the time
it's destroyed, then delaying the final free to the end.  But this still
leaves some races in the current code.

I believe it's simpler just to fail the attempt to destroy the client by
returning NFS4ERR_DELAY.  This is a case that should never happen
anyway.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

221a6876

nfsd4: simplify bind_conn_to_session locking · 4f6e6c17

J. Bruce Fields authored Mar 18, 2013

The locking here is very fiddly, and there's no reason for us to be
setting cstate->session, since this is the only op in the compound.
Let's just take the state lock and drop the reference counting.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

4f6e6c17

nfsd4: fix destroy_session race · abcdff09

J. Bruce Fields authored Mar 14, 2013

destroy_session uses the session and client without continuously holding
any reference or locks.

Put the whole thing under the state lock for now.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

abcdff09

nfsd4: clientid lookup cleanup · bfa85e83
J. Bruce Fields authored Mar 14, 2013
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
bfa85e83

nfsd4: destroy_clientid simplification · c0293b01

J. Bruce Fields authored Mar 14, 2013

I'm not sure what the check for clientid expiry was meant to do here.

The check for a matching session is redundant given the previous check
for state: a client without state is, in particular, a client without
sessions.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

c0293b01

nfsd4: remove some dprintk's · 1ca50792

J. Bruce Fields authored Mar 14, 2013

E.g. printk's that just report the return value from an op are
uninteresting as we already do that in the main proc_compound loop.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

1ca50792

nfsd4: STALE_STATEID cleanup · 0eb6f20a
J. Bruce Fields authored Mar 12, 2013
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
0eb6f20a

nfsd4: warn on odd create_session state · 78389046

J. Bruce Fields authored Mar 12, 2013

This should never happen.

(Note: the comparable case in setclientid_confirm *can* happen, since
updating a client record can result in both confirmed and unconfirmed
records with the same clientid.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

78389046

nfsd: fix bug on nfs4 stateid deallocation · 491402a7

ycnian@gmail.com authored Mar 11, 2013

NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4
stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(),
but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in
THIS close procedure may be freed immediately in the coming encoding function.
Sorry that Signed-off-by was forgotten in last version.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

491402a7

nfsd: remove unused macro in nfsv4 · 9c6bdbb8

Yanchuan Nian authored Mar 11, 2013

lk_rflags is never used anywhere, and rflags is not defined in struct
nfsd4_lock.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

9c6bdbb8

nfsd4: fix use-after-free of 4.1 client on connection loss · 2e4b7239

J. Bruce Fields authored Mar 08, 2013

Once we drop the lock here there's nothing keeping the client around:
the only lock still held is the xpt_lock on this socket, but this socket
no longer has any connection with the client so there's no way for other
code to know we're still using the client.

The solution is simple: all nfsd4_probe_callback does is set a few
variables and queue some work, so there's no reason we can't just keep
it under the lock.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2e4b7239

nfsd4: fix race on client shutdown · b0a9d3ab

J. Bruce Fields authored Mar 07, 2013

Dropping the session's reference count after the client's means we leave
a window where the session's se_client pointer is NULL.  An xpt_user
callback that encounters such a session may then crash:

[  303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318
[  303.959061] IP: [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061] PGD 37811067 PUD 3d498067 PMD 0
[  303.959061] Oops: 0002 [#8] PREEMPT SMP
[  303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
[  303.959061] CPU 0
[  303.959061] Pid: 264, comm: nfsd Tainted: G      D      3.8.0-ARCH+ #156 Bochs Bochs
[  303.959061] RIP: 0010:[<ffffffff81481a8e>]  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061] RSP: 0018:ffff880037877dd8  EFLAGS: 00010202
[  303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278
[  303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318
[  303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002
[  303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
[  303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00
[  303.959061] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[  303.959061] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0
[  303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0)
[  303.959061] Stack:
[  303.959061]  ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
[  303.959061]  ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
[  303.959061]  0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
[  303.959061] Call Trace:
[  303.959061]  [<ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
[  303.959061]  [<ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc]
[  303.959061]  [<ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc]
[  303.959061]  [<ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd]
[  303.959061]  [<ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd]
[  303.959061]  [<ffffffff8107ae00>] kthread+0xc0/0xd0
[  303.959061]  [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
[  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[  303.959061]  [<ffffffff814898ec>] ret_from_fork+0x7c/0xb0
[  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
[  303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
[  303.959061] RIP  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
[  303.959061]  RSP <ffff880037877dd8>
[  303.959061] CR2: 0000000000000318
[  304.001218] ---[ end trace 2d809cd4a7931f5a ]---
[  304.001903] note: nfsd[264] exited with preempt_count 2
Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

b0a9d3ab

nfsd4: handle seqid-mutating open errors from xdr decoding · 9d313b17

J. Bruce Fields authored Feb 28, 2013

If a client sets an owner (or group_owner or acl) attribute on open for
create, and the mapping of that owner to an id fails, then we return
BAD_OWNER.  But BAD_OWNER is a seqid-mutating error, so we can't
shortcut the open processing that case: we have to at least look up the
owner so we can find the seqid to bump.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

9d313b17

nfsd4: remove BUG_ON · b600de7a

J. Bruce Fields authored Feb 28, 2013

This BUG_ON just crashes the thread a little earlier than it would
otherwise--it doesn't seem useful.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

b600de7a

nfsd: scale up the number of DRC hash buckets with cache size · 0733c7ba

Jeff Layton authored Mar 27, 2013

We've now increased the size of the duplicate reply cache by quite a
bit, but the number of hash buckets has not changed. So, we've gone from
an average hash chain length of 16 in the old code to 4096 when the
cache is its largest. Change the code to scale out the number of buckets
with the max size of the cache.

At the same time, we also need to fix the hash function since the
existing one isn't really suitable when there are more than 256 buckets.
Move instead to use the stock hash_32 function for this. Testing on a
machine that had 2048 buckets showed that this gave a smaller
longest:average ratio than the existing hash function:

The formula here is longest hash bucket searched divided by average
number of entries per bucket at the time that we saw that longest
bucket:

    old hash: 68/(39258/2048) == 3.547404
    hash_32:  45/(33773/2048) == 2.728807
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0733c7ba

nfsd: keep stats on worst hash balancing seen so far · 98d821bd

Jeff Layton authored Mar 27, 2013

The typical case with the DRC is a cache miss, so if we keep track of
the max number of entries that we've ever walked over in a search, then
we should have a reasonable estimate of the longest hash chain that
we've ever seen.

With that, we'll also keep track of the total size of the cache when we
see the longest chain. In the case of a tie, we prefer to track the
smallest total cache size in order to properly gauge the worst-case
ratio of max vs. avg chain length.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

98d821bd

nfsd: add new reply_cache_stats file in nfsdfs · a2f999a3

Jeff Layton authored Mar 27, 2013

For presenting statistics relating to duplicate reply cache.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

a2f999a3

nfsd: track memory utilization by the DRC · 6c6910cd

Jeff Layton authored Mar 27, 2013

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

6c6910cd

nfsd: break out comparator into separate function · 9dc56143

Jeff Layton authored Mar 27, 2013

Break out the function that compares the rqstp and checksum against a
reply cache entry. While we're at it, track the efficacy of the checksum
over the NFS data by tracking the cases where we would have incorrectly
matched a DRC entry if we had not tracked it or the length.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

9dc56143

nfsd: eliminate one of the DRC cache searches · 0b9ea37f

Jeff Layton authored Mar 27, 2013

The most common case is to do a search of the cache, followed by an
insert. In the case where we have to allocate an entry off the slab,
then we end up having to redo the search, which is wasteful.

Better optimize the code for the common case by eliminating the initial
search of the cache and always preallocating an entry. In the case of a
cache hit, we'll end up just freeing that entry but that's preferable to
an extra search.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0b9ea37f

26 Mar, 2013 1 commit

nfsd4: reject "negative" acl lengths · 64a817cf

J. Bruce Fields authored Mar 26, 2013

Since we only enforce an upper bound, not a lower bound, a "negative"
length can get through here.

The symptom seen was a warning when we attempt to a kmalloc with an
excessive size.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

64a817cf

22 Mar, 2013 1 commit

nfsd: fix bad offset use · e49dbbf3

Kent Overstreet authored Mar 22, 2013

vfs_writev() updates the offset argument - but the code then passes the
offset to vfs_fsync_range(). Since offset now points to the offset after
what was just written, this is probably not what was intended

Introduced by face1502 "nfsd: use
vfs_fsync_range(), not O_SYNC, for stable writes".
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Reviewed-by: Zach Brown <zab@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

e49dbbf3