Commits · 11f779421a39b86da8a523d97e5fd3477878d44f · Kirill Smelkov / linux

15 Feb, 2013 8 commits

nfsd: containerize NFSd filesystem · 11f77942

Stanislav Kinsbursky authored Feb 01, 2013

This patch makes NFSD file system superblock to be created per net.
This makes possible to get proper network namespace from superblock instead of
using hard-coded "init_net".

Note: NFSd fs super-block holds network namespace. This garantees, that
network namespace won't disappear from underneath of it.
This, obviously, means, that in case of kill of a container's "init" (which is not a mount
namespace, but network namespace creator) netowrk namespace won't be
destroyed.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

11f77942

nfsd: fix comments on nfsd_cache_lookup · 1ac83629

Jeff Layton authored Feb 14, 2013

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

1ac83629

SUNRPC: move cache_detail->cache_request callback call to cache_read() · d94af6de

Stanislav Kinsbursky authored Feb 04, 2013

The reason to move cache_request() callback call from
sunrpc_cache_pipe_upcall() to cache_read() is that this garantees, that cache
access will be done userspace process context (only userspace process have
proper root context).
This is required for NFSd support in container: svc_export_request() (which is
cache_request callback) calls d_path(), which, in turn, traverse dentry up to
current->fs->root. Kernel threads always have global root, while container
have be in "root jail" - i.e. have it's own nested root.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d94af6de

SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function · 21cd1254

Stanislav Kinsbursky authored Feb 04, 2013

Passing this pointer is redundant since it's stored on cache_detail structure,
which is also passed to sunrpc_cache_pipe_upcall () function.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

21cd1254

SUNRPC: rework cache upcall logic · 2d438338

Stanislav Kinsbursky authored Feb 04, 2013

For most of SUNRPC caches (except NFS DNS cache) cache_detail->cache_upcall is
redundant since all that it's implementations are doing is calling
sunrpc_cache_pipe_upcall() with proper function address argument.
Cache request function address is now stored on cache_detail structure and
thus all the code can be simplified.
Now, for those cache details, which doesn't have cache_upcall callback (the
only one, which still has is nfs_dns_resolve_template)
sunrpc_cache_pipe_upcall will be called instead.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2d438338

SUNRPC: introduce cache_detail->cache_request callback · 73fb847a

Stanislav Kinsbursky authored Feb 04, 2013

This callback will allow to simplify upcalls in further patches in this
series.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

73fb847a

NFS: simplify and clean cache library · 462b8f6b

Stanislav Kinsbursky authored Feb 04, 2013

This is a cleanup patch.
Such helpers like nfs_cache_init() and nfs_cache_destroy() are redundant,
because they are just a wrappers around sunrpc_init_cache_detail() and
sunrpc_destroy_cache_detail() respectively.
So let's remove them completely and move corresponding logic to
nfs_cache_register_net() and nfs_cache_unregister_net() respectively (since
they are called together anyway).
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

462b8f6b

NFS: use SUNRPC cache creation and destruction helper for DNS cache · 483479c2

Stanislav Kinsbursky authored Feb 04, 2013

This cache was the first containerized and doesn't use net-aware cache
creation and destruction helpers.
This is a cleanup patch which just makes code looks clearer and reduce amount
of lines of code.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

483479c2

11 Feb, 2013 1 commit
- nfsd4: free_stid can be static · e56a3162
  Fengguang Wu authored Feb 11, 2013
```
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
```
  e56a3162
08 Feb, 2013 2 commits

nfsd: keep a checksum of the first 256 bytes of request · 01a7decf

Jeff Layton authored Feb 04, 2013

Now that we're allowing more DRC entries, it becomes a lot easier to hit
problems with XID collisions. In order to mitigate those, calculate a
checksum of up to the first 256 bytes of each request coming in and store
that in the cache entry, along with the total length of the request.

This initially used crc32, but Chuck Lever and Jim Rees pointed out that
crc32 is probably more heavyweight than we really need for generating
these checksums, and recommended looking at using the same routines that
are used to generate checksums for IP packets.

On an x86_64 KVM guest measurements with ftrace showed ~800ns to use
csum_partial vs ~1750ns for crc32. The difference probably isn't
terribly significant, but for now we may as well use csum_partial.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Stones-thrown-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

01a7decf

sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer · 4c190e2f

Jeff Layton authored Feb 06, 2013

When GSSAPI integrity signatures are in use, or when we're using GSSAPI
privacy with the v2 token format, there is a trailing checksum on the
xdr_buf that is returned.

It's checked during the authentication stage, and afterward nothing
cares about it. Ordinarily, it's not a problem since the XDR code
generally ignores it, but it will be when we try to compute a checksum
over the buffer to help prevent XID collisions in the duplicate reply
cache.

Fix the code to trim off the checksums after verifying them. Note that
in unwrap_integ_data, we must avoid trying to reverify the checksum if
the request was deferred since it will no longer be present when it's
revisited.
Signed-off-by: Jeff Layton <jlayton@redhat.com>

4c190e2f

05 Feb, 2013 5 commits

sunrpc: fix comment in struct xdr_buf definition · de0b65ca

Jeff Layton authored Feb 04, 2013

...these pages aren't necessarily contiguous.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

de0b65ca

sunrpc: move address copy/cmp/convert routines and prototypes from clnt.h to addr.h · 5976687a

Jeff Layton authored Feb 04, 2013

These routines are used by server and client code, so having them in a
separate header would be best.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

5976687a

sunrpc: copy scope ID in __rpc_copy_addr6 · 155a345a

Jeff Layton authored Feb 04, 2013

When copying an address, we should also copy the scopeid in the event
that this is a link-local address and the scope matters.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

155a345a

nfsd4: simplify idr allocation · 3abdb607

J. Bruce Fields authored Feb 03, 2013

We don't really need to preallocate at all; just allocate and initialize
everything at once, but leave the sc_type field initially 0 to prevent
finding the stateid till it's fully initialized.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

3abdb607

nfsd: Fix memleak · 2d32b29a

majianpeng authored Jan 29, 2013

When free nfs-client, it must free the ->cl_stateids.

Cc: stable@kernel.org
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2d32b29a

04 Feb, 2013 16 commits

nfsd: register a shrinker for DRC cache entries · b4e7f2c9

Jeff Layton authored Feb 04, 2013

Since we dynamically allocate them now, allow the system to call us up
to release them if it gets low on memory. Since these entries aren't
replaceable, only free ones that are expired or that are over the cap.
The the seeks value is set to '1' however to indicate that freeing the
these entries is low-cost.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

b4e7f2c9

nfsd: add recurring workqueue job to clean the cache · aca8a23d

Jeff Layton authored Feb 04, 2013

It's not sufficient to only clean the cache when requests come in. What
if we have a flurry of activity and then the server goes idle? Add a
workqueue job that will clean the cache every RC_EXPIRE period.

Care is taken to only run this when we expect to have entries expiring.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

aca8a23d

nfsd: when updating an entry with RC_NOCACHE, just free it · 2c6b691c

Jeff Layton authored Feb 04, 2013

There's no need to keep entries around that we're declaring RC_NOCACHE.
Ditto if there's a problem with the entry.

With this change too, there's no need to test for RC_UNUSED in the
search function. If the entry's in the hash table then it's either
INPROG or DONE.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2c6b691c

nfsd: remove the cache_disabled flag · 13cc8a78

Jeff Layton authored Feb 04, 2013

With the change to dynamically allocate entries, the cache is never
disabled on the fly. Remove this flag.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

13cc8a78

nfsd: dynamically allocate DRC entries · 0338dd15

Jeff Layton authored Feb 04, 2013

The existing code keeps a fixed-size cache of 1024 entries. This is much
too small for a busy server, and wastes memory on an idle one.  This
patch changes the code to dynamically allocate and free these cache
entries.

A cap on the number of entries is retained, but it's much larger than
the existing value and now scales with the amount of low memory in the
machine.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0338dd15

nfsd: track the number of DRC entries in the cache · 0ee0bf7e

Jeff Layton authored Feb 04, 2013

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

0ee0bf7e

nfsd: always move DRC entries to the end of LRU list when updating timestamp · 56c2548b

Jeff Layton authored Feb 04, 2013

...otherwise, we end up with the list ordering wrong. Currently, it's
not a problem since we skip RC_INPROG entries, but keeping the ordering
strict will be necessary for a later patch that adds a cache cleaner.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

56c2548b

nfsd: initialize the exp->ex_uuid field in svc_export_init · 2eeb9b2a

Jeff Layton authored Feb 02, 2013

commit 885c91f7 in Bruce's tree was causing oopses for me:

general protection fault: 0000 [#1] SMP
Modules linked in: nfsd(OF) nfs_acl(OF) auth_rpcgss(OF) lockd(OF) sunrpc(OF) kvm_amd kvm microcode i2c_piix4 virtio_net virtio_balloon cirrus drm_kms_helper ttm drm virtio_blk i2c_core
CPU 0
Pid: 564, comm: exportfs Tainted: GF          O 3.8.0-0.rc5.git2.1.fc19.x86_64 #1 Bochs Bochs
RIP: 0010:[<ffffffff811b1509>]  [<ffffffff811b1509>] kfree+0x49/0x280
RSP: 0018:ffff88007a3d7c50  EFLAGS: 00010203
RAX: 01adaf8dadadad80 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000001
RDX: ffffffff7fffffff RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b
RBP: ffff88007a3d7c80 R08: 6b6b6b6b6b6b6b6b R09: 0000000000000000
R10: 0000000000000018 R11: 0000000000000000 R12: ffff88006a117b50
R13: ffffffffa01a589c R14: ffff8800631b0f50 R15: 01ad998dadadad80
FS:  00007fcaa3616740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f5d84b6fdd8 CR3: 0000000064db4000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process exportfs (pid: 564, threadinfo ffff88007a3d6000, task ffff88006af28000)
Stack:
 ffff88007a3d7c80 ffff88006a117b68 ffff88006a117b50 0000000000000000
 ffff8800631b0f50 ffff88006a117b50 ffff88007a3d7ca0 ffffffffa01a589c
 ffff880036be1148 ffff88007a3d7cf8 ffff88007a3d7e28 ffffffffa01a6a98
Call Trace:
 [<ffffffffa01a589c>] svc_export_put+0x5c/0x70 [nfsd]
 [<ffffffffa01a6a98>] svc_export_parse+0x328/0x7e0 [nfsd]
 [<ffffffffa016f1c7>] cache_do_downcall+0x57/0x70 [sunrpc]
 [<ffffffffa016f25e>] cache_downcall+0x7e/0x100 [sunrpc]
 [<ffffffffa016f338>] cache_write_procfs+0x58/0x90 [sunrpc]
 [<ffffffffa016f2e0>] ? cache_downcall+0x100/0x100 [sunrpc]
 [<ffffffff8123b0e5>] proc_reg_write+0x75/0xb0
 [<ffffffff811ccecf>] vfs_write+0x9f/0x170
 [<ffffffff811cd089>] sys_write+0x49/0xa0
 [<ffffffff816e0919>] system_call_fastpath+0x16/0x1b
Code: 66 66 66 90 48 83 fb 10 0f 86 c3 00 00 00 48 89 df 49 bf 00 00 00 00 00 ea ff ff e8 f2 12 ea ff 48 c1 e8 0c 48 c1 e0 06 49 01 c7 <49> 8b 07 f6 c4 80 0f 85 1d 02 00 00 49 8b 07 a8 80 0f 84 ee 01
RIP  [<ffffffff811b1509>] kfree+0x49/0x280
 RSP <ffff88007a3d7c50>

I think Majianpeng's patch is correct, but incomplete. In order for it
to be safe to free the ex_uuid unconditionally in svc_export_put, we
need to make sure it's initialized to NULL in the init routine.

Cc: majianpeng <majianpeng@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

2eeb9b2a

nfsd: break out hashtable search into separate function · a4a3ec32

Jeff Layton authored Jan 28, 2013

Later, we'll need more than one call site for this, so break it out
into a new function.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

a4a3ec32

nfsd: clean up and clarify the cache expiration code · d1a0774d

Jeff Layton authored Jan 28, 2013

Add a preprocessor constant for the expiry time of cache entries, and
move the test for an expired entry into a function. Note that the current
code does not test for RC_INPROG. It just assumes that it won't take more
than 2 minutes to fill out an in-progress entry.

I'm not sure how valid that assumption is though, so let's just ensure
that we never consider an RC_INPROG entry to be expired.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

d1a0774d

nfsd: remove redundant test from nfsd_reply_cache_free · 25e6b8b0

Jeff Layton authored Jan 28, 2013

Entries can only get a c_type of RC_REPLBUFF iff they are
RC_DONE. Therefore the test for RC_DONE isn't necessary here.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

25e6b8b0

nfsd: add alloc and free functions for DRC entries · f09841fd

Jeff Layton authored Jan 28, 2013

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

f09841fd

nfsd: create a dedicated slabcache for DRC entries · 8a8bc40d

Jeff Layton authored Jan 28, 2013

Currently we use kmalloc() which wastes a little bit of memory on each
allocation since it's a power of 2 allocator. Since we're allocating a
1024 of these now, and may need even more later, let's create a new
slabcache for them.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

8a8bc40d

nfsd: get rid of RC_INTR · 09662d58

Jeff Layton authored Jan 28, 2013

The reply cache code never returns this status.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

09662d58

nfsd: remove unneeded spinlock in nfsd_cache_update · 6dc88895

Jeff Layton authored Jan 28, 2013

The locking rules for cache entries say that locking the cache_lock
isn't needed if you're just touching the current entry. Earlier
in this function we set rp->c_state to RC_UNUSED without any locking,
so I believe it's ok to do the same here.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

6dc88895

nfsd: fix IPv6 address handling in the DRC · 7b9e8522

Jeff Layton authored Jan 28, 2013

Currently, it only stores the first 16 bytes of any address. struct
sockaddr_in6 is 28 bytes however, so we're currently ignoring the last
12 bytes of the address.

Expand the c_addr field to a sockaddr_in6, and cast it to a sockaddr_in
as necessary. Also fix the comparitor to use the existing RPC
helpers for this.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

7b9e8522

29 Jan, 2013 1 commit

nfsd: Fix memleak in svc_export_put · 885c91f7

majianpeng authored Jan 29, 2013

In func svc_export_parse, the uuid which used kmemdup to alloc will be
changed in func export_update.So the later kfree don't free this memory.
And it can't be free in func svc_export_parse because other place still
used.So put this operation in func svc_export_put.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

885c91f7

23 Jan, 2013 7 commits

nfsd4: require version 4 when enabling or disabling minorversion · ff89be87

J. Bruce Fields authored Jan 23, 2013

The current code will allow silly things like:

	echo "+2 +3 +4 +7.1">/proc/fs/nfsd/versions
Reported-by: Fan Chaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ff89be87

nfsd: fix unused "nn" variable warning in free_client() · bca0ec65

Stanislav Kinsbursky authored Jan 09, 2013

If CONFIG_LOCKDEP is disabled, then there would be a warning like this:

  CC [M]  fs/nfsd/nfs4state.o
fs/nfsd/nfs4state.c: In function ‘free_client’:
fs/nfsd/nfs4state.c:1051:19: warning: unused variable ‘nn’ [-Wunused-variable]

So, let's add "maybe_unused" tag to this variable.
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

bca0ec65

sunrpc: Fix lockd sleeping until timeout · 35525b79

Andriy Skulysh authored Jan 07, 2013

There is a race in enqueueing thread to a pool and
waking up a thread.
lockd doesn't wake up on reception of lock granted callback
if svc_wake_up() is called before lockd's thread is added
to a pool.
Signed-off-by: Andriy Skulysh <Andriy_Skulysh@xyratex.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

35525b79

svcrpc: silence "unused variable" warning in !RPC_DEBUG case · 624ab464
J. Bruce Fields authored Jan 08, 2013
```
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
```
624ab464

nfsd: Remove write permission from file content · ec168676

Yanchuan Nian authored Jan 04, 2013

The write function doesn't be implemented in file content, and it's meaningless
to write data into this file directly. Remove write permission from it.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

ec168676

nfsd: Don't unlock the state while it's not locked · 266533c6

Yanchuan Nian authored Dec 24, 2012

In the procedure of CREATE_SESSION, the state is locked after
alloc_conn_from_crses(). If the allocation fails, the function
goes to "out_free_session", and then "out" where there is an
unlock function.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

266533c6

nfsd: Pass correct slot number to nfsd4_put_drc_mem() · 74b70dde

Yanchuan Nian authored Dec 24, 2012

In alloc_session(), numslots is the correct slot number used by the session.
But the slot number passed to nfsd4_put_drc_mem() is the one from nfs client.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

74b70dde