1. 30 May, 2014 5 commits
    • J. Bruce Fields's avatar
      nfsd4: reserve space before inlining 0-copy pages · 4e21ac4b
      J. Bruce Fields authored
      Once we've included page-cache pages in the encoding it's difficult to
      remove them and restart encoding.  (xdr_truncate_encode doesn't handle
      that case.)  So, make sure we'll have adequate space to finish the
      operation first.
      
      For now COMPOUND_SLACK_SPACE checks should prevent this case happening,
      but we want to remove those checks.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      4e21ac4b
    • J. Bruce Fields's avatar
      nfsd4: teach encoders to handle reserve_space failures · d0a381dd
      J. Bruce Fields authored
      We've tried to prevent running out of space with COMPOUND_SLACK_SPACE
      and special checking in those operations (getattr) whose result can vary
      enormously.
      
      However:
      	- COMPOUND_SLACK_SPACE may be difficult to maintain as we add
      	  more protocol.
      	- BUG_ON or page faulting on failure seems overly fragile.
      	- Especially in the 4.1 case, we prefer not to fail compounds
      	  just because the returned result came *close* to session
      	  limits.  (Though perfect enforcement here may be difficult.)
      	- I'd prefer encoding to be uniform for all encoders instead of
      	  having special exceptions for encoders containing, for
      	  example, attributes.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      d0a381dd
    • J. Bruce Fields's avatar
      nfsd4: "backfill" using write_bytes_to_xdr_buf · 082d4bd7
      J. Bruce Fields authored
      Normally xdr encoding proceeds in a single pass from start of a buffer
      to end, but sometimes we have to write a few bytes to an earlier
      position.
      
      Use write_bytes_to_xdr_buf for these cases rather than saving a pointer
      to write to.  We plan to rewrite xdr_reserve_space to handle encoding
      across page boundaries using a scratch buffer, and don't want to risk
      writing to a pointer that was contained in a scratch buffer.
      
      Also it will no longer be safe to calculate lengths by subtracting two
      pointers, so use xdr_buf offsets instead.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      082d4bd7
    • J. Bruce Fields's avatar
      nfsd4: use xdr_truncate_encode · 1fcea5b2
      J. Bruce Fields authored
      Now that lengths are reliable, we can use xdr_truncate instead of
      open-coding it everywhere.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      1fcea5b2
    • J. Bruce Fields's avatar
      rpc: xdr_truncate_encode · 3e19ce76
      J. Bruce Fields authored
      This will be used in the server side in a few cases:
      	- when certain operations (read, readdir, readlink) fail after
      	  encoding a partial response.
      	- when we run out of space after encoding a partial response.
      	- in readlink, where we initially reserve PAGE_SIZE bytes for
      	  data, then truncate to the actual size.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      3e19ce76
  2. 28 May, 2014 5 commits
  3. 27 May, 2014 2 commits
  4. 23 May, 2014 9 commits
  5. 22 May, 2014 7 commits
  6. 21 May, 2014 4 commits
    • J. Bruce Fields's avatar
      nfsd4: fix delegation cleanup on error · cbf7a75b
      J. Bruce Fields authored
      We're not cleaning up everything we need to on error.  In particular,
      we're not removing our lease.  Among other problems this can cause the
      struct nfs4_file used as fl_owner to be referenced after it has been
      destroyed.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      cbf7a75b
    • Kinglong Mee's avatar
      NFSD: Don't clear SUID/SGID after root writing data · 368fe39b
      Kinglong Mee authored
      We're clearing the SUID/SGID bits on write by hand in nfsd_vfs_write,
      even though the subsequent vfs_writev() call will end up doing this for
      us (through file system write methods eventually calling
      file_remove_suid(), e.g., from __generic_file_aio_write).
      
      So, remove the redundant nfsd code.
      
      The only change in behavior is when the write is by root, in which case
      we previously cleared SUID/SGID, but will now leave it alone.  The new
      behavior is the behavior of every filesystem we've checked.
      
      It seems better to be consistent with local filesystem behavior.  And
      the security advantage seems limited as root could always restore these
      bits by hand if it wanted.
      
      SUID/SGID is not cleared after writing data with (root, local ext4),
         File: ‘test’
         Size: 0               Blocks: 0          IO Block: 4096   regular
      empty file
      Device: 803h/2051d      Inode: 1200137     Links: 1
      Access: (4777/-rwsrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
      Context: unconfined_u:object_r:admin_home_t:s0
      Access: 2014-04-18 21:36:31.016029014 +0800
      Modify: 2014-04-18 21:36:31.016029014 +0800
      Change: 2014-04-18 21:36:31.026030285 +0800
        Birth: -
         File: ‘test’
         Size: 5               Blocks: 8          IO Block: 4096   regular file
      Device: 803h/2051d      Inode: 1200137     Links: 1
      Access: (4777/-rwsrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
      Context: unconfined_u:object_r:admin_home_t:s0
      Access: 2014-04-18 21:36:31.016029014 +0800
      Modify: 2014-04-18 21:36:31.040032065 +0800
      Change: 2014-04-18 21:36:31.040032065 +0800
        Birth: -
      
      With no_root_squash, (root, remote ext4), SUID/SGID are cleared,
         File: ‘test’
         Size: 0               Blocks: 0          IO Block: 262144 regular
      empty file
      Device: 24h/36d Inode: 786439      Links: 1
      Access: (4777/-rwsrwxrwx)  Uid: ( 1000/    test)   Gid: ( 1000/    test)
      Context: system_u:object_r:nfs_t:s0
      Access: 2014-04-18 21:45:32.155805097 +0800
      Modify: 2014-04-18 21:45:32.155805097 +0800
      Change: 2014-04-18 21:45:32.168806749 +0800
        Birth: -
         File: ‘test’
         Size: 5               Blocks: 8          IO Block: 262144 regular file
      Device: 24h/36d Inode: 786439      Links: 1
      Access: (0777/-rwxrwxrwx)  Uid: ( 1000/    test)   Gid: ( 1000/    test)
      Context: system_u:object_r:nfs_t:s0
      Access: 2014-04-18 21:45:32.155805097 +0800
      Modify: 2014-04-18 21:45:32.184808783 +0800
      Change: 2014-04-18 21:45:32.184808783 +0800
        Birth: -
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      368fe39b
    • J. Bruce Fields's avatar
      nfsd4: warn on finding lockowner without stateid's · 27b11428
      J. Bruce Fields authored
      The current code assumes a one-to-one lockowner<->lock stateid
      correspondance.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      27b11428
    • J. Bruce Fields's avatar
      nfsd4: remove lockowner when removing lock stateid · a1b8ff4c
      J. Bruce Fields authored
      The nfsv4 state code has always assumed a one-to-one correspondance
      between lock stateid's and lockowners even if it appears not to in some
      places.
      
      We may actually change that, but for now when FREE_STATEID releases a
      lock stateid it also needs to release the parent lockowner.
      
      Symptoms were a subsequent LOCK crashing in find_lockowner_str when it
      calls same_lockowner_ino on a lockowner that unexpectedly has an empty
      so_stateids list.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      a1b8ff4c
  7. 15 May, 2014 1 commit
  8. 08 May, 2014 7 commits
    • Kinglong Mee's avatar
      9fa1959e
    • Kinglong Mee's avatar
    • Kinglong Mee's avatar
      ecca063b
    • J. Bruce Fields's avatar
      Merge 3.15 bugfix for 3.16 · dd15073a
      J. Bruce Fields authored
      dd15073a
    • Christoph Hellwig's avatar
      nfsd: clean up fh_auth usage · 5409e46f
      Christoph Hellwig authored
      Use fh_fsid when reffering to the fsid part of the filehandle.  The
      variable length auth field envisioned in nfsfh wasn't ever implemented.
      Also clean up some lose ends around this and document the file handle
      format better.
      
      Btw, why do we even export nfsfh.h to userspace?  The file handle very
      much is kernel private, and nothing in nfs-utils include the header
      either.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      5409e46f
    • Kinglong Mee's avatar
      NFSD: cleanup unneeded including linux/export.h · ecc7455d
      Kinglong Mee authored
      commit 4ac7249e have remove all EXPORT_SYMBOL,
      linux/export.h is not needed, just clean it.
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      ecc7455d
    • Kinglong Mee's avatar
      NFSD: Call ->set_acl with a NULL ACL structure if no entries · aa07c713
      Kinglong Mee authored
      After setting ACL for directory, I got two problems that caused
      by the cached zero-length default posix acl.
      
      This patch make sure nfsd4_set_nfs4_acl calls ->set_acl
      with a NULL ACL structure if there are no entries.
      
      Thanks for Christoph Hellwig's advice.
      
      First problem:
      ............ hang ...........
      
      Second problem:
      [ 1610.167668] ------------[ cut here ]------------
      [ 1610.168320] kernel BUG at /root/nfs/linux/fs/nfsd/nfs4acl.c:239!
      [ 1610.168320] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 1610.168320] Modules linked in: nfsv4(OE) nfs(OE) nfsd(OE)
      rpcsec_gss_krb5 fscache ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack
      rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables
      ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
      ip6table_mangle ip6table_security ip6table_raw ip6table_filter
      ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
      nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
      auth_rpcgss nfs_acl snd_intel8x0 ppdev lockd snd_ac97_codec ac97_bus
      snd_pcm snd_timer e1000 pcspkr parport_pc snd parport serio_raw joydev
      i2c_piix4 sunrpc(OE) microcode soundcore i2c_core ata_generic pata_acpi
      [last unloaded: nfsd]
      [ 1610.168320] CPU: 0 PID: 27397 Comm: nfsd Tainted: G           OE
      3.15.0-rc1+ #15
      [ 1610.168320] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
      VirtualBox 12/01/2006
      [ 1610.168320] task: ffff88005ab653d0 ti: ffff88005a944000 task.ti:
      ffff88005a944000
      [ 1610.168320] RIP: 0010:[<ffffffffa034d5ed>]  [<ffffffffa034d5ed>]
      _posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd]
      [ 1610.168320] RSP: 0018:ffff88005a945b00  EFLAGS: 00010293
      [ 1610.168320] RAX: 0000000000000001 RBX: ffff88006700bac0 RCX:
      0000000000000000
      [ 1610.168320] RDX: 0000000000000000 RSI: ffff880067c83f00 RDI:
      ffff880068233300
      [ 1610.168320] RBP: ffff88005a945b48 R08: ffffffff81c64830 R09:
      0000000000000000
      [ 1610.168320] R10: ffff88004ea85be0 R11: 000000000000f475 R12:
      ffff880068233300
      [ 1610.168320] R13: 0000000000000003 R14: 0000000000000002 R15:
      ffff880068233300
      [ 1610.168320] FS:  0000000000000000(0000) GS:ffff880077800000(0000)
      knlGS:0000000000000000
      [ 1610.168320] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1610.168320] CR2: 00007f5bcbd3b0b9 CR3: 0000000001c0f000 CR4:
      00000000000006f0
      [ 1610.168320] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [ 1610.168320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
      0000000000000400
      [ 1610.168320] Stack:
      [ 1610.168320]  ffffffff00000000 0000000b67c83500 000000076700bac0
      0000000000000000
      [ 1610.168320]  ffff88006700bac0 ffff880068233300 ffff88005a945c08
      0000000000000002
      [ 1610.168320]  0000000000000000 ffff88005a945b88 ffffffffa034e2d5
      000000065a945b68
      [ 1610.168320] Call Trace:
      [ 1610.168320]  [<ffffffffa034e2d5>] nfsd4_get_nfs4_acl+0x95/0x150 [nfsd]
      [ 1610.168320]  [<ffffffffa03400d6>] nfsd4_encode_fattr+0x646/0x1e70 [nfsd]
      [ 1610.168320]  [<ffffffff816a6e6e>] ? kmemleak_alloc+0x4e/0xb0
      [ 1610.168320]  [<ffffffffa0327962>] ?
      nfsd_setuser_and_check_port+0x52/0x80 [nfsd]
      [ 1610.168320]  [<ffffffff812cd4bb>] ? selinux_cred_prepare+0x1b/0x30
      [ 1610.168320]  [<ffffffffa0341caa>] nfsd4_encode_getattr+0x5a/0x60 [nfsd]
      [ 1610.168320]  [<ffffffffa0341e07>] nfsd4_encode_operation+0x67/0x110
      [nfsd]
      [ 1610.168320]  [<ffffffffa033844d>] nfsd4_proc_compound+0x21d/0x810 [nfsd]
      [ 1610.168320]  [<ffffffffa0324d9b>] nfsd_dispatch+0xbb/0x200 [nfsd]
      [ 1610.168320]  [<ffffffffa00850cd>] svc_process_common+0x46d/0x6d0 [sunrpc]
      [ 1610.168320]  [<ffffffffa0085433>] svc_process+0x103/0x170 [sunrpc]
      [ 1610.168320]  [<ffffffffa032472f>] nfsd+0xbf/0x130 [nfsd]
      [ 1610.168320]  [<ffffffffa0324670>] ? nfsd_destroy+0x80/0x80 [nfsd]
      [ 1610.168320]  [<ffffffff810a5202>] kthread+0xd2/0xf0
      [ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
      [ 1610.168320]  [<ffffffff816c1ebc>] ret_from_fork+0x7c/0xb0
      [ 1610.168320]  [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40
      [ 1610.168320] Code: 78 02 e9 e7 fc ff ff 31 c0 31 d2 31 c9 66 89 45 ce
      41 8b 04 24 66 89 55 d0 66 89 4d d2 48 8d 04 80 49 8d 5c 84 04 e9 37 fd
      ff ff <0f> 0b 90 0f 1f 44 00 00 55 8b 56 08 c7 07 00 00 00 00 8b 46 0c
      [ 1610.168320] RIP  [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0
      [nfsd]
      [ 1610.168320]  RSP <ffff88005a945b00>
      [ 1610.257313] ---[ end trace 838254e3e352285b ]---
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      aa07c713