1. 15 Nov, 2021 1 commit
    • Chuck Lever's avatar
      NFSD: Fix exposure in nfsd4_decode_bitmap() · c0019b7d
      Chuck Lever authored
      rtm@csail.mit.edu reports:
      > nfsd4_decode_bitmap4() will write beyond bmval[bmlen-1] if the RPC
      > directs it to do so. This can cause nfsd4_decode_state_protect4_a()
      > to write client-supplied data beyond the end of
      > nfsd4_exchange_id.spo_must_allow[] when called by
      > nfsd4_decode_exchange_id().
      
      Rewrite the loops so nfsd4_decode_bitmap() cannot iterate beyond
      @bmlen.
      
      Reported by: rtm@csail.mit.edu
      Fixes: d1c263a0 ("NFSD: Replace READ* macros in nfsd4_decode_fattr()")
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      c0019b7d
  2. 01 Nov, 2021 2 commits
  3. 19 Oct, 2021 1 commit
  4. 15 Oct, 2021 1 commit
  5. 13 Oct, 2021 5 commits
  6. 12 Oct, 2021 2 commits
  7. 04 Oct, 2021 5 commits
  8. 02 Oct, 2021 6 commits
    • Chuck Lever's avatar
      NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment() · dae9a6ca
      Chuck Lever authored
      Refactor.
      
      Now that the NFSv2 and NFSv3 XDR decoders have been converted to
      use xdr_streams, the WRITE decoder functions can use
      xdr_stream_subsegment() to extract the WRITE payload into its own
      xdr_buf, just as the NFSv4 WRITE XDR decoder currently does.
      
      That makes it possible to pass the first kvec, pages array + length,
      page_base, and total payload length via a single function parameter.
      
      The payload's page_base is not yet assigned or used, but will be in
      subsequent patches.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      dae9a6ca
    • Chuck Lever's avatar
      SUNRPC: xdr_stream_subsegment() must handle non-zero page_bases · f49b68dd
      Chuck Lever authored
      xdr_stream_subsegment() was introduced in commit c1346a12
      ("NFSD: Replace the internals of the READ_BUF() macro").
      
      There are two call sites for xdr_stream_subsegment(). One is
      nfsd4_decode_write(), and the other is nfsd4_decode_setxattr().
      Currently neither of these call sites calls this API when
      xdr_buf::page_base is a non-zero value.
      
      However, I'm about to add a case where page_base will sometimes not
      be zero when nfsd4_decode_write() invokes this API. Replace the
      logic in xdr_stream_subsegment() that advances to the next data item
      in the xdr_stream with something more generic in order to handle
      this new use case.
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      f49b68dd
    • Colin Ian King's avatar
      NFSD: Initialize pointer ni with NULL and not plain integer 0 · 8e70bf27
      Colin Ian King authored
      Pointer ni is being initialized with plain integer zero. Fix
      this by initializing with NULL.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      8e70bf27
    • NeilBrown's avatar
      NFSD: simplify struct nfsfh · d8b26071
      NeilBrown authored
      Most of the fields in 'struct knfsd_fh' are 2 levels deep (a union and a
      struct) and are accessed using macros like:
      
       #define fh_FOO fh_base.fh_new.fb_FOO
      
      This patch makes the union and struct anonymous, so that "fh_FOO" can be
      a name directly within 'struct knfsd_fh' and the #defines aren't needed.
      
      The file handle as a whole is sometimes accessed as "fh_base" or
      "fh_base.fh_pad", neither of which are particularly helpful names.
      As the struct holding the filehandle is now anonymous, we
      cannot use the name of that, so we union it with 'fh_raw' and use that
      where the raw filehandle is needed.  fh_raw also ensure the structure is
      large enough for the largest possible filehandle.
      
      fh_raw is a 'char' array, removing any need to cast it for memcpy etc.
      
      SVCFH_fmt() is simplified using the "%ph" printk format.  This
      changes the appearance of filehandles in dprintk() debugging, making
      them a little more precise.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      d8b26071
    • NeilBrown's avatar
      NFSD: drop support for ancient filehandles · c645a883
      NeilBrown authored
      Filehandles not in the "new" or "version 1" format have not been handed
      out for new mounts since Linux 2.4 which was released 20 years ago.
      I think it is safe to say that no such file handles are still in use,
      and that we can drop support for them.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      c645a883
    • NeilBrown's avatar
      NFSD: move filehandle format declarations out of "uapi". · ef5825e3
      NeilBrown authored
      A small part of the declaration concerning filehandle format are
      currently in the "uapi" include directory:
         include/uapi/linux/nfsd/nfsfh.h
      
      There is a lot more to the filehandle format, including "enum fid_type"
      and "enum nfsd_fsid" which are not exported via "uapi".
      
      This small part of the filehandle definition is of minimal use outside
      of the kernel, and I can find no evidence that an other code is using
      it. Certainly nfs-utils and wireshark (The most likely candidates) do not
      use these declarations.
      
      So move it out of "uapi" by copying the content from
        include/uapi/linux/nfsd/nfsfh.h
      into
        fs/nfsd/nfsfh.h
      
      A few unnecessary "#include" directives are not copied, and neither is
      the #define of fh_auth, which is annotated as being for userspace only.
      
      The copyright claims in the uapi file are identical to those in the nfsd
      file, so there is no need to copy those.
      
      The "__u32" style integer types are only needed in "uapi".  In
      kernel-only code we can use the more familiar "u32" style.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      ef5825e3
  9. 23 Sep, 2021 1 commit
  10. 21 Sep, 2021 3 commits
    • Chuck Lever's avatar
      NFSD: Optimize DRC bucket pruning · 8847ecc9
      Chuck Lever authored
      DRC bucket pruning is done by nfsd_cache_lookup(), which is part of
      every NFSv2 and NFSv3 dispatch (ie, it's done while the client is
      waiting).
      
      I added a trace_printk() in prune_bucket() to see just how long
      it takes to prune. Here are two ends of the spectrum:
      
       prune_bucket: Scanned 1 and freed 0 in 90 ns, 62 entries remaining
       prune_bucket: Scanned 2 and freed 1 in 716 ns, 63 entries remaining
      ...
       prune_bucket: Scanned 75 and freed 74 in 34149 ns, 1 entries remaining
      
      Pruning latency is noticeable on fast transports with fast storage.
      By noticeable, I mean that the latency measured here in the worst
      case is the same order of magnitude as the round trip time for
      cached server operations.
      
      We could do something like moving expired entries to an expired list
      and then free them later instead of freeing them right in
      prune_bucket(). But simply limiting the number of entries that can
      be pruned by a lookup is simple and retains more entries in the
      cache, making the DRC somewhat more effective.
      
      Comparison with a 70/30 fio 8KB 12 thread direct I/O test:
      
      Before:
      
        write: IOPS=61.6k, BW=481MiB/s (505MB/s)(14.1GiB/30001msec); 0 zone resets
      
      WRITE:
      	1848726 ops (30%)
      	avg bytes sent per op: 8340 avg bytes received per op: 136
      	backlog wait: 0.635158 	RTT: 0.128525 	total execute time: 0.827242 (milliseconds)
      
      After:
      
        write: IOPS=63.0k, BW=492MiB/s (516MB/s)(14.4GiB/30001msec); 0 zone resets
      
      WRITE:
      	1891144 ops (30%)
      	avg bytes sent per op: 8340 avg bytes received per op: 136
      	backlog wait: 0.616114 	RTT: 0.126842 	total execute time: 0.805348 (milliseconds)
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      8847ecc9
    • J. Bruce Fields's avatar
      nfs: reexport documentation · dc451bbc
      J. Bruce Fields authored
      We've supported reexport for a while but documentation is limited.  This
      is mainly a simplified version of the text I wrote for the linux-nfs
      wiki at https://wiki.linux-nfs.org/wiki/index.php/NFS_re-export.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      dc451bbc
    • J. Bruce Fields's avatar
      nfsd: don't alloc under spinlock in rpc_parse_scope_id · 9b6e27d0
      J. Bruce Fields authored
      Dan Carpenter says:
      
        The patch d20c11d8: "nfsd: Protect session creation and client
        confirm using client_lock" from Jul 30, 2014, leads to the following
        Smatch static checker warning:
      
              net/sunrpc/addr.c:178 rpc_parse_scope_id()
              warn: sleeping in atomic context
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Fixes: d20c11d8 ("nfsd: Protect session creation and client...")
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      9b6e27d0
  11. 20 Sep, 2021 2 commits
    • Linus Torvalds's avatar
      Linux 5.15-rc2 · e4e737bb
      Linus Torvalds authored
      e4e737bb
    • Linus Torvalds's avatar
      pci_iounmap'2: Electric Boogaloo: try to make sense of it all · 316e8d79
      Linus Torvalds authored
      Nathan Chancellor reports that the recent change to pci_iounmap in
      commit 9caea000 ("parisc: Declare pci_iounmap() parisc version only
      when CONFIG_PCI enabled") causes build errors on arm64.
      
      It took me about two hours to convince myself that I think I know what
      the logic of that mess of #ifdef's in the <asm-generic/io.h> header file
      really aim to do, and rewrite it to be easier to follow.
      
      Famous last words.
      
      Anyway, the code has now been lifted from that grotty header file into
      lib/pci_iomap.c, and has fairly extensive comments about what the logic
      is.  It also avoids indirecting through another confusing (and badly
      named) helper function that has other preprocessor config conditionals.
      
      Let's see what odd architecture did something else strange in this area
      to break things.  But my arm64 cross build is clean.
      
      Fixes: 9caea000 ("parisc: Declare pci_iounmap() parisc version only when CONFIG_PCI enabled")
      Reported-by: default avatarNathan Chancellor <nathan@kernel.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Ulrich Teichert <krypton@ulrich-teichert.org>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      316e8d79
  12. 19 Sep, 2021 11 commits