Commits · ce502f81ba884c1fe45dc0ebddbcaaa4ec0fc5fb · Kirill Smelkov / linux

30 Jul, 2022 34 commits

NFSD: Convert the filecache to use rhashtable · ce502f81

Chuck Lever authored Jul 08, 2022

Enable the filecache hash table to start small, then grow with the
workload. Smaller server deployments benefit because there should
be lower memory utilization. Larger server deployments should see
improved scaling with the number of open files.
Suggested-by: Jeff Layton <jlayton@kernel.org>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

ce502f81

NFSD: Set up an rhashtable for the filecache · fc22945e

Chuck Lever authored Jul 08, 2022

Add code to initialize and tear down an rhashtable. The rhashtable
is not used yet.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

fc22945e

NFSD: Replace the "init once" mechanism · c7b824c3

Chuck Lever authored Jul 08, 2022

In a moment, the nfsd_file_hashtbl global will be replaced with an
rhashtable. Replace the one or two spots that need to check if the
hash table is available. We can easily reuse the SHUTDOWN flag for
this purpose.

Document that this mechanism relies on callers to hold the
nfsd_mutex to prevent init, shutdown, and purging to run
concurrently.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c7b824c3

NFSD: Remove nfsd_file::nf_hashval · f0743c2b

Chuck Lever authored Jul 08, 2022

The value in this field can always be computed from nf_inode, thus
it is no longer used.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f0743c2b

NFSD: nfsd_file_hash_remove can compute hashval · cb7ec76e

Chuck Lever authored Jul 08, 2022

Remove an unnecessary use of nf_hashval.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

cb7ec76e

NFSD: Refactor __nfsd_file_close_inode() · a8455110

Chuck Lever authored Jul 08, 2022

The code that computes the hashval is the same in both callers.

To prevent them from going stale, reframe the documenting comments
to remove descriptions of the underlying hash table structure, which
is about to be replaced.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

a8455110

NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode · 87553263

Chuck Lever authored Jul 08, 2022

Remove an unnecessary usage of nf_hashval.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

87553263

NFSD: Remove lockdep assertion from unhash_and_release_locked() · f53cef15

Chuck Lever authored Jul 08, 2022

IIUC, holding the hash bucket lock is needed only in
nfsd_file_unhash, and there is already a lockdep assertion there.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f53cef15

NFSD: No longer record nf_hashval in the trace log · 54f7df70

Chuck Lever authored Jul 08, 2022

I'm about to replace nfsd_file_hashtbl with an rhashtable. The
individual hash values will no longer be visible or relevant, so
remove them from the tracepoints.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

54f7df70

NFSD: Never call nfsd_file_gc() in foreground paths · 6df19411

Chuck Lever authored Jul 08, 2022

The checks in nfsd_file_acquire() and nfsd_file_put() that directly
invoke filecache garbage collection are intended to keep cache
occupancy between a low- and high-watermark. The reason to limit the
capacity of the filecache is to keep filecache lookups reasonably
fast.

However, invoking garbage collection at those points has some
undesirable negative impacts. Files that are held open by NFSv4
clients often push the occupancy of the filecache over these
watermarks. At that point:

- Every call to nfsd_file_acquire() and nfsd_file_put() results in
  an LRU walk. This has the same effect on lookup latency as long
  chains in the hash table.
- Garbage collection will then run on every nfsd thread, causing a
  lot of unnecessary lock contention.
- Limiting cache capacity pushes out files used only by NFSv3
  clients, which are the type of files the filecache is supposed to
  help.

To address those negative impacts, remove the direct calls to the
garbage collector. Subsequent patches will address maintaining
lookup efficiency as cache capacity increases.
Suggested-by: Wang Yugui <wangyugui@e16-tech.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

6df19411

NFSD: Fix the filecache LRU shrinker · edead3a5

Chuck Lever authored Jul 08, 2022

Without LRU item rotation, the shrinker visits only a few items on
the end of the LRU list, and those would always be long-term OPEN
files for NFSv4 workloads. That makes the filecache shrinker
completely ineffective.

Adopt the same strategy as the inode LRU by using LRU_ROTATE.
Suggested-by: Dave Chinner <david@fromorbit.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

edead3a5

NFSD: Leave open files out of the filecache LRU · 4a0e73e6

Chuck Lever authored Jul 08, 2022

There have been reports of problems when running fstests generic/531
against Linux NFS servers with NFSv4. The NFS server that hosts the
test's SCRATCH_DEV suffers from CPU soft lock-ups during the test.
Analysis shows that:

fs/nfsd/filecache.c
 482                 ret = list_lru_walk(&nfsd_file_lru,
 483                                 nfsd_file_lru_cb,
 484                                 &head, LONG_MAX);

causes nfsd_file_gc() to walk the entire length of the filecache LRU
list every time it is called (which is quite frequently). The walk
holds a spinlock the entire time that prevents other nfsd threads
from accessing the filecache.

What's more, for NFSv4 workloads, none of the items that are visited
during this walk may be evicted, since they are all files that are
held OPEN by NFS clients.

Address this by ensuring that open files are not kept on the LRU
list.
Reported-by: Frank van der Linden <fllinden@amazon.com>
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=386Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

4a0e73e6

NFSD: Trace filecache LRU activity · c46203ac

Chuck Lever authored Jul 08, 2022

Observe the operation of garbage collection and the lifetime of
filecache items.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

c46203ac

NFSD: WARN when freeing an item still linked via nf_lru · 668ed92e

Chuck Lever authored Jul 08, 2022

Add a guardrail to prevent freeing memory that is still on a list.
This includes either a dispose list or the LRU list.

This is the sign of a bug, but this class of bugs can be detected
so that they don't endanger system stability, especially while
debugging.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

668ed92e

NFSD: Hook up the filecache stat file · 2e6c6e4c

Chuck Lever authored Jul 08, 2022

There has always been the capability of exporting filecache metrics
via /proc, but it was never hooked up. Let's surface these metrics
to enable better observability of the filecache.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

2e6c6e4c

NFSD: Zero counters when the filecache is re-initialized · 8b330f78

Chuck Lever authored Jul 08, 2022

If nfsd_file_cache_init() is called after a shutdown, be sure the
stat counters are reset.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

8b330f78

NFSD: Record number of flush calls · df2aff52

Chuck Lever authored Jul 08, 2022

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

df2aff52

NFSD: Report the number of items evicted by the LRU walk · 94660cc1
Chuck Lever authored Jul 08, 2022
```
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
```
94660cc1

NFSD: Refactor nfsd_file_lru_scan() · 39f1d1ff

Chuck Lever authored Jul 08, 2022

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

39f1d1ff

NFSD: Refactor nfsd_file_gc() · 3bc6d347

Chuck Lever authored Jul 08, 2022

Refactor nfsd_file_gc() to use the new list_lru helper.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

3bc6d347

NFSD: Add nfsd_file_lru_dispose_list() helper · 0bac5a26

Chuck Lever authored Jul 08, 2022

Refactor the invariant part of nfsd_file_lru_walk_list() into a
separate helper function.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

0bac5a26

NFSD: Report average age of filecache items · 904940e9

Chuck Lever authored Jul 08, 2022

This is a measure of how long items stay in the filecache, to help
assess how efficient the cache is.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

904940e9

NFSD: Report count of freed filecache items · d6329327

Chuck Lever authored Jul 08, 2022

Surface the count of freed nfsd_file items.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

d6329327

NFSD: Report count of calls to nfsd_file_acquire() · 29d4bdbb

Chuck Lever authored Jul 08, 2022

Count the number of successful acquisitions that did not create a
file (ie, acquisitions that do not result in a compulsory cache
miss). This count can be compared directly with the reported hit
count to compute a hit ratio.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

29d4bdbb

NFSD: Report filecache LRU size · 0fd244c1

Chuck Lever authored Jul 08, 2022

Surface the NFSD filecache's LRU list length to help field
troubleshooters monitor filecache issues.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

0fd244c1

NFSD: Demote a WARN to a pr_warn() · ca3f9acb

Chuck Lever authored Jul 08, 2022

The call trace doesn't add much value, but it sure is noisy.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

ca3f9acb

SUNRPC: Fix server-side fault injection documentation · 36f2ef2d

Chuck Lever authored Jul 01, 2022

Fixes: 37324e6b ("SUNRPC: Cache deferral injection")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

36f2ef2d

nfsd: remove redundant assignment to variable len · 842e00ac

Colin Ian King authored Jun 28, 2022

Variable len is being assigned a value zero and this is never
read, it is being re-assigned later. The assignment is redundant
and can be removed.

Cleans up clang scan-build warning:
fs/nfsd/nfsctl.c:636:2: warning: Value stored to 'len' is never read
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

842e00ac

NFSD: Fix space and spelling mistake · f532c9ff

Zhang Jiaming authored Jun 23, 2022

Add a blank space after ','.
Change 'succesful' to 'successful'.
Signed-off-by: Zhang Jiaming <jiaming@nfschina.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

f532c9ff

NFSD: Instrument fh_verify() · 05138288

Chuck Lever authored Jun 21, 2022

Capture file handles and how they map to local inodes. In particular,
NFSv4 PUTFH uses fh_verify() so we can now observe which file handles
are the target of OPEN, LOOKUP, RENAME, and so on.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

05138288

SUNRPC: Expand the svc_alloc_arg_err tracepoint · 28fffa6c

Chuck Lever authored Jun 21, 2022

Record not only the number of pages requested, but the number of
pages that were actually allocated, to get a measure of progress
(or lack thereof).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

28fffa6c

NLM: Defend against file_lock changes after vfs_test_lock() · 184cefbe

Benjamin Coddington authored Jun 13, 2022

Instead of trusting that struct file_lock returns completely unchanged
after vfs_test_lock() when there's no conflicting lock, stash away our
nlm_lockowner reference so we can properly release it for all cases.

This defends against another file_lock implementation overwriting fl_owner
when the return type is F_UNLCK.
Reported-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Tested-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

184cefbe

SUNRPC: Fix xdr_encode_bool() · c770f31d

Chuck Lever authored Jul 19, 2022

I discovered that xdr_encode_bool() was returning the same address
that was passed in the @p parameter. The documenting comment states
that the intent is to return the address of the next buffer
location, just like the other "xdr_encode_*" helpers.

The result was the encoded results of NFSv3 PATHCONF operations were
not formed correctly.

Fixes: ded04a58 ("NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>

c770f31d

nfsd: eliminate the NFSD_FILE_BREAK_* flags · 23ba98de

Jeff Layton authored Jul 29, 2022

We had a report from the spring Bake-a-thon of data corruption in some
nfstest_interop tests. Looking at the traces showed the NFS server
allowing a v3 WRITE to proceed while a read delegation was still
outstanding.

Currently, we only set NFSD_FILE_BREAK_* flags if
NFSD_MAY_NOT_BREAK_LEASE was set when we call nfsd_file_alloc.
NFSD_MAY_NOT_BREAK_LEASE was intended to be set when finding files for
COMMIT ops, where we need a writeable filehandle but don't need to
break read leases.

It doesn't make any sense to consult that flag when allocating a file
since the file may be used on subsequent calls where we do want to break
the lease (and the usage of it here seems to be reverse from what it
should be anyway).

Also, after calling nfsd_open_break_lease, we don't want to clear the
BREAK_* bits. A lease could end up being set on it later (more than
once) and we need to be able to break those leases as well.

This means that the NFSD_FILE_BREAK_* flags now just mirror
NFSD_MAY_{READ,WRITE} flags, so there's no need for them at all. Just
drop those flags and unconditionally call nfsd_open_break_lease every
time.
Reported-by: Olga Kornieskaia <kolga@netapp.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2107360
Fixes: 65294c1f (nfsd: add a new struct file caching facility to nfsd)
Cc: <stable@vger.kernel.org> # 5.4.x : bb283ca1 NFSD: Clean up the show_nf_flags() macro
Cc: <stable@vger.kernel.org> # 5.4.x
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

23ba98de

17 Jul, 2022 6 commits

Linux 5.19-rc7 · ff699273
Linus Torvalds authored Jul 17, 2022

ff699273

Merge tag 'drm-intel-fixes-2022-07-17' of git://anongit.freedesktop.org/drm/drm-intel · 55ea9bd6

Linus Torvalds authored Jul 17, 2022

Pull intel drm build fix from Rodrigo Vivi:
 "Our 'dim' flow has a problem with fixes of fixes getting missed. We
  need to take a look on that later.

  Meanwhile, please allow me to quickly propagate this fix for the
  32-bit build issue here upstream"

* tag 'drm-intel-fixes-2022-07-17' of git://anongit.freedesktop.org/drm/drm-intel:
  drm/i915/ttm: fix 32b build

55ea9bd6

Merge tag 'perf-tools-fixes-for-v5.19-2022-07-17' of... · f7f4da30

Linus Torvalds authored Jul 17, 2022

Merge tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix SIGSEGV when processing syscall args in perf.data files in 'perf
   trace'

 - Sync kvm, msr-index and cpufeatures headers with the kernel sources

 - Fix 'convert perf time to TSC' 'perf test':
     - No need to open events twice
     - Fix finding correct event on hybrid systems

* tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf trace: Fix SIGSEGV when processing syscall args
  perf tests: Fix Convert perf time to TSC test for hybrid
  perf tests: Stop Convert perf time to TSC test opening events twice
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools headers cpufeatures: Sync with the kernel sources
  tools headers UAPI: Sync linux/kvm.h with the kernel sources

f7f4da30

drm/i915/ttm: fix 32b build · ced7866d

Matthew Auld authored Jul 12, 2022

Since segment_pages is no longer a compile time constant, it looks the
DIV_ROUND_UP(node->size, segment_pages) breaks the 32b build. Simplest
is just to use the ULL variant, but really we should need not need more
than u32 for the page alignment (also we are limited by that due to the
sg->length type), so also make it all u32.
Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Fixes: aff1e0b0 ("drm/i915/ttm: fix sg_table construction")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@linux.intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220712174050.592550-1-matthew.auld@intel.com
(cherry picked from commit 9306b2b2)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

ced7866d

Merge tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2b18593e

Linus Torvalds authored Jul 17, 2022

Pull perf fix from Borislav Petkov:

 - A single data race fix on the perf event cleanup path to avoid
   endless loops due to insufficient locking

* tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix data race between perf_event_set_output() and perf_mmap_close()

2b18593e

Merge tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 59c80f05

Linus Torvalds authored Jul 17, 2022

Pull x86 fixes from Borislav Petkov:

 - Improve the check whether the kernel supports WP mappings so that it
   can accomodate a XenPV guest due to how the latter is setting up the
   PAT machinery

  - Now that the retbleed nightmare is public, here's the first round of
    fallout fixes:

      * Fix a build failure on 32-bit due to missing include

      * Remove an untraining point in espfix64 return path

      * other small cleanups

* tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/bugs: Remove apostrophe typo
  um: Add missing apply_returns()
  x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt
  x86/bugs: Mark retbleed_strings static
  x86/pat: Fix x86_has_pat_wp()
  x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit

59c80f05