drivers/md/dm-bufio.c · 450e8dee51aa6fa1dd0f64073e88235f1a77b035 · Kirill Smelkov / linux

dm bufio: improve concurrent IO performance · 450e8dee

Joe Thornber authored Mar 10, 2023

When multiple threads perform IO to a thin device, the underlying
dm_bufio object can become a bottleneck; slowing down access to btree
nodes that store the thin metadata. Prior to this commit, each bufio
instance had a single mutex that was taken for every bufio operation.

This commit concentrates on improving the common case where: a user of
dm_bufio wishes to access, but not modify, a buffer which is already
within the dm_bufio cache.

Implementation::

The code has been refactored; pulling out an 'lru' abstraction and a
'buffer cache' abstraction (see 2 previous commits). This commit
updates higher level bufio code (that performs allocation of buffers,
IO and eviction/cache sizing) to leverage both abstractions. It also
deals with the delicate locking requirements of both abstractions to
provide finer grained locking. The result is significantly better
concurrent IO performance.

Before this commit, bufio has a global lru list it used to evict the
oldest, clean buffers from _all_ clients. With the new locking we
don’t want different ways to access the same buffer, so instead
do_global_cleanup() loops around the clients asking them to free
buffers older than a certain time.

This commit also converts many old BUG_ONs to WARN_ON_ONCE, see the
lru_evict and cache_evict code in particular. They will return
ER_DONT_EVICT if a given buffer somehow meets the invariants that
should _never_ happen. [Aside from revising this commit's header and
fixing coding style and whitespace nits: this switching to
WARN_ON_ONCE is Mike Snitzer's lone contribution to this commit]

Testing::

Some of the low level functions have been unit tested using dm-unit:
https://github.com/jthornber/dm-unit/blob/main/src/tests/bufio.rs

Higher level concurrency and IO is tested via a test only target
found here:
https://github.com/jthornber/linux/blob/2023-03-24-thin-concurrency-9/drivers/md/dm-bufio-test.c

The associated userland side of these tests is here:
https://github.com/jthornber/dmtest-python/blob/main/src/dmtest/bufio/bufio_tests.py

In addition the full dmtest suite of tests (dm-thin, dm-cache, etc)
has been run (~450 tests).

Performance::

Most bufio operations have unchanged performance. But if multiple
threads are attempting to get buffers concurrently, and these
buffers are already in the cache then there's a big speed up. Eg,
one test has 16 'hotspot' threads simulating btree lookups while
another thread dirties the whole device. In this case the hotspot
threads acquire the buffers about 25 times faster.
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>

450e8dee

dm-bufio.c 66.4 KB

Replace dm-bufio.c