• Joe Thornber's avatar
    dm bufio: improve concurrent IO performance · 450e8dee
    Joe Thornber authored
    When multiple threads perform IO to a thin device, the underlying
    dm_bufio object can become a bottleneck; slowing down access to btree
    nodes that store the thin metadata. Prior to this commit, each bufio
    instance had a single mutex that was taken for every bufio operation.
    
    This commit concentrates on improving the common case where: a user of
    dm_bufio wishes to access, but not modify, a buffer which is already
    within the dm_bufio cache.
    
    Implementation::
    
      The code has been refactored; pulling out an 'lru' abstraction and a
      'buffer cache' abstraction (see 2 previous commits). This commit
      updates higher level bufio code (that performs allocation of buffers,
      IO and eviction/cache sizing) to leverage both abstractions. It also
      deals with the delicate locking requirements of both abstractions to
      provide finer grained locking. The result is significantly better
      concurrent IO performance.
    
      Before this commit, bufio has a global lru list it used to evict the
      oldest, clean buffers from _all_ clients. With the new locking we
      don’t want different ways to access the same buffer, so instead
      do_global_cleanup() loops around the clients asking them to free
      buffers older than a certain time.
    
      This commit also converts many old BUG_ONs to WARN_ON_ONCE, see the
      lru_evict and cache_evict code in particular.  They will return
      ER_DONT_EVICT if a given buffer somehow meets the invariants that
      should _never_ happen. [Aside from revising this commit's header and
      fixing coding style and whitespace nits: this switching to
      WARN_ON_ONCE is Mike Snitzer's lone contribution to this commit]
    
    Testing::
    
      Some of the low level functions have been unit tested using dm-unit:
        https://github.com/jthornber/dm-unit/blob/main/src/tests/bufio.rs
    
      Higher level concurrency and IO is tested via a test only target
      found here:
        https://github.com/jthornber/linux/blob/2023-03-24-thin-concurrency-9/drivers/md/dm-bufio-test.c
    
      The associated userland side of these tests is here:
        https://github.com/jthornber/dmtest-python/blob/main/src/dmtest/bufio/bufio_tests.py
    
      In addition the full dmtest suite of tests (dm-thin, dm-cache, etc)
      has been run (~450 tests).
    
    Performance::
    
      Most bufio operations have unchanged performance. But if multiple
      threads are attempting to get buffers concurrently, and these
      buffers are already in the cache then there's a big speed up. Eg,
      one test has 16 'hotspot' threads simulating btree lookups while
      another thread dirties the whole device. In this case the hotspot
      threads acquire the buffers about 25 times faster.
    Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
    Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
    450e8dee
dm-bufio.c 66.4 KB