• Gal Ofri's avatar
    md/raid5: avoid device_lock in read_one_chunk() · 97ae2725
    Gal Ofri authored
    There is a lock contention on device_lock in read_one_chunk().
    device_lock is taken to sync conf->active_aligned_reads and
    conf->quiesce.
    read_one_chunk() takes the lock, then waits for quiesce=0 (resumed)
    before incrementing active_aligned_reads.
    raid5_quiesce() takes the lock, sets quiesce=2 (in-progress), then waits
    for active_aligned_reads to be zero before setting quiesce=1
    (suspended).
    
    Introduce a fast (lockless) path in read_one_chunk(): activate aligned
    read without taking device_lock.  In case quiesce starts while
    activating the aligned-read in fast path, deactivate it and revert to
    old behavior (take device_lock and wait for quiesce to finish).
    
    Add smp store/load in raid5_quiesce()/read_one_chunk() respectively to
    gaurantee that read_one_chunk() does not miss an ongoing quiesce.
    
    My setups:
    1. 8 local nvme drives (each up to 250k iops).
    2. 8 ram disks (brd).
    
    Each setup with raid6 (6+2), 1024 io threads on a 96 cpu-cores (48 per
    socket) system. Record both iops and cpu spent on this contention with
    rand-read-4k. Record bw with sequential-read-128k.  Note: in most cases
    cpu is still busy but due to "new" bottlenecks.
    
    nvme:
                  | iops           | cpu  | bw
    -----------------------------------------------
    without patch | 1.6M           | ~50% | 5.5GB/s
    with patch    | 2M (throttled) | 0%   | 16GB/s (throttled)
    
    ram (brd):
                  | iops           | cpu  | bw
    -----------------------------------------------
    without patch | 2M             | ~80% | 24GB/s
    with patch    | 4M             | 0%   | 55GB/s
    
    CC: Song Liu <song@kernel.org>
    CC: Neil Brown <neilb@suse.de>
    Reviewed-by: default avatarNeilBrown <neilb@suse.de>
    Signed-off-by: default avatarGal Ofri <gal.ofri@storing.io>
    Signed-off-by: default avatarSong Liu <song@kernel.org>
    97ae2725
raid5.c 248 KB