• Xiao Ni's avatar
    MD: fix lock contention for flush bios · 5a409b4f
    Xiao Ni authored
    There is a lock contention when there are many processes which send flush bios
    to md device. eg. Create many lvs on one raid device and mkfs.xfs on each lv.
    
    Now it just can handle flush request sequentially. It needs to wait mddev->flush_bio
    to be NULL, otherwise get mddev->lock.
    
    This patch remove mddev->flush_bio and handle flush bio asynchronously.
    I did a test with command dbench -s 128 -t 300. This is the test result:
    
    =================Without the patch============================
     Operation                Count    AvgLat    MaxLat
     --------------------------------------------------
     Flush                    11165   167.595  5879.560
     Close                   107469     1.391  2231.094
     LockX                      384     0.003     0.019
     Rename                    5944     2.141  1856.001
     ReadX                   208121     0.003     0.074
     WriteX                   98259  1925.402 15204.895
     Unlink                   25198    13.264  3457.268
     UnlockX                    384     0.001     0.009
     FIND_FIRST               47111     0.012     0.076
     SET_FILE_INFORMATION     12966     0.007     0.065
     QUERY_FILE_INFORMATION   27921     0.004     0.085
     QUERY_PATH_INFORMATION  124650     0.005     5.766
     QUERY_FS_INFORMATION     22519     0.003     0.053
     NTCreateX               141086     4.291  2502.812
    
    Throughput 3.7181 MB/sec (sync open)  128 clients  128 procs  max_latency=15204.905 ms
    
    =================With the patch============================
     Operation                Count    AvgLat    MaxLat
     --------------------------------------------------
     Flush                     4500   174.134   406.398
     Close                    48195     0.060   467.062
     LockX                      256     0.003     0.029
     Rename                    2324     0.026     0.360
     ReadX                    78846     0.004     0.504
     WriteX                   66832   562.775  1467.037
     Unlink                    5516     3.665  1141.740
     UnlockX                    256     0.002     0.019
     FIND_FIRST               16428     0.015     0.313
     SET_FILE_INFORMATION      6400     0.009     0.520
     QUERY_FILE_INFORMATION   17865     0.003     0.089
     QUERY_PATH_INFORMATION   47060     0.078   416.299
     QUERY_FS_INFORMATION      7024     0.004     0.032
     NTCreateX                55921     0.854  1141.452
    
    Throughput 11.744 MB/sec (sync open)  128 clients  128 procs  max_latency=1467.041 ms
    
    The test is done on raid1 disk with two rotational disks
    
    V5: V4 is more complicated than the version with memory pool. So revert to the memory pool
    version
    
    V4: use address of fbio to do hash to choose free flush info.
    V3:
    Shaohua suggests mempool is overkill. In v3 it allocs memory during creating raid device
    and uses a simple bitmap to record which resource is free.
    
    Fix a bug from v2. It should set flush_pending to 1 at first.
    
    V2:
    Neil pointed out two problems. One is counting error problem and another is return value
    when allocat memory fails.
    1. counting error problem
    This isn't safe.  It is only safe to call rdev_dec_pending() on rdevs
    that you previously called
                              atomic_inc(&rdev->nr_pending);
    If an rdev was added to the list between the start and end of the flush,
    this will do something bad.
    
    Now it doesn't use bio_chain. It uses specified call back function for each
    flush bio.
    2. Returned on IO error when kmalloc fails is wrong.
    I use mempool suggested by Neil in V2
    3. Fixed some places pointed by Guoqing
    Suggested-by: default avatarMing Lei <ming.lei@redhat.com>
    Signed-off-by: default avatarXiao Ni <xni@redhat.com>
    Signed-off-by: default avatarShaohua Li <shli@fb.com>
    5a409b4f
md.c 246 KB