staging: lustre: llite: Replace write mutex with range lock
Testing has shown the ll_inode_inode's lli_write_mutex to be a limiting factor with single shared file write performance, when using many writing threads on a single machine. Even if each thread is writing to a unique portion of the file, the lli_write_mutex will prevent no more than a single thread to ever write to the file simultaneously. This change attempts to remove this bottle neck, by replacing this mutex with a range lock. This should allow multiple threads to write to a single file simultaneously iff the threads are writing to unique regions of the file. Performance testing shows this change to garner a significant performance boost to write bandwidth. Using FIO on a single machine with 32 GB of RAM, write performance tests were run with and without this change applied; the results are below: +---------+-----------+---------+--------+--------+--------+ | fio v2.0.13 | Write Bandwidth (KB/s) | +---------+-----------+---------+--------+--------+--------+ | # Tasks | GB / Task | Test 1 | Test 2 | Test 3 | Test 4 | +---------+-----------+---------+--------+--------+--------+ | 1 | 64 | 452446 | 454623 | 457653 | 463737 | | 2 | 32 | 850318 | 565373 | 602498 | 733027 | | 4 | 16 | 1058900 | 463546 | 529107 | 976284 | | 8 | 8 | 1026300 | 468190 | 576451 | 963404 | | 16 | 4 | 1065500 | 503160 | 462902 | 830065 | | 32 | 2 | 1068600 | 462228 | 466963 | 749733 | | 64 | 1 | 991830 | 556618 | 557863 | 710912 | +---------+-----------+---------+--------+--------+--------+ * Test 1: Lustre client running 04ec54f. File per process write workload. This test was used as a baseline for what we _could_ achieve in the single shared file tests if the bottle necks were removed. * Test 2: Lustre client running 04ec54f. Single shared file workload, each task writing to a unique region. * Test 3: Lustre client running 04ec54f + I0023132b. Single shared file workload, each task writing to a unique region. * Test 4: Lustre client running 04ec54f + this patch. Single shared file workload, each task writing to a unique region. Direct IO does not use the page cache like normal IO, so concurrent direct IO reads of the same pages are not safe. As a result, direct IO reads must take the range lock in ll_file_io_generic, otherwise they will attempt to work on the same pages and hit assertions like: (osc_request.c:1219:osc_brw_prep_request()) ASSERTION( i == 0 || pg->off > pg_prev->off ) failed: i 3 p_c 10 pg ffffea00017a5208 [pri 0 ind 2771] off 16384 prev_pg ffffea00017a51d0 [pri 0 ind 2256] off 16384 Signed-off-by: Prakash Surya <surya1@llnl.gov> Signed-off-by: Patrick Farrell <paf@cray.com> Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-1669 Reviewed-on: http://review.whamcloud.com/6320 Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-6227 Reviewed-on: http://review.whamcloud.com/14385Reviewed-by: Bobi Jam <bobijam@hotmail.com> Reviewed-by: Alexander Boyko <alexander.boyko@seagate.com> Reviewed-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com> Reviewed-by: Andreas Dilger <andreas.dilger@intel.com> Reviewed-by: Oleg Drokin <oleg.drokin@intel.com> Signed-off-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Showing
Please register or sign in to comment