md/raid5: fix locking in handle_stripe_clean_event()
commit b8a9d66d upstream. After commit 566c09c5 ("raid5: relieve lock contention in get_active_stripe()") __find_stripe() is called under conf->hash_locks + hash. But handle_stripe_clean_event() calls remove_hash() under conf->device_lock. Under some cirscumstances the hash chain can be circuited, and we get an infinite loop with disabled interrupts and locked hash lock in __find_stripe(). This leads to hard lockup on multiple CPUs and following system crash. I was able to reproduce this behavior on raid6 over 6 ssd disks. The devices_handle_discard_safely option should be set to enable trim support. The following script was used: for i in `seq 1 32`; do dd if=/dev/zero of=large$i bs=10M count=100 & done Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Fixes: 566c09c5 ("raid5: relieve lock contention in get_active_stripe()") Signed-off-by: NeilBrown <neilb@suse.com> Cc: Shaohua Li <shli@kernel.org> [ luis: backported to 3.16: used Roman's backport to 3.14 ] Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Showing
Please register or sign in to comment