• Stanislaw Gruszka's avatar
    block: fix __blkdev_get and add_disk race condition · 9f53d2fe
    Stanislaw Gruszka authored
    The following situation might occur:
    
    __blkdev_get:			add_disk:
    
    				register_disk()
    get_gendisk()
    
    disk_block_events()
    	disk->ev == NULL
    
    				disk_add_events()
    
    __disk_unblock_events()
    	disk->ev != NULL
    	--ev->block
    
    Then we unblock events, when they are suppose to be blocked. This can
    trigger events related block/genhd.c warnings, but also can crash in
    sd_check_events() or other places.
    
    I'm able to reproduce crashes with the following scripts (with
    connected usb dongle as sdb disk).
    
    <snip>
    DEV=/dev/sdb
    ENABLE=/sys/bus/usb/devices/1-2/bConfigurationValue
    
    function stop_me()
    {
    	for i in `jobs -p` ; do kill $i 2> /dev/null ; done
    	exit
    }
    
    trap stop_me SIGHUP SIGINT SIGTERM
    
    for ((i = 0; i < 10; i++)) ; do
    	while true; do fdisk -l $DEV  2>&1 > /dev/null ; done &
    done
    
    while true ; do
    echo 1 > $ENABLE
    sleep 1
    echo 0 > $ENABLE
    done
    </snip>
    
    I use the script to verify patch fixing oops in sd_revalidate_disk
    http://marc.info/?l=linux-scsi&m=132935572512352&w=2
    Without Jun'ichi Nomura patch titled "Fix NULL pointer dereference in
    sd_revalidate_disk" or this one, script easily crash kernel within
    a few seconds. With both patches applied I do not observe crash.
    Unfortunately after some time (dozen of minutes), script will hung in:
    
    [ 1563.906432]  [<c08354f5>] schedule_timeout_uninterruptible+0x15/0x20
    [ 1563.906437]  [<c04532d5>] msleep+0x15/0x20
    [ 1563.906443]  [<c05d60b2>] blk_drain_queue+0x32/0xd0
    [ 1563.906447]  [<c05d6e00>] blk_cleanup_queue+0xd0/0x170
    [ 1563.906454]  [<c06d278f>] scsi_free_queue+0x3f/0x60
    [ 1563.906459]  [<c06d7e6e>] __scsi_remove_device+0x6e/0xb0
    [ 1563.906463]  [<c06d4aff>] scsi_forget_host+0x4f/0x60
    [ 1563.906468]  [<c06cd84a>] scsi_remove_host+0x5a/0xf0
    [ 1563.906482]  [<f7f030fb>] quiesce_and_remove_host+0x5b/0xa0 [usb_storage]
    [ 1563.906490]  [<f7f03203>] usb_stor_disconnect+0x13/0x20 [usb_storage]
    
    Anyway I think this patch is some step forward.
    
    As drawback, I do not teardown on sysfs file create error, because I do
    not know how to nullify disk->ev (since it can be used). However add_disk
    error handling practically does not exist too, and things will work
    without this sysfs file, except events will not be exported to user
    space.
    Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
    Acked-by: default avatarTejun Heo <tj@kernel.org>
    Cc: stable@kernel.org
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    9f53d2fe
genhd.c 42.9 KB