• Maurizio Lombardi's avatar
    cdrom: do not call check_disk_change() inside cdrom_open() · 2bbea6e1
    Maurizio Lombardi authored
    when mounting an ISO filesystem sometimes (very rarely)
    the system hangs because of a race condition between two tasks.
    
    PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
     #0 [ffff880078447ae0] __schedule at ffffffff8168d605
     #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
     #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
     #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
     #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
     #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
     #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
     #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
     #8 [ffff880078447da8] mount_bdev at ffffffff81202570
     #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
    #10 [ffff880078447e28] mount_fs at ffffffff81202d09
    #11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
    #12 [ffff880078447ea8] do_mount at ffffffff81220fee
    #13 [ffff880078447f28] sys_mount at ffffffff812218d6
    #14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
        RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
        RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
        RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
        RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
        R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
        R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
        ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b
    
    This task was trying to mount the cdrom.  It allocated and configured a
    super_block struct and owned the write-lock for the super_block->s_umount
    rwsem. While exclusively owning the s_umount lock, it called
    sr_block_ioctl and waited to acquire the global sr_mutex lock.
    
    PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
     #0 [ffff880078417898] __schedule at ffffffff8168d605
     #1 [ffff880078417900] schedule at ffffffff8168dc59
     #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
     #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
     #4 [ffff8800784179d0] down_read at ffffffff8168cde0
     #5 [ffff8800784179e8] get_super at ffffffff81201cc7
     #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
     #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
     #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
     #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
    #10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
    #11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
    #12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
    #13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
    #14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
    #15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
    #16 [ffff880078417d00] do_last at ffffffff8120d53d
    #17 [ffff880078417db0] path_openat at ffffffff8120e6b2
    #18 [ffff880078417e48] do_filp_open at ffffffff8121082b
    #19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
    #20 [ffff880078417f70] sys_open at ffffffff811fde4e
    #21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
        RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
        RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
        RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
        RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
        R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
        R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
        ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b
    
    This task tried to open the cdrom device, the sr_block_open function
    acquired the global sr_mutex lock. The call to check_disk_change()
    then saw an event flag indicating a possible media change and tried
    to flush any cached data for the device.
    As part of the flush, it tried to acquire the super_block->s_umount
    lock associated with the cdrom device.
    This was the same super_block as created and locked by the previous task.
    
    The first task acquires the s_umount lock and then the sr_mutex_lock;
    the second task acquires the sr_mutex_lock and then the s_umount lock.
    
    This patch fixes the issue by moving check_disk_change() out of
    cdrom_open() and let the caller take care of it.
    Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
    Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    2bbea6e1
ide-cd.c 46.4 KB