• Theodore Ts'o's avatar
    ext4: fix potential deadlock in ext4_nonda_switch() · 00d4e736
    Theodore Ts'o authored
    In ext4_nonda_switch(), if the file system is getting full we used to
    call writeback_inodes_sb_if_idle().  The problem is that we can be
    holding i_mutex already, and this causes a potential deadlock when
    writeback_inodes_sb_if_idle() when it tries to take s_umount.  (See
    lockdep output below).
    
    As it turns out we don't need need to hold s_umount; the fact that we
    are in the middle of the write(2) system call will keep the superblock
    pinned.  Unfortunately writeback_inodes_sb() checks to make sure
    s_umount is taken, and the VFS uses a different mechanism for making
    sure the file system doesn't get unmounted out from under us.  The
    simplest way of dealing with this is to just simply grab s_umount
    using a trylock, and skip kicking the writeback flusher thread in the
    very unlikely case that we can't take a read lock on s_umount without
    blocking.
    
    Also, we now check the cirteria for kicking the writeback thread
    before we decide to whether to fall back to non-delayed writeback, so
    if there are any outstanding delayed allocation writes, we try to get
    them resolved as soon as possible.
    
       [ INFO: possible circular locking dependency detected ]
       3.6.0-rc1-00042-gce894ca #367 Not tainted
       -------------------------------------------------------
       dd/8298 is trying to acquire lock:
        (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46
    
       but task is already holding lock:
        (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3
    
       which lock already depends on the new lock.
    
       2 locks held by dd/8298:
        #0:  (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3
        #1:  (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3
    
       stack backtrace:
       Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367
       Call Trace:
        [<c015b79c>] ? console_unlock+0x345/0x372
        [<c06d62a1>] print_circular_bug+0x190/0x19d
        [<c019906c>] __lock_acquire+0x86d/0xb6c
        [<c01999db>] ? mark_held_locks+0x5c/0x7b
        [<c0199724>] lock_acquire+0x66/0xb9
        [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
        [<c06db935>] down_read+0x28/0x58
        [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
        [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46
        [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4
        [<c0271ece>] ext4_da_write_begin+0x27/0x193
        [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb
        [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205
        [<c01ddce7>] generic_file_aio_write+0x78/0xd3
        [<c026d336>] ext4_file_write+0x480/0x4a6
        [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c
        [<c0180944>] ? sched_clock_cpu+0x11a/0x13e
        [<c01967e9>] ? trace_hardirqs_off+0xb/0xd
        [<c018099f>] ? local_clock+0x37/0x4e
        [<c0209f2c>] do_sync_write+0x67/0x9d
        [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44
        [<c020a7b9>] vfs_write+0x7b/0xe6
        [<c020a9a6>] sys_write+0x3b/0x64
        [<c06dd4bd>] syscall_call+0x7/0xb
    Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
    Cc: stable@vger.kernel.org
    00d4e736
inode.c 141 KB