• Ronnie Sahlberg's avatar
    cifs: move cifsFileInfo_put logic into a work-queue · 32546a95
    Ronnie Sahlberg authored
    This patch moves the final part of the cifsFileInfo_put() logic where we
    need a write lock on lock_sem to be processed in a separate thread that
    holds no other locks.
    This is to prevent deadlocks like the one below:
    
    > there are 6 processes looping to while trying to down_write
    > cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from
    > cifs_new_fileinfo
    >
    > and there are 5 other processes which are blocked, several of them
    > waiting on either PG_writeback or PG_locked (which are both set), all
    > for the same page of the file
    >
    > 2 inode_lock() (inode->i_rwsem) for the file
    > 1 wait_on_page_writeback() for the page
    > 1 down_read(inode->i_rwsem) for the inode of the directory
    > 1 inode_lock()(inode->i_rwsem) for the inode of the directory
    > 1 __lock_page
    >
    >
    > so processes are blocked waiting on:
    >   page flags PG_locked and PG_writeback for one specific page
    >   inode->i_rwsem for the directory
    >   inode->i_rwsem for the file
    >   cifsInodeInflock_sem
    >
    >
    >
    > here are the more gory details (let me know if I need to provide
    > anything more/better):
    >
    > [0 00:48:22.765] [UN]  PID: 8863   TASK: ffff8c691547c5c0  CPU: 3
    > COMMAND: "reopen_file"
    >  #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095
    >  #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df
    >  #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7
    >  #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d
    >  #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d
    >  #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33
    >  #6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6
    >  #7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315
    > * (I think legitimize_path is bogus)
    >
    > in path_openat
    >         } else {
    >                 const char *s = path_init(nd, flags);
    >                 while (!(error = link_path_walk(s, nd)) &&
    >                         (error = do_last(nd, file, op)) > 0) {  <<<<
    >
    > do_last:
    >         if (open_flag & O_CREAT)
    >                 inode_lock(dir->d_inode);  <<<<
    >         else
    > so it's trying to take inode->i_rwsem for the directory
    >
    >      DENTRY           INODE           SUPERBLK     TYPE PATH
    > ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR  /mnt/vm1_smb/
    > inode.i_rwsem is ffff8c691158efc0
    >
    > <struct rw_semaphore 0xffff8c691158efc0>:
    >         owner: <struct task_struct 0xffff8c6914275d00> (UN -   8856 -
    > reopen_file), counter: 0x0000000000000003
    >         waitlist: 2
    >         0xffff9965007e3c90     8863   reopen_file      UN 0  1:29:22.926
    >   RWSEM_WAITING_FOR_WRITE
    >         0xffff996500393e00     9802   ls               UN 0  1:17:26.700
    >   RWSEM_WAITING_FOR_READ
    >
    >
    > the owner of the inode.i_rwsem of the directory is:
    >
    > [0 00:00:00.109] [UN]  PID: 8856   TASK: ffff8c6914275d00  CPU: 3
    > COMMAND: "reopen_file"
    >  #0 [ffff99650065b828] __schedule at ffffffff9b6e6095
    >  #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df
    >  #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89
    >  #3 [ffff99650065b940] msleep at ffffffff9af573a9
    >  #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
    >  #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs]
    >  #6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs]
    >  #7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd
    >  #8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs]
    >  #9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs]
    > #10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs]
    > #11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7
    > #12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33
    > #13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6
    > #14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315
    >
    > cifs_launder_page is for page 0xffffd1e2c07d2480
    >
    > crash> page.index,mapping,flags 0xffffd1e2c07d2480
    >       index = 0x8
    >       mapping = 0xffff8c68f3cd0db0
    >   flags = 0xfffffc0008095
    >
    >   PAGE-FLAG       BIT  VALUE
    >   PG_locked         0  0000001
    >   PG_uptodate       2  0000004
    >   PG_lru            4  0000010
    >   PG_waiters        7  0000080
    >   PG_writeback     15  0008000
    >
    >
    > inode is ffff8c68f3cd0c40
    > inode.i_rwsem is ffff8c68f3cd0ce0
    >      DENTRY           INODE           SUPERBLK     TYPE PATH
    > ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG
    > /mnt/vm1_smb/testfile.8853
    >
    >
    > this process holds the inode->i_rwsem for the parent directory, is
    > laundering a page attached to the inode of the file it's opening, and in
    > _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem
    > for the file itself.
    >
    >
    > <struct rw_semaphore 0xffff8c68f3cd0ce0>:
    >         owner: <struct task_struct 0xffff8c6914272e80> (UN -   8854 -
    > reopen_file), counter: 0x0000000000000003
    >         waitlist: 1
    >         0xffff9965005dfd80     8855   reopen_file      UN 0  1:29:22.912
    >   RWSEM_WAITING_FOR_WRITE
    >
    > this is the inode.i_rwsem for the file
    >
    > the owner:
    >
    > [0 00:48:22.739] [UN]  PID: 8854   TASK: ffff8c6914272e80  CPU: 2
    > COMMAND: "reopen_file"
    >  #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095
    >  #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df
    >  #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2
    >  #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f
    >  #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf
    >  #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c
    >  #6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs]
    >  #7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4
    >  #8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a
    >  #9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs]
    > #10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd
    > #11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35
    > #12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9
    > #13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315
    >
    > the process holds the inode->i_rwsem for the file to which it's writing,
    > and is trying to __lock_page for the same page as in the other processes
    >
    >
    > the other tasks:
    > [0 00:00:00.028] [UN]  PID: 8859   TASK: ffff8c6915479740  CPU: 2
    > COMMAND: "reopen_file"
    >  #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095
    >  #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df
    >  #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89
    >  #3 [ffff9965007b3af0] msleep at ffffffff9af573a9
    >  #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs]
    >  #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs]
    >  #6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a
    >  #7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f
    >  #8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33
    >  #9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6
    > #10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315
    >
    > this is opening the file, and is trying to down_write cinode->lock_sem
    >
    >
    > [0 00:00:00.041] [UN]  PID: 8860   TASK: ffff8c691547ae80  CPU: 2
    > COMMAND: "reopen_file"
    > [0 00:00:00.057] [UN]  PID: 8861   TASK: ffff8c6915478000  CPU: 3
    > COMMAND: "reopen_file"
    > [0 00:00:00.059] [UN]  PID: 8858   TASK: ffff8c6914271740  CPU: 2
    > COMMAND: "reopen_file"
    > [0 00:00:00.109] [UN]  PID: 8862   TASK: ffff8c691547dd00  CPU: 6
    > COMMAND: "reopen_file"
    >  #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095
    >  #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df
    >  #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89
    >  #3 [ffff9965007c3d90] msleep at ffffffff9af573a9
    >  #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs]
    >  #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs]
    >  #6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e
    >  #7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614
    >  #8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f
    >  #9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c
    >
    > closing the file, and trying to down_write cifsi->lock_sem
    >
    >
    > [0 00:48:22.839] [UN]  PID: 8857   TASK: ffff8c6914270000  CPU: 7
    > COMMAND: "reopen_file"
    >  #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095
    >  #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df
    >  #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2
    >  #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6
    >  #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028
    >  #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165
    >  #6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs]
    >  #7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1
    >  #8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e
    >  #9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315
    >
    > in __filemap_fdatawait_range
    >                         wait_on_page_writeback(page);
    > for the same page of the file
    >
    >
    >
    > [0 00:48:22.718] [UN]  PID: 8855   TASK: ffff8c69142745c0  CPU: 7
    > COMMAND: "reopen_file"
    >  #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095
    >  #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df
    >  #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7
    >  #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs]
    >  #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd
    >  #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35
    >  #6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9
    >  #7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315
    >
    >         inode_lock(inode);
    >
    >
    > and one 'ls' later on, to see whether the rest of the mount is available
    > (the test file is in the root, so we get blocked up on the directory
    > ->i_rwsem), so the entire mount is unavailable
    >
    > [0 00:36:26.473] [UN]  PID: 9802   TASK: ffff8c691436ae80  CPU: 4
    > COMMAND: "ls"
    >  #0 [ffff996500393d28] __schedule at ffffffff9b6e6095
    >  #1 [ffff996500393db8] schedule at ffffffff9b6e64df
    >  #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421
    >  #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2
    >  #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56
    >  #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c
    >  #6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6
    >  #7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315
    >
    > in iterate_dir:
    >         if (shared)
    >                 res = down_read_killable(&inode->i_rwsem);  <<<<
    >         else
    >                 res = down_write_killable(&inode->i_rwsem);
    >
    Reported-by: default avatarFrank Sorenson <sorenson@redhat.com>
    Reviewed-by: default avatarPavel Shilovsky <pshilov@microsoft.com>
    Signed-off-by: default avatarRonnie Sahlberg <lsahlber@redhat.com>
    Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
    32546a95
file.c 121 KB