• Andrew Morton's avatar
    [PATCH] Reduce i_sem usage during file sync operations · fbdce7d7
    Andrew Morton authored
    We hold i_sem during the various sync() operations to prevent livelocks:
    if another thread is dirtying the file, a sync() may never return.
    
    Or at least, that used to be true when we were using the per-address_space
    page lists.  Since writeback has used radix tree traversal it is not possible
    to livelock the sync() operations, because they only visit each page a single
    time.
    
    sync_page_range() (used by O_SYNC writes) has not been holding i_sem for quite
    some time, for the above reasons.
    
    The patch converts fsync(), fdatasync() and msync() to also not hold i_sem
    during the radix-tree-based writeback.
    
    Now, we _do_ still need to hold i_sem across the file->f_op->fsync() call,
    because that is still based on a list_head walk, and is still livelockable.
    
    But in the case of msync() I deliberately left i_sem untaken.  This is because
    we're currently deadlockable in msync, because mmap_sem is already held, and
    mmap_sem nexts inside i_sem, due to direct-io.c.
    
    And yes, the ranking of down_read() veruss down() does matter:
    
    	Task A			Task B		Task C
    
    	down_read(rwsem)
    				down(sem)
    						down_write(rwsem)
    	down(sem)
    				down_read(rwsem)
    
    
    C's down_write() will cause B's down_read to block.  B holds `sem', so A will
    never release `rwsem'.
    
    So the patch fixes a hard-to-hit triple-task deadlock, but adds a possible
    livelock in msync().  It is possible to fix sys_msync() so that it takes i_sem
    outside i_mmap_sem.  Later.
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    fbdce7d7
buffer.c 79.7 KB