• Chunguang Xu's avatar
    ext4: fix a possible ABBA deadlock due to busy PA · 8c80fb31
    Chunguang Xu authored
    We found on older kernel (3.10) that in the scenario of insufficient
    disk space, system may trigger an ABBA deadlock problem, it seems that
    this problem still exists in latest kernel, try to fix it here. The
    main process triggered by this problem is that task A occupies the PA
    and waits for the jbd2 transaction finish, the jbd2 transaction waits
    for the completion of task B's IO (plug_list), but task B waits for
    the release of PA by task A to finish discard, which indirectly forms
    an ABBA deadlock. The related calltrace is as follows:
    
        Task A
        vfs_write
        ext4_mb_new_blocks()
        ext4_mb_mark_diskspace_used()       JBD2
        jbd2_journal_get_write_access()  -> jbd2_journal_commit_transaction()
      ->schedule()                          filemap_fdatawait()
     |                                              |
     | Task B                                       |
     | do_unlinkat()                                |
     | ext4_evict_inode()                           |
     | jbd2_journal_begin_ordered_truncate()        |
     | filemap_fdatawrite_range()                   |
     | ext4_mb_new_blocks()                         |
      -ext4_mb_discard_group_preallocations() <-----
    
    Here, try to cancel ext4_mb_discard_group_preallocations() internal
    retry due to PA busy, and do a limited number of retries inside
    ext4_mb_discard_preallocations(), which can circumvent the above
    problems, but also has some advantages:
    
    1. Since the PA is in a busy state, if other groups have free PAs,
       keeping the current PA may help to reduce fragmentation.
    2. Continue to traverse forward instead of waiting for the current
       group PA to be released. In most scenarios, the PA discard time
       can be reduced.
    
    However, in the case of smaller free space, if only a few groups have
    space, then due to multiple traversals of the group, it may increase
    CPU overhead. But in contrast, I feel that the overall benefit is
    better than the cost.
    Signed-off-by: default avatarChunguang Xu <brookxu@tencent.com>
    Reported-by: default avatarkernel test robot <lkp@intel.com>
    Reviewed-by: default avatarJan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/1637630277-23496-1-git-send-email-brookxu.cn@gmail.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    Cc: stable@kernel.org
    8c80fb31
mballoc.c 183 KB