• Filipe Manana's avatar
    Btrfs: scrub, fix sleep in atomic context · f55985f4
    Filipe Manana authored
    My previous patch "Btrfs: fix scrub race leading to use-after-free"
    introduced the possibility to sleep in an atomic context, which happens
    when the scrub_lock mutex is held at the time scrub_pending_bio_dec()
    is called - this function can be called under an atomic context.
    Chris ran into this in a debug kernel which gave the following trace:
    
    [ 1928.950319] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:621
    [ 1928.967334] in_atomic(): 1, irqs_disabled(): 0, pid: 149670, name: fsstress
    [ 1928.981324] INFO: lockdep is turned off.
    [ 1928.989244] CPU: 24 PID: 149670 Comm: fsstress Tainted: G        W     3.19.0-rc7-mason+ #41
    [ 1929.006418] Hardware name: ZTSYSTEMS Echo Ridge T4  /A9DRPF-10D, BIOS 1.07 05/10/2012
    [ 1929.022207]  ffffffff81a22cf8 ffff881076e03b78 ffffffff816b8dd9 ffff881076e03b78
    [ 1929.037267]  ffff880d8e828710 ffff881076e03ba8 ffffffff810856c4 ffff881076e03bc8
    [ 1929.052315]  0000000000000000 000000000000026d ffffffff81a22cf8 ffff881076e03bd8
    [ 1929.067381] Call Trace:
    [ 1929.072344]  <IRQ>  [<ffffffff816b8dd9>] dump_stack+0x4f/0x6e
    [ 1929.083968]  [<ffffffff810856c4>] ___might_sleep+0x174/0x230
    [ 1929.095352]  [<ffffffff810857d2>] __might_sleep+0x52/0x90
    [ 1929.106223]  [<ffffffff816bb68f>] mutex_lock_nested+0x2f/0x3b0
    [ 1929.117951]  [<ffffffff810ab37d>] ? trace_hardirqs_on+0xd/0x10
    [ 1929.129708]  [<ffffffffa05dc838>] scrub_pending_bio_dec+0x38/0x70 [btrfs]
    [ 1929.143370]  [<ffffffffa05dd0e0>] scrub_parity_bio_endio+0x50/0x70 [btrfs]
    [ 1929.157191]  [<ffffffff812fa603>] bio_endio+0x53/0xa0
    [ 1929.167382]  [<ffffffffa05f96bc>] rbio_orig_end_io+0x7c/0xa0 [btrfs]
    [ 1929.180161]  [<ffffffffa05f97ba>] raid_write_parity_end_io+0x5a/0x80 [btrfs]
    [ 1929.194318]  [<ffffffff812fa603>] bio_endio+0x53/0xa0
    [ 1929.204496]  [<ffffffff8130401b>] blk_update_request+0x1eb/0x450
    [ 1929.216569]  [<ffffffff81096e58>] ? trigger_load_balance+0x78/0x500
    [ 1929.229176]  [<ffffffff8144c74d>] scsi_end_request+0x3d/0x1f0
    [ 1929.240740]  [<ffffffff8144ccac>] scsi_io_completion+0xac/0x5b0
    [ 1929.252654]  [<ffffffff81441c50>] scsi_finish_command+0xf0/0x150
    [ 1929.264725]  [<ffffffff8144d317>] scsi_softirq_done+0x147/0x170
    [ 1929.276635]  [<ffffffff8130ace6>] blk_done_softirq+0x86/0xa0
    [ 1929.288014]  [<ffffffff8105d92e>] __do_softirq+0xde/0x600
    [ 1929.298885]  [<ffffffff8105df6d>] irq_exit+0xbd/0xd0
    (...)
    
    Fix this by using a reference count on the scrub context structure
    instead of locking the scrub_lock mutex.
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    f55985f4
scrub.c 110 KB