1. 18 Aug, 2023 5 commits
    • Heinz Mauelshagen's avatar
      md raid1: allow writebehind to work on any leg device set WriteMostly · 6b2460e6
      Heinz Mauelshagen authored
      As the WriteMostly flag can be set on any component device of a RAID1
      array, remove the constraint that it only works if set on the first one.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Tested-by: default avatarXiao Ni <xni@redhat.com>
      Link: https://lore.kernel.org/r/2a9592bf3340f34bf588eec984b23ee219f3985e.1692013451.git.heinzm@redhat.comSigned-off-by: default avatarSong Liu <song@kernel.org>
      6b2460e6
    • Xueshi Hu's avatar
      md/raid1: hold the barrier until handle_read_error() finishes · c069da44
      Xueshi Hu authored
      handle_read_error() will call allow_barrier() to match the former barrier
      raising. However, it should put the allow_barrier() at the end to avoid a
      concurrent raid reshape.
      
      Fixes: 689389a0 ("md/raid1: simplify handle_read_error().")
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarXueshi Hu <xueshi.hu@smartx.com>
      Link: https://lore.kernel.org/r/20230814135356.1113639-4-xueshi.hu@smartx.comSigned-off-by: default avatarSong Liu <song@kernel.org>
      c069da44
    • Xueshi Hu's avatar
      md/raid1: free the r1bio before waiting for blocked rdev · 992db13a
      Xueshi Hu authored
      Raid1 reshape will change mempool and r1conf::raid_disks which are
      needed to free r1bio. allow_barrier() make a concurrent raid1_reshape()
      possible. So, free the in-flight r1bio before waiting blocked rdev.
      
      Fixes: 6bfe0b49 ("md: support blocking writes to an array on device failure")
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarXueshi Hu <xueshi.hu@smartx.com>
      Link: https://lore.kernel.org/r/20230814135356.1113639-3-xueshi.hu@smartx.comSigned-off-by: default avatarSong Liu <song@kernel.org>
      992db13a
    • Xueshi Hu's avatar
      md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() · c5d736f5
      Xueshi Hu authored
      After allow_barrier, a concurrent raid1_reshape() will replace old mempool
      and r1conf::raid_disks. Move allow_barrier() to the end of raid_end_bio_io(),
      so that r1bio can be freed safely.
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarXueshi Hu <xueshi.hu@smartx.com>
      Link: https://lore.kernel.org/r/20230814135356.1113639-2-xueshi.hu@smartx.comSigned-off-by: default avatarSong Liu <song@kernel.org>
      c5d736f5
    • Tejun Heo's avatar
      blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init · ec14a87e
      Tejun Heo authored
      blk-iocost sometimes causes the following crash:
      
        BUG: kernel NULL pointer dereference, address: 00000000000000e0
        ...
        RIP: 0010:_raw_spin_lock+0x17/0x30
        Code: be 01 02 00 00 e8 79 38 39 ff 31 d2 89 d0 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 65 ff 05 48 d0 34 7e b9 01 00 00 00 31 c0 <f0> 0f b1 0f 75 02 5d c3 89 c6 e8 ea 04 00 00 5d c3 0f 1f 84 00 00
        RSP: 0018:ffffc900023b3d40 EFLAGS: 00010046
        RAX: 0000000000000000 RBX: 00000000000000e0 RCX: 0000000000000001
        RDX: ffffc900023b3d20 RSI: ffffc900023b3cf0 RDI: 00000000000000e0
        RBP: ffffc900023b3d40 R08: ffffc900023b3c10 R09: 0000000000000003
        R10: 0000000000000064 R11: 000000000000000a R12: ffff888102337000
        R13: fffffffffffffff2 R14: ffff88810af408c8 R15: ffff8881070c3600
        FS:  00007faaaf364fc0(0000) GS:ffff88842fdc0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000000000e0 CR3: 00000001097b1000 CR4: 0000000000350ea0
        Call Trace:
         <TASK>
         ioc_weight_write+0x13d/0x410
         cgroup_file_write+0x7a/0x130
         kernfs_fop_write_iter+0xf5/0x170
         vfs_write+0x298/0x370
         ksys_write+0x5f/0xb0
         __x64_sys_write+0x1b/0x20
         do_syscall_64+0x3d/0x80
         entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      This happens because iocg->ioc is NULL. The field is initialized by
      ioc_pd_init() and never cleared. The NULL deref is caused by
      blkcg_activate_policy() installing blkg_policy_data before initializing it.
      
      blkcg_activate_policy() was doing the following:
      
      1. Allocate pd's for all existing blkg's and install them in blkg->pd[].
      2. Initialize all pd's.
      3. Online all pd's.
      
      blkcg_activate_policy() only grabs the queue_lock and may release and
      re-acquire the lock as allocation may need to sleep. ioc_weight_write()
      grabs blkcg->lock and iterates all its blkg's. The two can race and if
      ioc_weight_write() runs during #1 or between #1 and #2, it can encounter a
      pd which is not initialized yet, leading to crash.
      
      The crash can be reproduced with the following script:
      
        #!/bin/bash
      
        echo +io > /sys/fs/cgroup/cgroup.subtree_control
        systemd-run --unit touch-sda --scope dd if=/dev/sda of=/dev/null bs=1M count=1 iflag=direct
        echo 100 > /sys/fs/cgroup/system.slice/io.weight
        bash -c "echo '8:0 enable=1' > /sys/fs/cgroup/io.cost.qos" &
        sleep .2
        echo 100 > /sys/fs/cgroup/system.slice/io.weight
      
      with the following patch applied:
      
      > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
      > index fc49be622e05..38d671d5e10c 100644
      > --- a/block/blk-cgroup.c
      > +++ b/block/blk-cgroup.c
      > @@ -1553,6 +1553,12 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
      > 		pd->online = false;
      > 	}
      >
      > +       if (system_state == SYSTEM_RUNNING) {
      > +               spin_unlock_irq(&q->queue_lock);
      > +               ssleep(1);
      > +               spin_lock_irq(&q->queue_lock);
      > +       }
      > +
      > 	/* all allocated, init in the same order */
      > 	if (pol->pd_init_fn)
      > 		list_for_each_entry_reverse(blkg, &q->blkg_list, q_node)
      
      I don't see a reason why all pd's should be allocated, initialized and
      onlined together. The only ordering requirement is that parent blkgs to be
      initialized and onlined before children, which is guaranteed from the
      walking order. Let's fix the bug by allocating, initializing and onlining pd
      for each blkg and holding blkcg->lock over initialization and onlining. This
      ensures that an installed blkg is always fully initialized and onlined
      removing the the race window.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarBreno Leitao <leitao@debian.org>
      Fixes: 9d179b86 ("blkcg: Fix multiple bugs in blkcg_activate_policy()")
      Link: https://lore.kernel.org/r/ZN0p5_W-Q9mAHBVY@slm.duckdns.orgSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ec14a87e
  2. 16 Aug, 2023 1 commit
  3. 15 Aug, 2023 13 commits
  4. 14 Aug, 2023 3 commits
  5. 11 Aug, 2023 1 commit
  6. 10 Aug, 2023 4 commits
  7. 09 Aug, 2023 4 commits
  8. 08 Aug, 2023 7 commits
  9. 04 Aug, 2023 1 commit
    • Li Zetao's avatar
      fs/Kconfig: Fix compile error for romfs · a24c8b51
      Li Zetao authored
      There are some compile errors reported by kernel test robot:
      
      arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_read':
      storage.c:(.text+0x64): undefined reference to `__brelse'
      arm-linux-gnueabi-ld: storage.c:(.text+0x9c): undefined reference to `__bread_gfp'
      arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_strnlen':
      storage.c:(.text+0x128): undefined reference to `__brelse'
      arm-linux-gnueabi-ld: storage.c:(.text+0x16c): undefined reference to `__bread_gfp'
      arm-linux-gnueabi-ld: fs/romfs/storage.o: in function `romfs_dev_strcmp':
      storage.c:(.text+0x22c): undefined reference to `__bread_gfp'
      arm-linux-gnueabi-ld: storage.c:(.text+0x27c): undefined reference to `__brelse'
      arm-linux-gnueabi-ld: storage.c:(.text+0x2a8): undefined reference to `__bread_gfp'
      arm-linux-gnueabi-ld: storage.c:(.text+0x2bc): undefined reference to `__brelse'
      arm-linux-gnueabi-ld: storage.c:(.text+0x2d4): undefined reference to `__brelse'
      arm-linux-gnueabi-ld: storage.c:(.text+0x2f4): undefined reference to `__brelse'
      arm-linux-gnueabi-ld: storage.c:(.text+0x304): undefined reference to `__brelse'
      
      The reason for the problem is that the commit
      "925c86a1" ("fs: add CONFIG_BUFFER_HEAD") has added a new config
      "CONFIG_BUFFER_HEAD" that controls building the buffer_head code, and
      romfs needs to use the buffer_head API, but no corresponding config has
      beed added. Select the config "CONFIG_BUFFER_HEAD" in romfs Kconfig to
      resolve the problem.
      
      Fixes: 925c86a1 ("fs: add CONFIG_BUFFER_HEAD")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202308031810.pQzGmR1v-lkp@intel.com/Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Tested-by: default avatarLi Zetao <lizetao1@huawei.com>
      Signed-off-by: default avatarLi Zetao <lizetao1@huawei.com>
      [axboe: fold in Christoph's incremental]
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a24c8b51
  10. 02 Aug, 2023 1 commit