1. 16 Nov, 2022 13 commits
  2. 15 Nov, 2022 17 commits
  3. 14 Nov, 2022 10 commits
    • Jens Axboe's avatar
      Merge branch 'md-next' of... · 5626196a
      Jens Axboe authored
      Merge branch 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md into for-6.2/block
      
      Pull MD fixes from Song.
      
      * 'md-next' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md:
        md/raid1: stop mdx_raid1 thread when raid1 array run failed
        md/raid5: use bdev_write_cache instead of open coding it
        md: fix a crash in mempool_free
        md/raid0, raid10: Don't set discard sectors for request queue
        md/bitmap: Fix bitmap chunk size overflow issues
        md: introduce md_ro_state
        md: factor out __md_set_array_info()
        lib/raid6: drop RAID6_USE_EMPTY_ZERO_PAGE
        raid5-cache: use try_cmpxchg in r5l_wake_reclaim
        drivers/md/md-bitmap: check the return value of md_bitmap_get_counter()
      5626196a
    • Jiang Li's avatar
      md/raid1: stop mdx_raid1 thread when raid1 array run failed · b611ad14
      Jiang Li authored
      fail run raid1 array when we assemble array with the inactive disk only,
      but the mdx_raid1 thread were not stop, Even if the associated resources
      have been released. it will caused a NULL dereference when we do poweroff.
      
      This causes the following Oops:
          [  287.587787] BUG: kernel NULL pointer dereference, address: 0000000000000070
          [  287.594762] #PF: supervisor read access in kernel mode
          [  287.599912] #PF: error_code(0x0000) - not-present page
          [  287.605061] PGD 0 P4D 0
          [  287.607612] Oops: 0000 [#1] SMP NOPTI
          [  287.611287] CPU: 3 PID: 5265 Comm: md0_raid1 Tainted: G     U            5.10.146 #0
          [  287.619029] Hardware name: xxxxxxx/To be filled by O.E.M, BIOS 5.19 06/16/2022
          [  287.626775] RIP: 0010:md_check_recovery+0x57/0x500 [md_mod]
          [  287.632357] Code: fe 01 00 00 48 83 bb 10 03 00 00 00 74 08 48 89 ......
          [  287.651118] RSP: 0018:ffffc90000433d78 EFLAGS: 00010202
          [  287.656347] RAX: 0000000000000000 RBX: ffff888105986800 RCX: 0000000000000000
          [  287.663491] RDX: ffffc90000433bb0 RSI: 00000000ffffefff RDI: ffff888105986800
          [  287.670634] RBP: ffffc90000433da0 R08: 0000000000000000 R09: c0000000ffffefff
          [  287.677771] R10: 0000000000000001 R11: ffffc90000433ba8 R12: ffff888105986800
          [  287.684907] R13: 0000000000000000 R14: fffffffffffffe00 R15: ffff888100b6b500
          [  287.692052] FS:  0000000000000000(0000) GS:ffff888277f80000(0000) knlGS:0000000000000000
          [  287.700149] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [  287.705897] CR2: 0000000000000070 CR3: 000000000320a000 CR4: 0000000000350ee0
          [  287.713033] Call Trace:
          [  287.715498]  raid1d+0x6c/0xbbb [raid1]
          [  287.719256]  ? __schedule+0x1ff/0x760
          [  287.722930]  ? schedule+0x3b/0xb0
          [  287.726260]  ? schedule_timeout+0x1ed/0x290
          [  287.730456]  ? __switch_to+0x11f/0x400
          [  287.734219]  md_thread+0xe9/0x140 [md_mod]
          [  287.738328]  ? md_thread+0xe9/0x140 [md_mod]
          [  287.742601]  ? wait_woken+0x80/0x80
          [  287.746097]  ? md_register_thread+0xe0/0xe0 [md_mod]
          [  287.751064]  kthread+0x11a/0x140
          [  287.754300]  ? kthread_park+0x90/0x90
          [  287.757974]  ret_from_fork+0x1f/0x30
      
      In fact, when raid1 array run fail, we need to do
      md_unregister_thread() before raid1_free().
      Signed-off-by: default avatarJiang Li <jiang.li@ugreen.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      b611ad14
    • Christoph Hellwig's avatar
      md/raid5: use bdev_write_cache instead of open coding it · ad831a16
      Christoph Hellwig authored
      Use the bdev_write_cache instead of two equivalent open coded checks.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      ad831a16
    • Mikulas Patocka's avatar
      md: fix a crash in mempool_free · 341097ee
      Mikulas Patocka authored
      There's a crash in mempool_free when running the lvm test
      shell/lvchange-rebuild-raid.sh.
      
      The reason for the crash is this:
      * super_written calls atomic_dec_and_test(&mddev->pending_writes) and
        wake_up(&mddev->sb_wait). Then it calls rdev_dec_pending(rdev, mddev)
        and bio_put(bio).
      * so, the process that waited on sb_wait and that is woken up is racing
        with bio_put(bio).
      * if the process wins the race, it calls bioset_exit before bio_put(bio)
        is executed.
      * bio_put(bio) attempts to free a bio into a destroyed bio set - causing
        a crash in mempool_free.
      
      We fix this bug by moving bio_put before atomic_dec_and_test.
      
      We also move rdev_dec_pending before atomic_dec_and_test as suggested by
      Neil Brown.
      
      The function md_end_flush has a similar bug - we must call bio_put before
      we decrement the number of in-progress bios.
      
       BUG: kernel NULL pointer dereference, address: 0000000000000000
       #PF: supervisor write access in kernel mode
       #PF: error_code(0x0002) - not-present page
       PGD 11557f0067 P4D 11557f0067 PUD 0
       Oops: 0002 [#1] PREEMPT SMP
       CPU: 0 PID: 73 Comm: kworker/0:1 Not tainted 6.1.0-rc3 #5
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
       Workqueue: kdelayd flush_expired_bios [dm_delay]
       RIP: 0010:mempool_free+0x47/0x80
       Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00
       RSP: 0018:ffff88910036bda8 EFLAGS: 00010093
       RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001
       RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8
       RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900
       R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000
       R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05
       FS:  0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0
       Call Trace:
        <TASK>
        clone_endio+0xf4/0x1c0 [dm_mod]
        clone_endio+0xf4/0x1c0 [dm_mod]
        __submit_bio+0x76/0x120
        submit_bio_noacct_nocheck+0xb6/0x2a0
        flush_expired_bios+0x28/0x2f [dm_delay]
        process_one_work+0x1b4/0x300
        worker_thread+0x45/0x3e0
        ? rescuer_thread+0x380/0x380
        kthread+0xc2/0x100
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
        </TASK>
       Modules linked in: brd dm_delay dm_raid dm_mod af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg scsi_common [last unloaded: brd]
       CR2: 0000000000000000
       ---[ end trace 0000000000000000 ]---
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      341097ee
    • Xiao Ni's avatar
      md/raid0, raid10: Don't set discard sectors for request queue · 8e1a2279
      Xiao Ni authored
      It should use disk_stack_limits to get a proper max_discard_sectors
      rather than setting a value by stack drivers.
      
      And there is a bug. If all member disks are rotational devices,
      raid0/raid10 set max_discard_sectors. So the member devices are
      not ssd/nvme, but raid0/raid10 export the wrong value. It reports
      warning messages in function __blkdev_issue_discard when mkfs.xfs
      like this:
      
      [ 4616.022599] ------------[ cut here ]------------
      [ 4616.027779] WARNING: CPU: 4 PID: 99634 at block/blk-lib.c:50 __blkdev_issue_discard+0x16a/0x1a0
      [ 4616.140663] RIP: 0010:__blkdev_issue_discard+0x16a/0x1a0
      [ 4616.146601] Code: 24 4c 89 20 31 c0 e9 fe fe ff ff c1 e8 09 8d 48 ff 4c 89 f0 4c 09 e8 48 85 c1 0f 84 55 ff ff ff b8 ea ff ff ff e9 df fe ff ff <0f> 0b 48 8d 74 24 08 e8 ea d6 00 00 48 c7 c6 20 1e 89 ab 48 c7 c7
      [ 4616.167567] RSP: 0018:ffffaab88cbffca8 EFLAGS: 00010246
      [ 4616.173406] RAX: ffff9ba1f9e44678 RBX: 0000000000000000 RCX: ffff9ba1c9792080
      [ 4616.181376] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ba1c9792080
      [ 4616.189345] RBP: 0000000000000cc0 R08: ffffaab88cbffd10 R09: 0000000000000000
      [ 4616.197317] R10: 0000000000000012 R11: 0000000000000000 R12: 0000000000000000
      [ 4616.205288] R13: 0000000000400000 R14: 0000000000000cc0 R15: ffff9ba1c9792080
      [ 4616.213259] FS:  00007f9a5534e980(0000) GS:ffff9ba1b7c80000(0000) knlGS:0000000000000000
      [ 4616.222298] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4616.228719] CR2: 000055a390a4c518 CR3: 0000000123e40006 CR4: 00000000001706e0
      [ 4616.236689] Call Trace:
      [ 4616.239428]  blkdev_issue_discard+0x52/0xb0
      [ 4616.244108]  blkdev_common_ioctl+0x43c/0xa00
      [ 4616.248883]  blkdev_ioctl+0x116/0x280
      [ 4616.252977]  __x64_sys_ioctl+0x8a/0xc0
      [ 4616.257163]  do_syscall_64+0x5c/0x90
      [ 4616.261164]  ? handle_mm_fault+0xc5/0x2a0
      [ 4616.265652]  ? do_user_addr_fault+0x1d8/0x690
      [ 4616.270527]  ? do_syscall_64+0x69/0x90
      [ 4616.274717]  ? exc_page_fault+0x62/0x150
      [ 4616.279097]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [ 4616.284748] RIP: 0033:0x7f9a55398c6b
      Signed-off-by: default avatarXiao Ni <xni@redhat.com>
      Reported-by: default avatarYi Zhang <yi.zhang@redhat.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      8e1a2279
    • Florian-Ewald Mueller's avatar
      md/bitmap: Fix bitmap chunk size overflow issues · 45552111
      Florian-Ewald Mueller authored
      - limit bitmap chunk size internal u64 variable to values not overflowing
        the u32 bitmap superblock structure variable stored on persistent media
      - assign bitmap chunk size internal u64 variable from unsigned values to
        avoid possible sign extension artifacts when assigning from a s32 value
      
      The bug has been there since at least kernel 4.0.
      Steps to reproduce it:
      1: mdadm -C /dev/mdx -l 1 --bitmap=internal --bitmap-chunk=256M -e 1.2
      -n2 /dev/rnbd1 /dev/rnbd2
      2 resize member device rnbd1 and rnbd2 to 8 TB
      3 mdadm --grow /dev/mdx --size=max
      
      The bitmap_chunksize will overflow without patch.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarFlorian-Ewald Mueller <florian-ewald.mueller@ionos.com>
      Signed-off-by: default avatarJack Wang <jinpu.wang@ionos.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      45552111
    • Ye Bin's avatar
      md: introduce md_ro_state · f97a5528
      Ye Bin authored
      Introduce md_ro_state for mddev->ro, so it is easy to understand.
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      f97a5528
    • Ye Bin's avatar
      md: factor out __md_set_array_info() · 2f6d261e
      Ye Bin authored
      Factor out __md_set_array_info(). No functional change.
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      2f6d261e
    • Giulio Benetti's avatar
      lib/raid6: drop RAID6_USE_EMPTY_ZERO_PAGE · 42271ca3
      Giulio Benetti authored
      RAID6_USE_EMPTY_ZERO_PAGE is unused and hardcoded to 0, so let's drop it.
      Signed-off-by: default avatarGiulio Benetti <giulio.benetti@benettiengineering.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      42271ca3
    • Uros Bizjak's avatar
      raid5-cache: use try_cmpxchg in r5l_wake_reclaim · 9487a0f6
      Uros Bizjak authored
      Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
      r5l_wake_reclaim. 86 CMPXCHG instruction returns success in ZF flag, so
      this change saves a compare after cmpxchg (and related move instruction in
      front of cmpxchg).
      
      Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
      fails. There is no need to re-read the value in the loop.
      
      Note that the value from *ptr should be read using READ_ONCE to prevent
      the compiler from merging, refetching or reordering the read.
      
      No functional change intended.
      
      Cc: Song Liu <song@kernel.org>
      Signed-off-by: default avatarUros Bizjak <ubizjak@gmail.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      9487a0f6