1. 28 Apr, 2023 4 commits
    • Eric Blake's avatar
      docs nbd: userspace NBD now favors github over sourceforge · 952aa344
      Eric Blake authored
      While the sourceforge site for userspace NBD still exists, the code
      repository moved to github several years ago.  Then with a recent
      patch[1], the github landing page contains just as much information as
      the sourceforge page, so we might as well point to a single location
      that also provides the code.
      
      [1] https://lists.debian.org/nbd/2023/03/msg00051.htmlSigned-off-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20230410180611.1051618-5-eblake@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      952aa344
    • Eric Blake's avatar
      block nbd: use req.cookie instead of req.handle · bd9e9916
      Eric Blake authored
      The NBD spec was recently changed [1] to refer to the opaque client
      identifier as a 'cookie' rather than a 'handle', but has for a much
      longer time listed it as a 64-bit value, and declares that all values
      in the NBD protocol are sent in network byte order (big-endian).
      
      Because the value is opaque to the server, it doesn't usually matter
      what endianness we send as the client - as long as we are consistent
      that either we byte-swap on both write and read, or on neither, then
      we can match server replies back to our requests.  That said, our
      internal use of the cookie is as a 64-bit number (well, as two 32-bit
      numbers concatenated together), rather than as 8 individual bytes; so
      prior to this commit, we ARE leaking the native endianness of our
      internals as a client out to the server.  We don't know of any server
      that will actually inspect the opaque value and behave differently
      depending on whether a little-endian or big-endian client is sending
      requests, but since we DO log the cookie value, a wireshark capture of
      the network traffic is easier to correlate back to the kernel traffic
      of a big-endian host (where the u64 and char[8] representations are
      the same) than of a little-endian host (where if wireshark honors the
      NBD spec and displays a u64 in network byte order, it is byte-swapped
      from what the kernel logged).
      
      The fix in this patch is thus two-part: it now consistently uses
      network byte order for the opaque value (no difference to a big-endian
      machine, but an extra byteswap on a little-endian machine; probably in
      the noise compared to the overhead of network traffic in general), and
      now uses a 64-bit integer instead of char[8] as its preferred access
      to the opaque value (direct assignment instead of memcpy()).
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20230410180611.1051618-4-eblake@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bd9e9916
    • Eric Blake's avatar
      uapi nbd: add cookie alias to handle · 2686eb84
      Eric Blake authored
      The uapi <linux/nbd.h> header declares a 'char handle[8]' per request;
      which is overloaded in English (are you referring to "handle" the
      verb, such as handling a signal or writing a callback handler, or
      "handle" the noun, the value used in a lookup table to correlate a
      response back to the request).  Many user-space NBD implementations
      (both servers and clients) have instead used 'uint64_t cookie' or
      similar, as it is easier to directly assign an integer than to futz
      around with memcpy.  In fact, upstream documentation is now
      encouraging this shift in terminology:
      https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b
      
      Accomplish this by use of an anonymous union to provide the alias for
      anyone getting the definition from the uapi; this does not break
      existing clients, while exposing the nicer name for those who prefer
      it.  Note that block/nbd.c still uses the term handle (in fact, it
      actually combines a 32-bit cookie and a 32-bit tag into the 64-bit
      handle), but that internal usage is not changed by the public uapi,
      since no compliant NBD server has any reason to inspect or alter the
      64 bits sent over the socket.
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20230410180611.1051618-3-eblake@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2686eb84
    • Eric Blake's avatar
      uapi nbd: improve doc links to userspace spec · daf376a3
      Eric Blake authored
      The uapi <linux/nbd.h> header intentionally documents only the NBD
      server features that the kernel module will utilize as a client.  But
      while it already had one mention of skipped bits due to userspace
      extensions, it did not actually direct the reader to the canonical
      source to learn about those extensions.
      
      While touching comments, fix an outdated reference that listed only
      READ and WRITE as commands.
      Signed-off-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20230410180611.1051618-2-eblake@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      daf376a3
  2. 27 Apr, 2023 3 commits
  3. 25 Apr, 2023 2 commits
  4. 24 Apr, 2023 1 commit
  5. 20 Apr, 2023 2 commits
    • Zhong Jinghua's avatar
      nbd: fix incomplete validation of ioctl arg · 55793ea5
      Zhong Jinghua authored
      We tested and found an alarm caused by nbd_ioctl arg without verification.
      The UBSAN warning calltrace like below:
      
      UBSAN: Undefined behaviour in fs/buffer.c:1709:35
      signed integer overflow:
      -9223372036854775808 - 1 cannot be represented in type 'long long int'
      CPU: 3 PID: 2523 Comm: syz-executor.0 Not tainted 4.19.90 #1
      Hardware name: linux,dummy-virt (DT)
      Call trace:
       dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
       show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x170/0x1dc lib/dump_stack.c:118
       ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
       handle_overflow+0x188/0x1dc lib/ubsan.c:192
       __ubsan_handle_sub_overflow+0x34/0x44 lib/ubsan.c:206
       __block_write_full_page+0x94c/0xa20 fs/buffer.c:1709
       block_write_full_page+0x1f0/0x280 fs/buffer.c:2934
       blkdev_writepage+0x34/0x40 fs/block_dev.c:607
       __writepage+0x68/0xe8 mm/page-writeback.c:2305
       write_cache_pages+0x44c/0xc70 mm/page-writeback.c:2240
       generic_writepages+0xdc/0x148 mm/page-writeback.c:2329
       blkdev_writepages+0x2c/0x38 fs/block_dev.c:2114
       do_writepages+0xd4/0x250 mm/page-writeback.c:2344
      
      The reason for triggering this warning is __block_write_full_page()
      -> i_size_read(inode) - 1 overflow.
      inode->i_size is assigned in __nbd_ioctl() -> nbd_set_size() -> bytesize.
      We think it is necessary to limit the size of arg to prevent errors.
      
      Moreover, __nbd_ioctl() -> nbd_add_socket(), arg will be cast to int.
      Assuming the value of arg is 0x80000000000000001) (on a 64-bit machine),
      it will become 1 after the coercion, which will return unexpected results.
      
      Fix it by adding checks to prevent passing in too large numbers.
      Signed-off-by: default avatarZhong Jinghua <zhongjinghua@huawei.com>
      Reviewed-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Link: https://lore.kernel.org/r/20230206145805.2645671-1-zhongjinghua@huawei.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      55793ea5
    • Ming Lei's avatar
      ublk: don't return 0 in case of any failure · 7c75661c
      Ming Lei authored
      Commit 2d786e66 ("block: ublk: switch to ioctl command encoding")
      starts to reset local variable of 'ret' as zero, then if any failure
      happens when handling the three IO commands, 0 can be returned to ublk
      server.
      
      Fix it by returning -EINVAL in case of command handling failure.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Fixes: 2d786e66 ("block: ublk: switch to ioctl command encoding")
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20230420091104.1092972-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7c75661c
  6. 19 Apr, 2023 3 commits
    • Ondrej Kozina's avatar
      sed-opal: geometry feature reporting command · 9e05a259
      Ondrej Kozina authored
      Locking range start and locking range length
      attributes may be require to satisfy restrictions
      exposed by OPAL2 geometry feature reporting.
      
      Geometry reporting feature is described in TCG OPAL SSC,
      section 3.1.1.4 (ALIGN, LogicalBlockSize, AlignmentGranularity
      and LowestAlignedLBA).
      
      4.3.5.2.1.1 RangeStart Behavior:
      
      [ StartAlignment = (RangeStart modulo AlignmentGranularity) - LowestAlignedLBA ]
      
      When processing a Set method or CreateRow method on the Locking
      table for a non-Global Range row, if:
      
      a) the AlignmentRequired (ALIGN above) column in the LockingInfo
         table is TRUE;
      b) RangeStart is non-zero; and
      c) StartAlignment is non-zero, then the method SHALL fail and
         return an error status code INVALID_PARAMETER.
      
      4.3.5.2.1.2 RangeLength Behavior:
      
      If RangeStart is zero, then
      	[ LengthAlignment = (RangeLength modulo AlignmentGranularity) - LowestAlignedLBA ]
      
      If RangeStart is non-zero, then
      	[ LengthAlignment = (RangeLength modulo AlignmentGranularity) ]
      
      When processing a Set method or CreateRow method on the Locking
      table for a non-Global Range row, if:
      
      a) the AlignmentRequired (ALIGN above) column in the LockingInfo
         table is TRUE;
      b) RangeLength is non-zero; and
      c) LengthAlignment is non-zero, then the method SHALL fail and
         return an error status code INVALID_PARAMETER
      
      In userspace we stuck to logical block size reported by general
      block device (via sysfs or ioctl), but we can not read
      'AlignmentGranularity' or 'LowestAlignedLBA' anywhere else and
      we need to get those values from sed-opal interface otherwise
      we will not be able to report or avoid locking range setup
      INVALID_PARAMETER errors above.
      Signed-off-by: default avatarOndrej Kozina <okozina@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarChristian Brauner <brauner@kernel.org>
      Tested-by: default avatarMilan Broz <gmazyland@gmail.com>
      Link: https://lore.kernel.org/r/20230411090931.9193-2-okozina@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9e05a259
    • Chaitanya Kulkarni's avatar
      null_blk: Always check queue mode setting from configfs · 63f8793e
      Chaitanya Kulkarni authored
      Make sure to check device queue mode in the null_validate_conf() and
      return error for NULL_Q_RQ as we don't allow legacy I/O path, without
      this patch we get OOPs when queue mode is set to 1 from configfs,
      following are repro steps :-
      
      modprobe null_blk nr_devices=0
      mkdir config/nullb/nullb0
      echo 1 > config/nullb/nullb0/memory_backed
      echo 4096 > config/nullb/nullb0/blocksize
      echo 20480 > config/nullb/nullb0/size
      echo 1 > config/nullb/nullb0/queue_mode
      echo 1 > config/nullb/nullb0/power
      
      Entering kdb (current=0xffff88810acdd080, pid 2372) on processor 42 Oops: (null)
      due to oops @ 0xffffffffc041c329
      CPU: 42 PID: 2372 Comm: sh Tainted: G           O     N 6.3.0-rc5lblk+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      RIP: 0010:null_add_dev.part.0+0xd9/0x720 [null_blk]
      Code: 01 00 00 85 d2 0f 85 a1 03 00 00 48 83 bb 08 01 00 00 00 0f 85 f7 03 00 00 80 bb 62 01 00 00 00 48 8b 75 20 0f 85 6d 02 00 00 <48> 89 6e 60 48 8b 75 20 bf 06 00 00 00 e8 f5 37 2c c1 48 8b 75 20
      RSP: 0018:ffffc900052cbde0 EFLAGS: 00010246
      RAX: 0000000000000001 RBX: ffff88811084d800 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff888100042e00
      RBP: ffff8881053d8200 R08: ffffc900052cbd68 R09: ffff888105db2000
      R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
      R13: ffff888104765200 R14: ffff88810eec1748 R15: ffff88810eec1740
      FS:  00007fd445fd1740(0000) GS:ffff8897dfc80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000060 CR3: 0000000166a00000 CR4: 0000000000350ee0
      DR0: ffffffff8437a488 DR1: ffffffff8437a489 DR2: ffffffff8437a48a
      DR3: ffffffff8437a48b DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       nullb_device_power_store+0xd1/0x120 [null_blk]
       configfs_write_iter+0xb4/0x120
       vfs_write+0x2ba/0x3c0
       ksys_write+0x5f/0xe0
       do_syscall_64+0x3b/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7fd4460c57a7
      Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
      RSP: 002b:00007ffd3792a4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fd4460c57a7
      RDX: 0000000000000002 RSI: 000055b43c02e4c0 RDI: 0000000000000001
      RBP: 000055b43c02e4c0 R08: 000000000000000a R09: 00007fd44615b4e0
      R10: 00007fd44615b3e0 R11: 0000000000000246 R12: 0000000000000002
      R13: 00007fd446198520 R14: 0000000000000002 R15: 00007fd446198700
       </TASK>
      Signed-off-by: default avatarChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: default avatarDamien Le Moal <dlemoal@kernel.org>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarNitesh Shetty <nj.shetty@samsung.com>
      Link: https://lore.kernel.org/r/20230416220339.43845-1-kch@nvidia.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      63f8793e
    • Ming Lei's avatar
      block: ublk: switch to ioctl command encoding · 2d786e66
      Ming Lei authored
      All ublk commands(control, IO) should have taken ioctl command encoding
      from the beginning, because ioctl command encoding defines each code
      uniquely, so driver can figure out wrong command sent from userspace
      easily; 2) it might help security subsystem for audit uring cmd[1].
      
      Unfortunately we didn't do that way, and it could be one lesson for
      ublk driver.
      
      So switch to ioctl command encoding now, we still support commands encoded
      in old way, but they become legacy definition. Any new command should take
      ioctl encoding.
      
      See ublksrv code for switching to ioctl command encoding in [2].
      
      [1] https://lore.kernel.org/io-uring/CAHC9VhSVzujW9LOj5Km80AjU0EfAuukoLrxO6BEfnXeK_s6bAg@mail.gmail.com/
      [2] https://github.com/ming1/ubdsrv/commits/ioctl_cmd_encoding
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ken Kurematsu <k.kurematsu@nskint.co.jp>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230418131810.855959-1-ming.lei@redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2d786e66
  7. 16 Apr, 2023 5 commits
    • Christoph Hellwig's avatar
      blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush · 26a42b61
      Christoph Hellwig authored
      Commit b12e5c6c accidentally changes blk_kick_flush to do a head
      insert into the requeue list, fix this up.
      
      Fixes: b12e5c6c ("blk-mq: pass a flags argument to blk_mq_add_to_requeue_list")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20230416073553.966161-1-hch@lst.deSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      26a42b61
    • Colin Ian King's avatar
      block, bfq: Fix division by zero error on zero wsum · e53413f8
      Colin Ian King authored
      When the weighted sum is zero the calculation of limit causes
      a division by zero error. Fix this by continuing to the next level.
      
      This was discovered by running as root:
      
      stress-ng --ioprio 0
      
      Fixes divison by error oops:
      
      [  521.450556] divide error: 0000 [#1] SMP NOPTI
      [  521.450766] CPU: 2 PID: 2684464 Comm: stress-ng-iopri Not tainted 6.2.1-1280.native #1
      [  521.451117] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
      [  521.451627] RIP: 0010:bfqq_request_over_limit+0x207/0x400
      [  521.451875] Code: 01 48 8d 0c c8 74 0b 48 8b 82 98 00 00 00 48 8d 0c c8 8b 85 34 ff ff ff 48 89 ca 41 0f af 41 50 48 d1 ea 48 98 48 01 d0 31 d2 <48> f7 f1 41 39 41 48 89 85 34 ff ff ff 0f 8c 7b 01 00 00 49 8b 44
      [  521.452699] RSP: 0018:ffffb1af84eb3948 EFLAGS: 00010046
      [  521.452938] RAX: 000000000000003c RBX: 0000000000000000 RCX: 0000000000000000
      [  521.453262] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb1af84eb3978
      [  521.453584] RBP: ffffb1af84eb3a30 R08: 0000000000000001 R09: ffff8f88ab8a4ba0
      [  521.453905] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8f88ab8a4b18
      [  521.454224] R13: ffff8f8699093000 R14: 0000000000000001 R15: ffffb1af84eb3970
      [  521.454549] FS:  00005640b6b0b580(0000) GS:ffff8f88b3880000(0000) knlGS:0000000000000000
      [  521.454912] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  521.455170] CR2: 00007ffcbcae4e38 CR3: 00000002e46de001 CR4: 0000000000770ee0
      [  521.455491] PKRU: 55555554
      [  521.455619] Call Trace:
      [  521.455736]  <TASK>
      [  521.455837]  ? bfq_request_merge+0x3a/0xc0
      [  521.456027]  ? elv_merge+0x115/0x140
      [  521.456191]  bfq_limit_depth+0xc8/0x240
      [  521.456366]  __blk_mq_alloc_requests+0x21a/0x2c0
      [  521.456577]  blk_mq_submit_bio+0x23c/0x6c0
      [  521.456766]  __submit_bio+0xb8/0x140
      [  521.457236]  submit_bio_noacct_nocheck+0x212/0x300
      [  521.457748]  submit_bio_noacct+0x1a6/0x580
      [  521.458220]  submit_bio+0x43/0x80
      [  521.458660]  ext4_io_submit+0x23/0x80
      [  521.459116]  ext4_do_writepages+0x40a/0xd00
      [  521.459596]  ext4_writepages+0x65/0x100
      [  521.460050]  do_writepages+0xb7/0x1c0
      [  521.460492]  __filemap_fdatawrite_range+0xa6/0x100
      [  521.460979]  file_write_and_wait_range+0xbf/0x140
      [  521.461452]  ext4_sync_file+0x105/0x340
      [  521.461882]  __x64_sys_fsync+0x67/0x100
      [  521.462305]  ? syscall_exit_to_user_mode+0x2c/0x1c0
      [  521.462768]  do_syscall_64+0x3b/0xc0
      [  521.463165]  entry_SYSCALL_64_after_hwframe+0x5a/0xc4
      [  521.463621] RIP: 0033:0x5640b6c56590
      [  521.464006] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d 71 70 0e 00 00 74 17 b8 4a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Link: https://lore.kernel.org/r/20230413133009.1605335-1-colin.i.king@gmail.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e53413f8
    • Akinobu Mita's avatar
      fault-inject: fix build error when FAULT_INJECTION_CONFIGFS=y and CONFIGFS_FS=m · d325c162
      Akinobu Mita authored
      This fixes a build error when CONFIG_FAULT_INJECTION_CONFIGFS=y and
      CONFIG_CONFIGFS_FS=m.
      
      Since the fault-injection library cannot built as a module, avoid building
      configfs as a module.
      
      Fixes: 4668c7a2 ("fault-inject: allow configuration via configfs")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Link: https://lore.kernel.org/oe-kbuild-all/202304150025.K0hczLR4-lkp@intel.com/Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d325c162
    • Jens Axboe's avatar
      block: store bdev->bd_disk->fops->submit_bio state in bdev · 9f4107b0
      Jens Axboe authored
      We have a long chain of memory dereferencing just to whether or not
      this disk has a special submit_bio helper. As that's not necessarily
      the common case, add a bd_has_submit_bio state in the bdev to avoid
      traversing this memory dependency chain if we don't need to.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9f4107b0
    • Jens Axboe's avatar
      block: re-arrange the struct block_device fields for better layout · 3838c406
      Jens Axboe authored
      This moves struct device out-of-line as it's just used at open/close
      time, so we can keep some of the commonly used fields closer together.
      On a standard setup, it also reduces the size from 864 bytes to 848
      bytes. Yes, struct device is a pig...
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3838c406
  8. 14 Apr, 2023 17 commits
  9. 13 Apr, 2023 3 commits