1. 24 Jun, 2020 23 commits
    • Christoph Hellwig's avatar
      nvme-rdma: fix a missing completion with remove invalidation · 7a804c34
      Christoph Hellwig authored
      Revert and incorret transformation that caused requests using remote
      invalidation to never complete.
      
      Fixes: 421147be863b ("nvme-rdma: factor out a nvme_rdma_end_request helper")
      Reported-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Tested-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7a804c34
    • Gustavo A. R. Silva's avatar
      blk-iocost: Use struct_size() in kzalloc_node() · f61d6e25
      Gustavo A. R. Silva authored
      Make use of the struct_size() helper instead of an open-coded version
      in order to avoid any potential type mistakes.
      
      This code was detected with the help of Coccinelle and, audited and
      fixed manually.
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f61d6e25
    • Gustavo A. R. Silva's avatar
      block: bio: Use struct_size() in kmalloc() · 1f4fe21c
      Gustavo A. R. Silva authored
      Make use of the struct_size() helper instead of an open-coded version
      in order to avoid any potential type mistakes.
      
      This code was detected with the help of Coccinelle and, audited and
      fixed manually.
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Addresses-KSPP-ID: https://github.com/KSPP/linux/issues/83Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1f4fe21c
    • Luis Chamberlain's avatar
      block: create the request_queue debugfs_dir on registration · 85e0cbbb
      Luis Chamberlain authored
      We were only creating the request_queue debugfs_dir only
      for make_request block drivers (multiqueue), but never for
      request-based block drivers. We did this as we were only
      creating non-blktrace additional debugfs files on that directory
      for make_request drivers. However, since blktrace *always* creates
      that directory anyway, we special-case the use of that directory
      on blktrace. Other than this being an eye-sore, this exposes
      request-based block drivers to the same debugfs fragile
      race that used to exist with make_request block drivers
      where if we start adding files onto that directory we can later
      run a race with a double removal of dentries on the directory
      if we don't deal with this carefully on blktrace.
      
      Instead, just simplify things by always creating the request_queue
      debugfs_dir on request_queue registration. Rename the mutex also to
      reflect the fact that this is used outside of the blktrace context.
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      85e0cbbb
    • Luis Chamberlain's avatar
      blktrace: ensure our debugfs dir exists · b431ef83
      Luis Chamberlain authored
      We make an assumption that a debugfs directory exists, but since
      this can fail ensure it exists before allowing blktrace setup to
      complete. Otherwise we end up stuffing blktrace files on the debugfs
      root directory. In the worst case scenario this *in theory* can create
      an eventual panic *iff* in the future a similarly named file is created
      prior on the debugfs root directory. This theoretical crash can happen
      due to a recursive removal followed by a specific dentry removal.
      
      This doesn't fix any known crash, however I have seen the files
      go into the main debugfs root directory in cases where the debugfs
      directory was not created due to other internal bugs with blktrace
      now fixed.
      
      blktrace is also completely useless without this directory, so
      this ensures to userspace we only setup blktrace if the kernel
      can stuff files where they are supposed to go into.
      
      debugfs directory creations typically aren't checked for, and we have
      maintainers doing sweep removals of these checks, but since we need this
      check to ensure proper userspace blktrace functionality we make sure
      to annotate the justification for the check.
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b431ef83
    • Luis Chamberlain's avatar
      blktrace: fix debugfs use after free · bad8e64f
      Luis Chamberlain authored
      On commit 6ac93117 ("blktrace: use existing disk debugfs directory")
      merged on v4.12 Omar fixed the original blktrace code for request-based
      drivers (multiqueue). This however left in place a possible crash, if you
      happen to abuse blktrace while racing to remove / add a device.
      
      We used to use asynchronous removal of the request_queue, and with that
      the issue was easier to reproduce. Now that we have reverted to
      synchronous removal of the request_queue, the issue is still possible to
      reproduce, its however just a bit more difficult.
      
      We essentially run two instances of break-blktrace which add/remove
      a loop device, and setup a blktrace and just never tear the blktrace
      down. We do this twice in parallel. This is easily reproduced with the
      script run_0004.sh from break-blktrace [0].
      
      We can end up with two types of panics each reflecting where we
      race, one a failed blktrace setup:
      
      [  252.426751] debugfs: Directory 'loop0' with parent 'block' already present!
      [  252.432265] BUG: kernel NULL pointer dereference, address: 00000000000000a0
      [  252.436592] #PF: supervisor write access in kernel mode
      [  252.439822] #PF: error_code(0x0002) - not-present page
      [  252.442967] PGD 0 P4D 0
      [  252.444656] Oops: 0002 [#1] SMP NOPTI
      [  252.446972] CPU: 10 PID: 1153 Comm: break-blktrace Tainted: G            E     5.7.0-rc2-next-20200420+ #164
      [  252.452673] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
      [  252.456343] RIP: 0010:down_write+0x15/0x40
      [  252.458146] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc
                     cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00
                     00 00 <f0> 48 0f b1 55 00 75 0f 48 8b 04 25 c0 8b 01 00 48 89
                     45 08 5d
      [  252.463638] RSP: 0018:ffffa626415abcc8 EFLAGS: 00010246
      [  252.464950] RAX: 0000000000000000 RBX: ffff958c25f0f5c0 RCX: ffffff8100000000
      [  252.466727] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
      [  252.468482] RBP: 00000000000000a0 R08: 0000000000000000 R09: 0000000000000001
      [  252.470014] R10: 0000000000000000 R11: ffff958d1f9227ff R12: 0000000000000000
      [  252.471473] R13: ffff958c25ea5380 R14: ffffffff8cce15f1 R15: 00000000000000a0
      [  252.473346] FS:  00007f2e69dee540(0000) GS:ffff958c2fc80000(0000) knlGS:0000000000000000
      [  252.475225] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  252.476267] CR2: 00000000000000a0 CR3: 0000000427d10004 CR4: 0000000000360ee0
      [  252.477526] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  252.478776] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  252.479866] Call Trace:
      [  252.480322]  simple_recursive_removal+0x4e/0x2e0
      [  252.481078]  ? debugfs_remove+0x60/0x60
      [  252.481725]  ? relay_destroy_buf+0x77/0xb0
      [  252.482662]  debugfs_remove+0x40/0x60
      [  252.483518]  blk_remove_buf_file_callback+0x5/0x10
      [  252.484328]  relay_close_buf+0x2e/0x60
      [  252.484930]  relay_open+0x1ce/0x2c0
      [  252.485520]  do_blk_trace_setup+0x14f/0x2b0
      [  252.486187]  __blk_trace_setup+0x54/0xb0
      [  252.486803]  blk_trace_ioctl+0x90/0x140
      [  252.487423]  ? do_sys_openat2+0x1ab/0x2d0
      [  252.488053]  blkdev_ioctl+0x4d/0x260
      [  252.488636]  block_ioctl+0x39/0x40
      [  252.489139]  ksys_ioctl+0x87/0xc0
      [  252.489675]  __x64_sys_ioctl+0x16/0x20
      [  252.490380]  do_syscall_64+0x52/0x180
      [  252.491032]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      And the other on the device removal:
      
      [  128.528940] debugfs: Directory 'loop0' with parent 'block' already present!
      [  128.615325] BUG: kernel NULL pointer dereference, address: 00000000000000a0
      [  128.619537] #PF: supervisor write access in kernel mode
      [  128.622700] #PF: error_code(0x0002) - not-present page
      [  128.625842] PGD 0 P4D 0
      [  128.627585] Oops: 0002 [#1] SMP NOPTI
      [  128.629871] CPU: 12 PID: 544 Comm: break-blktrace Tainted: G            E     5.7.0-rc2-next-20200420+ #164
      [  128.635595] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014
      [  128.640471] RIP: 0010:down_write+0x15/0x40
      [  128.643041] Code: eb ca e8 ae 22 8d ff cc cc cc cc cc cc cc cc cc cc cc cc
                     cc cc 0f 1f 44 00 00 55 48 89 fd e8 52 db ff ff 31 c0 ba 01 00
                     00 00 <f0> 48 0f b1 55 00 75 0f 65 48 8b 04 25 c0 8b 01 00 48 89
                     45 08 5d
      [  128.650180] RSP: 0018:ffffa9c3c05ebd78 EFLAGS: 00010246
      [  128.651820] RAX: 0000000000000000 RBX: ffff8ae9a6370240 RCX: ffffff8100000000
      [  128.653942] RDX: 0000000000000001 RSI: ffffff8100000000 RDI: 00000000000000a0
      [  128.655720] RBP: 00000000000000a0 R08: 0000000000000002 R09: ffff8ae9afd2d3d0
      [  128.657400] R10: 0000000000000056 R11: 0000000000000000 R12: 0000000000000000
      [  128.659099] R13: 0000000000000000 R14: 0000000000000003 R15: 00000000000000a0
      [  128.660500] FS:  00007febfd995540(0000) GS:ffff8ae9afd00000(0000) knlGS:0000000000000000
      [  128.662204] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  128.663426] CR2: 00000000000000a0 CR3: 0000000420042003 CR4: 0000000000360ee0
      [  128.664776] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  128.666022] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  128.667282] Call Trace:
      [  128.667801]  simple_recursive_removal+0x4e/0x2e0
      [  128.668663]  ? debugfs_remove+0x60/0x60
      [  128.669368]  debugfs_remove+0x40/0x60
      [  128.669985]  blk_trace_free+0xd/0x50
      [  128.670593]  __blk_trace_remove+0x27/0x40
      [  128.671274]  blk_trace_shutdown+0x30/0x40
      [  128.671935]  blk_release_queue+0x95/0xf0
      [  128.672589]  kobject_put+0xa5/0x1b0
      [  128.673188]  disk_release+0xa2/0xc0
      [  128.673786]  device_release+0x28/0x80
      [  128.674376]  kobject_put+0xa5/0x1b0
      [  128.674915]  loop_remove+0x39/0x50 [loop]
      [  128.675511]  loop_control_ioctl+0x113/0x130 [loop]
      [  128.676199]  ksys_ioctl+0x87/0xc0
      [  128.676708]  __x64_sys_ioctl+0x16/0x20
      [  128.677274]  do_syscall_64+0x52/0x180
      [  128.677823]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The common theme here is:
      
      debugfs: Directory 'loop0' with parent 'block' already present
      
      This crash happens because of how blktrace uses the debugfs directory
      where it places its files. Upon init we always create the same directory
      which would be needed by blktrace but we only do this for make_request
      drivers (multiqueue) block drivers. When you race a removal of these
      devices with a blktrace setup you end up in a situation where the
      make_request recursive debugfs removal will sweep away the blktrace
      files and then later blktrace will also try to remove individual
      dentries which are already NULL. The inverse is also possible and hence
      the two types of use after frees.
      
      We don't create the block debugfs directory on init for these types of
      block devices:
      
        * request-based block driver block devices
        * every possible partition
        * scsi-generic
      
      And so, this race should in theory only be possible with make_request
      drivers.
      
      We can fix the UAF by simply re-using the debugfs directory for
      make_request drivers (multiqueue) and only creating the ephemeral
      directory for the other type of block devices. The new clarifications
      on relying on the q->blk_trace_mutex *and* also checking for q->blk_trace
      *prior* to processing a blktrace ensures the debugfs directories are
      only created if no possible directory name clashes are possible.
      
      This goes tested with:
      
        o nvme partitions
        o ISCSI with tgt, and blktracing against scsi-generic with:
          o block
          o tape
          o cdrom
          o media changer
        o blktests
      
      This patch is part of the work which disputes the severity of
      CVE-2019-19770 which shows this issue is not a core debugfs issue, but
      a misuse of debugfs within blktace.
      
      Fixes: 6ac93117 ("blktrace: use existing disk debugfs directory")
      Reported-by: syzbot+603294af2d01acfdd6da@syzkaller.appspotmail.com
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
      Cc: yu kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bad8e64f
    • Luis Chamberlain's avatar
      loop: be paranoid on exit and prevent new additions / removals · 200f9337
      Luis Chamberlain authored
      Be pedantic on removal as well and hold the mutex.
      This should prevent uses of addition while we exit.
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      200f9337
    • Luis Chamberlain's avatar
      blktrace: annotate required lock on do_blk_trace_setup() · a67549c8
      Luis Chamberlain authored
      Ensure it is clear which lock is required on do_blk_trace_setup().
      Suggested-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a67549c8
    • Luis Chamberlain's avatar
      block: revert back to synchronous request_queue removal · e8c7d14a
      Luis Chamberlain authored
      Commit dc9edc44 ("block: Fix a blk_exit_rl() regression") merged on
      v4.12 moved the work behind blk_release_queue() into a workqueue after a
      splat floated around which indicated some work on blk_release_queue()
      could sleep in blk_exit_rl(). This splat would be possible when a driver
      called blk_put_queue() or blk_cleanup_queue() (which calls blk_put_queue()
      as its final call) from an atomic context.
      
      blk_put_queue() decrements the refcount for the request_queue kobject, and
      upon reaching 0 blk_release_queue() is called. Although blk_exit_rl() is
      now removed through commit db6d9952 ("block: remove request_list code")
      on v5.0, we reserve the right to be able to sleep within
      blk_release_queue() context.
      
      The last reference for the request_queue must not be called from atomic
      context. *When* the last reference to the request_queue reaches 0 varies,
      and so let's take the opportunity to document when that is expected to
      happen and also document the context of the related calls as best as
      possible so we can avoid future issues, and with the hopes that the
      synchronous request_queue removal sticks.
      
      We revert back to synchronous request_queue removal because asynchronous
      removal creates a regression with expected userspace interaction with
      several drivers. An example is when removing the loopback driver, one
      uses ioctls from userspace to do so, but upon return and if successful,
      one expects the device to be removed. Likewise if one races to add another
      device the new one may not be added as it is still being removed. This was
      expected behavior before and it now fails as the device is still present
      and busy still. Moving to asynchronous request_queue removal could have
      broken many scripts which relied on the removal to have been completed if
      there was no error. Document this expectation as well so that this
      doesn't regress userspace again.
      
      Using asynchronous request_queue removal however has helped us find
      other bugs. In the future we can test what could break with this
      arrangement by enabling CONFIG_DEBUG_KOBJECT_RELEASE.
      
      While at it, update the docs with the context expectations for the
      request_queue / gendisk refcount decrement, and make these
      expectations explicit by using might_sleep().
      
      Fixes: dc9edc44 ("block: Fix a blk_exit_rl() regression")
      Suggested-by: default avatarNicolai Stange <nstange@suse.de>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: yu kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e8c7d14a
    • Luis Chamberlain's avatar
      block: clarify context for refcount increment helpers · 763b5892
      Luis Chamberlain authored
      Let us clarify the context under which the helpers to increment the
      refcount for the gendisk and request_queue can be called under. We
      make this explicit on the places where we may sleep with might_sleep().
      
      We don't address the decrement context yet, as that needs some extra
      work and fixes, but will be addressed in the next patch.
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      763b5892
    • Luis Chamberlain's avatar
      block: add docs for gendisk / request_queue refcount helpers · b5bd357c
      Luis Chamberlain authored
      This adds documentation for the gendisk / request_queue refcount
      helpers.
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b5bd357c
    • Christoph Hellwig's avatar
      nvme: use blk_mq_complete_request_remote to avoid an indirect function call · ff029451
      Christoph Hellwig authored
      Use the new blk_mq_complete_request_remote helper to avoid an indirect
      function call in the completion fast path.
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ff029451
    • Christoph Hellwig's avatar
      nvme-rdma: factor out a nvme_rdma_end_request helper · 8446546c
      Christoph Hellwig authored
      Factor a small sniplet of duplicated code into a new helper in
      preparation for making this sniplet a little bit less trivial.
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8446546c
    • Christoph Hellwig's avatar
      blk-mq: add a new blk_mq_complete_request_remote API · 40d09b53
      Christoph Hellwig authored
      This is a variant of blk_mq_complete_request_remote that only completes
      the request if it needs to be bounced to another CPU or a softirq.  If
      the request can be completed locally the function returns false and lets
      the driver complete it without requring and indirect function call.
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      40d09b53
    • Christoph Hellwig's avatar
      blk-mq: factor out a blk_mq_complete_need_ipi helper · 96339526
      Christoph Hellwig authored
      Add a helper to decide if we can complete locally or need an IPI.
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      96339526
    • Christoph Hellwig's avatar
      blk-mq: remove the get_cpu/put_cpu pair in blk_mq_complete_request · 4c8fc196
      Christoph Hellwig authored
      We don't really care if we get migrated during the I/O completion.
      In the worth case we either perform an IPI that wasn't required, or
      complete the request on a CPU which we just migrated off.
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4c8fc196
    • Christoph Hellwig's avatar
      blk-mq: move failure injection out of blk_mq_complete_request · 15f73f5b
      Christoph Hellwig authored
      Move the call to blk_should_fake_timeout out of blk_mq_complete_request
      and into the drivers, skipping call sites that are obvious error
      handlers, and remove the now superflous blk_mq_force_complete_rq helper.
      This ensures we don't keep injecting errors into completions that just
      terminate the Linux request after the hardware has been reset or the
      command has been aborted.
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      15f73f5b
    • Christoph Hellwig's avatar
      blk-mq: merge the softirq vs non-softirq IPI logic · d391a7a3
      Christoph Hellwig authored
      Both the softirq path for single queue devices and the multi-queue
      completion handler share the same logic to figure out if we need an
      IPI for the completion and eventually issue it.  Merge the two
      versions into a single unified code path.
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d391a7a3
    • Christoph Hellwig's avatar
      blk-mq: short cut the IPI path in blk_mq_force_complete_rq for !SMP · d6cc464c
      Christoph Hellwig authored
      Let the compile optimize out the entire IPI path, given that we are
      obviously not going to use it.
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d6cc464c
    • Christoph Hellwig's avatar
      blk-mq: complete polled requests directly · 6aab1da6
      Christoph Hellwig authored
      Even for single queue devices there is no point in offloading a polled
      completion to the softirq, given that blk_mq_force_complete_rq is called
      from the polling thread in that case and thus there are no starvation
      issues.
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6aab1da6
    • Christoph Hellwig's avatar
      blk-mq: remove raise_blk_irq · dea6f399
      Christoph Hellwig authored
      By open coding raise_blk_irq in the only caller, and replacing the
      ifdef CONFIG_SMP with an IS_ENABLED check the flow in the caller
      can be significantly simplified.
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      dea6f399
    • Christoph Hellwig's avatar
      blk-mq: factor out a helper to reise the block softirq · 115243f5
      Christoph Hellwig authored
      Add a helper to deduplicate the logic that raises the block softirq.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      115243f5
    • Christoph Hellwig's avatar
      blk-mq: merge blk-softirq.c into blk-mq.c · c3077b5d
      Christoph Hellwig authored
      __blk_complete_request is only called from the blk-mq code, and
      duplicates a lot of code from blk-mq.c.  Move it there to prepare
      for better code sharing and simplifications.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDaniel Wagner <dwagner@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c3077b5d
  2. 21 Jun, 2020 10 commits
    • Linus Torvalds's avatar
      Linux 5.8-rc2 · 48778464
      Linus Torvalds authored
      48778464
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20200621' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 817d914d
      Linus Torvalds authored
      Pull SELinux fixes from Paul Moore:
       "Three small patches to fix problems in the SELinux code, all found via
        clang.
      
        Two patches fix potential double-free conditions and one fixes an
        undefined return value"
      
      * tag 'selinux-pr-20200621' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: fix undefined return of cond_evaluate_expr
        selinux: fix a double free in cond_read_node()/cond_read_list()
        selinux: fix double free
      817d914d
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 16f4aa9b
      Linus Torvalds authored
      Pull pin control fixes from Linus Walleij:
       "Some early fixes collected during the first week after the merge
        window, all pretty self-evident, with the details below. The revert is
        the crucial thing.
      
         - Fix a warning on the Qualcomm SPMI GPIO chip being instatiated
           twice without a unique irqchip struct
      
         - Use the noirq variants of the suspend and resume callbacks in the
           Tegra driver
      
         - Clean up the errorpath on the MCP23s08 driver
      
         - Revert the use of devm_of_iomap() in the Freescale driver as it was
           regressing the platform
      
         - Add some missing pins in the Qualcomm IPQ6018 driver
      
         - Fix a simple documentation bug in the pinctrl-single driver"
      
      * tag 'pinctrl-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: single: fix function name in documentation
        pinctrl: qcom: ipq6018 Add missing pins in qpic pin group
        Revert "pinctrl: freescale: imx: Use 'devm_of_iomap()' to avoid a resource leak in case of error in 'imx_pinctrl_probe()'"
        pinctrl: mcp23s08: Split to three parts: fix ptr_ret.cocci warnings
        pinctrl: tegra: Use noirq suspend/resume callbacks
        pinctrl: qcom: spmi-gpio: fix warning about irq chip reusage
      16f4aa9b
    • Linus Torvalds's avatar
      Merge tag 'kbuild-fixes-v5.8' of... · be9160a9
      Linus Torvalds authored
      Merge tag 'kbuild-fixes-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
      
      Pull Kbuild fixes from Masahiro Yamada:
      
       - fix -gz=zlib compiler option test for CONFIG_DEBUG_INFO_COMPRESSED
      
       - improve cc-option in scripts/Kbuild.include to clean up temp files
      
       - improve cc-option in scripts/Kconfig.include for more reliable
         compile option test
      
       - do not copy modules.builtin by 'make install' because it would break
         existing systems
      
       - use 'userprogs' syntax for watch_queue sample
      
      * tag 'kbuild-fixes-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        samples: watch_queue: build sample program for target architecture
        Revert "Makefile: install modules.builtin even if CONFIG_MODULES=n"
        scripts: Fix typo in headers_install.sh
        kconfig: unify cc-option and as-option
        kbuild: improve cc-option to clean up all temporary files
        Makefile: Improve compressed debug info support detection
      be9160a9
    • Linus Torvalds's avatar
      Merge tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 75613939
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - One fix for the interrupt rework we did last release which broke
         KVM-PR
      
       - Three commits fixing some fallout from the READ_ONCE() changes
         interacting badly with our 8xx 16K pages support, which uses a pte_t
         that is a structure of 4 actual PTEs
      
       - A cleanup of the 8xx pte_update() to use the newly added pmd_off()
      
       - A fix for a crash when handling an oops if CONFIG_DEBUG_VIRTUAL is
         enabled
      
       - A minor fix for the SPU syscall generation
      
      Thanks to Aneesh Kumar K.V, Christian Zigotzky, Christophe Leroy, Mike
      Rapoport, Nicholas Piggin.
      
      * tag 'powerpc-5.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/8xx: Provide ptep_get() with 16k pages
        mm: Allow arches to provide ptep_get()
        mm/gup: Use huge_ptep_get() in gup_hugepte()
        powerpc/syscalls: Use the number when building SPU syscall table
        powerpc/8xx: use pmd_off() to access a PMD entry in pte_update()
        powerpc/64s: Fix KVM interrupt using wrong save area
        powerpc: Fix kernel crash in show_instructions() w/DEBUG_VIRTUAL
      75613939
    • Linus Torvalds's avatar
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · 93bbca27
      Linus Torvalds authored
      Pull crypto fixes from Herbert Xu:
      
       - NULL dereference in octeontx
      
       - PM reference imbalance in ks-sa
      
       - deadlock in crypto manager
      
       - memory leak in drbg
      
       - missing socket limit check on receive SG list size in algif_skcipher
      
       - typos in caam
      
       - warnings in ccp and hisilicon
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: drbg - always try to free Jitter RNG instance
        crypto: marvell/octeontx - Fix a potential NULL dereference
        crypto: algboss - don't wait during notifier callback
        crypto: caam - fix typos
        crypto: ccp - Fix sparse warnings in sev-dev
        crypto: hisilicon - Cap block size at 2^31
        crypto: algif_skcipher - Cap recv SG list at ctx->used
        hwrng: ks-sa - Fix runtime PM imbalance on error
      93bbca27
    • Masahiro Yamada's avatar
      samples: watch_queue: build sample program for target architecture · 214377e9
      Masahiro Yamada authored
      This userspace program includes UAPI headers exported to usr/include/.
      'make headers' always works for the target architecture (i.e. the same
      architecture as the kernel), so the sample program should be built for
      the target as well. Kbuild now supports 'userprogs' for that.
      
      I also guarded the CONFIG option by 'depends on CC_CAN_LINK' because
      $(CC) may not provide libc.
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      214377e9
    • Masahiro Yamada's avatar
      Revert "Makefile: install modules.builtin even if CONFIG_MODULES=n" · 2c6d9636
      Masahiro Yamada authored
      This reverts commit e0b250b5,
      which broke build systems that need to install files to a certain
      path, but do not set INSTALL_MOD_PATH when invoking 'make install'.
      
        $ make INSTALL_PATH=/tmp/destdir install
        mkdir: cannot create directory ‘/lib/modules/5.8.0-rc1+/’: Permission denied
        Makefile:1342: recipe for target '_builtin_inst_' failed
        make: *** [_builtin_inst_] Error 1
      
      While modules.builtin is useful also for CONFIG_MODULES=n, this change
      in the behavior is quite unexpected. Maybe "make modules_install"
      can install modules.builtin irrespective of CONFIG_MODULES as Jonas
      originally suggested.
      
      Anyway, that commit should be reverted ASAP.
      Reported-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reported-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Cc: Jonas Karlman <jonas@kwiboo.se>
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: default avatarGuenter Roeck <linux@roeck-us.net>
      Tested-by: default avatarGuenter Roeck <linux@roeck-us.net>
      2c6d9636
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 64677779
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "One minor fix and two patches reworking the ata dma drain for the
        !CONFIG_LIBATA case. The latter is a 5.7 regression fix"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: Wire up ata_scsi_dma_need_drain for SAS HBA drivers
        scsi: libata: Provide an ata_scsi_dma_need_drain stub for !CONFIG_ATA
        scsi: ufs-bsg: Fix runtime PM imbalance on error
      64677779
    • Linus Torvalds's avatar
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · a5c6a1f0
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
      
       - a small collection of remaining API conversion patches (all acked)
         which allow to finally remove the deprecated API
      
       - some documentation fixes and a MAINTAINERS addition
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        MAINTAINERS: Add robert and myself as qcom i2c cci maintainers
        i2c: smbus: Fix spelling mistake in the comments
        Documentation/i2c: SMBus start signal is S not A
        i2c: remove deprecated i2c_new_device API
        Documentation: media: convert to use i2c_new_client_device()
        video: backlight: tosa_lcd: convert to use i2c_new_client_device()
        x86/platform/intel-mid: convert to use i2c_new_client_device()
        drm: encoder_slave: use new I2C API
        drm: encoder_slave: fix refcouting error for modules
      a5c6a1f0
  3. 20 Jun, 2020 7 commits
    • Drew Fustini's avatar
      pinctrl: single: fix function name in documentation · 25fae752
      Drew Fustini authored
      Use the correct the function name in the documentation for
      "pcs_parse_one_pinctrl_entry()".
      
      "smux_parse_one_pinctrl_entry()" appears to be an artifact from the
      development of a prior patch series ("simple pinmux driver") which
      transformed into pinctrl-single.
      Signed-off-by: default avatarDrew Fustini <drew@beagleboard.org>
      Link: https://lore.kernel.org/r/20200612112758.GA3407886@x1Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      25fae752
    • Linus Torvalds's avatar
      Merge tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 8b6ddd10
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
      
       - Have recordmcount work with > 64K sections (to support LTO)
      
       - kprobe RCU fixes
      
       - Correct a kprobe critical section with missing mutex
      
       - Remove redundant arch_disarm_kprobe() call
      
       - Fix lockup when kretprobe triggers within kprobe_flush_task()
      
       - Fix memory leak in fetch_op_data operations
      
       - Fix sleep in atomic in ftrace trace array sample code
      
       - Free up memory on failure in sample trace array code
      
       - Fix incorrect reporting of function_graph fields in format file
      
       - Fix quote within quote parsing in bootconfig
      
       - Fix return value of bootconfig tool
      
       - Add testcases for bootconfig tool
      
       - Fix maybe uninitialized warning in ftrace pid file code
      
       - Remove unused variable in tracing_iter_reset()
      
       - Fix some typos
      
      * tag 'trace-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace: Fix maybe-uninitialized compiler warning
        tools/bootconfig: Add testcase for show-command and quotes test
        tools/bootconfig: Fix to return 0 if succeeded to show the bootconfig
        tools/bootconfig: Fix to use correct quotes for value
        proc/bootconfig: Fix to use correct quotes for value
        tracing: Remove unused event variable in tracing_iter_reset
        tracing/probe: Fix memleak in fetch_op_data operations
        trace: Fix typo in allocate_ftrace_ops()'s comment
        tracing: Make ftrace packed events have align of 1
        sample-trace-array: Remove trace_array 'sample-instance'
        sample-trace-array: Fix sleeping function called from invalid context
        kretprobe: Prevent triggering kretprobe from within kprobe_flush_task
        kprobes: Remove redundant arch_disarm_kprobe() call
        kprobes: Fix to protect kick_kprobe_optimizer() by kprobe_mutex
        kprobes: Use non RCU traversal APIs on kprobe_tables if possible
        kprobes: Suppress the suspicious RCU warning on kprobes
        recordmcount: support >64k sections
      8b6ddd10
    • Linus Torvalds's avatar
      Merge tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · eede2b9b
      Linus Torvalds authored
      Pull libnvdimm updates from Dan Williams:
       "A feature (papr_scm health retrieval) and a fix (sysfs attribute
        visibility) for v5.8.
      
        Vaibhav explains in the merge commit below why missing v5.8 would be
        painful and I agreed to try a -rc2 pull because only cosmetics kept
        this out of -rc1 and his initial versions were posted in more than
        enough time for v5.8 consideration:
      
         'These patches are tied to specific features that were committed to
          customers in upcoming distros releases (RHEL and SLES) whose
          time-lines are tied to 5.8 kernel release.
      
          Being able to track the health of an nvdimm is critical for our
          customers that are running workloads leveraging papr-scm nvdimms.
          Missing the 5.8 kernel would mean missing the distro timelines and
          shifting forward the availability of this feature in distro kernels
          by at least 6 months'
      
        Summary:
      
         - Fix the visibility of the region 'align' attribute.
      
           The new unit tests for region alignment handling caught a corner
           case where the alignment cannot be specified if the region is
           converted from static to dynamic provisioning at runtime.
      
         - Add support for device health retrieval for the persistent memory
           supported by the papr_scm driver.
      
           This includes both the standard sysfs "health flags" that the nfit
           persistent memory driver publishes and a mechanism for the ndctl
           tool to retrieve a health-command payload"
      
      * tag 'libnvdimm-for-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        nvdimm/region: always show the 'align' attribute
        powerpc/papr_scm: Implement support for PAPR_PDSM_HEALTH
        ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods
        powerpc/papr_scm: Improve error logging and handling papr_scm_ndctl()
        powerpc/papr_scm: Fetch nvdimm health information from PHYP
        seq_buf: Export seq_buf_printf
        powerpc: Document details on H_SCM_HEALTH hcall
      eede2b9b
    • Sivaprakash Murugesan's avatar
      pinctrl: qcom: ipq6018 Add missing pins in qpic pin group · 7f5f4de8
      Sivaprakash Murugesan authored
      The patch adds missing qpic data pins to qpic pingroup. These pins are
      necessary for the qpic nand to work.
      
      Fixes: ef1ea54e ("pinctrl: qcom: Add ipq6018 pinctrl driver")
      Signed-off-by: default avatarSivaprakash Murugesan <sivaprak@codeaurora.org>
      Link: https://lore.kernel.org/r/1592541089-17700-1-git-send-email-sivaprak@codeaurora.orgSigned-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      7f5f4de8
    • Haibo Chen's avatar
      Revert "pinctrl: freescale: imx: Use 'devm_of_iomap()' to avoid a resource... · 13f2d25b
      Haibo Chen authored
      Revert "pinctrl: freescale: imx: Use 'devm_of_iomap()' to avoid a resource leak in case of error in 'imx_pinctrl_probe()'"
      
      This reverts commit ba403242.
      
      After commit 26d8cde5 ("pinctrl: freescale: imx: add shared
      input select reg support"). i.MX7D has two iomux controllers
      iomuxc and iomuxc-lpsr which share select_input register for
      daisy chain settings.
      If use 'devm_of_iomap()', when probe the iomuxc-lpsr, will call
      devm_request_mem_region() for the region <0x30330000-0x3033ffff>
      for the first time. Then, next time when probe the iomuxc, API
      devm_platform_ioremap_resource() will also use the API
      devm_request_mem_region() for the share region <0x30330000-0x3033ffff>
      again, then cause issue, log like below:
      
      [    0.179561] imx7d-pinctrl 302c0000.iomuxc-lpsr: initialized IMX pinctrl driver
      [    0.191742] imx7d-pinctrl 30330000.pinctrl: can't request region for resource [mem 0x30330000-0x3033ffff]
      [    0.191842] imx7d-pinctrl: probe of 30330000.pinctrl failed with error -16
      
      Fixes: ba403242 ("pinctrl: freescale: imx: Use 'devm_of_iomap()' to avoid a resource leak in case of error in 'imx_pinctrl_probe()'")
      Signed-off-by: default avatarHaibo Chen <haibo.chen@nxp.com>
      Reviewed-by: default avatarDong Aisheng <aisheng.dong@nxp.com>
      Link: https://lore.kernel.org/r/1591673223-1680-1-git-send-email-haibo.chen@nxp.comSigned-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      13f2d25b
    • Linus Torvalds's avatar
      Merge tag 's390-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 1566feea
      Linus Torvalds authored
      Pull s390 fixes from Vasily Gorbik:
      
       - a few ptrace fixes mostly for strace and seccomp_bpf kernel tests
         findings
      
       - cleanup unused pm callbacks in virtio ccw
      
       - replace kmalloc + memset with kzalloc in crypto
      
       - use $(LD) for vDSO linkage to make clang happy
      
       - fix vDSO clock_getres() to preserve the same behaviour as
         posix_get_hrtimer_res()
      
       - fix workqueue cpumask warning when NUMA=n and nr_node_ids=2
      
       - reduce SLSB writes during input processing, improve warnings and
         cleanup qdio_data usage in qdio
      
       - a few fixes to use scnprintf() instead of snprintf()
      
      * tag 's390-5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        s390: fix syscall_get_error for compat processes
        s390/qdio: warn about unexpected SLSB states
        s390/qdio: clean up usage of qdio_data
        s390/numa: let NODES_SHIFT depend on NEED_MULTIPLE_NODES
        s390/vdso: fix vDSO clock_getres()
        s390/vdso: Use $(LD) instead of $(CC) to link vDSO
        s390/protvirt: use scnprintf() instead of snprintf()
        s390: use scnprintf() in sys_##_prefix##_##_name##_show
        s390/crypto: use scnprintf() instead of snprintf()
        s390/zcrypt: use kzalloc
        s390/virtio: remove unused pm callbacks
        s390/qdio: reduce SLSB writes during Input Queue processing
        selftests/seccomp: s390 shares the syscall and return value register
        s390/ptrace: fix setting syscall number
        s390/ptrace: pass invalid syscall numbers to tracing
        s390/ptrace: return -ENOSYS when invalid syscall is supplied
        s390/seccomp: pass syscall arguments via seccomp_data
        s390/qdio: fine-tune SLSB update
      1566feea
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 7fdfbe08
      Linus Torvalds authored
      Pull RISC-V fixes from Palmer Dabbelt:
      
       - a workaround for a compiler surprise related to the "r" inline
         assembly that allows LLVM to boot.
      
       - a fix to avoid WX-only mappings, which the ISA does not allow. While
         this probably manifests in many ways, the bug was found in stress-ng.
      
       - a missing lock in set_direct_map_*(), which due to a recent lockdep
         change started asserting.
      
      * tag 'riscv-for-linus-5.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        RISC-V: Acquire mmap lock before invoking walk_page_range
        RISC-V: Don't allow write+exec only page mapping request in mmap
        riscv/atomic: Fix sign extension for RV64I
      7fdfbe08