1. 30 Nov, 2018 2 commits
    • Ming Lei's avatar
      block: fix single range discard merge · 2a5cf35c
      Ming Lei authored
      There are actually two kinds of discard merge:
      
      - one is the normal discard merge, just like normal read/write request,
      and call it single-range discard
      
      - another is the multi-range discard, queue_max_discard_segments(rq->q) > 1
      
      For the former case, queue_max_discard_segments(rq->q) is 1, and we
      should handle this kind of discard merge like the normal read/write
      request.
      
      This patch fixes the following kernel panic issue[1], which is caused by
      not removing the single-range discard request from elevator queue.
      
      Guangwu has one raid discard test case, in which this issue is a bit
      easier to trigger, and I verified that this patch can fix the kernel
      panic issue in Guangwu's test case.
      
      [1] kernel panic log from Jens's report
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000148
       PGD 0 P4D 0.
       Oops: 0000 [#1] SMP PTI
       CPU: 37 PID: 763 Comm: kworker/37:1H Not tainted \
      4.20.0-rc3-00649-ge64d9a554a91-dirty #14  Hardware name: Wiwynn \
      Leopard-Orv2/Leopard-DDR BW, BIOS LBM08   03/03/2017       Workqueue: kblockd \
      blk_mq_run_work_fn                                            RIP: \
      0010:blk_mq_get_driver_tag+0x81/0x120                                       Code: 24 \
      10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 00 00 00 \
      0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 20 72 37 \
      f6 87 b0 00 00 00 02  RSP: 0018:ffffc90004aabd30 EFLAGS: 00010246                     \
        RAX: 0000000000000003 RBX: ffff888465ea1300 RCX: ffffc90004aabde8
       RDX: 00000000ffffffff RSI: ffffc90004aabde8 RDI: 0000000000000000
       RBP: 0000000000000000 R08: ffff888465ea1348 R09: 0000000000000000
       R10: 0000000000001000 R11: 00000000ffffffff R12: ffff888465ea1300
       R13: 0000000000000000 R14: ffff888465ea1348 R15: ffff888465d10000
       FS:  0000000000000000(0000) GS:ffff88846f9c0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000148 CR3: 000000000220a003 CR4: 00000000003606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        blk_mq_dispatch_rq_list+0xec/0x480
        ? elv_rb_del+0x11/0x30
        blk_mq_do_dispatch_sched+0x6e/0xf0
        blk_mq_sched_dispatch_requests+0xfa/0x170
        __blk_mq_run_hw_queue+0x5f/0xe0
        process_one_work+0x154/0x350
        worker_thread+0x46/0x3c0
        kthread+0xf5/0x130
        ? process_one_work+0x350/0x350
        ? kthread_destroy_worker+0x50/0x50
        ret_from_fork+0x1f/0x30
       Modules linked in: sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel \
      kvm switchtec irqbypass iTCO_wdt iTCO_vendor_support efivars cdc_ether usbnet mii \
      cdc_acm i2c_i801 lpc_ich mfd_core ipmi_si ipmi_devintf ipmi_msghandler acpi_cpufreq \
      button sch_fq_codel nfsd nfs_acl lockd grace auth_rpcgss oid_registry sunrpc nvme \
      nvme_core fuse sg loop efivarfs autofs4  CR2: 0000000000000148                        \
      
       ---[ end trace 340a1fb996df1b9b ]---
       RIP: 0010:blk_mq_get_driver_tag+0x81/0x120
       Code: 24 10 48 89 7c 24 20 74 21 83 fa ff 0f 95 c0 48 8b 4c 24 28 65 48 33 0c 25 28 \
      00 00 00 0f 85 96 00 00 00 48 83 c4 30 5b 5d c3 <48> 8b 87 48 01 00 00 8b 40 04 39 43 \
      20 72 37 f6 87 b0 00 00 00 02
      
      Fixes: 445251d0 ("blk-mq: fix discard merge with scheduler attached")
      Reported-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: Guangwu Zhang <guazhang@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2a5cf35c
    • Maximilian Heyne's avatar
      fs: fix lost error code in dio_complete · 41e817bc
      Maximilian Heyne authored
      commit e2592217 ("fs: simplify the
      generic_write_sync prototype") reworked callers of generic_write_sync(),
      and ended up dropping the error return for the directio path. Prior to
      that commit, in dio_complete(), an error would be bubbled up the stack,
      but after that commit, errors passed on to dio_complete were eaten up.
      
      This was reported on the list earlier, and a fix was proposed in
      https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but
      never followed up with.  We recently hit this bug in our testing where
      fencing io errors, which were previously erroring out with EIO, were
      being returned as success operations after this commit.
      
      The fix proposed on the list earlier was a little short -- it would have
      still called generic_write_sync() in case `ret` already contained an
      error. This fix ensures generic_write_sync() is only called when there's
      no pending error in the write. Additionally, transferred is replaced
      with ret to bring this code in line with other callers.
      
      Fixes: e2592217 ("fs: simplify the generic_write_sync prototype")
      Reported-by: default avatarRavi Nankani <rnankani@amazon.com>
      Signed-off-by: default avatarMaximilian Heyne <mheyne@amazon.de>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      CC: Torsten Mehlan <tomeh@amazon.de>
      CC: Uwe Dannowski <uwed@amazon.de>
      CC: Amit Shah <aams@amazon.de>
      CC: David Woodhouse <dwmw@amazon.co.uk>
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      41e817bc
  2. 21 Nov, 2018 1 commit
  3. 15 Nov, 2018 1 commit
    • James Smart's avatar
      nvme-fc: resolve io failures during connect · 4cff280a
      James Smart authored
      If an io error occurs on an io issued while connecting, recovery
      of the io falls flat as the state checking ends up nooping the error
      handler.
      
      Create an err_work work item that is scheduled upon an io error while
      connecting. The work thread terminates all io on all queues and marks
      the queues as not connected.  The termination of the io will return
      back to the callee, which will then back out of the connection attempt
      and will reschedule, if possible, the connection attempt.
      
      The changes:
      - in case there are several commands hitting the error handler, a
        state flag is kept so that the error work is only scheduled once,
        on the first error. The subsequent errors can be ignored.
      - The calling sequence to stop keep alive and terminate the queues
        and their io is lifted from the reset routine. Made a small
        service routine used by both reset and err_work.
      - During debugging, found that the teardown path can reference
        an uninitialized pointer, resulting in a NULL pointer oops.
        The aen_ops weren't initialized yet. Add validation on their
        initialization before calling the teardown routine.
      Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      4cff280a
  4. 14 Nov, 2018 2 commits
    • Ming Lei's avatar
      SCSI: fix queue cleanup race before queue initialization is done · 8dc765d4
      Ming Lei authored
      c2856ae2 ("blk-mq: quiesce queue before freeing queue") has
      already fixed this race, however the implied synchronize_rcu()
      in blk_mq_quiesce_queue() can slow down LUN probe a lot, so caused
      performance regression.
      
      Then 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
      tried to quiesce queue for avoiding unnecessary synchronize_rcu()
      only when queue initialization is done, because it is usual to see
      lots of inexistent LUNs which need to be probed.
      
      However, turns out it isn't safe to quiesce queue only when queue
      initialization is done. Because when one SCSI command is completed,
      the user of sending command can be waken up immediately, then the
      scsi device may be removed, meantime the run queue in scsi_end_request()
      is still in-progress, so kernel panic can be caused.
      
      In Red Hat QE lab, there are several reports about this kind of kernel
      panic triggered during kernel booting.
      
      This patch tries to address the issue by grabing one queue usage
      counter during freeing one request and the following run queue.
      
      Fixes: 1311326c ("blk-mq: avoid to synchronize rcu inside blk_cleanup_queue()")
      Cc: Andrew Jones <drjones@redhat.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
      Cc: stable <stable@vger.kernel.org>
      Cc: jianchao.wang <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8dc765d4
    • Dave Chinner's avatar
      block: fix 32 bit overflow in __blkdev_issue_discard() · 4800bf7b
      Dave Chinner authored
      A discard cleanup merged into 4.20-rc2 causes fstests xfs/259 to
      fall into an endless loop in the discard code. The test is creating
      a device that is exactly 2^32 sectors in size to test mkfs boundary
      conditions around the 32 bit sector overflow region.
      
      mkfs issues a discard for the entire device size by default, and
      hence this throws a sector count of 2^32 into
      blkdev_issue_discard(). It takes the number of sectors to discard as
      a sector_t - a 64 bit value.
      
      The commit ba5d7385 ("block: cleanup __blkdev_issue_discard")
      takes this sector count and casts it to a 32 bit value before
      comapring it against the maximum allowed discard size the device
      has. This truncates away the upper 32 bits, and so if the lower 32
      bits of the sector count is zero, it starts issuing discards of
      length 0. This causes the code to fall into an endless loop, issuing
      a zero length discards over and over again on the same sector.
      
      Fixes: ba5d7385 ("block: cleanup __blkdev_issue_discard")
      Tested-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      
      Killed pointless WARN_ON().
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4800bf7b
  5. 12 Nov, 2018 3 commits
  6. 10 Nov, 2018 1 commit
    • Jens Axboe's avatar
      floppy: fix race condition in __floppy_read_block_0() · de7b75d8
      Jens Axboe authored
      LKP recently reported a hang at bootup in the floppy code:
      
      [  245.678853] INFO: task mount:580 blocked for more than 120 seconds.
      [  245.679906]       Tainted: G                T 4.19.0-rc6-00172-ga9f38e1d #1
      [  245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  245.682181] mount           D 6372   580      1 0x00000004
      [  245.683023] Call Trace:
      [  245.683425]  __schedule+0x2df/0x570
      [  245.683975]  schedule+0x2d/0x80
      [  245.684476]  schedule_timeout+0x19d/0x330
      [  245.685090]  ? wait_for_common+0xa5/0x170
      [  245.685735]  wait_for_common+0xac/0x170
      [  245.686339]  ? do_sched_yield+0x90/0x90
      [  245.686935]  wait_for_completion+0x12/0x20
      [  245.687571]  __floppy_read_block_0+0xfb/0x150
      [  245.688244]  ? floppy_resume+0x40/0x40
      [  245.688844]  floppy_revalidate+0x20f/0x240
      [  245.689486]  check_disk_change+0x43/0x60
      [  245.690087]  floppy_open+0x1ea/0x360
      [  245.690653]  __blkdev_get+0xb4/0x4d0
      [  245.691212]  ? blkdev_get+0x1db/0x370
      [  245.691777]  blkdev_get+0x1f3/0x370
      [  245.692351]  ? path_put+0x15/0x20
      [  245.692871]  ? lookup_bdev+0x4b/0x90
      [  245.693539]  blkdev_get_by_path+0x3d/0x80
      [  245.694165]  mount_bdev+0x2a/0x190
      [  245.694695]  squashfs_mount+0x10/0x20
      [  245.695271]  ? squashfs_alloc_inode+0x30/0x30
      [  245.695960]  mount_fs+0xf/0x90
      [  245.696451]  vfs_kern_mount+0x43/0x130
      [  245.697036]  do_mount+0x187/0xc40
      [  245.697563]  ? memdup_user+0x28/0x50
      [  245.698124]  ksys_mount+0x60/0xc0
      [  245.698639]  sys_mount+0x19/0x20
      [  245.699167]  do_int80_syscall_32+0x61/0x130
      [  245.699813]  entry_INT80_32+0xc7/0xc7
      
      showing that we never complete that read request. The reason is that
      the completion setup is racy - it initializes the completion event
      AFTER submitting the IO, which means that the IO could complete
      before/during the init. If it does, we are passing garbage to
      complete() and we may sleep forever waiting for the event to
      occur.
      
      Fixes: 7b7b68bb ("floppy: bail out in open() if drive is not responding to block0 read")
      Reviewed-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      de7b75d8
  7. 09 Nov, 2018 10 commits
  8. 08 Nov, 2018 12 commits
  9. 07 Nov, 2018 8 commits
    • Keith Busch's avatar
      block: Clear kernel memory before copying to user · f3587d76
      Keith Busch authored
      If the kernel allocates a bounce buffer for user read data, this memory
      needs to be cleared before copying it to the user, otherwise it may leak
      kernel memory to user space.
      
      Laurence Oberman <loberman@redhat.com>
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f3587d76
    • Geert Uytterhoeven's avatar
      MAINTAINERS: Fix remaining pointers to obsolete libata.git · e31d36b0
      Geert Uytterhoeven authored
      libata.git no longer exists.  Replace the remaining pointers to it by
      pointers to the block tree, which is where all libata development
      happens now.
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e31d36b0
    • Jens Axboe's avatar
      ubd: fix missing lock around request issue · 6961cd4d
      Jens Axboe authored
      We need to hold the device lock (and disable interrupts) while
      writing new commands, or we could be interrupted while that
      is happening and read invalid requests in the completion path.
      
      Fixes: 4e6da0fe ("um: Convert ubd driver to blk-mq")
      Tested-by: default avatarRichard Weinberger <richard@nod.at>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6961cd4d
    • Geert Uytterhoeven's avatar
      Documentation: ABI: led-trigger-pattern: Fix typos · 406e7f98
      Geert Uytterhoeven authored
        - Spelling s/brigntess/brightness/,
        - Double "use".
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: default avatarPavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarJacek Anaszewski <jacek.anaszewski@gmail.com>
      406e7f98
    • Baolin Wang's avatar
      leds: trigger: Fix sleeping function called from invalid context · 3a40cfe8
      Baolin Wang authored
      We will meet below issue due to mutex_lock() is called in interrupt context.
      The mutex lock is used to protect the pattern trigger data, but before changing
      new pattern trigger data (pattern values or repeat value) by users, we always
      cancel the timer firstly to clear previous patterns' performance. That means
      there is no race in pattern_trig_timer_function(), so we can drop the mutex
      lock in pattern_trig_timer_function() to avoid this issue.
      
      Moreover we can move the timer cancelling into mutex protection, since there
      is no deadlock risk if we remove the mutex lock in pattern_trig_timer_function().
      
      BUG: sleeping function called from invalid context at kernel/locking/mutex.c:254
      in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted
      4.20.0-rc1-koelsch-00841-ga338c8181013c1a9 #171
      Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
      [<c020f19c>] (unwind_backtrace) from [<c020aecc>] (show_stack+0x10/0x14)
      [<c020aecc>] (show_stack) from [<c07affb8>] (dump_stack+0x7c/0x9c)
      [<c07affb8>] (dump_stack) from [<c02417d4>] (___might_sleep+0xf4/0x158)
      [<c02417d4>] (___might_sleep) from [<c07c92c4>] (mutex_lock+0x18/0x60)
      [<c07c92c4>] (mutex_lock) from [<c067b28c>] (pattern_trig_timer_function+0x1c/0x11c)
      [<c067b28c>] (pattern_trig_timer_function) from [<c027f6fc>] (call_timer_fn+0x1c/0x90)
      [<c027f6fc>] (call_timer_fn) from [<c027f944>] (expire_timers+0x94/0xa4)
      [<c027f944>] (expire_timers) from [<c027fc98>] (run_timer_softirq+0x108/0x15c)
      [<c027fc98>] (run_timer_softirq) from [<c02021cc>] (__do_softirq+0x1d4/0x258)
      [<c02021cc>] (__do_softirq) from [<c0224d24>] (irq_exit+0x64/0xc4)
      [<c0224d24>] (irq_exit) from [<c0268dd0>] (__handle_domain_irq+0x80/0xb4)
      [<c0268dd0>] (__handle_domain_irq) from [<c045e1b0>] (gic_handle_irq+0x58/0x90)
      [<c045e1b0>] (gic_handle_irq) from [<c02019f8>] (__irq_svc+0x58/0x74)
      Exception stack(0xeb483f60 to 0xeb483fa8)
      3f60: 00000000 00000000 eb9afaa0 c0217e80 00000000 ffffe000 00000000 c0e06408
      3f80: 00000002 c0e0647c c0c6a5f0 00000000 c0e04900 eb483fb0 c0207ea8 c0207e98
      3fa0: 60020013 ffffffff
      [<c02019f8>] (__irq_svc) from [<c0207e98>] (arch_cpu_idle+0x1c/0x38)
      [<c0207e98>] (arch_cpu_idle) from [<c0247ca8>] (do_idle+0x138/0x268)
      [<c0247ca8>] (do_idle) from [<c0248050>] (cpu_startup_entry+0x18/0x1c)
      [<c0248050>] (cpu_startup_entry) from [<402022ec>] (0x402022ec)
      
      Fixes: 5fd752b6 ("leds: core: Introduce LED pattern trigger")
      Signed-off-by: default avatarBaolin Wang <baolin.wang@linaro.org>
      Reported-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarJacek Anaszewski <jacek.anaszewski@gmail.com>
      3a40cfe8
    • Johannes Thumshirn's avatar
      block: respect virtual boundary mask in bvecs · df376b2e
      Johannes Thumshirn authored
      With drivers that are settting a virtual boundary constrain, we are
      seeing a lot of bio splitting and smaller I/Os being submitted to the
      driver.
      
      This happens because the bio gap detection code does not account cases
      where PAGE_SIZE - 1 is bigger than queue_virt_boundary() and thus will
      split the bio unnecessarily.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Acked-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      df376b2e
    • Linus Torvalds's avatar
      Merge tag 'hwmon-for-v4.20-rc2' of... · 85758777
      Linus Torvalds authored
      Merge tag 'hwmon-for-v4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
      
      Pull hwmon fixes from Guenter Roeck:
      
       - Remove bogus __init annotations in ibmpowernv driver
      
       - Fix double-free in error handling of __hwmon_device_register()
      
      * tag 'hwmon-for-v4.20-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
        hwmon: (ibmpowernv) Remove bogus __init annotations
        hwmon: (core) Fix double-free in __hwmon_device_register()
      85758777
    • Linus Torvalds's avatar
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · e09d51ad
      Linus Torvalds authored
      Pull ARM SoC fixes from Olof Johansson:
       "A few more fixes that have come in, and one revert of a previous fix.
      
        I was a bit too trigger happy to enable PREEMPT on multi_v7_defconfig,
        and it ended up regressing at least BeagleBone XM boards. While we get
        that debugged for next merge window, let's disable it again.
      
        Beyond that:
      
         - Stratix change to fix multicast filtering
      
         - Minor DT fixes for Renesas and i.MX
      
         - Ethernet fix for a Renesas board (switching main interfaces)
      
         - Ethernet phy regulator fix for i.MX6SX"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        arm64: dts: stratix10: fix multicast filtering
        ARM: defconfig: Disable PREEMPT again on  multi_v7
        arm64: dts: renesas: condor: switch from EtherAVB to GEther
        dt-bindings: arm: Fix RZ/G2E part number
        arm64: dts: renesas: r8a7795: add missing dma-names on hscif2
        ARM: dts: imx6sx-sdb: Fix enet phy regulator
        ARM: dts: fsl: Fix improperly quoted stdout-path values
        ARM: dts: imx6sll: fix typo for fsl,imx6sll-i2c node
      e09d51ad