1. 10 Mar, 2016 6 commits
  2. 23 Feb, 2016 15 commits
  3. 22 Feb, 2016 11 commits
    • Mike Snitzer's avatar
      dm: allocate blk_mq_tag_set rather than embed in mapped_device · 1c357a1e
      Mike Snitzer authored
      The blk_mq_tag_set is only needed for dm-mq support.  There is point
      wasting space in 'struct mapped_device' for non-dm-mq devices.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> # check kzalloc return
      1c357a1e
    • Mike Snitzer's avatar
      dm: add 'dm_mq_nr_hw_queues' and 'dm_mq_queue_depth' module params · faad87df
      Mike Snitzer authored
      Allow user to change these values via module params or sysfs.
      
      'dm_mq_nr_hw_queues' defaults to 1 (max 32).
      
      'dm_mq_queue_depth' defaults to 2048 (up from 64, which proved far too
      small under moderate sized workloads -- the dm-multipath device would
      continuously block waiting for tags (requests) to become available).
      The maximum is BLK_MQ_MAX_DEPTH (currently 10240).
      
      Keep in mind the total number of pre-allocated requests per
      request-based dm-mq device is 'dm_mq_nr_hw_queues' * 'dm_mq_queue_depth'
      (currently 2048).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      faad87df
    • Mike Snitzer's avatar
      dm: optimize dm_request_fn() · c91852ff
      Mike Snitzer authored
      DM multipath is the only request-based DM target -- which only supports
      tables with a single target that is immutable.  Leverage this fact in
      dm_request_fn().
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      c91852ff
    • Mike Snitzer's avatar
      dm: optimize dm_mq_queue_rq() · 16f12266
      Mike Snitzer authored
      DM multipath is the only dm-mq target.  But that aside, request-based DM
      only supports tables with a single target that is immutable.  Leverage
      this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in
      the mapped_device when the table was made active.  This saves the need
      to even take the read-side of the SRCU via dm_{get,put}_live_table.
      
      If the active DM table does not have an immutable target (e.g. "error"
      target was swapped in) then fallback to the slow-path where the target
      is looked up from the live table.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      16f12266
    • Mike Snitzer's avatar
      dm: set DM_TARGET_WILDCARD feature on "error" target · f083b09b
      Mike Snitzer authored
      The DM_TARGET_WILDCARD feature indicates that the "error" target may
      replace any target; even immutable targets.  This feature will be useful
      to preserve the ability to replace the "multipath" target even once it
      is formally converted over to having the DM_TARGET_IMMUTABLE feature.
      
      Also, implicit in the DM_TARGET_WILDCARD feature flag being set is that
      .map, .map_rq, .clone_and_map_rq and .release_clone_rq are all defined
      in the target_type.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      f083b09b
    • Mike Snitzer's avatar
      dm: cleanup dm_any_congested() · e522c039
      Mike Snitzer authored
      The request-based DM support for checking queue congestion doesn't
      require access to the live DM table.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      e522c039
    • Mike Snitzer's avatar
      dm: remove unused dm_get_rq_mapinfo() · ae6ad75e
      Mike Snitzer authored
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      ae6ad75e
    • Mike Snitzer's avatar
      dm: fix excessive dm-mq context switching · 6acfe68b
      Mike Snitzer authored
      Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
      than if an underlying null_blk device were used directly.  One of the
      reasons for this drop in performance is that blk_insert_clone_request()
      was calling blk_mq_insert_request() with @async=true.  This forced the
      use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
      which ushered in ping-ponging between process context (fio in this case)
      and kblockd's kworker to submit the cloned request.  The ftrace
      function_graph tracer showed:
      
        kworker-2013  =>   fio-12190
        fio-12190    =>  kworker-2013
        ...
        kworker-2013  =>   fio-12190
        fio-12190    =>  kworker-2013
        ...
      
      Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
      _not_ use kblockd to submit the cloned requests isn't enough to
      eliminate the observed context switches.
      
      In addition to this dm-mq specific blk-core fix, there are 2 DM core
      fixes to dm-mq that (when paired with the blk-core fix) completely
      eliminate the observed context switching:
      
      1)  don't blk_mq_run_hw_queues in blk-mq request completion
      
          Motivated by desire to reduce overhead of dm-mq, punting to kblockd
          just increases context switches.
      
          In my testing against a really fast null_blk device there was no benefit
          to running blk_mq_run_hw_queues() on completion (and no other blk-mq
          driver does this).  So hopefully this change doesn't induce the need for
          yet another revert like commit 621739b0 !
      
      2)  use blk_mq_complete_request() in dm_complete_request()
      
          blk_complete_request() doesn't offer the traditional q->mq_ops vs
          .request_fn branching pattern that other historic block interfaces
          do (e.g. blk_get_request).  Using blk_mq_complete_request() for
          blk-mq requests is important for performance.  It should be noted
          that, like blk_complete_request(), blk_mq_complete_request() doesn't
          natively handle partial completions -- but the request-based
          DM-multipath target does provide the required partial completion
          support by dm.c:end_clone_bio() triggering requeueing of the request
          via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.
      
      dm-mq fix #2 is _much_ more important than #1 for eliminating the
      context switches.
      Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
      After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472
      
      With these changes multithreaded async read IOPs improved from ~950K
      to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
      IOPs of the underlying null_blk device for the same workload is ~1950K.
      
      Fixes: 7fb4898e ("block: add blk-mq support to blk_insert_cloned_request()")
      Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
      Cc: stable@vger.kernel.org # 4.1+
      Reported-by: default avatarSagi Grimberg <sagig@dev.mellanox.co.il>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      6acfe68b
    • Mike Snitzer's avatar
      dm: fix sparse "unexpected unlock" warnings in ioctl code · 956a4025
      Mike Snitzer authored
      Rename dm_get_live_table_for_ioctl to dm_grab_bdev_for_ioctl and have it
      do the dm_{get,put}_live_table() rather than split those operations.
      
      The dm_grab_bdev_for_ioctl() callers only care about the block_device
      associated with a singleton DM device so there isn't any need to retain
      a reference to the live DM table.  It is sufficient to:
      1) dm_get_live_table()
      2) bdgrab() the bdev associated with the singleton table's target
      3) dm_put_live_table()
      4) bdput() the bdev
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      956a4025
    • Mike Snitzer's avatar
      dm: do not return target from dm_get_live_table_for_ioctl() · 66482026
      Mike Snitzer authored
      None of the callers actually used the returned target.
      Also, just reuse bdev pointer passed to dm_blk_ioctl().
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      66482026
    • Mike Snitzer's avatar
      dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq paths · 4328daa2
      Mike Snitzer authored
      Using request-based DM mpath configured with the following stacking
      (.request_fn DM mpath ontop of scsi-mq paths):
      
      echo Y > /sys/module/scsi_mod/parameters/use_blk_mq
      echo N > /sys/module/dm_mod/parameters/use_blk_mq
      
      'struct dm_rq_target_io' would leak if a request is requeued before a
      blk-mq clone is allocated (or fails to allocate).  free_rq_tio()
      wasn't being called.
      
      kmemleak reported:
      
      unreferenced object 0xffff8800b90b98c0 (size 112):
        comm "kworker/7:1H", pid 5692, jiffies 4295056109 (age 78.589s)
        hex dump (first 32 bytes):
          00 d0 5c 2c 03 88 ff ff 40 00 bf 01 00 c9 ff ff  ..\,....@.......
          e0 d9 b1 34 00 88 ff ff 00 00 00 00 00 00 00 00  ...4............
        backtrace:
          [<ffffffff81672b6e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811dbb63>] kmem_cache_alloc+0xc3/0x1e0
          [<ffffffff8117eae5>] mempool_alloc_slab+0x15/0x20
          [<ffffffff8117ec1e>] mempool_alloc+0x6e/0x170
          [<ffffffffa00029ac>] dm_old_prep_fn+0x3c/0x180 [dm_mod]
          [<ffffffff812fbd78>] blk_peek_request+0x168/0x290
          [<ffffffffa0003e62>] dm_request_fn+0xb2/0x1b0 [dm_mod]
          [<ffffffff812f66e3>] __blk_run_queue+0x33/0x40
          [<ffffffff812f9585>] blk_delay_work+0x25/0x40
          [<ffffffff81096fff>] process_one_work+0x14f/0x3d0
          [<ffffffff81097715>] worker_thread+0x125/0x4b0
          [<ffffffff8109ce88>] kthread+0xd8/0xf0
          [<ffffffff8167cb8f>] ret_from_fork+0x3f/0x70
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      crash> struct -o dm_rq_target_io
      struct dm_rq_target_io {
          ...
      }
      SIZE: 112
      
      Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
      Cc: stable@vger.kernel.org # 4.0+
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      4328daa2
  4. 20 Feb, 2016 8 commits
    • Linus Torvalds's avatar
      Linux 4.5-rc5 · 81f70ba2
      Linus Torvalds authored
      81f70ba2
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0389075e
      Linus Torvalds authored
      Pull x86 fixes from Ingo Molnar:
       "This is unusually large, partly due to the EFI fixes that prevent
        accidental deletion of EFI variables through efivarfs that may brick
        machines.  These fixes are somewhat involved to maintain compatibility
        with existing install methods and other usage modes, while trying to
        turn off the 'rm -rf' bricking vector.
      
        Other fixes are for large page ioremap()s and for non-temporal
        user-memcpy()s"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/mm: Fix vmalloc_fault() to handle large pages properly
        hpet: Drop stale URLs
        x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()
        x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable
        lib/ucs2_string: Correct ucs2 -> utf8 conversion
        efi: Add pstore variables to the deletion whitelist
        efi: Make efivarfs entries immutable by default
        efi: Make our variable validation list include the guid
        efi: Do variable name validation tests in utf8
        efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
        lib/ucs2_string: Add ucs2 -> utf8 helper functions
      0389075e
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 06b74c65
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "A handful of CPU hotplug related fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: Plug potential memory leak in CPU_UP_PREPARE
        perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state
        perf/core: Remove bogus UP_CANCELED hotplug state
        perf/x86/amd/uncore: Plug reference leak
      06b74c65
    • Linus Torvalds's avatar
      Merge tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · e6a1c1e9
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
       - Fix build error on 32-bit with checkpoint restart from Aneesh Kumar
       - Fix dedotify for binutils >= 2.26 from Andreas Schwab
       - Don't trace hcalls on offline CPUs from Denis Kirjanov
       - eeh: Fix stale cached primary bus from Gavin Shan
       - eeh: Fix stale PE primary bus from Gavin Shan
       - mm: Fix Multi hit ERAT cause by recent THP update from Aneesh Kumar K.V
       - ioda: Set "read" permission when "write" is set from Alexey Kardashevskiy
      
      * tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/ioda: Set "read" permission when "write" is set
        powerpc/mm: Fix Multi hit ERAT cause by recent THP update
        powerpc/powernv: Fix stale PE primary bus
        powerpc/eeh: Fix stale cached primary bus
        powerpc/pseries: Don't trace hcalls on offline CPUs
        powerpc: Fix dedotify for binutils >= 2.26
        powerpc/book3s_32: Fix build error with checkpoint restart
      e6a1c1e9
    • Linus Torvalds's avatar
      Merge tag 'dmaengine-fix-4.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma · da6b7366
      Linus Torvalds authored
      Pull dmaengine fixes from Vinod Koul:
       "A few fixes for drivers, nothing major here.
      
        Fixes are: iotdma fix to restart channels, new ID for wildcat PCH,
        residue fix for edma, disable irq for non-cyclic in dw"
      
      * tag 'dmaengine-fix-4.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: dw: disable BLOCK IRQs for non-cyclic xfer
        dmaengine: edma: fix residue race for cyclic
        dmaengine: dw: pci: add ID for WildcatPoint PCH
        dmaengine: IOATDMA: fix timer code that continues to restart channels during idle
      da6b7366
    • Linus Torvalds's avatar
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 37aa4dac
      Linus Torvalds authored
      Pull clk driver fixes from Stephen Boyd:
       "An assortment of vendor specific clk drivers fixes, most notably
        fallout from adding Tegra210 and rockchip rk3036/rk3368 drivers this
        cycle.
      
        There's also the random smattering of sparse/checker fixes, a build
        "fix" to get the Tango clk driver to compile because the Kconfig
        symbol was renamed after the fact, and a clk gpio fix for a patch
        mismerge"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (28 commits)
        clk: gpio: Really allow an optional clock= DT property
        Revert "clk: qcom: Specify LE device endianness"
        clk: versatile: mask VCO bits before writing
        clk: tegra: super: Fix sparse warnings for functions not declared as static
        clk: tegra: Fix sparse warnings for functions not declared as static
        clk: tegra: Fix sparse warning for pll_m
        clk: tegra: Use definition for pll_u override bit
        clk: tegra: Fix warning caused by pll_u failing to lock
        clk: tegra: Fix clock sources for Tegra210 EMC
        clk: tegra: Add the APB2APE audio clock on Tegra210
        clk: tegra: Add missing of_node_put()
        clk: tegra: Fix PLLE SS coefficients
        clk: tegra: Fix typos around clearing PLLE bits during enable
        clk: tegra: Do not disable PLLE when under hardware control
        clk: tegra: Fix pllx dyn step calculation
        clk: tegra: pll: Fix potential sleeping-while-atomic
        clk: tegra: Fix the misnaming of nvenc from msenc
        clk: tegra: Fix naming of MISC registers
        clk: tango4: rename ARCH_TANGOX to ARCH_TANGO
        clk: scpi: Fix checking return value of platform_device_register_simple()
        ...
      37aa4dac
    • Linus Torvalds's avatar
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · a703f42d
      Linus Torvalds authored
      Pull more drm fixes from Dave Airlie:
       "Some more fixes trickled in:
      
        A bunch of VC4 ones since it's a pretty new driver not much chance of
        regressions, and it fixes GPU resets.
      
        Also one atomic fix, one set of fixes for a common bug in TTM cleanup,
        and one i915 hotplug fix"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/nouveau: use post-decrement in error handling
        drm/atomic: Allow for holes in connector state, v2.
        drm/i915: Fix hpd live status bits for g4x
        drm/vc4: Use runtime PM to power cycle the device when the GPU hangs.
        drm/vc4: Enable runtime PM.
        drm/vc4: Fix spurious GPU resets due to BO reuse.
        drm/vc4: Drop error message on seqno wait timeouts.
        drm/vc4: Fix -ERESTARTSYS error return from BO waits.
        drm/vc4: Return an ERR_PTR from BO creation instead of NULL.
        drm/vc4: Fix the clear color for the first tile rendered.
        drm/vc4: Validate that WAIT_BO padding is cleared.
        drm/radeon: use post-decrement in error handling
        drm/amdgpu: use post-decrement in error handling
      a703f42d
    • Simon Guinot's avatar
      kernel/resource.c: fix muxed resource handling in __request_region() · 59ceeaaf
      Simon Guinot authored
      In __request_region, if a conflict with a BUSY and MUXED resource is
      detected, then the caller goes to sleep and waits for the resource to be
      released.  A pointer on the conflicting resource is kept.  At wake-up
      this pointer is used as a parent to retry to request the region.
      
      A first problem is that this pointer might well be invalid (if for
      example the conflicting resource have already been freed).  Another
      problem is that the next call to __request_region() fails to detect a
      remaining conflict.  The previously conflicting resource is passed as a
      parameter and __request_region() will look for a conflict among the
      children of this resource and not at the resource itself.  It is likely
      to succeed anyway, even if there is still a conflict.
      
      Instead, the parent of the conflicting resource should be passed to
      __request_region().
      
      As a fix, this patch doesn't update the parent resource pointer in the
      case we have to wait for a muxed region right after.
      Reported-and-tested-by: default avatarVincent Pelletier <plr.vincent@gmail.com>
      Signed-off-by: default avatarSimon Guinot <simon.guinot@sequanux.org>
      Tested-by: default avatarVincent Donnefort <vdonnefort@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      59ceeaaf