1. 10 Mar, 2017 4 commits
    • Linus Torvalds's avatar
      Merge tag 'xfs-4.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 9db61d6f
      Linus Torvalds authored
      Pull xfs fixes from Darrick Wong:
       "Here are some bug fixes for -rc2 to clean up the copy on write
        handling and to remove a cause of hangs.
      
         - Fix various iomap bugs
      
         - Fix overly aggressive CoW preallocation garbage collection
      
         - Fixes to CoW endio error handling
      
         - Fix some incorrect geometry calculations
      
         - Remove a potential system hang in bulkstat
      
         - Try to allocate blocks more aggressively to reduce ENOSPC errors"
      
      * tag 'xfs-4.11-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: try any AG when allocating the first btree block when reflinking
        xfs: use iomap new flag for newly allocated delalloc blocks
        xfs: remove kmem_zalloc_greedy
        xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask
        xfs: fix and streamline error handling in xfs_end_io
        xfs: only reclaim unwritten COW extents periodically
        iomap: invalidate page caches should be after iomap_dio_complete() in direct write
      9db61d6f
    • Linus Torvalds's avatar
      Merge tag 'gcc-plugins-v4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 794fe789
      Linus Torvalds authored
      Pull gcc-plugins fix from Kees Cook:
       "Fixes a typo in sancov plugin, exposed in earlier compiler versions"
      
      * tag 'gcc-plugins-v4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        gcc-plugins: fix sancov_plugin for gcc-5
      794fe789
    • Linus Torvalds's avatar
      Merge tag 'pm-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · c1aa905a
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix several issues in the intel_pstate driver and one issue in
        the schedutil cpufreq governor, clean up that governor a bit and hook
        up existing code for disabling cpufreq to a new kernel command line
        option.
      
        Specifics:
      
         - Three fixes for intel_pstate problems related to the passive mode
           (in which it acts as a regular cpufreq scaling driver), two for the
           handling of global P-state limits and one for the handling of the
           cpu_frequency tracepoint in that mode (Rafael Wysocki).
      
         - Three fixes for the handling of P-state limits in intel_pstate in
           the active mode (Rafael Wysocki).
      
         - Introduction of a new cpufreq.off=1 kernel command line argument
           that will disable cpufreq entirely if passed to the kernel and is
           simply hooked up to the existing code used by Xen (Len Brown).
      
         - Fix for the schedutil cpufreq governor to prevent it from using
           stale raw frequency values in configurations with mutiple CPUs
           sharing one policy object and a cleanup for it reducing its
           overhead slightly (Viresh Kumar)"
      
      * tag 'pm-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy
        cpufreq: intel_pstate: Fix intel_pstate_verify_policy()
        cpufreq: intel_pstate: Fix global settings in active mode
        cpufreq: Add the "cpufreq.off=1" cmdline option
        cpufreq: schedutil: Pass sg_policy to get_next_freq()
        cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
        cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily
        cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy()
        cpufreq: intel_pstate: Do not use performance_limits in passive mode
      c1aa905a
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 144c7666
      Linus Torvalds authored
      Pull PCI fixes from Bjorn Helgaas:
       "PCI fixes:
      
         - fix NULL pointer dereference in Exynos driver
      
         - fix NULL pointer dereference in ASPM with pre-1.1 PCIe devices
      
         - blacklist QLogic ISP2722 to prevent panics while reading VPD"
      
      * tag 'pci-v4.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/ASPM: Always set link->downstream to avoid NULL dereference on remove
        PCI: Prevent VPD access for QLogic ISP2722
        PCI: exynos: Initialize elbi_base even when using PHY framework
      144c7666
  2. 09 Mar, 2017 5 commits
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 34bbce9e
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
       "Sending this a bit sooner than I otherwise would have, as a fix in the
        merge window had some unfortunate issues and side effects for some
        folks.
      
        This contains:
      
         - Fixes from Jan for the bdi registration/unregistration. These have
           been tested by the various parties reporting issues, and should be
           solid at this point.
      
         - Also from Jan, fix for axonram gendisk registration.
      
         - A stable fix for zram from Johannes.
      
         - A small series from Ming, fixing up some long standing issues with
           blk-mq hardware queue kobject initialization and registration.
      
         - A fix for sed opal from Jon, fixing a nonsensical range check and
           some set-but-not-used variables.
      
         - A fix from Neil for a long standing deadlock issue for stacking
           device drivers. With this in place, dm/md don't have to work around
           the issue anymore, and can be properly fixed up"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        axonram: Fix gendisk handling
        blk: improve order of bio handling in generic_make_request()
        Revert "scsi, block: fix duplicate bdi name registration crashes"
        block: Make del_gendisk() safer for disks without queues
        bdi: Fix use-after-free in wb_congested_put()
        block: Allow bdi re-registration
        block/sed: Fix opal user range check and unused variables
        zram: set physical queue limits to avoid array out of bounds accesses
        blk-mq: free hctx->cpumask in release handler of hctx's kobject
        blk-mq: make lifetime consistent between hctx and its kobject
        blk-mq: make lifetime consitent between q/ctx and its kobject
        blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()
      34bbce9e
    • Linus Torvalds's avatar
      Merge tag 'media/v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · bb61ce54
      Linus Torvalds authored
      Pull media fixes from Mauro Carvalho Chehab:
       "Media regression fixes:
      
         - serial_ir: fix a Kernel crash during boot on Kernel 4.11-rc1, due
           to an IRQ code called too early
      
         - other IR regression fixes at lirc and at the raw IR decoding
      
         - a deadlock fix at the RC nuvoton driver
      
         - fix another issue with DMA on stack at dw2102 driver
      
        There's an extra patch there that change a driver interface for the
        SoC VSP1 driver, with is shared between the DRM and V4L2 driver. The
        patch itself is trivial, and was acked by David Arlie"
      
      * tag 'media/v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        [media] v4l: vsp1: Adapt vsp1_du_setup_lif() interface to use a structure
        [media] dw2102: don't do DMA on stack
        [media] rc: protocol is not set on register for raw IR devices
        [media] rc: raw decoder for keymap protocol is not loaded on register
        [media] rc: nuvoton: fix deadlock in nvt_write_wakeup_codes
        [media] lirc: fix dead lock between open and wakeup_filter
        [media] serial_ir: ensure we're ready to receive interrupts
      bb61ce54
    • Linus Torvalds's avatar
      Merge tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · cb2113cb
      Linus Torvalds authored
      Pull xen fix and cleanup from Juergen Gross:
       "This contains one fix for MSIX handling under Xen and a trivial
        cleanup patch"
      
      * tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xenbus: Remove duplicate inclusion of linux/init.h
        xen: do not re-use pirq number cached in pci device msi msg data
      cb2113cb
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-cpufreq-sched' · 32d3b06a
      Rafael J. Wysocki authored
      * pm-cpufreq-sched:
        cpufreq: schedutil: Pass sg_policy to get_next_freq()
        cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
      32d3b06a
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-cpufreq' · fd8e57d5
      Rafael J. Wysocki authored
      * pm-cpufreq:
        cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy
        cpufreq: intel_pstate: Fix intel_pstate_verify_policy()
        cpufreq: intel_pstate: Fix global settings in active mode
        cpufreq: Add the "cpufreq.off=1" cmdline option
        cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily
        cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy()
        cpufreq: intel_pstate: Do not use performance_limits in passive mode
      fd8e57d5
  3. 08 Mar, 2017 26 commits
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ea6200e8
      Linus Torvalds authored
      Pull sched.h split-up fixes for MIPS from Ingo Molnar:
       "These are the fixes for MIPS build failures due to the sched.h
        split-up, from Arnd Bergmann"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        MIPS: Add missing include files
      ea6200e8
    • Tony Luck's avatar
      mm, page_alloc: Add missing check for memory holes · b4fb8f66
      Tony Luck authored
      Commit 13ad59df ("mm, page_alloc: avoid page_to_pfn() when merging
      buddies") moved the check for memory holes out of page_is_buddy() and
      had the callers do the check.
      
      But this wasn't done correctly in one place which caused ia64 to crash
      very early in boot.
      
      Update to fix that and make ia64 boot again.
      
      [ v2: Vlastimil pointed out we don't need to call page_to_pfn()
            since we already have the result of that in "buddy_pfn" ]
      
      Fixes: 13ad59df ("avoid page_to_pfn() when merging buddies")
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4fb8f66
    • Linus Torvalds's avatar
      Merge tag 'ktest-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest · 8557b8e4
      Linus Torvalds authored
      Pull ktest fixes from Steven Rostedt:
       "Greg Kroah-Hartman reported to me that the ktest of v4.11-rc1 locked
        up in an infinite loop while doing the make mrproper.
      
        Looking into the cause I noticed that a recent update to the function
        run_command (used for running all shell commands, including "make
        mrproper") changed the internal loop to use the function
        wait_for_input.
      
        The wait_for_input function uses select to look at two file
        descriptors. One is the file descriptor of the command it is running,
        the other is STDIN. The STDIN check was not checking the return status
        of the sysread call, and was also just writing a lot of data into
        syswrite without regard to the size of the data read.
      
        Changing the code to check the return status of sysread, and also to
        still process the passed in descriptor data without looping back to
        the select fixed Greg's problem.
      
        While looking at this code I also realized that the loop did not honor
        the timeout if STDIN always had input (or for some reason return
        error). this could prevent wait_for_input to timeout on the file
        descriptor it is suppose to be waiting for. That is fixed too"
      
      * tag 'ktest-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
        ktest: Make sure wait_for_input does honor the timeout
        ktest: Fix while loop in wait_for_input
      8557b8e4
    • Linus Torvalds's avatar
      overlayfs: remove now unnecessary header file include · 04bb94b1
      Linus Torvalds authored
      This removes the extra include header file that was added in commit
      e58bc927 "Pull overlayfs updates from Miklos Szeredi" now that it
      is no longer needed.
      
      There are probably other such includes that got added during the
      scheduler header splitup series, but this is the one that annoyed me
      personally and I know about.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      04bb94b1
    • Christoph Hellwig's avatar
      xfs: try any AG when allocating the first btree block when reflinking · 2fcc319d
      Christoph Hellwig authored
      When a reflink operation causes the bmap code to allocate a btree block
      we're currently doing single-AG allocations due to having ->firstblock
      set and then try any higher AG due a little reflink quirk we've put in
      when adding the reflink code.  But given that we do not have a minleft
      reservation of any kind in this AG we can still not have any space in
      the same or higher AG even if the file system has enough free space.
      To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back
      path instead.
      
      [And yes, we need to redo this properly instead of piling hacks over
       hacks.  I'm working on that, but it's not going to be a small series.
       In the meantime this fixes the customer reported issue]
      
      Also add a warning for failing allocations to make it easier to debug.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      2fcc319d
    • Linus Torvalds's avatar
      sched/headers: fix up header file dependency on <linux/sched/signal.h> · bd0f9b35
      Linus Torvalds authored
      The scheduler header file split and cleanups ended up exposing a few
      nasty header file dependencies, and in particular it showed how we in
      <linux/wait.h> ended up depending on "signal_pending()", which now comes
      from <linux/sched/signal.h>.
      
      That's a very subtle and annoying dependency, which already caused a
      semantic merge conflict (see commit e58bc927 "Pull overlayfs updates
      from Miklos Szeredi", which added that fixup in the merge commit).
      
      It turns out that we can avoid this dependency _and_ improve code
      generation by moving the guts of the fairly nasty helper #define
      __wait_event_interruptible_locked() to out-of-line code.  The code that
      includes the signal_pending() check is all in the slow-path where we
      actually go to sleep waiting for the event anyway, so using a helper
      function is the right thing to do.
      
      Using a helper function is also what we already did for the non-locked
      versions, see the "__wait_event*()" macros and the "prepare_to_wait*()"
      set of helper functions.
      
      We might want to try to unify all these macro games, we have a _lot_ of
      subtly different wait-event loops.  But this is the minimal patch to fix
      the annoying header dependency.
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bd0f9b35
    • Brian Foster's avatar
      xfs: use iomap new flag for newly allocated delalloc blocks · f65e6fad
      Brian Foster authored
      Commit fa7f138a ("xfs: clear delalloc and cache on buffered write
      failure") fixed one regression in the iomap error handling code and
      exposed another. The fundamental problem is that if a buffered write
      is a rewrite of preexisting delalloc blocks and the write fails, the
      failure handling code can punch out preexisting blocks with valid
      file data.
      
      This was reproduced directly by sub-block writes in the LTP
      kernel/syscalls/write/write03 test. A first 100 byte write allocates
      a single block in a file. A subsequent 100 byte write fails and
      punches out the block, including the data successfully written by
      the previous write.
      
      To address this problem, update the ->iomap_begin() handler to
      distinguish newly allocated delalloc blocks from preexisting
      delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the
      ->iomap_end() handler to decide when a failed or short write should
      punch out delalloc blocks.
      
      This introduces the subtle requirement that ->iomap_begin() should
      never combine newly allocated delalloc blocks with existing blocks
      in the resulting iomap descriptor. This can occur when a new
      delalloc reservation merges with a neighboring extent that is part
      of the current write, for example. Therefore, drop the
      post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and
      just return the record inserted into the fork. This ensures only new
      blocks are returned and thus that preexisting delalloc blocks are
      always handled as "found" blocks and not punched out on a failed
      rewrite.
      Reported-by: default avatarXiong Zhou <xzhou@redhat.com>
      Signed-off-by: default avatarBrian Foster <bfoster@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      f65e6fad
    • Jan Kara's avatar
      axonram: Fix gendisk handling · 672a2c87
      Jan Kara authored
      It is invalid to call del_gendisk() when disk->queue is NULL. Fix error
      handling in axon_ram_probe() to avoid doing that.
      
      Also del_gendisk() does not drop a reference to gendisk allocated by
      alloc_disk(). That has to be done by put_disk(). Add that call where
      needed.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      672a2c87
    • NeilBrown's avatar
      blk: improve order of bio handling in generic_make_request() · 79bd9959
      NeilBrown authored
      To avoid recursion on the kernel stack when stacked block devices
      are in use, generic_make_request() will, when called recursively,
      queue new requests for later handling.  They will be handled when the
      make_request_fn for the current bio completes.
      
      If any bios are submitted by a make_request_fn, these will ultimately
      be handled seqeuntially.  If the handling of one of those generates
      further requests, they will be added to the end of the queue.
      
      This strict first-in-first-out behaviour can lead to deadlocks in
      various ways, normally because a request might need to wait for a
      previous request to the same device to complete.  This can happen when
      they share a mempool, and can happen due to interdependencies
      particular to the device.  Both md and dm have examples where this happens.
      
      These deadlocks can be erradicated by more selective ordering of bios.
      Specifically by handling them in depth-first order.  That is: when the
      handling of one bio generates one or more further bios, they are
      handled immediately after the parent, before any siblings of the
      parent.  That way, when generic_make_request() calls make_request_fn
      for some particular device, we can be certain that all previously
      submited requests for that device have been completely handled and are
      not waiting for anything in the queue of requests maintained in
      generic_make_request().
      
      An easy way to achieve this would be to use a last-in-first-out stack
      instead of a queue.  However this will change the order of consecutive
      bios submitted by a make_request_fn, which could have unexpected consequences.
      Instead we take a slightly more complex approach.
      A fresh queue is created for each call to a make_request_fn.  After it completes,
      any bios for a different device are placed on the front of the main queue, followed
      by any bios for the same device, followed by all bios that were already on
      the queue before the make_request_fn was called.
      This provides the depth-first approach without reordering bios on the same level.
      
      This, by itself, it not enough to remove all deadlocks.  It just makes
      it possible for drivers to take the extra step required themselves.
      
      To avoid deadlocks, drivers must never risk waiting for a request
      after submitting one to generic_make_request.  This includes never
      allocing from a mempool twice in the one call to a make_request_fn.
      
      A common pattern in drivers is to call bio_split() in a loop, handling
      the first part and then looping around to possibly split the next part.
      Instead, a driver that finds it needs to split a bio should queue
      (with generic_make_request) the second part, handle the first part,
      and then return.  The new code in generic_make_request will ensure the
      requests to underlying bios are processed first, then the second bio
      that was split off.  If it splits again, the same process happens.  In
      each case one bio will be completely handled before the next one is attempted.
      
      With this is place, it should be possible to disable the
      punt_bios_to_recover() recovery thread for many block devices, and
      eventually it may be possible to remove it completely.
      
      Ref: http://www.spinics.net/lists/raid/msg54680.htmlTested-by: default avatarJinpu Wang <jinpu.wang@profitbricks.com>
      Inspired-by: default avatarLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      79bd9959
    • Jan Kara's avatar
      Revert "scsi, block: fix duplicate bdi name registration crashes" · c01228db
      Jan Kara authored
      This reverts commit 0dba1314. It causes
      leaking of device numbers for SCSI when SCSI registers multiple gendisks
      for one request_queue in succession. It can be easily reproduced using
      Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE.
      Furthermore the protection provided by this commit is not needed anymore
      as the problem it was fixing got also fixed by commit 165a5e22
      "block: Move bdi_unregister() to del_gendisk()".
      
      [1]: http://marc.info/?l=linux-block&m=148554717109098&w=2Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarDan Williams <dan.j.williams@intel.com>
      Tested-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c01228db
    • Jan Kara's avatar
      block: Make del_gendisk() safer for disks without queues · 90f16fdd
      Jan Kara authored
      Commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()"
      added disk->queue dereference to del_gendisk(). Although del_gendisk()
      is not supposed to be called without disk->queue valid and
      blk_unregister_queue() warns in that case, this change will make it oops
      instead. Return to the old more robust behavior of just warning when
      del_gendisk() gets called for gendisk with disk->queue being NULL.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Tested-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      90f16fdd
    • Jan Kara's avatar
      bdi: Fix use-after-free in wb_congested_put() · df23de55
      Jan Kara authored
      bdi_writeback_congested structures get created for each blkcg and bdi
      regardless whether bdi is registered or not. When they are created in
      unregistered bdi and the request queue (and thus bdi) is then destroyed
      while blkg still holds reference to bdi_writeback_congested structure,
      this structure will be referencing freed bdi and last wb_congested_put()
      will try to remove the structure from already freed bdi.
      
      With commit 165a5e22 "block: Move bdi_unregister() to
      del_gendisk()", SCSI started to destroy bdis without calling
      bdi_unregister() first (previously it was calling bdi_unregister() even
      for unregistered bdis) and thus the code detaching
      bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we
      started hitting this use-after-free bug. It is enough to boot a KVM
      instance with virtio-scsi device to trigger this behavior.
      
      Fix the problem by detaching bdi_writeback_congested structures in
      bdi_exit() instead of bdi_unregister(). This is also more logical as
      they can get attached to bdi regardless whether it ever got registered
      or not.
      
      Fixes: 165a5e22Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Tested-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      df23de55
    • Jan Kara's avatar
      block: Allow bdi re-registration · b6f8fec4
      Jan Kara authored
      SCSI can call device_add_disk() several times for one request queue when
      a device in unbound and bound, creating new gendisk each time. This will
      lead to bdi being repeatedly registered and unregistered. This was not a
      big problem until commit 165a5e22 "block: Move bdi_unregister() to
      del_gendisk()" since bdi was only registered repeatedly (bdi_register()
      handles repeated calls fine, only we ended up leaking reference to
      gendisk due to overwriting bdi->owner) but unregistered only in
      blk_cleanup_queue() which didn't get called repeatedly. After
      165a5e22 we were doing correct bdi_register() - bdi_unregister()
      cycles however bdi_unregister() is not prepared for it. So make sure
      bdi_unregister() cleans up bdi in such a way that it is prepared for
      a possible following bdi_register() call.
      
      An easy way to provoke this behavior is to enable
      CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a
      scsi disk which immediately hangs without this fix.
      
      Fixes: 165a5e22Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Tested-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b6f8fec4
    • Jon Derrick's avatar
      block/sed: Fix opal user range check and unused variables · b0bfdfc2
      Jon Derrick authored
      Fixes check that the opal user is within the range, and cleans up unused
      method variables.
      Signed-off-by: default avatarJon Derrick <jonathan.derrick@intel.com>
      Reviewed-by: default avatarScott Bauer <scott.bauer@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b0bfdfc2
    • Johannes Thumshirn's avatar
      zram: set physical queue limits to avoid array out of bounds accesses · 0bc31538
      Johannes Thumshirn authored
      zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
      the NVMe over Fabrics loopback target which potentially sends a huge bulk of
      pages attached to the bio's bvec this results in a kernel panic because of
      array out of bounds accesses in zram_decompress_page().
      Signed-off-by: default avatarJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0bc31538
    • Ming Lei's avatar
      blk-mq: free hctx->cpumask in release handler of hctx's kobject · 01388df3
      Ming Lei authored
      It is obviously that hctx->cpumask is per hctx, and both
      share same lifetime, so this patch moves freeing of hctx->cpumask
      into release handler of hctx's kobject.
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Tested-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      01388df3
    • Ming Lei's avatar
      blk-mq: make lifetime consistent between hctx and its kobject · 6c8b232e
      Ming Lei authored
      This patch removes kobject_put() over hctx in __blk_mq_unregister_dev(),
      and trys to keep lifetime consistent between hctx and hctx's kobject.
      
      Now blk_mq_sysfs_register() and blk_mq_sysfs_unregister() become
      totally symmetrical, and kobject's refcounter drops to zero just
      when the hctx is freed.
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Tested-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      6c8b232e
    • Ming Lei's avatar
      blk-mq: make lifetime consitent between q/ctx and its kobject · 7ea5fe31
      Ming Lei authored
      Currently from kobject view, both q->mq_kobj and ctx->kobj can
      be released during one cycle of blk_mq_register_dev() and
      blk_mq_unregister_dev(). Actually, sw queue's lifetime is
      same with its request queue's, which is covered by request_queue->kobj.
      
      So we don't need to call kobject_put() for the two kinds of
      kobject in __blk_mq_unregister_dev(), instead we do that
      in release handler of request queue.
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Tested-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      7ea5fe31
    • Ming Lei's avatar
      blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue() · 737f98cf
      Ming Lei authored
      Both q->mq_kobj and sw queues' kobjects should have been initialized
      once, instead of doing that each add_disk context.
      
      Also this patch removes clearing of ctx in blk_mq_init_cpu_queues()
      because percpu allocator fills zero to allocated variable.
      
      This patch fixes one issue[1] reported from Omar.
      
      [1] kernel wearning when doing unbind/bind on one scsi-mq device
      
      [   19.347924] kobject (ffff8800791ea0b8): tried to init an initialized object, something is seriously wrong.
      [   19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0-rc7-00210-g53f39eeaa263 #34
      [   19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
      [   19.350920] Workqueue: events_unbound async_run_entry_fn
      [   19.350920] Call Trace:
      [   19.350920]  dump_stack+0x63/0x83
      [   19.350920]  kobject_init+0x77/0x90
      [   19.350920]  blk_mq_register_dev+0x40/0x130
      [   19.350920]  blk_register_queue+0xb6/0x190
      [   19.350920]  device_add_disk+0x1ec/0x4b0
      [   19.350920]  sd_probe_async+0x10d/0x1c0 [sd_mod]
      [   19.350920]  async_run_entry_fn+0x48/0x150
      [   19.350920]  process_one_work+0x1d0/0x480
      [   19.350920]  worker_thread+0x48/0x4e0
      [   19.350920]  kthread+0x101/0x140
      [   19.350920]  ? process_one_work+0x480/0x480
      [   19.350920]  ? kthread_create_on_node+0x60/0x60
      [   19.350920]  ret_from_fork+0x2c/0x40
      
      Cc: Omar Sandoval <osandov@osandov.com>
      Signed-off-by: default avatarMing Lei <tom.leiming@gmail.com>
      Tested-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      737f98cf
    • Steven Rostedt (VMware)'s avatar
      ktest: Make sure wait_for_input does honor the timeout · f7c6401f
      Steven Rostedt (VMware) authored
      The function wait_for_input takes in a timeout, and even has a default
      timeout. But if for some reason the STDIN descriptor keeps sending in data,
      the function will never time out. The timout is to wait for the data from
      the passed in file descriptor, not for STDIN. Adding a test in the case
      where there's no data from the passed in file descriptor that checks to see
      if the timeout passed, will ensure that it will timeout properly even if
      there's input in STDIN.
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      f7c6401f
    • Steven Rostedt (VMware)'s avatar
      ktest: Fix while loop in wait_for_input · 99c014a8
      Steven Rostedt (VMware) authored
      The run_command function was changed to use the wait_for_input function to
      allow having a timeout if the command to run takes too much time. There was
      a bug in the wait_for_input where it could end up going into an infinite
      loop. There's two issues here. One is that the return value of the sysread
      wasn't used for the write (to write a proper size), and that it should
      continue processing the passed in file descriptor too even if there was
      input. There was no check for error, if for some reason STDIN returned an
      error, the function would go into an infinite loop and never exit.
      Reported-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Tested-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Fixes: 6e98d1b4 ("ktest: Add timeout to ssh command")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      99c014a8
    • Arnd Bergmann's avatar
      MIPS: Add missing include files · fc69910f
      Arnd Bergmann authored
      After the split of linux/sched.h, several platforms in arch/mips stopped building.
      
      Add the respective additional #include statements to fix the problem I first
      tried adding these into asm/processor.h, but ran into circular header
      dependencies with that which I could not figure out.
      
      The commit I listed as causing the problem is the branch merge, as there is
      likely a combination of multiple patches in that branch.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mips@linux-mips.org
      Cc: ralf@linux-mips.org
      Fixes: 1827adb1 ("Merge branch 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
      Link: http://lkml.kernel.org/r/20170308072931.3836696-1-arnd@arndb.deSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      fc69910f
    • Darrick J. Wong's avatar
      xfs: remove kmem_zalloc_greedy · 08b005f1
      Darrick J. Wong authored
      The sole remaining caller of kmem_zalloc_greedy is bulkstat, which uses
      it to grab 1-4 pages for staging of inobt records.  The infinite loop in
      the greedy allocation function is causing hangs[1] in generic/269, so
      just get rid of the greedy allocator in favor of kmem_zalloc_large.
      This makes bulkstat somewhat more likely to ENOMEM if there's really no
      pages to spare, but eliminates a source of hangs.
      
      [1] http://lkml.kernel.org/r/20170301044634.rgidgdqqiiwsmfpj%40XZHOUW.usersys.redhat.comSigned-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      ---
      v2: remove single-page fallback
      08b005f1
    • Chandan Rajendra's avatar
      xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask · d5825712
      Chandan Rajendra authored
      When block size is larger than inode cluster size, the call to
      XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs
      would have set xfs_sb->sb_inoalignmt to 0. Hence in
      xfs_set_inoalignment(), xfs_mount->m_inoalign_mask gets initialized to
      -1 instead of 0. However, xfs_mount->m_sinoalign would get correctly
      intialized to 0 because for every positive value of xfs_mount->m_dalign,
      the condition "!(mp->m_dalign & mp->m_inoalign_mask)" would evaluate to
      false.
      
      Also, xfs_imap() worked fine even with xfs_mount->m_inoalign_mask having
      -1 as the value because blks_per_cluster variable would have the value 1
      and hence we would never have a need to use xfs_mount->m_inoalign_mask
      to compute the inode chunk's agbno and offset within the chunk.
      Signed-off-by: default avatarChandan Rajendra <chandan@linux.vnet.ibm.com>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      d5825712
    • Christoph Hellwig's avatar
      xfs: fix and streamline error handling in xfs_end_io · 787eb485
      Christoph Hellwig authored
      There are two different cases of buffered I/O errors:
      
       - first we can have an already shutdown fs.  In that case we should skip
         any on-disk operations and just clean up the appen transaction if
         present and destroy the ioend
       - a real I/O error.  In that case we should cleanup any lingering COW
         blocks.  This gets skipped in the current code and is fixed by this
         patch.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      787eb485
    • Christoph Hellwig's avatar
      xfs: only reclaim unwritten COW extents periodically · 3802a345
      Christoph Hellwig authored
      We only want to reclaim preallocations from our periodic work item.
      Currently this is archived by looking for a dirty inode, but that check
      is rather fragile.  Instead add a flag to xfs_reflink_cancel_cow_* so
      that the caller can ask for just cancelling unwritten extents in the COW
      fork.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: fix typos in commit message]
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      3802a345
  4. 07 Mar, 2017 5 commits