1. 16 Jul, 2014 1 commit
    • Tejun Heo's avatar
      blkcg: don't call into policy draining if root_blkg is already gone · 2a1b4cf2
      Tejun Heo authored
      While a queue is being destroyed, all the blkgs are destroyed and its
      ->root_blkg pointer is set to NULL.  If someone else starts to drain
      while the queue is in this state, the following oops happens.
      
        NULL pointer dereference at 0000000000000028
        IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
        PGD e4a1067 PUD b773067 PMD 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
        CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
        RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
        RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
        RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
        RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
        R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
        R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
        FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
        Stack:
         ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
         ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
         ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
        Call Trace:
         [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
         [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
         [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
         [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
         [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
         [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
         [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
         [<ffffffff81454968>] kobject_cleanup+0x38/0x70
         [<ffffffff81454848>] kobject_put+0x28/0x60
         [<ffffffff81427505>] blk_put_queue+0x15/0x20
         [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
         [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
         [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
         [<ffffffff817930e2>] device_release+0x32/0xa0
         [<ffffffff81454968>] kobject_cleanup+0x38/0x70
         [<ffffffff81454848>] kobject_put+0x28/0x60
         [<ffffffff817934d7>] put_device+0x17/0x20
         [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
         [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
         [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
         [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
         [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
         [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
         [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
         [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
         [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b
      
      776687bc ("block, blk-mq: draining can't be skipped even if
      bypass_depth was non-zero") made it easier to trigger this bug by
      making blk_queue_bypass_start() drain even when it loses the first
      bypass test to blk_cleanup_queue(); however, the bug has always been
      there even before the commit as blk_queue_bypass_start() could race
      against queue destruction, win the initial bypass test but perform the
      actual draining after blk_cleanup_queue() already destroyed all blkgs.
      
      Fix it by skippping calling into policy draining if all the blkgs are
      already gone.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarShirish Pargaonkar <spargaonkar@suse.com>
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Reported-by: default avatarJet Chen <jet.chen@intel.com>
      Cc: stable@vger.kernel.org
      Tested-by: default avatarShirish Pargaonkar <spargaonkar@suse.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      2a1b4cf2
  2. 14 Jul, 2014 1 commit
  3. 01 Jul, 2014 14 commits
    • Maurizio Lombardi's avatar
      bio: modify __bio_add_page() to accept pages that don't start a new segment · 254c4407
      Maurizio Lombardi authored
      The original behaviour is to refuse to add a new page if the maximum
      number of segments has been reached, regardless of the fact the page we
      are going to add can be merged into the last segment or not.
      
      Unfortunately, when the system runs under heavy memory fragmentation
      conditions, a driver may try to add multiple pages to the last segment.
      The original code won't accept them and EBUSY will be reported to
      userspace.
      
      This patch modifies the function so it refuses to add a page only in case
      the latter starts a new segment and the maximum number of segments has
      already been reached.
      
      The bug can be easily reproduced with the st driver:
      
      1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE  to 16
      2) modprobe st buffer_kbs=1024
      3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10
         dd: error writing `/dev/st0': Device or resource busy
      
      [ming.lei@canonical.com: update bi_iter.bi_size before recounting segments]
      Signed-off-by: default avatarMaurizio Lombardi <mlombard@redhat.com>
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Tested-by: default avatarDongsu Park <dongsu.park@profitbricks.com>
      Tested-by: default avatarJet Chen <jet.chen@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      254c4407
    • Akinobu Mita's avatar
      block: fix SG_[GS]ET_RESERVED_SIZE ioctl when max_sectors is huge · 9b4231bf
      Akinobu Mita authored
      SG_GET_RESERVED_SIZE and SG_SET_RESERVED_SIZE ioctls access a reserved
      buffer in bytes as int type.  The value needs to be capped at the request
      queue's max_sectors.  But integer overflow is not correctly handled in
      the calculation when converting max_sectors from sectors to bytes.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: linux-scsi@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9b4231bf
    • Akinobu Mita's avatar
      block: fix BLKSECTGET ioctl when max_sectors is greater than USHRT_MAX · 63f26496
      Akinobu Mita authored
      BLKSECTGET ioctl loads the request queue's max_sectors as unsigned
      short value to the argument pointer.  So if the max_sector is greater
      than USHRT_MAX, the upper 16 bits of that is just discarded.
      
      In such case, USHRT_MAX is more preferable than the lower 16 bits of
      max_sectors.
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Douglas Gilbert <dgilbert@interlog.com>
      Cc: linux-scsi@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      63f26496
    • Fabian Frederick's avatar
      block/partitions/efi.c: kerneldoc fixing · 16e15565
      Fabian Frederick authored
      Adding function documentation and fixing kerneldoc warnings
      ('field: description' uniformization).
      
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      16e15565
    • Fabian Frederick's avatar
      block/partitions/msdos.c: code clean-up · dce14c23
      Fabian Frederick authored
      checkpatch fixing:
      WARNING: Missing a blank line after declarations
      WARNING: space prohibited between function name and open parenthesis '('
      ERROR: spaces required around that '<' (ctx:VxV)
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      dce14c23
    • Fabian Frederick's avatar
      block/partitions/amiga.c: replace nolevel printk by pr_err · 600ffc5e
      Fabian Frederick authored
      Also add no prefix pr_fmt to avoid any future default format update
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      600ffc5e
    • Fabian Frederick's avatar
      block/partitions/aix.c: replace count*size kzalloc by kcalloc · 472d5e2a
      Fabian Frederick authored
      kcalloc manages count*sizeof overflow.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarFabian Frederick <fabf@skynet.be>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      472d5e2a
    • Gu Zheng's avatar
      bio-integrity: add "bip_max_vcnt" into struct bio_integrity_payload · cbcd1054
      Gu Zheng authored
      Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from
      Martin introduces the function bip_integrity_vecs(get the useful vectors)
      to fix the issue about nr_vecs for inline integrity vectors that reported
      by David Milburn.
      
      But it seems that bip_integrity_vecs() will return the wrong number if the
      bio is not based on any bio_set for some reason(bio->bi_pool == NULL),
      because in that case, the bip_inline_vecs[0] is malloced directly.  So
      here we add the bip_max_vcnt to record the count of vector slots, and
      cleanup the function bip_integrity_vecs().
      Signed-off-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      cbcd1054
    • Tejun Heo's avatar
      blk-mq: use percpu_ref for mq usage count · add703fd
      Tejun Heo authored
      Currently, blk-mq uses a percpu_counter to keep track of how many
      usages are in flight.  The percpu_counter is drained while freezing to
      ensure that no usage is left in-flight after freezing is complete.
      blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this
      per-cpu gating mechanism.
      
      This type of code has relatively high chance of subtle bugs which are
      extremely difficult to trigger and it's way too hairy to be open coded
      in blk-mq.  percpu_ref can serve the same purpose after the recent
      changes.  This patch replaces the open-coded per-cpu usage counting
      and draining mechanism with percpu_ref.
      
      blk_mq_queue_enter() performs tryget_live on the ref and exit()
      performs put.  blk_mq_freeze_queue() kills the ref and waits until the
      reference count reaches zero.  blk_mq_unfreeze_queue() revives the ref
      and wakes up the waiters.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      add703fd
    • Tejun Heo's avatar
      blk-mq: collapse __blk_mq_drain_queue() into blk_mq_freeze_queue() · 72d6f02a
      Tejun Heo authored
      Keeping __blk_mq_drain_queue() as a separate function doesn't buy us
      anything and it's gonna be further simplified.  Let's flatten it into
      its caller.
      
      This patch doesn't make any functional change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      72d6f02a
    • Tejun Heo's avatar
      blk-mq: decouble blk-mq freezing from generic bypassing · 780db207
      Tejun Heo authored
      blk_mq freezing is entangled with generic bypassing which bypasses
      blkcg and io scheduler and lets IO requests fall through the block
      layer to the drivers in FIFO order.  This allows forward progress on
      IOs with the advanced features disabled so that those features can be
      configured or altered without worrying about stalling IO which may
      lead to deadlock through memory allocation.
      
      However, generic bypassing doesn't quite fit blk-mq.  blk-mq currently
      doesn't make use of blkcg or ioscheds and it maps bypssing to
      freezing, which blocks request processing and drains all the in-flight
      ones.  This causes problems as bypassing assumes that request
      processing is online.  blk-mq works around this by conditionally
      allowing request processing for the problem case - during queue
      initialization.
      
      Another weirdity is that except for during queue cleanup, bypassing
      started on the generic side prevents blk-mq from processing new
      requests but doesn't drain the in-flight ones.  This shouldn't break
      anything but again highlights that something isn't quite right here.
      
      The root cause is conflating blk-mq freezing and generic bypassing
      which are two different mechanisms.  The only intersecting purpose
      that they serve is during queue cleanup.  Let's properly separate
      blk-mq freezing from generic bypassing and simply use it where
      necessary.
      
      * request_queue->mq_freeze_depth is added and
        blk_mq_[un]freeze_queue() now operate on this counter instead of
        ->bypass_depth.  The replacement for QUEUE_FLAG_BYPASS isn't added
        but the counter is tested directly.  This will be further updated by
        later changes.
      
      * blk_mq_drain_queue() is dropped and "__" prefix is dropped from
        blk_mq_freeze_queue().  Queue cleanup path now calls
        blk_mq_freeze_queue() directly.
      
      * blk_queue_enter()'s fast path condition is simplified to simply
        check @q->mq_freeze_depth.  Previously, the condition was
      
      	!blk_queue_dying(q) &&
      	    (!blk_queue_bypass(q) || !blk_queue_init_done(q))
      
        mq_freeze_depth is incremented right after dying is set and
        blk_queue_init_done() exception isn't necessary as blk-mq doesn't
        start frozen, which only leaves the blk_queue_bypass() test which
        can be replaced by @q->mq_freeze_depth test.
      
      This change simplifies the code and reduces confusion in the area.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      780db207
    • Tejun Heo's avatar
      block, blk-mq: draining can't be skipped even if bypass_depth was non-zero · 776687bc
      Tejun Heo authored
      Currently, both blk_queue_bypass_start() and blk_mq_freeze_queue()
      skip queue draining if bypass_depth was already above zero.  The
      assumption is that the one which bumped the bypass_depth should have
      performed draining already; however, there's nothing which prevents a
      new instance of bypassing/freezing from starting before the previous
      one finishes draining.  The current code may allow the later
      bypassing/freezing instances to complete while there still are
      in-flight requests which haven't finished draining.
      
      Fix it by draining regardless of bypass_depth.  We still skip draining
      from blk_queue_bypass_start() while the queue is initializing to avoid
      introducing excessive delays during boot.  INIT_DONE setting is moved
      above the initial blk_queue_bypass_end() so that bypassing attempts
      can't slip inbetween.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      776687bc
    • Tejun Heo's avatar
      blk-mq: fix a memory ordering bug in blk_mq_queue_enter() · 531ed626
      Tejun Heo authored
      blk-mq uses a percpu_counter to keep track of how many usages are in
      flight.  The percpu_counter is drained while freezing to ensure that
      no usage is left in-flight after freezing is complete.
      
      blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this
      per-cpu gating mechanism; unfortunately, it contains a subtle bug -
      smp_wmb() in blk_mq_queue_enter() doesn't prevent prevent the cpu from
      fetching @q->bypass_depth before incrementing @q->mq_usage_counter and
      if freezing happens inbetween the caller can slip through and freezing
      can be complete while there are active users.
      
      Use smp_mb() instead so that bypass_depth and mq_usage_counter
      modifications and tests are properly interlocked.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      531ed626
    • Jens Axboe's avatar
      Merge branch 'for-3.17' of... · 17737d3b
      Jens Axboe authored
      Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.17/core
      
      Merge the percpu_ref changes from Tejun, he says they are stable now.
      17737d3b
  4. 29 Jun, 2014 10 commits
  5. 28 Jun, 2014 14 commits
    • Linus Torvalds's avatar
      Merge tag 'spi-v3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · 24b414d5
      Linus Torvalds authored
      Pull spi fixes from Mark Brown:
       "A few driver specific fixes, the biggest one being a fix for the newly
        added Qualcomm SPI controller driver to make it not use its internal
        chip select due to hardware bugs, replacing it with GPIOs"
      
      * tag 'spi-v3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
        spi: qup: Remove chip select function
        spi: qup: Fix order of spi_register_master
        spi: sh-sci: fix use-after-free in sh_sci_spi_remove()
        spi/pxa2xx: fix incorrect SW mode chipselect setting for BayTrail LPSS SPI
      24b414d5
    • Linus Torvalds's avatar
      Merge tag 'regulator-v3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 4194976b
      Linus Torvalds authored
      Pull regulator fixes from Mark Brown:
       "Several driver specific fixes here, the palmas fixes being especially
        important for a range of boards - the recent updates to support new
        devices have introduced several regressions"
      
      * tag 'regulator-v3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: tps65218: Correct the the config register for LDO1
        regulator: tps65218: Add the missing of_node assignment in probe
        regulator: palmas: fix typo in enable_reg calculation
        regulator: bcm590xx: fix vbus name
        regulator: palmas: Fix SMPS enable/disable/is_enabled
      4194976b
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · eb477e03
      Linus Torvalds authored
      Pull SCSI target fixes from Nicholas Bellinger:
       "Mostly minor fixes this time around.  The highlights include:
      
         - iscsi-target CHAP authentication fixes to enforce explicit key
           values (Tejas Vaykole + rahul.rane)
         - fix a long-standing OOPs in target-core when a alua configfs
           attribute is accessed after port symlink has been removed.
           (Sebastian Herbszt)
         - fix a v3.10.y iscsi-target regression causing the login reject
           status class/detail to be ignored (Christoph Vu-Brugier)
         - fix a v3.10.y iscsi-target regression to avoid rejecting an
           existing ITT during Data-Out when data-direction is wrong (Santosh
           Kulkarni + Arshad Hussain)
         - fix a iscsi-target related shutdown deadlock on UP kernels (Mikulas
           Patocka)
         - fix a v3.16-rc1 build issue with vhost-scsi + !CONFIG_NET (MST)"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        iscsi-target: fix iscsit_del_np deadlock on unload
        iovec: move memcpy_from/toiovecend to lib/iovec.c
        iscsi-target: Avoid rejecting incorrect ITT for Data-Out
        tcm_loop: Fix memory leak in tcm_loop_submission_work error path
        iscsi-target: Explicily clear login response PDU in exception path
        target: Fix left-over se_lun->lun_sep pointer OOPs
        iscsi-target; Enforce 1024 byte maximum for CHAP_C key value
        iscsi-target: Convert chap_server_compute_md5 to use kstrtoul
      eb477e03
    • Mark Brown's avatar
    • Mark Brown's avatar
      Merge remote-tracking branches 'regulator/fix/bcm590xx',... · 11767484
      Mark Brown authored
      Merge remote-tracking branches 'regulator/fix/bcm590xx', 'regulator/fix/palmas' and 'regulator/fix/tps65218' into regulator-linus
      11767484
    • Tejun Heo's avatar
      percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero() · 2d722782
      Tejun Heo authored
      Now that explicit invocation of percpu_ref_exit() is necessary to free
      the percpu counter, we can implement percpu_ref_reinit() which
      reinitializes a released percpu_ref.  This can be used implement
      scalable gating switch which can be drained and then re-opened without
      worrying about memory allocation failures.
      
      percpu_ref_is_zero() is added to be used in a sanity check in
      percpu_ref_exit().  As this function will be useful for other purposes
      too, make it a public interface.
      
      v2: Use smp_read_barrier_depends() instead of smp_load_acquire().  We
          only need data dep barrier and smp_load_acquire() is stronger and
          heavier on some archs.  Spotted by Lai Jiangshan.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      2d722782
    • Tejun Heo's avatar
      percpu-refcount: require percpu_ref to be exited explicitly · 9a1049da
      Tejun Heo authored
      Currently, a percpu_ref undoes percpu_ref_init() automatically by
      freeing the allocated percpu area when the percpu_ref is killed.
      While seemingly convenient, this has the following niggles.
      
      * It's impossible to re-init a released reference counter without
        going through re-allocation.
      
      * In the similar vein, it's impossible to initialize a percpu_ref
        count with static percpu variables.
      
      * We need and have an explicit destructor anyway for failure paths -
        percpu_ref_cancel_init().
      
      This patch removes the automatic percpu counter freeing in
      percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a
      generic destructor now named percpu_ref_exit().  percpu_ref_destroy()
      is considered but it gets confusing with percpu_ref_kill() while
      "exit" clearly indicates that it's the counterpart of
      percpu_ref_init().
      
      All percpu_ref_cancel_init() users are updated to invoke
      percpu_ref_exit() instead and explicit percpu_ref_exit() calls are
      added to the destruction path of all percpu_ref users.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Li Zefan <lizefan@huawei.com>
      9a1049da
    • Tejun Heo's avatar
      percpu-refcount: use unsigned long for pcpu_count pointer · 7d742075
      Tejun Heo authored
      percpu_ref->pcpu_count is a percpu pointer with a status flag in its
      lowest bit.  As such, it always goes through arithmetic operations
      which is very cumbersome to do on a pointer.  It has to be first
      casted to unsigned long and then back.
      
      Let's just make the field unsigned long so that we can skip the first
      casts.  While at it, rename it to pcpu_counter_ptr to clarify that
      it's a pointer value.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      7d742075
    • Tejun Heo's avatar
      percpu-refcount: add helpers for ->percpu_count accesses · eae7975d
      Tejun Heo authored
      * All four percpu_ref_*() operations implemented in the header file
        perform the same operation to determine whether the percpu_ref is
        alive and extract the percpu pointer.  Factor out the common logic
        into __pcpu_ref_alive().  This doesn't change the generated code.
      
      * There are a couple places in percpu-refcount.c which masks out
        PCPU_REF_DEAD to obtain the percpu pointer.  Factor it out into
        pcpu_count_ptr().
      
      * The above changes make the WARN_ON_ONCE() conditional at the top of
        percpu_ref_kill_and_confirm() the only user of REF_STATUS().  Test
        PCPU_REF_DEAD directly and remove REF_STATUS().
      
      This patch doesn't introduce any functional change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      eae7975d
    • Tejun Heo's avatar
      percpu-refcount: one bit is enough for REF_STATUS · d630dc4c
      Tejun Heo authored
      percpu-refcount currently reserves two lowest bits of its percpu
      pointer to indicate its state; however, only one bit is used for
      PCPU_REF_DEAD.
      
      Simplify it by removing PCPU_STATUS_BITS/MASK and testing
      PCPU_REF_DEAD directly.  This also allows the compiler to choose a
      more efficient instruction depending on the architecture.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      d630dc4c
    • Tejun Heo's avatar
      percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc() · 55c6c814
      Tejun Heo authored
      ioctx_alloc() reaches inside percpu_ref and directly frees
      ->pcpu_count in its failure path, which is quite gross.  percpu_ref
      has been providing a proper interface to do this,
      percpu_ref_cancel_init(), for quite some time now.  Let's use that
      instead.
      
      This patch doesn't introduce any behavior changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarBenjamin LaHaise <bcrl@kvack.org>
      Cc: Kent Overstreet <kmo@daterainc.com>
      55c6c814
    • Mikulas Patocka's avatar
      iscsi-target: fix iscsit_del_np deadlock on unload · 81a9c5e7
      Mikulas Patocka authored
      On uniprocessor preemptible kernel, target core deadlocks on unload. The
      following events happen:
      * iscsit_del_np is called
      * it calls send_sig(SIGINT, np->np_thread, 1);
      * the scheduler switches to the np_thread
      * the np_thread is woken up, it sees that kthread_should_stop() returns
        false, so it doesn't terminate
      * the np_thread clears signals with flush_signals(current); and goes back
        to sleep in iscsit_accept_np
      * the scheduler switches back to iscsit_del_np
      * iscsit_del_np calls kthread_stop(np->np_thread);
      * the np_thread is waiting in iscsit_accept_np and it doesn't respond to
        kthread_stop
      
      The deadlock could be resolved if the administrator sends SIGINT signal to
      the np_thread with killall -INT iscsi_np
      
      The reproducible deadlock was introduced in commit
      db6077fd, but the thread-stopping code was
      racy even before.
      
      This patch fixes the problem. Using kthread_should_stop to stop the
      np_thread is unreliable, so we test np_thread_state instead. If
      np_thread_state equals ISCSI_NP_THREAD_SHUTDOWN, the thread exits.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNicholas Bellinger <nab@linux-iscsi.org>
      81a9c5e7
    • Linus Torvalds's avatar
      Merge tag 'iommu-fixes-v3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 3e7b256c
      Linus Torvalds authored
      Pull IOMMU fixes from Joerg Roedel:
      
       - fix VT-d regression with handling multiple RMRR entries per device
      
       - fix a small race that was left in the mmu_notifier handling in the
         AMD IOMMUv2 driver
      
      * tag 'iommu-fixes-v3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/amd: Fix small race between invalidate_range_end/start
        iommu/vt-d: fix bug in handling multiple RMRRs for the same PCI device
      3e7b256c
    • Linus Torvalds's avatar
      Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d1fc98ba
      Linus Torvalds authored
      Pull x86 fixes from Peter Anvin:
       "A pile of fixes related to the VDSO, EFI and 32-bit badsys handling.
      
        It turns out that removing the section headers from the VDSO breaks
        gdb, so this puts back most of them.  A very simple typo broke
        rt_sigreturn on some versions of glibc, with obviously disastrous
        results.  The rest is pretty much fixes for the corresponding fallout.
      
        The EFI fixes fixes an arithmetic overflow on 32-bit systems and
        quiets some build warnings.
      
        Finally, when invoking an invalid system call number on x86-32, we
        bypass a bunch of handling, which can make the audit code oops"
      
      * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        efi-pstore: Fix an overflow on 32-bit builds
        x86/vdso: Error out in vdso2c if DT_RELA is present
        x86/vdso: Move DISABLE_BRANCH_PROFILING into the vdso makefile
        x86_32, signal: Fix vdso rt_sigreturn
        x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)
        x86/vdso: Create .build-id links for unstripped vdso files
        x86/vdso: Remove some redundant in-memory section headers
        x86/vdso: Improve the fake section headers
        x86/vdso2c: Use better macros for ELF bitness
        x86/vdso: Discard the __bug_table section
        efi: Fix compiler warnings (unused, const, type)
      d1fc98ba