1. 06 Mar, 2012 18 commits
    • Vivek Goyal's avatar
      blkcg: skip blkg printing if q isn't associated with disk · 92616b5b
      Vivek Goyal authored
      blk-cgroup printing code currently assumes that there is a device/disk
      associated with every queue in the system, but modules like floppy,
      can instantiate request queues without registering disk which can lead
      to oops.
      
      Skip the queue/blkg which don't have dev/disk associated with them.
      
      -tj: Factored out backing_dev_info check into blkg_dev_name().
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      92616b5b
    • Tejun Heo's avatar
      blkcg: kill the mind-bending blkg->dev · 7a4dd281
      Tejun Heo authored
      blkg->dev is dev_t recording the device number of the block device for
      the associated request_queue.  It is used to identify the associated
      block device when printing out configuration or stats.
      
      This is redundant to begin with.  A blkg is an association between a
      cgroup and a request_queue and it of course is possible to reach
      request_queue from blkg and synchronization conventions are in place
      for safe q dereferencing, so this shouldn't be necessary from the
      beginning.  Furthermore, it's initialized by sscanf()ing the device
      name of backing_dev_info.  The mind boggles.
      
      Anyways, if blkg is visible under rcu lock, we *know* that the
      associated request_queue hasn't gone away yet and its bdi is
      registered and alive - blkg can't be created for request_queue which
      hasn't been fully initialized and it can't go away before blkg is
      removed.
      
      Let stat and conf read functions get device name from
      blkg->q->backing_dev_info.dev and pass it down to printing functions
      and remove blkg->dev.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      7a4dd281
    • Tejun Heo's avatar
      blkcg: kill blkio_policy_node · 4bfd482e
      Tejun Heo authored
      Now that blkcg configuration lives in blkg's, blkio_policy_node is no
      longer necessary.  Kill it.
      
      blkio_policy_parse_and_set() now fails if invoked for missing device
      and functions to print out configurations are updated to print from
      blkg's.
      
      cftype_blkg_same_policy() is dropped along with other policy functions
      for consistency.  Its one line is open coded in the only user -
      blkio_read_blkg_stats().
      
      -v2: Update to reflect the retry-on-bypass logic change of the
           previous patch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4bfd482e
    • Tejun Heo's avatar
      blkcg: don't allow or retain configuration of missing devices · e56da7e2
      Tejun Heo authored
      blkcg is very peculiar in that it allows setting and remembering
      configurations for non-existent devices by maintaining separate data
      structures for configuration.
      
      This behavior is completely out of the usual norms and outright
      confusing; furthermore, it uses dev_t number to match the
      configuration to devices, which is unpredictable to begin with and
      becomes completely unuseable if EXT_DEVT is fully used.
      
      It is wholely unnecessary - we already have fully functional userland
      mechanism to program devices being hotplugged which has full access to
      device identification, connection topology and filesystem information.
      
      Add a new struct blkio_group_conf which contains all blkcg
      configurations to blkio_group and let blkio_group, which can be
      created iff the associated device exists and is removed when the
      associated device goes away, carry all configurations.
      
      Note that, after this patch, all newly created blkg's will always have
      the default configuration (unlimited for throttling and blkcg's weight
      for propio).
      
      This patch makes blkio_policy_node meaningless but doesn't remove it.
      The next patch will.
      
      -v2: Updated to retry after short sleep if blkg lookup/creation failed
           due to the queue being temporarily bypassed as indicated by
           -EBUSY return.  Pointed out by Vivek.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      e56da7e2
    • Tejun Heo's avatar
      blkcg: factor out blkio_group creation · cd1604fa
      Tejun Heo authored
      Currently both blk-throttle and cfq-iosched implement their own
      blkio_group creation code in throtl_get_tg() and cfq_get_cfqg().  This
      patch factors out the common code into blkg_lookup_create(), which
      returns ERR_PTR value so that transitional failures due to queue
      bypass can be distinguished from other failures.
      
      * New plkio_policy_ops methods blkio_alloc_group_fn() and
        blkio_link_group_fn added.  Both are transitional and will be
        removed once the blkg management code is fully moved into
        blk-cgroup.c.
      
      * blkio_alloc_group_fn() allocates policy-specific blkg which is
        usually a larger data structure with blkg as the first entry and
        intiailizes it.  Note that initialization of blkg proper, including
        percpu stats, is responsibility of blk-cgroup proper.
      
        Note that default config (weight, bps...) initialization is done
        from this method; otherwise, we end up violating locking order
        between blkcg and q locks via blkcg_get_CONF() functions.
      
      * blkio_link_group_fn() is called under queue_lock and responsible for
        linking the blkg to the queue.  blkcg side is handled by blk-cgroup
        proper.
      
      * The common blkg creation function is named blkg_lookup_create() and
        blkiocg_lookup_group() is renamed to blkg_lookup() for consistency.
        Also, throtl / cfq related functions are similarly [re]named for
        consistency.
      
      This simplifies blkcg policy implementations and enables further
      cleanup.
      
      -v2: Vivek noticed that blkg_lookup_create() incorrectly tested
           blk_queue_dead() instead of blk_queue_bypass() leading a user of
           the function ending up creating a new blkg on bypassing queue.
           This is a bug introduced while relocating bypass patches before
           this one.  Fixed.
      
      -v3: ERR_PTR patch folded into this one.  @for_root added to
           blkg_lookup_create() to allow creating root group on a bypassed
           queue during elevator switch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cd1604fa
    • Tejun Heo's avatar
      blkcg: use the usual get blkg path for root blkio_group · f51b802c
      Tejun Heo authored
      For root blkg, blk_throtl_init() was using throtl_alloc_tg()
      explicitly and cfq_init_queue() was manually initializing embedded
      cfqd->root_group, adding unnecessarily different code paths to blkg
      handling.
      
      Make both use the usual blkio_group get functions - throtl_get_tg()
      and cfq_get_cfqg() - for the root blkio_group too.  Note that
      blk_throtl_init() callsite is pushed downwards in
      blk_alloc_queue_node() so that @q is sufficiently initialized for
      throtl_get_tg().
      
      This simplifies root blkg handling noticeably for cfq and will allow
      further modularization of blkcg API.
      
      -v2: Vivek pointed out that using cfq_get_cfqg() won't work if
           CONFIG_CFQ_GROUP_IOSCHED is disabled.  Fix it by factoring out
           initialization of base part of cfqg into cfq_init_cfqg_base() and
           alloc/init/free explicitly if !CONFIG_CFQ_GROUP_IOSCHED.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f51b802c
    • Tejun Heo's avatar
      blkcg: add blkio_policy[] array and allow one policy per policy ID · 035d10b2
      Tejun Heo authored
      Block cgroup policies are maintained in a linked list and,
      theoretically, multiple policies sharing the same policy ID are
      allowed.
      
      This patch temporarily restricts one policy per plid and adds
      blkio_policy[] array which indexes registered policy types by plid.
      Both the restriction and blkio_policy[] array are transitional and
      will be removed once API cleanup is complete.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      035d10b2
    • Tejun Heo's avatar
      blkcg: use q and plid instead of opaque void * for blkio_group association · ca32aefc
      Tejun Heo authored
      blkgio_group is association between a block cgroup and a queue for a
      given policy.  Using opaque void * for association makes things
      confusing and hinders factoring of common code.  Use request_queue *
      and, if necessary, policy id instead.
      
      This will help block cgroup API cleanup.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ca32aefc
    • Tejun Heo's avatar
      blkcg: update blkg get functions take blkio_cgroup as parameter · 0a5a7d0e
      Tejun Heo authored
      In both blkg get functions - throtl_get_tg() and cfq_get_cfqg(),
      instead of obtaining blkcg of %current explicitly, let the caller
      specify the blkcg to use as parameter and make both functions hold on
      to the blkcg.
      
      This is part of block cgroup interface cleanup and will help making
      blkcg API more modular.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0a5a7d0e
    • Tejun Heo's avatar
      blkcg: move rcu_read_lock() outside of blkio_group get functions · 2a7f1244
      Tejun Heo authored
      rcu_read_lock() in throtl_get_tb() and cfq_get_cfqg() holds onto
      @blkcg while looking up blkg.  For API cleanup, the next patch will
      make the caller responsible for determining @blkcg to look blkg from
      and let them specify it as a parameter.  Move rcu read locking out to
      the callers to prepare for the change.
      
      -v2: Originally this patch was described as a fix for RCU read locking
           bug around @blkg, which Vivek pointed out to be incorrect.  It
           was from misunderstanding the role of rcu locking as protecting
           @blkg not @blkcg.  Patch description updated.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2a7f1244
    • Tejun Heo's avatar
      blkcg: shoot down blkio_groups on elevator switch · 72e06c25
      Tejun Heo authored
      Elevator switch may involve changes to blkcg policies.  Implement
      shoot down of blkio_groups.
      
      Combined with the previous bypass updates, the end goal is updating
      blkcg core such that it can ensure that blkcg's being affected become
      quiescent and don't have any per-blkg data hanging around before
      commencing any policy updates.  Until queues are made aware of the
      policies that applies to them, as an interim step, all per-policy blkg
      data will be shot down.
      
      * blk-throtl doesn't need this change as it can't be disabled for a
        live queue; however, update it anyway as the scheduled blkg
        unification requires this behavior change.  This means that
        blk-throtl configuration will be unnecessarily lost over elevator
        switch.  This oddity will be removed after blkcg learns to associate
        individual policies with request_queues.
      
      * blk-throtl dosen't shoot down root_tg.  This is to ease transition.
        Unified blkg will always have persistent root group and not shooting
        down root_tg for now eases transition to that point by avoiding
        having to update td->root_tg and is safe as blk-throtl can never be
        disabled
      
      -v2: Vivek pointed out that group list is not guaranteed to be empty
           on return from clear function if it raced cgroup removal and
           lost.  Fix it by waiting a bit and retrying.  This kludge will
           soon be removed once locking is updated such that blkg is never
           in limbo state between blkcg and request_queue locks.
      
           blk-throtl no longer shoots down root_tg to avoid breaking
           td->root_tg.
      
           Also, Nest queue_lock inside blkio_list_lock not the other way
           around to avoid introduce possible deadlock via blkcg lock.
      
      -v3: blkcg_clear_queue() repositioned and renamed to
           blkg_destroy_all() to increase consistency with later changes.
           cfq_clear_queue() updated to check q->elevator before
           dereferencing it to avoid NULL dereference on not fully
           initialized queues (used by later change).
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      72e06c25
    • Tejun Heo's avatar
      block: extend queue bypassing to cover blkcg policies · 6ecf23af
      Tejun Heo authored
      Extend queue bypassing such that dying queue is always bypassing and
      blk-throttle is drained on bypass.  With blkcg policies updated to
      test blk_queue_bypass() instead of blk_queue_dead(), this ensures that
      no bio or request is held by or going through blkcg policies on a
      bypassing queue.
      
      This will be used to implement blkg cleanup on elevator switches and
      policy changes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      6ecf23af
    • Tejun Heo's avatar
      block: implement blk_queue_bypass_start/end() · d732580b
      Tejun Heo authored
      Rename and extend elv_queisce_start/end() to
      blk_queue_bypass_start/end() which are exported and supports nesting
      via @q->bypass_depth.  Also add blk_queue_bypass() to test bypass
      state.
      
      This will be further extended and used for blkio_group management.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d732580b
    • Tejun Heo's avatar
      elevator: make elevator_init_fn() return 0/-errno · b2fab5ac
      Tejun Heo authored
      elevator_ops->elevator_init_fn() has a weird return value.  It returns
      a void * which the caller should assign to q->elevator->elevator_data
      and %NULL return denotes init failure.
      
      Update such that it returns integer 0/-errno and sets elevator_data
      directly as necessary.
      
      This makes the interface more conventional and eases further cleanup.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b2fab5ac
    • Tejun Heo's avatar
      elevator: clear auxiliary data earlier during elevator switch · 5a5bafdc
      Tejun Heo authored
      Elevator switch tries hard to keep as much as context until new
      elevator is ready so that it can revert to the original state if
      initializing the new elevator fails for some reason.  Unfortunately,
      with more auxiliary contexts to manage, this makes elevator init and
      exit paths too complex and fragile.
      
      This patch makes elevator_switch() unregister the current elevator and
      flush icq's before start initializing the new one.  As we still keep
      the old elevator itself, the only difference is that we lose icq's on
      rare occassions of switching failure, which isn't critical at all.
      
      Note that this makes explicit elevator parameter to
      elevator_init_queue() and __elv_register_queue() unnecessary as they
      always can use the current elevator.
      
      This patch enables block cgroup cleanups.
      
      -v2: blk_add_trace_msg() prints elevator name from @new_e instead of
           @e->type as the local variable no longer exists.  This caused
           build failure on CONFIG_BLK_DEV_IO_TRACE.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      5a5bafdc
    • Tejun Heo's avatar
      cfq: don't register propio policy if !CONFIG_CFQ_GROUP_IOSCHED · b95ada55
      Tejun Heo authored
      cfq has been registering zeroed blkio_poilcy_cfq if CFQ_GROUP_IOSCHED
      is disabled.  This fortunately doesn't collide with blk-throtl as
      BLKIO_POLICY_PROP is zero but is unnecessary and risky.  Just don't
      register it if not enabled.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b95ada55
    • Tejun Heo's avatar
      blkcg: make CONFIG_BLK_CGROUP bool · 32e380ae
      Tejun Heo authored
      Block cgroup core can be built as module; however, it isn't too useful
      as blk-throttle can only be built-in and cfq-iosched is usually the
      default built-in scheduler.  Scheduled blkcg cleanup requires calling
      into blkcg from block core.  To simplify that, disallow building blkcg
      as module by making CONFIG_BLK_CGROUP bool.
      
      If building blkcg core as module really matters, which I doubt, we can
      revisit it after blkcg API cleanup.
      
      -v2: Vivek pointed out that IOSCHED_CFQ was incorrectly updated to
           depend on BLK_CGROUP.  Fixed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      32e380ae
    • Tejun Heo's avatar
      block: blk-throttle should be drained regardless of q->elevator · b855b04a
      Tejun Heo authored
      Currently, blk_cleanup_queue() doesn't call elv_drain_elevator() if
      q->elevator doesn't exist; however, bio based drivers don't have
      elevator initialized but can still use blk-throttle.  This patch moves
      q->elevator test inside blk_drain_queue() such that only
      elv_drain_elevator() is skipped if !q->elevator.
      
      -v2: loop can have registered queue which has NULL request_fn.  Make
           sure we don't call into __blk_run_queue() in such cases.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarVivek Goyal <vgoyal@redhat.com>
      
      Fold in bug fix from Vivek.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b855b04a
  2. 03 Mar, 2012 1 commit
  3. 02 Mar, 2012 6 commits
    • Alan Stern's avatar
      Block: use a freezable workqueue for disk-event polling · 62d3c543
      Alan Stern authored
      This patch (as1519) fixes a bug in the block layer's disk-events
      polling.  The polling is done by a work routine queued on the
      system_nrt_wq workqueue.  Since that workqueue isn't freezable, the
      polling continues even in the middle of a system sleep transition.
      
      Obviously, polling a suspended drive for media changes and such isn't
      a good thing to do; in the case of USB mass-storage devices it can
      lead to real problems requiring device resets and even re-enumeration.
      
      The patch fixes things by creating a new system-wide, non-reentrant,
      freezable workqueue and using it for disk-events polling.
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      CC: <stable@kernel.org>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      62d3c543
    • Danny Kukawka's avatar
      drivers/block/DAC960: fix -Wuninitialized warning · cecd353a
      Danny Kukawka authored
      Set CommandMailbox with memset before use it. Fix for:
      
      drivers/block/DAC960.c: In function ‘DAC960_V1_EnableMemoryMailboxInterface’:
      arch/x86/include/asm/io.h:61:1: warning: ‘CommandMailbox.Bytes[12]’
       may be used uninitialized in this function [-Wuninitialized]
      drivers/block/DAC960.c:1175:30: note: ‘CommandMailbox.Bytes[12]’
       was declared here
      Signed-off-by: default avatarDanny Kukawka <danny.kukawka@bisect.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      cecd353a
    • Danny Kukawka's avatar
      drivers/block/DAC960: fix DAC960_V2_IOCTL_Opcode_T -Wenum-compare warning · bca505f1
      Danny Kukawka authored
      Fixed compiler warning:
      
      comparison between ‘DAC960_V2_IOCTL_Opcode_T’ and ‘enum <anonymous>’
      
      Renamed enum, added a new enum for SCSI_10.CommandOpcode in
      DAC960_V2_ProcessCompletedCommand().
      Signed-off-by: default avatarDanny Kukawka <danny.kukawka@bisect.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bca505f1
    • Stanislaw Gruszka's avatar
      block: fix __blkdev_get and add_disk race condition · 9f53d2fe
      Stanislaw Gruszka authored
      The following situation might occur:
      
      __blkdev_get:			add_disk:
      
      				register_disk()
      get_gendisk()
      
      disk_block_events()
      	disk->ev == NULL
      
      				disk_add_events()
      
      __disk_unblock_events()
      	disk->ev != NULL
      	--ev->block
      
      Then we unblock events, when they are suppose to be blocked. This can
      trigger events related block/genhd.c warnings, but also can crash in
      sd_check_events() or other places.
      
      I'm able to reproduce crashes with the following scripts (with
      connected usb dongle as sdb disk).
      
      <snip>
      DEV=/dev/sdb
      ENABLE=/sys/bus/usb/devices/1-2/bConfigurationValue
      
      function stop_me()
      {
      	for i in `jobs -p` ; do kill $i 2> /dev/null ; done
      	exit
      }
      
      trap stop_me SIGHUP SIGINT SIGTERM
      
      for ((i = 0; i < 10; i++)) ; do
      	while true; do fdisk -l $DEV  2>&1 > /dev/null ; done &
      done
      
      while true ; do
      echo 1 > $ENABLE
      sleep 1
      echo 0 > $ENABLE
      done
      </snip>
      
      I use the script to verify patch fixing oops in sd_revalidate_disk
      http://marc.info/?l=linux-scsi&m=132935572512352&w=2
      Without Jun'ichi Nomura patch titled "Fix NULL pointer dereference in
      sd_revalidate_disk" or this one, script easily crash kernel within
      a few seconds. With both patches applied I do not observe crash.
      Unfortunately after some time (dozen of minutes), script will hung in:
      
      [ 1563.906432]  [<c08354f5>] schedule_timeout_uninterruptible+0x15/0x20
      [ 1563.906437]  [<c04532d5>] msleep+0x15/0x20
      [ 1563.906443]  [<c05d60b2>] blk_drain_queue+0x32/0xd0
      [ 1563.906447]  [<c05d6e00>] blk_cleanup_queue+0xd0/0x170
      [ 1563.906454]  [<c06d278f>] scsi_free_queue+0x3f/0x60
      [ 1563.906459]  [<c06d7e6e>] __scsi_remove_device+0x6e/0xb0
      [ 1563.906463]  [<c06d4aff>] scsi_forget_host+0x4f/0x60
      [ 1563.906468]  [<c06cd84a>] scsi_remove_host+0x5a/0xf0
      [ 1563.906482]  [<f7f030fb>] quiesce_and_remove_host+0x5b/0xa0 [usb_storage]
      [ 1563.906490]  [<f7f03203>] usb_stor_disconnect+0x13/0x20 [usb_storage]
      
      Anyway I think this patch is some step forward.
      
      As drawback, I do not teardown on sysfs file create error, because I do
      not know how to nullify disk->ev (since it can be used). However add_disk
      error handling practically does not exist too, and things will work
      without this sysfs file, except events will not be exported to user
      space.
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9f53d2fe
    • Muthukumar R's avatar
      block: Fix setting bio flags in drivers (sd_dif/floppy) · 12ebffd1
      Muthukumar R authored
      Fix setting bio flags in drivers (sd_dif/floppy).
      Signed-off-by: default avatarMuthukumar R <muthur@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      12ebffd1
    • Jun'ichi Nomura's avatar
      block: Fix NULL pointer dereference in sd_revalidate_disk · fe316bf2
      Jun'ichi Nomura authored
      Since 2.6.39 (1196f8b8), when a driver returns -ENOMEDIUM for open(),
      __blkdev_get() calls rescan_partitions() to remove
      in-kernel partition structures and raise KOBJ_CHANGE uevent.
      
      However it ends up calling driver's revalidate_disk without open
      and could cause oops.
      
      In the case of SCSI:
      
        process A                  process B
        ----------------------------------------------
        sys_open
          __blkdev_get
            sd_open
              returns -ENOMEDIUM
                                   scsi_remove_device
                                     <scsi_device torn down>
            rescan_partitions
              sd_revalidate_disk
                <oops>
      Oopses are reported here:
      http://marc.info/?l=linux-scsi&m=132388619710052
      
      This patch separates the partition invalidation from rescan_partitions()
      and use it for -ENOMEDIUM case.
      Reported-by: default avatarHuajun Li <huajun.li.lee@gmail.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      fe316bf2
  4. 15 Feb, 2012 3 commits
    • Tejun Heo's avatar
      block: exit_io_context() should call elevator_exit_icq_fn() · 621032ad
      Tejun Heo authored
      While updating locking, b2efa052 "block, cfq: unlink
      cfq_io_context's immediately" moved elevator_exit_icq_fn() invocation
      from exit_io_context() to the final ioc put.  While this doesn't cause
      catastrophic failure, it effectively removes task exit notification to
      elevator and cause noticeable IO performance degradation with CFQ.
      
      On task exit, CFQ used to immediately expire the slice if it was being
      used by the exiting task as no more IO would be issued by the task;
      however, after b2efa052, the notification is lost and disk could sit
      idle needlessly, leading to noticeable IO performance degradation for
      certain workloads.
      
      This patch renames ioc_exit_icq() to ioc_destroy_icq(), separates
      elevator_exit_icq_fn() invocation into ioc_exit_icq() and invokes it
      from exit_io_context().  ICQ_EXITED flag is added to avoid invoking
      the callback more than once for the same icq.
      
      Walking icq_list from ioc side and invoking elevator callback requires
      reverse double locking.  This may be better implemented using RCU;
      unfortunately, using RCU isn't trivial.  e.g. RCU protection would
      need to cover request_queue and queue_lock switch on cleanup makes
      grabbing queue_lock from RCU unsafe.  Reverse double locking should
      do, at least for now.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-and-bisected-by: default avatarShaohua Li <shli@kernel.org>
      LKML-Reference: <CANejiEVzs=pUhQSTvUppkDcc2TNZyfohBRLygW5zFmXyk5A-xQ@mail.gmail.com>
      Tested-by: default avatarShaohua Li <shaohua.li@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      621032ad
    • Tejun Heo's avatar
      block: simplify ioc_release_fn() · 2274b029
      Tejun Heo authored
      Reverse double lock dancing in ioc_release_fn() can be simplified by
      just using trylock on the queue_lock and back out from ioc lock on
      trylock failure.  Simplify it.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Tested-by: default avatarShaohua Li <shaohua.li@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      2274b029
    • Tejun Heo's avatar
      block: replace icq->changed with icq->flags · d705ae6b
      Tejun Heo authored
      icq->changed was used for ICQ_*_CHANGED bits.  Rename it to flags and
      access it under ioc->lock instead of using atomic bitops.
      ioc_get_changed() is added so that the changed part can be fetched and
      cleared as before.
      
      icq->flags will be used to carry other flags.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Tested-by: default avatarShaohua Li <shaohua.li@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      d705ae6b
  5. 14 Feb, 2012 12 commits
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · 7ada1dd6
      Linus Torvalds authored
      One small bug fix from Axel plus a fix for a build failure in unrealistic
      but commonly built configs which for some reason manage to survive for
      an awfully long time in -next without any reports.
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
        regulator: Fix getting voltage in max8649_enable_time()
        regulator: Fix mc13xxx regulator modular build (again)
      7ada1dd6
    • Linus Torvalds's avatar
      Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc · ebf4bcbd
      Linus Torvalds authored
      Quoth BenH:
       "Here are a few powerpc fixes for 3.3, all pretty trivial.  I also
        added the patch to define GET_IP/SET_IP so we can use some more
        asm-generic goodness."
      
      * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
        powerpc/pseries/eeh: Fix crash when error happens during device probe
        powerpc/pseries: Fix partition migration hang in stop_topology_update
        powerpc/powernv: Disable interrupts while taking phb->lock
        powerpc: Fix WARN_ON in decrementer_check_overflow
        powerpc/wsp: Fix IRQ affinity setting
        powerpc: Implement GET_IP/SET_IP
        powerpc/wsp: Permanently enable PCI class code workaround
      ebf4bcbd
    • Linus Torvalds's avatar
      Merge tag 'mmc-fixes-for-3.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc · 8b36ac50
      Linus Torvalds authored
      MMC fixes for 3.3-rc4:
       * The most visible fix here is against a regression introduced in 3.3-rc1
         that ran cards in Ultra High Speed mode even when they failed to initialize
         in that mode, leading to lower-speed cards failing to mount.
       * A lockdep warning introduced in 3.3-rc1 is fixed.
       * Various other small driver fixes, most notably for a NULL dereference
         when using highmem with dw_mmc.
      
      * tag 'mmc-fixes-for-3.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
        mmc: dw_mmc: Fix PIO mode with support of highmem
        mmc: atmel-mci: save and restore sdioirq when soft reset is performed
        mmc: block: Init ro_lock sysfs attr to fix lockdep warnings
        mmc: sh_mmcif: fix late delayed work initialisation
        mmc: tmio_mmc: fix card eject during IO with DMA
        mmc: core: Fix comparison issue in mmc_compare_ext_csds
        mmc: core: Fix PowerOff Notify suspend/resume
        mmc: sdhci-pci: set Medfield SDIO as non-removable
        mmc: core: add the capability for broken voltage
        mmc: core: Fix low speed mmc card detection failure
        mmc: esdhc: set the timeout to the max value
        mmc: esdhc: add PIO mode support
        mmc: core: Ensure clocks are always enabled before host interaction
        mmc: of_mmc_spi: fix little endian support
        mmc: core: UHS sdio card that fails should not exceed 50MHz
        mmc: esdhc: fix errors when booting kernel on Freescale eSDHC version 2.3
      8b36ac50
    • Linus Torvalds's avatar
      Merge tag 'stable/for-linus-fixes-3.3-rc3' of... · 694ce18e
      Linus Torvalds authored
      Merge tag 'stable/for-linus-fixes-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
      
      Two fixes for VCPU offlining; One to fix the string format exposed
      by the xen-pci[front|back] to conform to the one used in majority of
      PCI drivers; Two fixes to make the code more resilient to invalid
      configurations.
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      
      * tag 'stable/for-linus-fixes-3.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
        xenbus_dev: add missing error check to watch handling
        xen/pci[front|back]: Use %d instead of %1x for displaying PCI devfn.
        xen pvhvm: do not remap pirqs onto evtchns if !xen_have_vector_callback
        xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic.
        xen/bootup: During bootup suppress XENBUS: Unable to read cpu state
      694ce18e
    • Linus Torvalds's avatar
      Merge tag 'sound-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 13d26193
      Linus Torvalds authored
      sound fixes for 3.3-rc4
      
      Basically all small fixes suited as rc4: a few HD-audio regression fixes,
      a stable fix for an old Dell laptop with intel8x0, and a simple fix for
      ASoC fsi.
      
      * tag 'sound-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: intel8x0: Fix default inaudible sound on Gateway M520
        ALSA: hda - Fix silent speaker output on Acer Aspire 6935
        ALSA: hda - Fix initialization of secondary capture source on VT1705
        ASoC: fsi: fixup fsi_pointer() calculation method
        ALSA: hda - Fix mute-LED VREF value for new HP laptops
      13d26193
    • Daniel T Chen's avatar
      ALSA: intel8x0: Fix default inaudible sound on Gateway M520 · 27c3afe6
      Daniel T Chen authored
      BugLink: https://bugs.launchpad.net/bugs/930842
      
      The reporter states that audio is inaudible by default without muting
      'External Amplifier'. Add a quirk to handle his SSID so that changing
      the control is not necessary.
      Reported-and-tested-by: default avatarBenjamin Carlson <elderbubba0810@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: default avatarDaniel T Chen <crimsun@ubuntu.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      27c3afe6
    • Takashi Iwai's avatar
      Merge tag 'asoc-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus · 675c67af
      Takashi Iwai authored
      A simple fix from Morimoto-san for the pointer() operation in the FSI
      driver.
      675c67af
    • Linus Torvalds's avatar
      Merge git://git.samba.org/sfrench/cifs-2.6 · ce5afed9
      Linus Torvalds authored
      * git://git.samba.org/sfrench/cifs-2.6:
        cifs: don't return error from standard_receive3 after marking response malformed
        cifs: request oplock when doing open on lookup
        cifs: fix error handling when cifscreds key payload is an error
      ce5afed9
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · ca81a621
      Linus Torvalds authored
      This updates the sha512 fix so that it doesn't cause excessive stack
      usage on i386.  This is done by reverting to the original code, and
      avoiding the W duplication by moving its initialisation into the loop.
      
      As the underlying code is in fact the one that we have used for years,
      I'm pushing this now instead of postponing to the next cycle.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: sha512 - Avoid stack bloat on i386
        crypto: sha512 - Use binary and instead of modulus
      ca81a621
    • Thadeu Lima de Souza Cascardo's avatar
      powerpc/pseries/eeh: Fix crash when error happens during device probe · 778a785f
      Thadeu Lima de Souza Cascardo authored
      EEH may happen during a PCI driver probe. If the driver is trying to
      access some register in a loop, the EEH code will try to print the
      driver name. But the driver pointer in struct pci_dev is not set until
      probe returns successfully.
      
      Use a function to test if the device and the driver pointer is NULL
      before accessing the driver's name.
      Signed-off-by: default avatarThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      778a785f
    • Brian King's avatar
      powerpc/pseries: Fix partition migration hang in stop_topology_update · 444080d1
      Brian King authored
      This fixes a hang that was observed during live partition migration.
      Since stop_topology_update must not be called from an interrupt
      context, call it earlier in the migration process. The hang observed
      can be seen below:
      
      WARNING: at kernel/timer.c:1011
      Modules linked in: ip6t_LOG xt_tcpudp xt_pkttype ipt_LOG xt_limit ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables ipv6 fuse loop ibmveth sg ext3 jbd mbcache raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid10 raid1 raid0 scsi_dh_alua scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc dm_round_robin dm_multipath scsi_dh sd_mod crc_t10dif ibmvfc scsi_transport_fc scsi_tgt scsi_mod dm_snapshot dm_mod
      NIP: c0000000000c52d8 LR: c00000000004be28 CTR: 0000000000000000
      REGS: c00000005ffd77d0 TRAP: 0700   Not tainted  (3.2.0-git-00001-g07d106d0)
      MSR: 8000000000021032 <ME,CE,IR,DR>  CR: 48000084  XER: 00000001
      CFAR: c00000000004be20
      TASK = c00000005ec78860[0] 'swapper/3' THREAD: c00000005ec98000 CPU: 3
      GPR00: 0000000000000001 c00000005ffd7a50 c000000000fbbc98 c000000000ec8340
      GPR04: 00000000282a0020 0000000000000000 0000000000004000 0000000000000101
      GPR08: 0000000000000012 c00000005ffd4000 0000000000000020 c000000000f3ba88
      GPR12: 0000000000000000 c000000007f40900 0000000000000001 0000000000000004
      GPR16: 0000000000000001 0000000000000000 0000000000000000 c000000001022310
      GPR20: 0000000000000001 0000000000000000 0000000000200200 c000000001029e14
      GPR24: 0000000000000000 0000000000000001 0000000000000040 c00000003f74bc80
      GPR28: c00000003f74bc84 c000000000f38038 c000000000f16b58 c000000000ec8340
      NIP [c0000000000c52d8] .del_timer_sync+0x28/0x60
      LR [c00000000004be28] .stop_topology_update+0x20/0x38
      Call Trace:
      [c00000005ffd7a50] [c00000005ec78860] 0xc00000005ec78860 (unreliable)
      [c00000005ffd7ad0] [c00000000004be28] .stop_topology_update+0x20/0x38
      [c00000005ffd7b40] [c000000000028378] .__rtas_suspend_last_cpu+0x58/0x260
      [c00000005ffd7bf0] [c0000000000fa230] .generic_smp_call_function_interrupt+0x160/0x358
      [c00000005ffd7cf0] [c000000000036ec8] .smp_ipi_demux+0x88/0x100
      [c00000005ffd7d80] [c00000000005c154] .icp_hv_ipi_action+0x5c/0x80
      [c00000005ffd7e00] [c00000000012a088] .handle_irq_event_percpu+0x100/0x318
      [c00000005ffd7f00] [c00000000012e774] .handle_percpu_irq+0x84/0xd0
      [c00000005ffd7f90] [c000000000022ba8] .call_handle_irq+0x1c/0x2c
      [c00000005ec9ba20] [c00000000001157c] .do_IRQ+0x22c/0x2a8
      [c00000005ec9bae0] [c0000000000054bc] hardware_interrupt_entry+0x18/0x1c
      Exception: 501 at .cpu_idle+0x194/0x2f8
          LR = .cpu_idle+0x194/0x2f8
      [c00000005ec9bdd0] [c000000000017e58] .cpu_idle+0x188/0x2f8 (unreliable)
      [c00000005ec9be90] [c00000000067ec18] .start_secondary+0x3e4/0x524
      [c00000005ec9bf90] [c0000000000093e8] .start_secondary_prolog+0x10/0x14
      Instruction dump:
      ebe1fff8 4e800020 fbe1fff8 7c0802a6 f8010010 7c7f1b78 f821ff81 78290464
      80090014 5400019e 7c0000d0 78000fe0 <0b000000> 4800000c 7c210b78 7c421378
      Signed-off-by: default avatarBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      444080d1
    • Michael Ellerman's avatar
      powerpc/powernv: Disable interrupts while taking phb->lock · f1c853b5
      Michael Ellerman authored
      We need to disable interrupts when taking the phb->lock. Otherwise
      we could deadlock with pci_lock taken from an interrupt.
      Signed-off-by: default avatarMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f1c853b5