1. 12 Aug, 2010 9 commits
    • Kiyoshi Ueda's avatar
      dm: separate device deletion from dm_put · 3f77316d
      Kiyoshi Ueda authored
      This patch separates the device deletion code from dm_put()
      to make sure the deletion happens in the process context.
      
      By this patch, device deletion always occurs in an ioctl (process)
      context and dm_put() can be called in interrupt context.
      As a result, the request-based dm's bad dm_put() usage pointed out
      by Mikulas below disappears.
          http://marc.info/?l=dm-devel&m=126699981019735&w=2
      
      Without this patch, I confirmed there is a case to crash the system:
          dm_put() => dm_table_destroy() => vfree() => BUG_ON(in_interrupt())
      
      Some more backgrounds and details:
      In request-based dm, a device opener can remove a mapped_device
      while the last request is still completing, because bios in the last
      request complete first and then the device opener can close and remove
      the mapped_device before the last request completes:
        CPU0                                          CPU1
        =================================================================
        <<INTERRUPT>>
        blk_end_request_all(clone_rq)
          blk_update_request(clone_rq)
            bio_endio(clone_bio) == end_clone_bio
              blk_update_request(orig_rq)
                bio_endio(orig_bio)
                                                      <<I/O completed>>
                                                      dm_blk_close()
                                                      dev_remove()
                                                        dm_put(md)
                                                          <<Free md>>
         blk_finish_request(clone_rq)
           ....
           dm_end_request(clone_rq)
             free_rq_clone(clone_rq)
             blk_end_request_all(orig_rq)
             rq_completed(md)
      
      So request-based dm used dm_get()/dm_put() to hold md for each I/O
      until its request completion handling is fully done.
      However, the final dm_put() can call the device deletion code which
      must not be run in interrupt context and may cause kernel panic.
      
      To solve the problem, this patch moves the device deletion code,
      dm_destroy(), to predetermined places that is actually deleting
      the mapped_device in ioctl (process) context, and changes dm_put()
      just to decrement the reference count of the mapped_device.
      By this change, dm_put() can be used in any context and the symmetric
      model below is introduced:
          dm_create():  create a mapped_device
          dm_destroy(): destroy a mapped_device
          dm_get():     increment the reference count of a mapped_device
          dm_put():     decrement the reference count of a mapped_device
      
      dm_destroy() waits for all references of the mapped_device to disappear,
      then deletes the mapped_device.
      
      dm_destroy() uses active waiting with msleep(1), since deleting
      the mapped_device isn't performance-critical task.
      And since at this point, nobody opens the mapped_device and no new
      reference will be taken, the pending counts are just for racing
      completing activity and will eventually decrease to zero.
      
      For the unlikely case of the forced module unload, dm_destroy_immediate(),
      which doesn't wait and forcibly deletes the mapped_device, is also
      introduced and used in dm_hash_remove_all().  Otherwise, "rmmod -f"
      may be stuck and never return.
      And now, because the mapped_device is deleted at this point, subsequent
      accesses to the mapped_device may cause NULL pointer references.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      3f77316d
    • Kiyoshi Ueda's avatar
      dm ioctl: release _hash_lock between devices in remove_all · 98f33285
      Kiyoshi Ueda authored
      This patch changes dm_hash_remove_all() to release _hash_lock when
      removing a device.  After removing the device, dm_hash_remove_all()
      takes _hash_lock and searches the hash from scratch again.
      
      This patch is a preparation for the next patch, which changes device
      deletion code to wait for md reference to be 0.  Without this patch,
      the wait in the next patch may cause AB-BA deadlock:
        CPU0                                CPU1
        -----------------------------------------------------------------------
        dm_hash_remove_all()
          down_write(_hash_lock)
                                            table_status()
                                              md = find_device()
                                                     dm_get(md)
                                                       <increment md->holders>
                                              dm_get_live_or_inactive_table()
                                                dm_get_inactive_table()
                                                  down_write(_hash_lock)
          <in the md deletion code>
            <wait for md->holders to be 0>
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      98f33285
    • Kiyoshi Ueda's avatar
      dm: prevent access to md being deleted · abdc568b
      Kiyoshi Ueda authored
      This patch prevents access to mapped_device which is being deleted.
      
      Currently, even after a mapped_device has been removed from the hash,
      it could be accessed through idr_find() using minor number.
      That could cause a race and NULL pointer reference below:
        CPU0                          CPU1
        ------------------------------------------------------------------
        dev_remove(param)
          down_write(_hash_lock)
          dm_lock_for_deletion(md)
            spin_lock(_minor_lock)
            set_bit(DMF_DELETING)
            spin_unlock(_minor_lock)
          __hash_remove(hc)
          up_write(_hash_lock)
                                      dev_status(param)
                                        md = find_device(param)
                                               down_read(_hash_lock)
                                               __find_device_hash_cell(param)
                                                 dm_get_md(param->dev)
                                                   md = dm_find_md(dev)
                                                          spin_lock(_minor_lock)
                                                          md = idr_find(MINOR(dev))
                                                          spin_unlock(_minor_lock)
          dm_put(md)
            free_dev(md)
                                                   dm_get(md)
                                               up_read(_hash_lock)
                                        __dev_status(md, param)
                                        dm_put(md)
      
      This patch fixes such problems.
      Signed-off-by: default avatarKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      abdc568b
    • Peter Rajnoha's avatar
      dm ioctl: return uevent flag after rename · 856a6f1d
      Peter Rajnoha authored
      All the dm ioctls that generate uevents set the DM_UEVENT_GENERATED flag so
      that userspace knows whether or not to wait for a uevent to be processed
      before continuing,
      
      The dm rename ioctl sets this flag but was not structured to return it
      to userspace.  This patch restructures the rename ioctl processing to
      behave like the other ioctls that return data and so fix this.
      Signed-off-by: default avatarPeter Rajnoha <prajnoha@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      856a6f1d
    • Alasdair G Kergon's avatar
      dm ioctl: make __dev_status void · 094ea9a0
      Alasdair G Kergon authored
      __dev_status() cannot fail so make it void and simplify callers.
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      094ea9a0
    • Peter Rajnoha's avatar
      dm ioctl: remove __dev_status from geometry and target message · 6be54494
      Peter Rajnoha authored
      Remove useless __dev_status call while processing an ioctl that sets up
      device geometry and target message.  The data is not returned to
      userspace so there is no point collecting it and in the case of
      target_message it is collected before processing the message so if it
      did return it might be stale.
      Signed-off-by: default avatarPeter Rajnoha <prajnoha@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      6be54494
    • Mikulas Patocka's avatar
      dm snapshot: test chunk size against both origin and snapshot · c2411045
      Mikulas Patocka authored
      Validate chunk size against both origin and snapshot sector size
      
      Don't allow chunk size smaller than either origin or snapshot logical
      sector size. Reading or writing data not aligned to sector size is not
      allowed and causes immediate errors.
      
      This requires us to open the origin before initialising the
      exception store and to export dm_snap_origin.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      c2411045
    • Mikulas Patocka's avatar
      dm snapshot: iterate origin and cow devices · 1e5554c8
      Mikulas Patocka authored
      Iterate both origin and snapshot devices
      
      iterate_devices method should call the callback for all the devices where
      the bio may be remapped. Thus, snapshot_iterate_devices should call the callback
      for both snapshot and origin underlying devices because it remaps some bios
      to the snapshot and some to the origin.
      
      snapshot_iterate_devices called the callback only for the origin device.
      This led to badly calculated device limits if snapshot and origin were placed
      on different types of disks.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      1e5554c8
    • Alasdair G Kergon's avatar
      dm mpath: fix NULL pointer dereference when path parameters missing · 6bbf79a1
      Alasdair G Kergon authored
      multipath_ctr() forgets to return an error after detecting
      missing path parameters.  Fix this.
      Signed-off-by: default avatarPatrick LoPresti <lopresti@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      6bbf79a1
  2. 11 Aug, 2010 31 commits