1. 02 Apr, 2009 9 commits
    • Alasdair G Kergon's avatar
      dm table: fix upgrade mode race · 570b9d96
      Alasdair G Kergon authored
      upgrade_mode() sets bdev to NULL temporarily, and does not have any
      locking to exclude anything from seeing that NULL.
      
      In dm_table_any_congested() bdev_get_queue() can dereference that NULL and
      cause a reported oops.
      
      Fix this by not changing that field during the mode upgrade.
      
      Cc: stable@kernel.org
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      570b9d96
    • Jun'ichi Nomura's avatar
      dm: path selector use module refcount directly · aea90588
      Jun'ichi Nomura authored
      Fix refcount corruption in dm-path-selector
      
      Refcounting with non-atomic ops under shared lock will corrupt the counter
      in multi-processor system and may trigger BUG_ON().
      Use module refcount.
      # same approach as dm-target-use-module-refcount-directly.patch here
      # https://www.redhat.com/archives/dm-devel/2008-December/msg00075.html
      
      Typical oops:
        kernel BUG at linux-2.6.29-rc3/drivers/md/dm-path-selector.c:90!
        Pid: 11148, comm: dmsetup Not tainted 2.6.29-rc3-nm #1
        dm_put_path_selector+0x4d/0x61 [dm_multipath]
        Call Trace:
         [<ffffffffa031d3f9>] free_priority_group+0x33/0xb3 [dm_multipath]
         [<ffffffffa031d4aa>] free_multipath+0x31/0x67 [dm_multipath]
         [<ffffffffa031d50d>] multipath_dtr+0x2d/0x32 [dm_multipath]
         [<ffffffffa015d6c2>] dm_table_destroy+0x64/0xd8 [dm_mod]
         [<ffffffffa015b73a>] __unbind+0x46/0x4b [dm_mod]
         [<ffffffffa015b79f>] dm_swap_table+0x60/0x14d [dm_mod]
         [<ffffffffa015f963>] dev_suspend+0xfd/0x177 [dm_mod]
         [<ffffffffa0160250>] dm_ctl_ioctl+0x24c/0x29c [dm_mod]
         [<ffffffff80288cd3>] ? get_page_from_freelist+0x49c/0x61d
         [<ffffffffa015f866>] ? dev_suspend+0x0/0x177 [dm_mod]
         [<ffffffff802bf05c>] vfs_ioctl+0x2a/0x77
         [<ffffffff802bf4f1>] do_vfs_ioctl+0x448/0x4a0
         [<ffffffff802bf5a0>] sys_ioctl+0x57/0x7a
         [<ffffffff8020c05b>] system_call_fastpath+0x16/0x1b
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      aea90588
    • Cheng Renquan's avatar
      dm target: use module refcount directly · 5642b8a6
      Cheng Renquan authored
      The tt_internal's 'use' field is superfluous: the module's refcount can do
      the work properly.  An acceptable side-effect is that this increases the
      reference counts reported by 'lsmod'.
      
      Remove the superfluous test when removing a target module.
      
      [Crash possible without this on SMP - agk]
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarCheng Renquan <crquan@gmail.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Reviewed-by: default avatarAlasdair G Kergon <agk@redhat.com>
      Reviewed-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      5642b8a6
    • Mikulas Patocka's avatar
      dm snapshot: avoid having two exceptions for the same chunk · 35bf659b
      Mikulas Patocka authored
      We need to check if the exception was completed after dropping the lock.
      
      After regaining the lock, __find_pending_exception checks if the exception
      was already placed into &s->pending hash.
      
      But we don't check if the exception was already completed and placed into
      &s->complete hash. If the process waiting in alloc_pending_exception was
      delayed at this point because of a scheduling latency and the exception
      was meanwhile completed, we'd miss that and allocate another pending
      exception for already completed chunk.
      
      It would lead to a situation where two records for the same chunk exist
      and potential data corruption because multiple snapshot I/Os to the
      affected chunk could be redirected to different locations in the
      snapshot.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      35bf659b
    • Mikulas Patocka's avatar
      dm snapshot: avoid dropping lock in __find_pending_exception · c6621392
      Mikulas Patocka authored
      It is uncommon and bug-prone to drop a lock in a function that is called with
      the lock held, so this is moved to the caller.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      c6621392
    • Mikulas Patocka's avatar
      dm snapshot: refactor __find_pending_exception · 2913808e
      Mikulas Patocka authored
      Move looking-up of a pending exception from __find_pending_exception to another
      function.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      2913808e
    • Mikulas Patocka's avatar
      dm io: make sync_io uninterruptible · b64b6bf4
      Mikulas Patocka authored
      If someone sends signal to a process performing synchronous dm-io call,
      the kernel may crash.
      
      The function sync_io attempts to exit with -EINTR if it has pending signal,
      however the structure "io" is allocated on stack, so already submitted io
      requests end up touching unallocated stack space and corrupting kernel memory.
      
      sync_io sets its state to TASK_UNINTERRUPTIBLE, so the signal can't break out
      of io_schedule() --- however, if the signal was pending before sync_io entered
      while (1) loop, the corruption of kernel memory will happen.
      
      There is no way to cancel in-progress IOs, so the best solution is to ignore
      signals at this point.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      b64b6bf4
    • Mikulas Patocka's avatar
      dm raid1: switch read_record from kmalloc to slab to save memory · 95f8fac8
      Mikulas Patocka authored
      With my previous patch to save bi_io_vec, the size of dm_raid1_read_record
      is significantly increased (the vector list takes 3072 bytes on 32-bit machines
      and 4096 bytes on 64-bit machines).
      
      The structure dm_raid1_read_record used to be allocated with kmalloc,
      but kmalloc aligns the size on the next power-of-two so an object
      slightly greater than 4096 will allocate 8192 bytes of memory and half of
      that memory will be wasted.
      
      This patch turns kmalloc into a slab cache which doesn't have this
      padding so it will reduce the memory consumed.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      95f8fac8
    • Mikulas Patocka's avatar
      dm: preserve bi_io_vec when resubmitting bios · a920f6b3
      Mikulas Patocka authored
      Device mapper saves and restores various fields in the bio, but it doesn't save
      bi_io_vec.  If the device driver modifies this after a partially successful
      request, dm-raid1 and dm-multipath may attempt to resubmit a bio that has
      bi_size inconsistent with the size of vector.
      
      To make requests resubmittable in dm-raid1 and dm-multipath, we must save
      and restore the bio vector as well.
      
      To reduce the memory overhead involved in this, we do not save the pages in a
      vector and use a 16-bit field size if the page size is less than 65536.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      a920f6b3
  2. 01 Apr, 2009 31 commits