1. 17 Oct, 2014 1 commit
    • Mikulas Patocka's avatar
      dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks · 9d28eb12
      Mikulas Patocka authored
      The shrinker uses gfp flags to indicate what kind of operation can the
      driver wait for. If __GFP_IO flag is present, the driver can wait for
      block I/O operations, if __GFP_FS flag is present, the driver can wait on
      operations involving the filesystem.
      
      dm-bufio tested for __GFP_IO. However, dm-bufio can run on a loop block
      device that makes calls into the filesystem. If __GFP_IO is present and
      __GFP_FS isn't, dm-bufio could still block on filesystem operations if it
      runs on a loop block device.
      
      The change from __GFP_IO to __GFP_FS supposedly fixes one observed (though
      unreproducible) deadlock involving dm-bufio and loop device.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      9d28eb12
  2. 11 Oct, 2014 1 commit
  3. 06 Oct, 2014 9 commits
    • Alexey Khoroshilov's avatar
      dm log userspace: fix memory leak in dm_ulog_tfr_init failure path · 56ec16cb
      Alexey Khoroshilov authored
      If cn_add_callback() fails in dm_ulog_tfr_init(), it does not
      deallocate prealloced memory but calls cn_del_callback().
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Reviewed-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      56ec16cb
    • Mikulas Patocka's avatar
      dm bufio: when done scanning return from __scan immediately · 0e825862
      Mikulas Patocka authored
      When __scan frees the required number of buffer entries that the
      shrinker requested (nr_to_scan becomes zero) it must return.  Before
      this fix the __scan code exited only the inner loop and continued in the
      outer loop -- which could result in reduced performance due to extra
      buffers being freed (e.g. unnecessarily evicted thinp metadata needing
      to be synchronously re-read into bufio's cache).
      
      Also, move dm_bufio_cond_resched to __scan's inner loop, so that
      iterating the bufio client's lru lists doesn't result in scheduling
      latency.
      Reported-by: default avatarJoe Thornber <thornber@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.2+
      0e825862
    • Joe Thornber's avatar
      dm bufio: update last_accessed when relinking a buffer · eb76faf5
      Joe Thornber authored
      The 'last_accessed' member of the dm_buffer structure was only set when
      the the buffer was created.  This led to each buffer being discarded
      after dm_bufio_max_age time even if it was used recently.  In practice
      this resulted in all thinp metadata being evicted soon after being read
      -- this is particularly problematic for metadata intensive workloads
      like multithreaded small random IO.
      
      'last_accessed' is now updated each time the buffer is moved to the head
      of the LRU list, so the buffer is now properly discarded if it was not
      used in dm_bufio_max_age time.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v3.2+
      eb76faf5
    • Heinz Mauelshagen's avatar
      dm raid: add discard support for RAID levels 4, 5 and 6 · 48cf06bc
      Heinz Mauelshagen authored
      In case of RAID levels 4, 5 and 6 we have to verify each RAID members'
      ability to zero data on discards to avoid stripe data corruption -- if
      discard_zeroes_data is not set for each RAID member discard support must
      be disabled.  But given the uncertainty of whether or not a RAID member
      properly supports zeroing data on discard we require the user to
      explicitly allow discard support on RAID levels 4, 5, and 6 by setting
      a dm-raid module paramter, e.g.: dm-raid.devices_handle_discard_safely=Y
      Otherwise, discards could cause data corruption on RAID4/5/6.
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      48cf06bc
    • Heinz Mauelshagen's avatar
      dm raid: add discard support for RAID levels 1 and 10 · 75b8e04b
      Heinz Mauelshagen authored
      Discard support is not enabled for RAID levels 4, 5, and 6 at this time
      due to concerns about unreliable discard_zeroes_data support on some
      hardware.  Otherwise, discards could cause stripe data corruption
      (classic example of bad apples spoiling the bunch).
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      75b8e04b
    • Benjamin Marzinski's avatar
      dm: allow active and inactive tables to share dm_devs · 86f1152b
      Benjamin Marzinski authored
      Until this change, when loading a new DM table, DM core would re-open
      all of the devices in the DM table.  Now, DM core will avoid redundant
      device opens (and closes when destroying the old table) if the old
      table already has a device open using the same mode.  This is achieved
      by managing reference counts on the table_devices that DM core now
      stores in the mapped_device structure (rather than in the dm_table
      structure).  So a mapped_device's active and inactive dm_tables' dm_dev
      lists now just point to the dm_devs stored in the mapped_device's
      table_devices list.
      
      This improvement in DM core's device reference counting has the
      side-effect of fixing a long-standing limitation of the multipath
      target: a DM multipath table couldn't include any paths that were unusable
      (failed).  For example: if all paths have failed and you add a new,
      working, path to the table; you can't use it since the table load would
      fail due to it still containing failed paths.  Now a re-load of a
      multipath table can include failed devices and when those devices become
      active again they can be used instantly.
      
      The device list code in dm.c isn't a straight copy/paste from the code in
      dm-table.c, but it's very close (aside from some variable renames).  One
      subtle difference is that find_table_device for the tables_devices list
      will only match devices with the same name and mode.  This is because we
      don't want to upgrade a device's mode in the active table when an
      inactive table is loaded.
      
      Access to the mapped_device structure's tables_devices list requires a
      mutex (tables_devices_lock), so that tables cannot be created and
      destroyed concurrently.
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      86f1152b
    • Benjamin Marzinski's avatar
      dm mpath: stop queueing IO when no valid paths exist · 1f271972
      Benjamin Marzinski authored
      'queue_io' is set so that IO is queued while paths are being
      initialized.  Clear queue_io in __choose_pgpath if there are no valid
      paths, since there are obviously no paths that can be initialized.
      Otherwise IOs to the device will back up.
      Signed-off-by: default avatarBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      1f271972
    • Junichi Nomura's avatar
      dm: use bioset_create_nobvec() · 3d8aab2d
      Junichi Nomura authored
      Since DM core uses bio_clone_fast() for both bio-based and request-based
      DM devices there is no need for DM's bioset to have a bvec mempool.
      
      With this patch, on arch with 4KB page for example, memory usage will be
      reduced by 64KB for each bio-based DM device and 1MB for each
      request-based DM device.
      
      For example, when you create 10,000 bio-based DM devices and 1,000
      request-based DM devices, memory usage of biovec under no load is:
        # grep biovec /proc/slabinfo
      
        biovec-256        418068 418068   4096  ...
        biovec-128             0      0   2048  ...
        biovec-64              0      0   1024  ...
        biovec-16              0      0    256  ...
      
      With this patch series applied, the usage becomes:
        # grep biovec /proc/slabinfo
      
        biovec-256           116    116   4096  ...
        biovec-128             0      0   2048  ...
        biovec-64              0      0   1024  ...
        biovec-16              0      0    256  ...
      
      So 4096 * (418068 - 116) = 1.6GB of memory is saved in this example.
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      3d8aab2d
    • Junichi Nomura's avatar
      dm: remove nr_iovecs parameter from alloc_tio() · 99778273
      Junichi Nomura authored
      alloc_tio() uses bio_alloc_bioset() to allocate a clone-bio for a bio.
      alloc_tio() takes the number of bvecs to allocate for the clone-bio.
      However, with v3.14's immutable biovec changes DM now uses
      __bio_clone_fast() and no longer needs to allocate bvecs.
      
      In practice, the 'nr_iovecs' passed to alloc_tio() is always effectively
      0.  __clone_and_map_simple_bio() looked like it was passing non-zero
      nr_iovecs, but its value was always within the range of inline bvecs and
      no allocation actually happened.  If allocation happened, the BUG_ON() in
      __bio_clone_fast() would've triggered.
      
      Remove the nr_iovecs parameter from alloc_tio() to prevent possible
      future bio_alloc_bioset() mis-use of a new bioset interface that will no
      longer allow bvecs to be allocated.
      
      Also fix extra whitespace before the __bio_clone_fast() call in
      __clone_and_map_simple_bio().
      Signed-off-by: default avatarJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      99778273
  4. 03 Oct, 2014 2 commits
  5. 01 Oct, 2014 1 commit
  6. 30 Sep, 2014 1 commit
  7. 27 Sep, 2014 14 commits
  8. 25 Sep, 2014 10 commits
  9. 22 Sep, 2014 1 commit