Commit b0e5c294 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'for-4.19/dm-changes' of...

Merge tag 'for-4.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - A couple stable fixes for the DM writecache target.

 - A stable fix for the DM cache target that fixes the potential for
   data corruption after an unclean shutdown of a cache device using
   writeback mode.

 - Update DM integrity target to allow the metadata to be stored on a
   separate device from data.

 - Fix DM kcopyd and the snapshot target to cond_resched() where
   appropriate and be more efficient with processing completed work.

 - A few fixes and improvements for DM crypt.

 - Add DM delay target feature to configure delay of flushes independent
   of writes.

 - Update DM thin-provisioning target to include metadata_low_watermark
   threshold in pool status.

 - Fix stale DM thin-provisioning Documentation.

* tag 'for-4.19/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (26 commits)
  dm writecache: fix a crash due to reading past end of dirty_bitmap
  dm crypt: don't decrease device limits
  dm cache metadata: set dirty on all cache blocks after a crash
  dm snapshot: remove stale FIXME in snapshot_map()
  dm snapshot: improve performance by switching out_of_order_list to rbtree
  dm kcopyd: avoid softlockup in run_complete_job
  dm cache metadata: save in-core policy_hint_size to on-disk superblock
  dm thin: stop no_space_timeout worker when switching to write-mode
  dm kcopyd: return void from dm_kcopyd_copy()
  dm thin: include metadata_low_watermark threshold in pool status
  dm writecache: report start_sector in status line
  dm crypt: convert essiv from ahash to shash
  dm crypt: use wake_up_process() instead of a wait queue
  dm integrity: recalculate checksums on creation
  dm integrity: flush journal on suspend when using separate metadata device
  dm integrity: use version 2 for separate metadata
  dm integrity: allow separate metadata device
  dm integrity: add ic->start in get_data_sector()
  dm integrity: report provided data sectors in the status
  dm integrity: implement fair range locks
  ...
parents 2645b9d1 1e1132ea
...@@ -5,7 +5,8 @@ Device-Mapper's "delay" target delays reads and/or writes ...@@ -5,7 +5,8 @@ Device-Mapper's "delay" target delays reads and/or writes
and maps them to different devices. and maps them to different devices.
Parameters: Parameters:
<device> <offset> <delay> [<write_device> <write_offset> <write_delay>] <device> <offset> <delay> [<write_device> <write_offset> <write_delay>
[<flush_device> <flush_offset> <flush_delay>]]
With separate write parameters, the first set is only used for reads. With separate write parameters, the first set is only used for reads.
Offsets are specified in sectors. Offsets are specified in sectors.
......
...@@ -113,6 +113,10 @@ internal_hash:algorithm(:key) (the key is optional) ...@@ -113,6 +113,10 @@ internal_hash:algorithm(:key) (the key is optional)
from an upper layer target, such as dm-crypt. The upper layer from an upper layer target, such as dm-crypt. The upper layer
target should check the validity of the integrity tags. target should check the validity of the integrity tags.
recalculate
Recalculate the integrity tags automatically. It is only valid
when using internal hash.
journal_crypt:algorithm(:key) (the key is optional) journal_crypt:algorithm(:key) (the key is optional)
Encrypt the journal using given algorithm to make sure that the Encrypt the journal using given algorithm to make sure that the
attacker can't read the journal. You can use a block cipher here attacker can't read the journal. You can use a block cipher here
......
...@@ -28,17 +28,18 @@ administrator some freedom, for example to: ...@@ -28,17 +28,18 @@ administrator some freedom, for example to:
Status Status
====== ======
These targets are very much still in the EXPERIMENTAL state. Please These targets are considered safe for production use. But different use
do not yet rely on them in production. But do experiment and offer us cases will have different performance characteristics, for example due
feedback. Different use cases will have different performance to fragmentation of the data volume.
characteristics, for example due to fragmentation of the data volume.
If you find this software is not performing as expected please mail If you find this software is not performing as expected please mail
dm-devel@redhat.com with details and we'll try our best to improve dm-devel@redhat.com with details and we'll try our best to improve
things for you. things for you.
Userspace tools for checking and repairing the metadata are under Userspace tools for checking and repairing the metadata have been fully
development. developed and are available as 'thin_check' and 'thin_repair'. The name
of the package that provides these utilities varies by distribution (on
a Red Hat distribution it is named 'device-mapper-persistent-data').
Cookbook Cookbook
======== ========
...@@ -280,7 +281,7 @@ ii) Status ...@@ -280,7 +281,7 @@ ii) Status
<transaction id> <used metadata blocks>/<total metadata blocks> <transaction id> <used metadata blocks>/<total metadata blocks>
<used data blocks>/<total data blocks> <held metadata root> <used data blocks>/<total data blocks> <held metadata root>
ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space
needs_check|- needs_check|- metadata_low_watermark
transaction id: transaction id:
A 64-bit number used by userspace to help synchronise with metadata A 64-bit number used by userspace to help synchronise with metadata
...@@ -327,6 +328,11 @@ ii) Status ...@@ -327,6 +328,11 @@ ii) Status
thin-pool can be made fully operational again. '-' indicates thin-pool can be made fully operational again. '-' indicates
needs_check is not set. needs_check is not set.
metadata_low_watermark:
Value of metadata low watermark in blocks. The kernel sets this
value internally but userspace needs to know this value to
determine if an event was caused by crossing this threshold.
iii) Messages iii) Messages
create_thin <dev id> create_thin <dev id>
......
...@@ -363,7 +363,7 @@ static int __write_initial_superblock(struct dm_cache_metadata *cmd) ...@@ -363,7 +363,7 @@ static int __write_initial_superblock(struct dm_cache_metadata *cmd)
disk_super->version = cpu_to_le32(cmd->version); disk_super->version = cpu_to_le32(cmd->version);
memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name)); memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name));
memset(disk_super->policy_version, 0, sizeof(disk_super->policy_version)); memset(disk_super->policy_version, 0, sizeof(disk_super->policy_version));
disk_super->policy_hint_size = 0; disk_super->policy_hint_size = cpu_to_le32(0);
__copy_sm_root(cmd, disk_super); __copy_sm_root(cmd, disk_super);
...@@ -701,6 +701,7 @@ static int __commit_transaction(struct dm_cache_metadata *cmd, ...@@ -701,6 +701,7 @@ static int __commit_transaction(struct dm_cache_metadata *cmd,
disk_super->policy_version[0] = cpu_to_le32(cmd->policy_version[0]); disk_super->policy_version[0] = cpu_to_le32(cmd->policy_version[0]);
disk_super->policy_version[1] = cpu_to_le32(cmd->policy_version[1]); disk_super->policy_version[1] = cpu_to_le32(cmd->policy_version[1]);
disk_super->policy_version[2] = cpu_to_le32(cmd->policy_version[2]); disk_super->policy_version[2] = cpu_to_le32(cmd->policy_version[2]);
disk_super->policy_hint_size = cpu_to_le32(cmd->policy_hint_size);
disk_super->read_hits = cpu_to_le32(cmd->stats.read_hits); disk_super->read_hits = cpu_to_le32(cmd->stats.read_hits);
disk_super->read_misses = cpu_to_le32(cmd->stats.read_misses); disk_super->read_misses = cpu_to_le32(cmd->stats.read_misses);
...@@ -1322,6 +1323,7 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd, ...@@ -1322,6 +1323,7 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd,
dm_oblock_t oblock; dm_oblock_t oblock;
unsigned flags; unsigned flags;
bool dirty = true;
dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le); dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le);
memcpy(&mapping, mapping_value_le, sizeof(mapping)); memcpy(&mapping, mapping_value_le, sizeof(mapping));
...@@ -1332,8 +1334,10 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd, ...@@ -1332,8 +1334,10 @@ static int __load_mapping_v1(struct dm_cache_metadata *cmd,
dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le); dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le);
memcpy(&hint, hint_value_le, sizeof(hint)); memcpy(&hint, hint_value_le, sizeof(hint));
} }
if (cmd->clean_when_opened)
dirty = flags & M_DIRTY;
r = fn(context, oblock, to_cblock(cb), flags & M_DIRTY, r = fn(context, oblock, to_cblock(cb), dirty,
le32_to_cpu(hint), hints_valid); le32_to_cpu(hint), hints_valid);
if (r) { if (r) {
DMERR("policy couldn't load cache block %llu", DMERR("policy couldn't load cache block %llu",
...@@ -1361,7 +1365,7 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd, ...@@ -1361,7 +1365,7 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd,
dm_oblock_t oblock; dm_oblock_t oblock;
unsigned flags; unsigned flags;
bool dirty; bool dirty = true;
dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le); dm_array_cursor_get_value(mapping_cursor, (void **) &mapping_value_le);
memcpy(&mapping, mapping_value_le, sizeof(mapping)); memcpy(&mapping, mapping_value_le, sizeof(mapping));
...@@ -1372,8 +1376,9 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd, ...@@ -1372,8 +1376,9 @@ static int __load_mapping_v2(struct dm_cache_metadata *cmd,
dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le); dm_array_cursor_get_value(hint_cursor, (void **) &hint_value_le);
memcpy(&hint, hint_value_le, sizeof(hint)); memcpy(&hint, hint_value_le, sizeof(hint));
} }
if (cmd->clean_when_opened)
dirty = dm_bitset_cursor_get_value(dirty_cursor); dirty = dm_bitset_cursor_get_value(dirty_cursor);
r = fn(context, oblock, to_cblock(cb), dirty, r = fn(context, oblock, to_cblock(cb), dirty,
le32_to_cpu(hint), hints_valid); le32_to_cpu(hint), hints_valid);
if (r) { if (r) {
......
...@@ -1188,9 +1188,8 @@ static void copy_complete(int read_err, unsigned long write_err, void *context) ...@@ -1188,9 +1188,8 @@ static void copy_complete(int read_err, unsigned long write_err, void *context)
queue_continuation(mg->cache->wq, &mg->k); queue_continuation(mg->cache->wq, &mg->k);
} }
static int copy(struct dm_cache_migration *mg, bool promote) static void copy(struct dm_cache_migration *mg, bool promote)
{ {
int r;
struct dm_io_region o_region, c_region; struct dm_io_region o_region, c_region;
struct cache *cache = mg->cache; struct cache *cache = mg->cache;
...@@ -1203,11 +1202,9 @@ static int copy(struct dm_cache_migration *mg, bool promote) ...@@ -1203,11 +1202,9 @@ static int copy(struct dm_cache_migration *mg, bool promote)
c_region.count = cache->sectors_per_block; c_region.count = cache->sectors_per_block;
if (promote) if (promote)
r = dm_kcopyd_copy(cache->copier, &o_region, 1, &c_region, 0, copy_complete, &mg->k); dm_kcopyd_copy(cache->copier, &o_region, 1, &c_region, 0, copy_complete, &mg->k);
else else
r = dm_kcopyd_copy(cache->copier, &c_region, 1, &o_region, 0, copy_complete, &mg->k); dm_kcopyd_copy(cache->copier, &c_region, 1, &o_region, 0, copy_complete, &mg->k);
return r;
} }
static void bio_drop_shared_lock(struct cache *cache, struct bio *bio) static void bio_drop_shared_lock(struct cache *cache, struct bio *bio)
...@@ -1449,12 +1446,7 @@ static void mg_full_copy(struct work_struct *ws) ...@@ -1449,12 +1446,7 @@ static void mg_full_copy(struct work_struct *ws)
} }
init_continuation(&mg->k, mg_upgrade_lock); init_continuation(&mg->k, mg_upgrade_lock);
copy(mg, is_policy_promote);
if (copy(mg, is_policy_promote)) {
DMERR_LIMIT("%s: migration copy failed", cache_device_name(cache));
mg->k.input = BLK_STS_IOERR;
mg_complete(mg, false);
}
} }
static void mg_copy(struct work_struct *ws) static void mg_copy(struct work_struct *ws)
...@@ -2250,7 +2242,7 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as, ...@@ -2250,7 +2242,7 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
{0, 2, "Invalid number of cache feature arguments"}, {0, 2, "Invalid number of cache feature arguments"},
}; };
int r; int r, mode_ctr = 0;
unsigned argc; unsigned argc;
const char *arg; const char *arg;
struct cache_features *cf = &ca->features; struct cache_features *cf = &ca->features;
...@@ -2264,14 +2256,20 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as, ...@@ -2264,14 +2256,20 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
while (argc--) { while (argc--) {
arg = dm_shift_arg(as); arg = dm_shift_arg(as);
if (!strcasecmp(arg, "writeback")) if (!strcasecmp(arg, "writeback")) {
cf->io_mode = CM_IO_WRITEBACK; cf->io_mode = CM_IO_WRITEBACK;
mode_ctr++;
}
else if (!strcasecmp(arg, "writethrough")) else if (!strcasecmp(arg, "writethrough")) {
cf->io_mode = CM_IO_WRITETHROUGH; cf->io_mode = CM_IO_WRITETHROUGH;
mode_ctr++;
}
else if (!strcasecmp(arg, "passthrough")) else if (!strcasecmp(arg, "passthrough")) {
cf->io_mode = CM_IO_PASSTHROUGH; cf->io_mode = CM_IO_PASSTHROUGH;
mode_ctr++;
}
else if (!strcasecmp(arg, "metadata2")) else if (!strcasecmp(arg, "metadata2"))
cf->metadata_version = 2; cf->metadata_version = 2;
...@@ -2282,6 +2280,11 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as, ...@@ -2282,6 +2280,11 @@ static int parse_features(struct cache_args *ca, struct dm_arg_set *as,
} }
} }
if (mode_ctr > 1) {
*error = "Duplicate cache io_mode features requested";
return -EINVAL;
}
return 0; return 0;
} }
......
...@@ -99,7 +99,7 @@ struct crypt_iv_operations { ...@@ -99,7 +99,7 @@ struct crypt_iv_operations {
}; };
struct iv_essiv_private { struct iv_essiv_private {
struct crypto_ahash *hash_tfm; struct crypto_shash *hash_tfm;
u8 *salt; u8 *salt;
}; };
...@@ -144,7 +144,7 @@ struct crypt_config { ...@@ -144,7 +144,7 @@ struct crypt_config {
struct workqueue_struct *io_queue; struct workqueue_struct *io_queue;
struct workqueue_struct *crypt_queue; struct workqueue_struct *crypt_queue;
wait_queue_head_t write_thread_wait; spinlock_t write_thread_lock;
struct task_struct *write_thread; struct task_struct *write_thread;
struct rb_root write_tree; struct rb_root write_tree;
...@@ -327,25 +327,22 @@ static int crypt_iv_plain64be_gen(struct crypt_config *cc, u8 *iv, ...@@ -327,25 +327,22 @@ static int crypt_iv_plain64be_gen(struct crypt_config *cc, u8 *iv,
static int crypt_iv_essiv_init(struct crypt_config *cc) static int crypt_iv_essiv_init(struct crypt_config *cc)
{ {
struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv; struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv;
AHASH_REQUEST_ON_STACK(req, essiv->hash_tfm); SHASH_DESC_ON_STACK(desc, essiv->hash_tfm);
struct scatterlist sg;
struct crypto_cipher *essiv_tfm; struct crypto_cipher *essiv_tfm;
int err; int err;
sg_init_one(&sg, cc->key, cc->key_size); desc->tfm = essiv->hash_tfm;
ahash_request_set_tfm(req, essiv->hash_tfm); desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL);
ahash_request_set_crypt(req, &sg, essiv->salt, cc->key_size);
err = crypto_ahash_digest(req); err = crypto_shash_digest(desc, cc->key, cc->key_size, essiv->salt);
ahash_request_zero(req); shash_desc_zero(desc);
if (err) if (err)
return err; return err;
essiv_tfm = cc->iv_private; essiv_tfm = cc->iv_private;
err = crypto_cipher_setkey(essiv_tfm, essiv->salt, err = crypto_cipher_setkey(essiv_tfm, essiv->salt,
crypto_ahash_digestsize(essiv->hash_tfm)); crypto_shash_digestsize(essiv->hash_tfm));
if (err) if (err)
return err; return err;
...@@ -356,7 +353,7 @@ static int crypt_iv_essiv_init(struct crypt_config *cc) ...@@ -356,7 +353,7 @@ static int crypt_iv_essiv_init(struct crypt_config *cc)
static int crypt_iv_essiv_wipe(struct crypt_config *cc) static int crypt_iv_essiv_wipe(struct crypt_config *cc)
{ {
struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv; struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv;
unsigned salt_size = crypto_ahash_digestsize(essiv->hash_tfm); unsigned salt_size = crypto_shash_digestsize(essiv->hash_tfm);
struct crypto_cipher *essiv_tfm; struct crypto_cipher *essiv_tfm;
int r, err = 0; int r, err = 0;
...@@ -408,7 +405,7 @@ static void crypt_iv_essiv_dtr(struct crypt_config *cc) ...@@ -408,7 +405,7 @@ static void crypt_iv_essiv_dtr(struct crypt_config *cc)
struct crypto_cipher *essiv_tfm; struct crypto_cipher *essiv_tfm;
struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv; struct iv_essiv_private *essiv = &cc->iv_gen_private.essiv;
crypto_free_ahash(essiv->hash_tfm); crypto_free_shash(essiv->hash_tfm);
essiv->hash_tfm = NULL; essiv->hash_tfm = NULL;
kzfree(essiv->salt); kzfree(essiv->salt);
...@@ -426,7 +423,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti, ...@@ -426,7 +423,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
const char *opts) const char *opts)
{ {
struct crypto_cipher *essiv_tfm = NULL; struct crypto_cipher *essiv_tfm = NULL;
struct crypto_ahash *hash_tfm = NULL; struct crypto_shash *hash_tfm = NULL;
u8 *salt = NULL; u8 *salt = NULL;
int err; int err;
...@@ -436,14 +433,14 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti, ...@@ -436,14 +433,14 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
} }
/* Allocate hash algorithm */ /* Allocate hash algorithm */
hash_tfm = crypto_alloc_ahash(opts, 0, CRYPTO_ALG_ASYNC); hash_tfm = crypto_alloc_shash(opts, 0, 0);
if (IS_ERR(hash_tfm)) { if (IS_ERR(hash_tfm)) {
ti->error = "Error initializing ESSIV hash"; ti->error = "Error initializing ESSIV hash";
err = PTR_ERR(hash_tfm); err = PTR_ERR(hash_tfm);
goto bad; goto bad;
} }
salt = kzalloc(crypto_ahash_digestsize(hash_tfm), GFP_KERNEL); salt = kzalloc(crypto_shash_digestsize(hash_tfm), GFP_KERNEL);
if (!salt) { if (!salt) {
ti->error = "Error kmallocing salt storage in ESSIV"; ti->error = "Error kmallocing salt storage in ESSIV";
err = -ENOMEM; err = -ENOMEM;
...@@ -454,7 +451,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti, ...@@ -454,7 +451,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
cc->iv_gen_private.essiv.hash_tfm = hash_tfm; cc->iv_gen_private.essiv.hash_tfm = hash_tfm;
essiv_tfm = alloc_essiv_cipher(cc, ti, salt, essiv_tfm = alloc_essiv_cipher(cc, ti, salt,
crypto_ahash_digestsize(hash_tfm)); crypto_shash_digestsize(hash_tfm));
if (IS_ERR(essiv_tfm)) { if (IS_ERR(essiv_tfm)) {
crypt_iv_essiv_dtr(cc); crypt_iv_essiv_dtr(cc);
return PTR_ERR(essiv_tfm); return PTR_ERR(essiv_tfm);
...@@ -465,7 +462,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti, ...@@ -465,7 +462,7 @@ static int crypt_iv_essiv_ctr(struct crypt_config *cc, struct dm_target *ti,
bad: bad:
if (hash_tfm && !IS_ERR(hash_tfm)) if (hash_tfm && !IS_ERR(hash_tfm))
crypto_free_ahash(hash_tfm); crypto_free_shash(hash_tfm);
kfree(salt); kfree(salt);
return err; return err;
} }
...@@ -1620,36 +1617,31 @@ static int dmcrypt_write(void *data) ...@@ -1620,36 +1617,31 @@ static int dmcrypt_write(void *data)
struct rb_root write_tree; struct rb_root write_tree;
struct blk_plug plug; struct blk_plug plug;
DECLARE_WAITQUEUE(wait, current); spin_lock_irq(&cc->write_thread_lock);
spin_lock_irq(&cc->write_thread_wait.lock);
continue_locked: continue_locked:
if (!RB_EMPTY_ROOT(&cc->write_tree)) if (!RB_EMPTY_ROOT(&cc->write_tree))
goto pop_from_list; goto pop_from_list;
set_current_state(TASK_INTERRUPTIBLE); set_current_state(TASK_INTERRUPTIBLE);
__add_wait_queue(&cc->write_thread_wait, &wait);
spin_unlock_irq(&cc->write_thread_wait.lock); spin_unlock_irq(&cc->write_thread_lock);
if (unlikely(kthread_should_stop())) { if (unlikely(kthread_should_stop())) {
set_current_state(TASK_RUNNING); set_current_state(TASK_RUNNING);
remove_wait_queue(&cc->write_thread_wait, &wait);
break; break;
} }
schedule(); schedule();
set_current_state(TASK_RUNNING); set_current_state(TASK_RUNNING);
spin_lock_irq(&cc->write_thread_wait.lock); spin_lock_irq(&cc->write_thread_lock);
__remove_wait_queue(&cc->write_thread_wait, &wait);
goto continue_locked; goto continue_locked;
pop_from_list: pop_from_list:
write_tree = cc->write_tree; write_tree = cc->write_tree;
cc->write_tree = RB_ROOT; cc->write_tree = RB_ROOT;
spin_unlock_irq(&cc->write_thread_wait.lock); spin_unlock_irq(&cc->write_thread_lock);
BUG_ON(rb_parent(write_tree.rb_node)); BUG_ON(rb_parent(write_tree.rb_node));
...@@ -1693,7 +1685,9 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async) ...@@ -1693,7 +1685,9 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
return; return;
} }
spin_lock_irqsave(&cc->write_thread_wait.lock, flags); spin_lock_irqsave(&cc->write_thread_lock, flags);
if (RB_EMPTY_ROOT(&cc->write_tree))
wake_up_process(cc->write_thread);
rbp = &cc->write_tree.rb_node; rbp = &cc->write_tree.rb_node;
parent = NULL; parent = NULL;
sector = io->sector; sector = io->sector;
...@@ -1706,9 +1700,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async) ...@@ -1706,9 +1700,7 @@ static void kcryptd_crypt_write_io_submit(struct dm_crypt_io *io, int async)
} }
rb_link_node(&io->rb_node, parent, rbp); rb_link_node(&io->rb_node, parent, rbp);
rb_insert_color(&io->rb_node, &cc->write_tree); rb_insert_color(&io->rb_node, &cc->write_tree);
spin_unlock_irqrestore(&cc->write_thread_lock, flags);
wake_up_locked(&cc->write_thread_wait);
spin_unlock_irqrestore(&cc->write_thread_wait.lock, flags);
} }
static void kcryptd_crypt_write_convert(struct dm_crypt_io *io) static void kcryptd_crypt_write_convert(struct dm_crypt_io *io)
...@@ -2831,7 +2823,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv) ...@@ -2831,7 +2823,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
goto bad; goto bad;
} }
init_waitqueue_head(&cc->write_thread_wait); spin_lock_init(&cc->write_thread_lock);
cc->write_tree = RB_ROOT; cc->write_tree = RB_ROOT;
cc->write_thread = kthread_create(dmcrypt_write, cc, "dmcrypt_write"); cc->write_thread = kthread_create(dmcrypt_write, cc, "dmcrypt_write");
...@@ -3069,11 +3061,11 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits) ...@@ -3069,11 +3061,11 @@ static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
*/ */
limits->max_segment_size = PAGE_SIZE; limits->max_segment_size = PAGE_SIZE;
if (cc->sector_size != (1 << SECTOR_SHIFT)) { limits->logical_block_size =
limits->logical_block_size = cc->sector_size; max_t(unsigned short, limits->logical_block_size, cc->sector_size);
limits->physical_block_size = cc->sector_size; limits->physical_block_size =
blk_limits_io_min(limits, cc->sector_size); max_t(unsigned, limits->physical_block_size, cc->sector_size);
} limits->io_min = max_t(unsigned, limits->io_min, cc->sector_size);
} }
static struct target_type crypt_target = { static struct target_type crypt_target = {
......
...@@ -17,6 +17,13 @@ ...@@ -17,6 +17,13 @@
#define DM_MSG_PREFIX "delay" #define DM_MSG_PREFIX "delay"
struct delay_class {
struct dm_dev *dev;
sector_t start;
unsigned delay;
unsigned ops;
};
struct delay_c { struct delay_c {
struct timer_list delay_timer; struct timer_list delay_timer;
struct mutex timer_lock; struct mutex timer_lock;
...@@ -25,19 +32,16 @@ struct delay_c { ...@@ -25,19 +32,16 @@ struct delay_c {
struct list_head delayed_bios; struct list_head delayed_bios;
atomic_t may_delay; atomic_t may_delay;
struct dm_dev *dev_read; struct delay_class read;
sector_t start_read; struct delay_class write;
unsigned read_delay; struct delay_class flush;
unsigned reads;
struct dm_dev *dev_write; int argc;
sector_t start_write;
unsigned write_delay;
unsigned writes;
}; };
struct dm_delay_info { struct dm_delay_info {
struct delay_c *context; struct delay_c *context;
struct delay_class *class;
struct list_head list; struct list_head list;
unsigned long expires; unsigned long expires;
}; };
...@@ -77,7 +81,7 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all) ...@@ -77,7 +81,7 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all)
{ {
struct dm_delay_info *delayed, *next; struct dm_delay_info *delayed, *next;
unsigned long next_expires = 0; unsigned long next_expires = 0;
int start_timer = 0; unsigned long start_timer = 0;
struct bio_list flush_bios = { }; struct bio_list flush_bios = { };
mutex_lock(&delayed_bios_lock); mutex_lock(&delayed_bios_lock);
...@@ -87,10 +91,7 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all) ...@@ -87,10 +91,7 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all)
sizeof(struct dm_delay_info)); sizeof(struct dm_delay_info));
list_del(&delayed->list); list_del(&delayed->list);
bio_list_add(&flush_bios, bio); bio_list_add(&flush_bios, bio);
if ((bio_data_dir(bio) == WRITE)) delayed->class->ops--;
delayed->context->writes--;
else
delayed->context->reads--;
continue; continue;
} }
...@@ -100,7 +101,6 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all) ...@@ -100,7 +101,6 @@ static struct bio *flush_delayed_bios(struct delay_c *dc, int flush_all)
} else } else
next_expires = min(next_expires, delayed->expires); next_expires = min(next_expires, delayed->expires);
} }
mutex_unlock(&delayed_bios_lock); mutex_unlock(&delayed_bios_lock);
if (start_timer) if (start_timer)
...@@ -117,6 +117,50 @@ static void flush_expired_bios(struct work_struct *work) ...@@ -117,6 +117,50 @@ static void flush_expired_bios(struct work_struct *work)
flush_bios(flush_delayed_bios(dc, 0)); flush_bios(flush_delayed_bios(dc, 0));
} }
static void delay_dtr(struct dm_target *ti)
{
struct delay_c *dc = ti->private;
destroy_workqueue(dc->kdelayd_wq);
if (dc->read.dev)
dm_put_device(ti, dc->read.dev);
if (dc->write.dev)
dm_put_device(ti, dc->write.dev);
if (dc->flush.dev)
dm_put_device(ti, dc->flush.dev);
mutex_destroy(&dc->timer_lock);
kfree(dc);
}
static int delay_class_ctr(struct dm_target *ti, struct delay_class *c, char **argv)
{
int ret;
unsigned long long tmpll;
char dummy;
if (sscanf(argv[1], "%llu%c", &tmpll, &dummy) != 1) {
ti->error = "Invalid device sector";
return -EINVAL;
}
c->start = tmpll;
if (sscanf(argv[2], "%u%c", &c->delay, &dummy) != 1) {
ti->error = "Invalid delay";
return -EINVAL;
}
ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table), &c->dev);
if (ret) {
ti->error = "Device lookup failed";
return ret;
}
return 0;
}
/* /*
* Mapping parameters: * Mapping parameters:
* <device> <offset> <delay> [<write_device> <write_offset> <write_delay>] * <device> <offset> <delay> [<write_device> <write_offset> <write_delay>]
...@@ -128,134 +172,89 @@ static void flush_expired_bios(struct work_struct *work) ...@@ -128,134 +172,89 @@ static void flush_expired_bios(struct work_struct *work)
static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv) static int delay_ctr(struct dm_target *ti, unsigned int argc, char **argv)
{ {
struct delay_c *dc; struct delay_c *dc;
unsigned long long tmpll;
char dummy;
int ret; int ret;
if (argc != 3 && argc != 6) { if (argc != 3 && argc != 6 && argc != 9) {
ti->error = "Requires exactly 3 or 6 arguments"; ti->error = "Requires exactly 3, 6 or 9 arguments";
return -EINVAL; return -EINVAL;
} }
dc = kmalloc(sizeof(*dc), GFP_KERNEL); dc = kzalloc(sizeof(*dc), GFP_KERNEL);
if (!dc) { if (!dc) {
ti->error = "Cannot allocate context"; ti->error = "Cannot allocate context";
return -ENOMEM; return -ENOMEM;
} }
dc->reads = dc->writes = 0; ti->private = dc;
timer_setup(&dc->delay_timer, handle_delayed_timer, 0);
INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);
INIT_LIST_HEAD(&dc->delayed_bios);
mutex_init(&dc->timer_lock);
atomic_set(&dc->may_delay, 1);
dc->argc = argc;
ret = -EINVAL; ret = delay_class_ctr(ti, &dc->read, argv);
if (sscanf(argv[1], "%llu%c", &tmpll, &dummy) != 1) { if (ret)
ti->error = "Invalid device sector";
goto bad; goto bad;
}
dc->start_read = tmpll;
if (sscanf(argv[2], "%u%c", &dc->read_delay, &dummy) != 1) { if (argc == 3) {
ti->error = "Invalid delay"; ret = delay_class_ctr(ti, &dc->write, argv);
if (ret)
goto bad; goto bad;
} ret = delay_class_ctr(ti, &dc->flush, argv);
if (ret)
ret = dm_get_device(ti, argv[0], dm_table_get_mode(ti->table),
&dc->dev_read);
if (ret) {
ti->error = "Device lookup failed";
goto bad; goto bad;
}
ret = -EINVAL;
dc->dev_write = NULL;
if (argc == 3)
goto out; goto out;
if (sscanf(argv[4], "%llu%c", &tmpll, &dummy) != 1) {
ti->error = "Invalid write device sector";
goto bad_dev_read;
} }
dc->start_write = tmpll;
if (sscanf(argv[5], "%u%c", &dc->write_delay, &dummy) != 1) { ret = delay_class_ctr(ti, &dc->write, argv + 3);
ti->error = "Invalid write delay"; if (ret)
goto bad_dev_read; goto bad;
if (argc == 6) {
ret = delay_class_ctr(ti, &dc->flush, argv + 3);
if (ret)
goto bad;
goto out;
} }
ret = dm_get_device(ti, argv[3], dm_table_get_mode(ti->table), ret = delay_class_ctr(ti, &dc->flush, argv + 6);
&dc->dev_write); if (ret)
if (ret) { goto bad;
ti->error = "Write device lookup failed";
goto bad_dev_read;
}
out: out:
ret = -EINVAL;
dc->kdelayd_wq = alloc_workqueue("kdelayd", WQ_MEM_RECLAIM, 0); dc->kdelayd_wq = alloc_workqueue("kdelayd", WQ_MEM_RECLAIM, 0);
if (!dc->kdelayd_wq) { if (!dc->kdelayd_wq) {
ret = -EINVAL;
DMERR("Couldn't start kdelayd"); DMERR("Couldn't start kdelayd");
goto bad_queue; goto bad;
} }
timer_setup(&dc->delay_timer, handle_delayed_timer, 0);
INIT_WORK(&dc->flush_expired_bios, flush_expired_bios);
INIT_LIST_HEAD(&dc->delayed_bios);
mutex_init(&dc->timer_lock);
atomic_set(&dc->may_delay, 1);
ti->num_flush_bios = 1; ti->num_flush_bios = 1;
ti->num_discard_bios = 1; ti->num_discard_bios = 1;
ti->per_io_data_size = sizeof(struct dm_delay_info); ti->per_io_data_size = sizeof(struct dm_delay_info);
ti->private = dc;
return 0; return 0;
bad_queue:
if (dc->dev_write)
dm_put_device(ti, dc->dev_write);
bad_dev_read:
dm_put_device(ti, dc->dev_read);
bad: bad:
kfree(dc); delay_dtr(ti);
return ret; return ret;
} }
static void delay_dtr(struct dm_target *ti) static int delay_bio(struct delay_c *dc, struct delay_class *c, struct bio *bio)
{
struct delay_c *dc = ti->private;
destroy_workqueue(dc->kdelayd_wq);
dm_put_device(ti, dc->dev_read);
if (dc->dev_write)
dm_put_device(ti, dc->dev_write);
mutex_destroy(&dc->timer_lock);
kfree(dc);
}
static int delay_bio(struct delay_c *dc, int delay, struct bio *bio)
{ {
struct dm_delay_info *delayed; struct dm_delay_info *delayed;
unsigned long expires = 0; unsigned long expires = 0;
if (!delay || !atomic_read(&dc->may_delay)) if (!c->delay || !atomic_read(&dc->may_delay))
return DM_MAPIO_REMAPPED; return DM_MAPIO_REMAPPED;
delayed = dm_per_bio_data(bio, sizeof(struct dm_delay_info)); delayed = dm_per_bio_data(bio, sizeof(struct dm_delay_info));
delayed->context = dc; delayed->context = dc;
delayed->expires = expires = jiffies + msecs_to_jiffies(delay); delayed->expires = expires = jiffies + msecs_to_jiffies(c->delay);
mutex_lock(&delayed_bios_lock); mutex_lock(&delayed_bios_lock);
c->ops++;
if (bio_data_dir(bio) == WRITE)
dc->writes++;
else
dc->reads++;
list_add_tail(&delayed->list, &dc->delayed_bios); list_add_tail(&delayed->list, &dc->delayed_bios);
mutex_unlock(&delayed_bios_lock); mutex_unlock(&delayed_bios_lock);
queue_timeout(dc, expires); queue_timeout(dc, expires);
...@@ -282,23 +281,28 @@ static void delay_resume(struct dm_target *ti) ...@@ -282,23 +281,28 @@ static void delay_resume(struct dm_target *ti)
static int delay_map(struct dm_target *ti, struct bio *bio) static int delay_map(struct dm_target *ti, struct bio *bio)
{ {
struct delay_c *dc = ti->private; struct delay_c *dc = ti->private;
struct delay_class *c;
struct dm_delay_info *delayed = dm_per_bio_data(bio, sizeof(struct dm_delay_info));
if ((bio_data_dir(bio) == WRITE) && (dc->dev_write)) { if (bio_data_dir(bio) == WRITE) {
bio_set_dev(bio, dc->dev_write->bdev); if (unlikely(bio->bi_opf & REQ_PREFLUSH))
if (bio_sectors(bio)) c = &dc->flush;
bio->bi_iter.bi_sector = dc->start_write + else
dm_target_offset(ti, bio->bi_iter.bi_sector); c = &dc->write;
} else {
return delay_bio(dc, dc->write_delay, bio); c = &dc->read;
} }
delayed->class = c;
bio_set_dev(bio, c->dev->bdev);
if (bio_sectors(bio))
bio->bi_iter.bi_sector = c->start + dm_target_offset(ti, bio->bi_iter.bi_sector);
bio_set_dev(bio, dc->dev_read->bdev); return delay_bio(dc, c, bio);
bio->bi_iter.bi_sector = dc->start_read +
dm_target_offset(ti, bio->bi_iter.bi_sector);
return delay_bio(dc, dc->read_delay, bio);
} }
#define DMEMIT_DELAY_CLASS(c) \
DMEMIT("%s %llu %u", (c)->dev->name, (unsigned long long)(c)->start, (c)->delay)
static void delay_status(struct dm_target *ti, status_type_t type, static void delay_status(struct dm_target *ti, status_type_t type,
unsigned status_flags, char *result, unsigned maxlen) unsigned status_flags, char *result, unsigned maxlen)
{ {
...@@ -307,17 +311,19 @@ static void delay_status(struct dm_target *ti, status_type_t type, ...@@ -307,17 +311,19 @@ static void delay_status(struct dm_target *ti, status_type_t type,
switch (type) { switch (type) {
case STATUSTYPE_INFO: case STATUSTYPE_INFO:
DMEMIT("%u %u", dc->reads, dc->writes); DMEMIT("%u %u %u", dc->read.ops, dc->write.ops, dc->flush.ops);
break; break;
case STATUSTYPE_TABLE: case STATUSTYPE_TABLE:
DMEMIT("%s %llu %u", dc->dev_read->name, DMEMIT_DELAY_CLASS(&dc->read);
(unsigned long long) dc->start_read, if (dc->argc >= 6) {
dc->read_delay); DMEMIT(" ");
if (dc->dev_write) DMEMIT_DELAY_CLASS(&dc->write);
DMEMIT(" %s %llu %u", dc->dev_write->name, }
(unsigned long long) dc->start_write, if (dc->argc >= 9) {
dc->write_delay); DMEMIT(" ");
DMEMIT_DELAY_CLASS(&dc->flush);
}
break; break;
} }
} }
...@@ -328,12 +334,15 @@ static int delay_iterate_devices(struct dm_target *ti, ...@@ -328,12 +334,15 @@ static int delay_iterate_devices(struct dm_target *ti,
struct delay_c *dc = ti->private; struct delay_c *dc = ti->private;
int ret = 0; int ret = 0;
ret = fn(ti, dc->dev_read, dc->start_read, ti->len, data); ret = fn(ti, dc->read.dev, dc->read.start, ti->len, data);
if (ret)
goto out;
ret = fn(ti, dc->write.dev, dc->write.start, ti->len, data);
if (ret)
goto out;
ret = fn(ti, dc->flush.dev, dc->flush.start, ti->len, data);
if (ret) if (ret)
goto out; goto out;
if (dc->dev_write)
ret = fn(ti, dc->dev_write, dc->start_write, ti->len, data);
out: out:
return ret; return ret;
......
...@@ -31,6 +31,8 @@ ...@@ -31,6 +31,8 @@
#define MIN_LOG2_INTERLEAVE_SECTORS 3 #define MIN_LOG2_INTERLEAVE_SECTORS 3
#define MAX_LOG2_INTERLEAVE_SECTORS 31 #define MAX_LOG2_INTERLEAVE_SECTORS 31
#define METADATA_WORKQUEUE_MAX_ACTIVE 16 #define METADATA_WORKQUEUE_MAX_ACTIVE 16
#define RECALC_SECTORS 8192
#define RECALC_WRITE_SUPER 16
/* /*
* Warning - DEBUG_PRINT prints security-sensitive data to the log, * Warning - DEBUG_PRINT prints security-sensitive data to the log,
...@@ -44,7 +46,8 @@ ...@@ -44,7 +46,8 @@
*/ */
#define SB_MAGIC "integrt" #define SB_MAGIC "integrt"
#define SB_VERSION 1 #define SB_VERSION_1 1
#define SB_VERSION_2 2
#define SB_SECTORS 8 #define SB_SECTORS 8
#define MAX_SECTORS_PER_BLOCK 8 #define MAX_SECTORS_PER_BLOCK 8
...@@ -57,9 +60,12 @@ struct superblock { ...@@ -57,9 +60,12 @@ struct superblock {
__u64 provided_data_sectors; /* userspace uses this value */ __u64 provided_data_sectors; /* userspace uses this value */
__u32 flags; __u32 flags;
__u8 log2_sectors_per_block; __u8 log2_sectors_per_block;
__u8 pad[3];
__u64 recalc_sector;
}; };
#define SB_FLAG_HAVE_JOURNAL_MAC 0x1 #define SB_FLAG_HAVE_JOURNAL_MAC 0x1
#define SB_FLAG_RECALCULATING 0x2
#define JOURNAL_ENTRY_ROUNDUP 8 #define JOURNAL_ENTRY_ROUNDUP 8
...@@ -139,6 +145,7 @@ struct alg_spec { ...@@ -139,6 +145,7 @@ struct alg_spec {
struct dm_integrity_c { struct dm_integrity_c {
struct dm_dev *dev; struct dm_dev *dev;
struct dm_dev *meta_dev;
unsigned tag_size; unsigned tag_size;
__s8 log2_tag_size; __s8 log2_tag_size;
sector_t start; sector_t start;
...@@ -170,7 +177,8 @@ struct dm_integrity_c { ...@@ -170,7 +177,8 @@ struct dm_integrity_c {
unsigned short journal_section_sectors; unsigned short journal_section_sectors;
unsigned journal_sections; unsigned journal_sections;
unsigned journal_entries; unsigned journal_entries;
sector_t device_sectors; sector_t data_device_sectors;
sector_t meta_device_sectors;
unsigned initial_sectors; unsigned initial_sectors;
unsigned metadata_run; unsigned metadata_run;
__s8 log2_metadata_run; __s8 log2_metadata_run;
...@@ -178,7 +186,7 @@ struct dm_integrity_c { ...@@ -178,7 +186,7 @@ struct dm_integrity_c {
__u8 sectors_per_block; __u8 sectors_per_block;
unsigned char mode; unsigned char mode;
bool suspending; int suspending;
int failed; int failed;
...@@ -186,6 +194,7 @@ struct dm_integrity_c { ...@@ -186,6 +194,7 @@ struct dm_integrity_c {
/* these variables are locked with endio_wait.lock */ /* these variables are locked with endio_wait.lock */
struct rb_root in_progress; struct rb_root in_progress;
struct list_head wait_list;
wait_queue_head_t endio_wait; wait_queue_head_t endio_wait;
struct workqueue_struct *wait_wq; struct workqueue_struct *wait_wq;
...@@ -210,6 +219,11 @@ struct dm_integrity_c { ...@@ -210,6 +219,11 @@ struct dm_integrity_c {
struct workqueue_struct *writer_wq; struct workqueue_struct *writer_wq;
struct work_struct writer_work; struct work_struct writer_work;
struct workqueue_struct *recalc_wq;
struct work_struct recalc_work;
u8 *recalc_buffer;
u8 *recalc_tags;
struct bio_list flush_bio_list; struct bio_list flush_bio_list;
unsigned long autocommit_jiffies; unsigned long autocommit_jiffies;
...@@ -233,7 +247,14 @@ struct dm_integrity_c { ...@@ -233,7 +247,14 @@ struct dm_integrity_c {
struct dm_integrity_range { struct dm_integrity_range {
sector_t logical_sector; sector_t logical_sector;
unsigned n_sectors; unsigned n_sectors;
bool waiting;
union {
struct rb_node node; struct rb_node node;
struct {
struct task_struct *task;
struct list_head wait_entry;
};
};
}; };
struct dm_integrity_io { struct dm_integrity_io {
...@@ -337,10 +358,14 @@ static commit_id_t dm_integrity_commit_id(struct dm_integrity_c *ic, unsigned i, ...@@ -337,10 +358,14 @@ static commit_id_t dm_integrity_commit_id(struct dm_integrity_c *ic, unsigned i,
static void get_area_and_offset(struct dm_integrity_c *ic, sector_t data_sector, static void get_area_and_offset(struct dm_integrity_c *ic, sector_t data_sector,
sector_t *area, sector_t *offset) sector_t *area, sector_t *offset)
{ {
if (!ic->meta_dev) {
__u8 log2_interleave_sectors = ic->sb->log2_interleave_sectors; __u8 log2_interleave_sectors = ic->sb->log2_interleave_sectors;
*area = data_sector >> log2_interleave_sectors; *area = data_sector >> log2_interleave_sectors;
*offset = (unsigned)data_sector & ((1U << log2_interleave_sectors) - 1); *offset = (unsigned)data_sector & ((1U << log2_interleave_sectors) - 1);
} else {
*area = 0;
*offset = data_sector;
}
} }
#define sector_to_block(ic, n) \ #define sector_to_block(ic, n) \
...@@ -379,6 +404,9 @@ static sector_t get_data_sector(struct dm_integrity_c *ic, sector_t area, sector ...@@ -379,6 +404,9 @@ static sector_t get_data_sector(struct dm_integrity_c *ic, sector_t area, sector
{ {
sector_t result; sector_t result;
if (ic->meta_dev)
return offset;
result = area << ic->sb->log2_interleave_sectors; result = area << ic->sb->log2_interleave_sectors;
if (likely(ic->log2_metadata_run >= 0)) if (likely(ic->log2_metadata_run >= 0))
result += (area + 1) << ic->log2_metadata_run; result += (area + 1) << ic->log2_metadata_run;
...@@ -386,6 +414,8 @@ static sector_t get_data_sector(struct dm_integrity_c *ic, sector_t area, sector ...@@ -386,6 +414,8 @@ static sector_t get_data_sector(struct dm_integrity_c *ic, sector_t area, sector
result += (area + 1) * ic->metadata_run; result += (area + 1) * ic->metadata_run;
result += (sector_t)ic->initial_sectors + offset; result += (sector_t)ic->initial_sectors + offset;
result += ic->start;
return result; return result;
} }
...@@ -395,6 +425,14 @@ static void wraparound_section(struct dm_integrity_c *ic, unsigned *sec_ptr) ...@@ -395,6 +425,14 @@ static void wraparound_section(struct dm_integrity_c *ic, unsigned *sec_ptr)
*sec_ptr -= ic->journal_sections; *sec_ptr -= ic->journal_sections;
} }
static void sb_set_version(struct dm_integrity_c *ic)
{
if (ic->meta_dev || ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING))
ic->sb->version = SB_VERSION_2;
else
ic->sb->version = SB_VERSION_1;
}
static int sync_rw_sb(struct dm_integrity_c *ic, int op, int op_flags) static int sync_rw_sb(struct dm_integrity_c *ic, int op, int op_flags)
{ {
struct dm_io_request io_req; struct dm_io_request io_req;
...@@ -406,7 +444,7 @@ static int sync_rw_sb(struct dm_integrity_c *ic, int op, int op_flags) ...@@ -406,7 +444,7 @@ static int sync_rw_sb(struct dm_integrity_c *ic, int op, int op_flags)
io_req.mem.ptr.addr = ic->sb; io_req.mem.ptr.addr = ic->sb;
io_req.notify.fn = NULL; io_req.notify.fn = NULL;
io_req.client = ic->io; io_req.client = ic->io;
io_loc.bdev = ic->dev->bdev; io_loc.bdev = ic->meta_dev ? ic->meta_dev->bdev : ic->dev->bdev;
io_loc.sector = ic->start; io_loc.sector = ic->start;
io_loc.count = SB_SECTORS; io_loc.count = SB_SECTORS;
...@@ -753,7 +791,7 @@ static void rw_journal(struct dm_integrity_c *ic, int op, int op_flags, unsigned ...@@ -753,7 +791,7 @@ static void rw_journal(struct dm_integrity_c *ic, int op, int op_flags, unsigned
io_req.notify.fn = NULL; io_req.notify.fn = NULL;
} }
io_req.client = ic->io; io_req.client = ic->io;
io_loc.bdev = ic->dev->bdev; io_loc.bdev = ic->meta_dev ? ic->meta_dev->bdev : ic->dev->bdev;
io_loc.sector = ic->start + SB_SECTORS + sector; io_loc.sector = ic->start + SB_SECTORS + sector;
io_loc.count = n_sectors; io_loc.count = n_sectors;
...@@ -857,7 +895,7 @@ static void copy_from_journal(struct dm_integrity_c *ic, unsigned section, unsig ...@@ -857,7 +895,7 @@ static void copy_from_journal(struct dm_integrity_c *ic, unsigned section, unsig
io_req.notify.context = data; io_req.notify.context = data;
io_req.client = ic->io; io_req.client = ic->io;
io_loc.bdev = ic->dev->bdev; io_loc.bdev = ic->dev->bdev;
io_loc.sector = ic->start + target; io_loc.sector = target;
io_loc.count = n_sectors; io_loc.count = n_sectors;
r = dm_io(&io_req, 1, &io_loc, NULL); r = dm_io(&io_req, 1, &io_loc, NULL);
...@@ -867,13 +905,27 @@ static void copy_from_journal(struct dm_integrity_c *ic, unsigned section, unsig ...@@ -867,13 +905,27 @@ static void copy_from_journal(struct dm_integrity_c *ic, unsigned section, unsig
} }
} }
static bool add_new_range(struct dm_integrity_c *ic, struct dm_integrity_range *new_range) static bool ranges_overlap(struct dm_integrity_range *range1, struct dm_integrity_range *range2)
{
return range1->logical_sector < range2->logical_sector + range2->n_sectors &&
range2->logical_sector + range2->n_sectors > range2->logical_sector;
}
static bool add_new_range(struct dm_integrity_c *ic, struct dm_integrity_range *new_range, bool check_waiting)
{ {
struct rb_node **n = &ic->in_progress.rb_node; struct rb_node **n = &ic->in_progress.rb_node;
struct rb_node *parent; struct rb_node *parent;
BUG_ON((new_range->logical_sector | new_range->n_sectors) & (unsigned)(ic->sectors_per_block - 1)); BUG_ON((new_range->logical_sector | new_range->n_sectors) & (unsigned)(ic->sectors_per_block - 1));
if (likely(check_waiting)) {
struct dm_integrity_range *range;
list_for_each_entry(range, &ic->wait_list, wait_entry) {
if (unlikely(ranges_overlap(range, new_range)))
return false;
}
}
parent = NULL; parent = NULL;
while (*n) { while (*n) {
...@@ -898,7 +950,22 @@ static bool add_new_range(struct dm_integrity_c *ic, struct dm_integrity_range * ...@@ -898,7 +950,22 @@ static bool add_new_range(struct dm_integrity_c *ic, struct dm_integrity_range *
static void remove_range_unlocked(struct dm_integrity_c *ic, struct dm_integrity_range *range) static void remove_range_unlocked(struct dm_integrity_c *ic, struct dm_integrity_range *range)
{ {
rb_erase(&range->node, &ic->in_progress); rb_erase(&range->node, &ic->in_progress);
wake_up_locked(&ic->endio_wait); while (unlikely(!list_empty(&ic->wait_list))) {
struct dm_integrity_range *last_range =
list_first_entry(&ic->wait_list, struct dm_integrity_range, wait_entry);
struct task_struct *last_range_task;
if (!ranges_overlap(range, last_range))
break;
last_range_task = last_range->task;
list_del(&last_range->wait_entry);
if (!add_new_range(ic, last_range, false)) {
last_range->task = last_range_task;
list_add(&last_range->wait_entry, &ic->wait_list);
break;
}
last_range->waiting = false;
wake_up_process(last_range_task);
}
} }
static void remove_range(struct dm_integrity_c *ic, struct dm_integrity_range *range) static void remove_range(struct dm_integrity_c *ic, struct dm_integrity_range *range)
...@@ -910,6 +977,19 @@ static void remove_range(struct dm_integrity_c *ic, struct dm_integrity_range *r ...@@ -910,6 +977,19 @@ static void remove_range(struct dm_integrity_c *ic, struct dm_integrity_range *r
spin_unlock_irqrestore(&ic->endio_wait.lock, flags); spin_unlock_irqrestore(&ic->endio_wait.lock, flags);
} }
static void wait_and_add_new_range(struct dm_integrity_c *ic, struct dm_integrity_range *new_range)
{
new_range->waiting = true;
list_add_tail(&new_range->wait_entry, &ic->wait_list);
new_range->task = current;
do {
__set_current_state(TASK_UNINTERRUPTIBLE);
spin_unlock_irq(&ic->endio_wait.lock);
io_schedule();
spin_lock_irq(&ic->endio_wait.lock);
} while (unlikely(new_range->waiting));
}
static void init_journal_node(struct journal_node *node) static void init_journal_node(struct journal_node *node)
{ {
RB_CLEAR_NODE(&node->node); RB_CLEAR_NODE(&node->node);
...@@ -1599,8 +1679,12 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map ...@@ -1599,8 +1679,12 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map
dio->range.n_sectors = min(dio->range.n_sectors, dio->range.n_sectors = min(dio->range.n_sectors,
ic->free_sectors << ic->sb->log2_sectors_per_block); ic->free_sectors << ic->sb->log2_sectors_per_block);
if (unlikely(!dio->range.n_sectors)) if (unlikely(!dio->range.n_sectors)) {
goto sleep; if (from_map)
goto offload_to_thread;
sleep_on_endio_wait(ic);
goto retry;
}
range_sectors = dio->range.n_sectors >> ic->sb->log2_sectors_per_block; range_sectors = dio->range.n_sectors >> ic->sb->log2_sectors_per_block;
ic->free_sectors -= range_sectors; ic->free_sectors -= range_sectors;
journal_section = ic->free_section; journal_section = ic->free_section;
...@@ -1654,22 +1738,20 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map ...@@ -1654,22 +1738,20 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map
} }
} }
} }
if (unlikely(!add_new_range(ic, &dio->range))) { if (unlikely(!add_new_range(ic, &dio->range, true))) {
/* /*
* We must not sleep in the request routine because it could * We must not sleep in the request routine because it could
* stall bios on current->bio_list. * stall bios on current->bio_list.
* So, we offload the bio to a workqueue if we have to sleep. * So, we offload the bio to a workqueue if we have to sleep.
*/ */
sleep:
if (from_map) { if (from_map) {
offload_to_thread:
spin_unlock_irq(&ic->endio_wait.lock); spin_unlock_irq(&ic->endio_wait.lock);
INIT_WORK(&dio->work, integrity_bio_wait); INIT_WORK(&dio->work, integrity_bio_wait);
queue_work(ic->wait_wq, &dio->work); queue_work(ic->wait_wq, &dio->work);
return; return;
} else {
sleep_on_endio_wait(ic);
goto retry;
} }
wait_and_add_new_range(ic, &dio->range);
} }
spin_unlock_irq(&ic->endio_wait.lock); spin_unlock_irq(&ic->endio_wait.lock);
...@@ -1701,14 +1783,18 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map ...@@ -1701,14 +1783,18 @@ static void dm_integrity_map_continue(struct dm_integrity_io *dio, bool from_map
bio->bi_end_io = integrity_end_io; bio->bi_end_io = integrity_end_io;
bio->bi_iter.bi_size = dio->range.n_sectors << SECTOR_SHIFT; bio->bi_iter.bi_size = dio->range.n_sectors << SECTOR_SHIFT;
bio->bi_iter.bi_sector += ic->start;
generic_make_request(bio); generic_make_request(bio);
if (need_sync_io) { if (need_sync_io) {
wait_for_completion_io(&read_comp); wait_for_completion_io(&read_comp);
if (unlikely(ic->recalc_wq != NULL) &&
ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING) &&
dio->range.logical_sector + dio->range.n_sectors > le64_to_cpu(ic->sb->recalc_sector))
goto skip_check;
if (likely(!bio->bi_status)) if (likely(!bio->bi_status))
integrity_metadata(&dio->work); integrity_metadata(&dio->work);
else else
skip_check:
dec_in_flight(dio); dec_in_flight(dio);
} else { } else {
...@@ -1892,8 +1978,8 @@ static void do_journal_write(struct dm_integrity_c *ic, unsigned write_start, ...@@ -1892,8 +1978,8 @@ static void do_journal_write(struct dm_integrity_c *ic, unsigned write_start,
io->range.n_sectors = (k - j) << ic->sb->log2_sectors_per_block; io->range.n_sectors = (k - j) << ic->sb->log2_sectors_per_block;
spin_lock_irq(&ic->endio_wait.lock); spin_lock_irq(&ic->endio_wait.lock);
while (unlikely(!add_new_range(ic, &io->range))) if (unlikely(!add_new_range(ic, &io->range, true)))
sleep_on_endio_wait(ic); wait_and_add_new_range(ic, &io->range);
if (likely(!from_replay)) { if (likely(!from_replay)) {
struct journal_node *section_node = &ic->journal_tree[i * ic->journal_section_entries]; struct journal_node *section_node = &ic->journal_tree[i * ic->journal_section_entries];
...@@ -1981,7 +2067,7 @@ static void integrity_writer(struct work_struct *w) ...@@ -1981,7 +2067,7 @@ static void integrity_writer(struct work_struct *w)
unsigned prev_free_sectors; unsigned prev_free_sectors;
/* the following test is not needed, but it tests the replay code */ /* the following test is not needed, but it tests the replay code */
if (READ_ONCE(ic->suspending)) if (READ_ONCE(ic->suspending) && !ic->meta_dev)
return; return;
spin_lock_irq(&ic->endio_wait.lock); spin_lock_irq(&ic->endio_wait.lock);
...@@ -2008,6 +2094,108 @@ static void integrity_writer(struct work_struct *w) ...@@ -2008,6 +2094,108 @@ static void integrity_writer(struct work_struct *w)
spin_unlock_irq(&ic->endio_wait.lock); spin_unlock_irq(&ic->endio_wait.lock);
} }
static void recalc_write_super(struct dm_integrity_c *ic)
{
int r;
dm_integrity_flush_buffers(ic);
if (dm_integrity_failed(ic))
return;
sb_set_version(ic);
r = sync_rw_sb(ic, REQ_OP_WRITE, 0);
if (unlikely(r))
dm_integrity_io_error(ic, "writing superblock", r);
}
static void integrity_recalc(struct work_struct *w)
{
struct dm_integrity_c *ic = container_of(w, struct dm_integrity_c, recalc_work);
struct dm_integrity_range range;
struct dm_io_request io_req;
struct dm_io_region io_loc;
sector_t area, offset;
sector_t metadata_block;
unsigned metadata_offset;
__u8 *t;
unsigned i;
int r;
unsigned super_counter = 0;
spin_lock_irq(&ic->endio_wait.lock);
next_chunk:
if (unlikely(READ_ONCE(ic->suspending)))
goto unlock_ret;
range.logical_sector = le64_to_cpu(ic->sb->recalc_sector);
if (unlikely(range.logical_sector >= ic->provided_data_sectors))
goto unlock_ret;
get_area_and_offset(ic, range.logical_sector, &area, &offset);
range.n_sectors = min((sector_t)RECALC_SECTORS, ic->provided_data_sectors - range.logical_sector);
if (!ic->meta_dev)
range.n_sectors = min(range.n_sectors, (1U << ic->sb->log2_interleave_sectors) - (unsigned)offset);
if (unlikely(!add_new_range(ic, &range, true)))
wait_and_add_new_range(ic, &range);
spin_unlock_irq(&ic->endio_wait.lock);
if (unlikely(++super_counter == RECALC_WRITE_SUPER)) {
recalc_write_super(ic);
super_counter = 0;
}
if (unlikely(dm_integrity_failed(ic)))
goto err;
io_req.bi_op = REQ_OP_READ;
io_req.bi_op_flags = 0;
io_req.mem.type = DM_IO_VMA;
io_req.mem.ptr.addr = ic->recalc_buffer;
io_req.notify.fn = NULL;
io_req.client = ic->io;
io_loc.bdev = ic->dev->bdev;
io_loc.sector = get_data_sector(ic, area, offset);
io_loc.count = range.n_sectors;
r = dm_io(&io_req, 1, &io_loc, NULL);
if (unlikely(r)) {
dm_integrity_io_error(ic, "reading data", r);
goto err;
}
t = ic->recalc_tags;
for (i = 0; i < range.n_sectors; i += ic->sectors_per_block) {
integrity_sector_checksum(ic, range.logical_sector + i, ic->recalc_buffer + (i << SECTOR_SHIFT), t);
t += ic->tag_size;
}
metadata_block = get_metadata_sector_and_offset(ic, area, offset, &metadata_offset);
r = dm_integrity_rw_tag(ic, ic->recalc_tags, &metadata_block, &metadata_offset, t - ic->recalc_tags, TAG_WRITE);
if (unlikely(r)) {
dm_integrity_io_error(ic, "writing tags", r);
goto err;
}
spin_lock_irq(&ic->endio_wait.lock);
remove_range_unlocked(ic, &range);
ic->sb->recalc_sector = cpu_to_le64(range.logical_sector + range.n_sectors);
goto next_chunk;
err:
remove_range(ic, &range);
return;
unlock_ret:
spin_unlock_irq(&ic->endio_wait.lock);
recalc_write_super(ic);
}
static void init_journal(struct dm_integrity_c *ic, unsigned start_section, static void init_journal(struct dm_integrity_c *ic, unsigned start_section,
unsigned n_sections, unsigned char commit_seq) unsigned n_sections, unsigned char commit_seq)
{ {
...@@ -2210,17 +2398,22 @@ static void dm_integrity_postsuspend(struct dm_target *ti) ...@@ -2210,17 +2398,22 @@ static void dm_integrity_postsuspend(struct dm_target *ti)
del_timer_sync(&ic->autocommit_timer); del_timer_sync(&ic->autocommit_timer);
ic->suspending = true; WRITE_ONCE(ic->suspending, 1);
if (ic->recalc_wq)
drain_workqueue(ic->recalc_wq);
queue_work(ic->commit_wq, &ic->commit_work); queue_work(ic->commit_wq, &ic->commit_work);
drain_workqueue(ic->commit_wq); drain_workqueue(ic->commit_wq);
if (ic->mode == 'J') { if (ic->mode == 'J') {
if (ic->meta_dev)
queue_work(ic->writer_wq, &ic->writer_work);
drain_workqueue(ic->writer_wq); drain_workqueue(ic->writer_wq);
dm_integrity_flush_buffers(ic); dm_integrity_flush_buffers(ic);
} }
ic->suspending = false; WRITE_ONCE(ic->suspending, 0);
BUG_ON(!RB_EMPTY_ROOT(&ic->in_progress)); BUG_ON(!RB_EMPTY_ROOT(&ic->in_progress));
...@@ -2232,6 +2425,16 @@ static void dm_integrity_resume(struct dm_target *ti) ...@@ -2232,6 +2425,16 @@ static void dm_integrity_resume(struct dm_target *ti)
struct dm_integrity_c *ic = (struct dm_integrity_c *)ti->private; struct dm_integrity_c *ic = (struct dm_integrity_c *)ti->private;
replay_journal(ic); replay_journal(ic);
if (ic->recalc_wq && ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING)) {
__u64 recalc_pos = le64_to_cpu(ic->sb->recalc_sector);
if (recalc_pos < ic->provided_data_sectors) {
queue_work(ic->recalc_wq, &ic->recalc_work);
} else if (recalc_pos > ic->provided_data_sectors) {
ic->sb->recalc_sector = cpu_to_le64(ic->provided_data_sectors);
recalc_write_super(ic);
}
}
} }
static void dm_integrity_status(struct dm_target *ti, status_type_t type, static void dm_integrity_status(struct dm_target *ti, status_type_t type,
...@@ -2243,7 +2446,13 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type, ...@@ -2243,7 +2446,13 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
switch (type) { switch (type) {
case STATUSTYPE_INFO: case STATUSTYPE_INFO:
DMEMIT("%llu", (unsigned long long)atomic64_read(&ic->number_of_mismatches)); DMEMIT("%llu %llu",
(unsigned long long)atomic64_read(&ic->number_of_mismatches),
(unsigned long long)ic->provided_data_sectors);
if (ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING))
DMEMIT(" %llu", (unsigned long long)le64_to_cpu(ic->sb->recalc_sector));
else
DMEMIT(" -");
break; break;
case STATUSTYPE_TABLE: { case STATUSTYPE_TABLE: {
...@@ -2251,19 +2460,25 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type, ...@@ -2251,19 +2460,25 @@ static void dm_integrity_status(struct dm_target *ti, status_type_t type,
watermark_percentage += ic->journal_entries / 2; watermark_percentage += ic->journal_entries / 2;
do_div(watermark_percentage, ic->journal_entries); do_div(watermark_percentage, ic->journal_entries);
arg_count = 5; arg_count = 5;
arg_count += !!ic->meta_dev;
arg_count += ic->sectors_per_block != 1; arg_count += ic->sectors_per_block != 1;
arg_count += !!(ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING));
arg_count += !!ic->internal_hash_alg.alg_string; arg_count += !!ic->internal_hash_alg.alg_string;
arg_count += !!ic->journal_crypt_alg.alg_string; arg_count += !!ic->journal_crypt_alg.alg_string;
arg_count += !!ic->journal_mac_alg.alg_string; arg_count += !!ic->journal_mac_alg.alg_string;
DMEMIT("%s %llu %u %c %u", ic->dev->name, (unsigned long long)ic->start, DMEMIT("%s %llu %u %c %u", ic->dev->name, (unsigned long long)ic->start,
ic->tag_size, ic->mode, arg_count); ic->tag_size, ic->mode, arg_count);
if (ic->meta_dev)
DMEMIT(" meta_device:%s", ic->meta_dev->name);
if (ic->sectors_per_block != 1)
DMEMIT(" block_size:%u", ic->sectors_per_block << SECTOR_SHIFT);
if (ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING))
DMEMIT(" recalculate");
DMEMIT(" journal_sectors:%u", ic->initial_sectors - SB_SECTORS); DMEMIT(" journal_sectors:%u", ic->initial_sectors - SB_SECTORS);
DMEMIT(" interleave_sectors:%u", 1U << ic->sb->log2_interleave_sectors); DMEMIT(" interleave_sectors:%u", 1U << ic->sb->log2_interleave_sectors);
DMEMIT(" buffer_sectors:%u", 1U << ic->log2_buffer_sectors); DMEMIT(" buffer_sectors:%u", 1U << ic->log2_buffer_sectors);
DMEMIT(" journal_watermark:%u", (unsigned)watermark_percentage); DMEMIT(" journal_watermark:%u", (unsigned)watermark_percentage);
DMEMIT(" commit_time:%u", ic->autocommit_msec); DMEMIT(" commit_time:%u", ic->autocommit_msec);
if (ic->sectors_per_block != 1)
DMEMIT(" block_size:%u", ic->sectors_per_block << SECTOR_SHIFT);
#define EMIT_ALG(a, n) \ #define EMIT_ALG(a, n) \
do { \ do { \
...@@ -2286,7 +2501,10 @@ static int dm_integrity_iterate_devices(struct dm_target *ti, ...@@ -2286,7 +2501,10 @@ static int dm_integrity_iterate_devices(struct dm_target *ti,
{ {
struct dm_integrity_c *ic = ti->private; struct dm_integrity_c *ic = ti->private;
if (!ic->meta_dev)
return fn(ti, ic->dev, ic->start + ic->initial_sectors + ic->metadata_run, ti->len, data); return fn(ti, ic->dev, ic->start + ic->initial_sectors + ic->metadata_run, ti->len, data);
else
return fn(ti, ic->dev, 0, ti->len, data);
} }
static void dm_integrity_io_hints(struct dm_target *ti, struct queue_limits *limits) static void dm_integrity_io_hints(struct dm_target *ti, struct queue_limits *limits)
...@@ -2319,14 +2537,16 @@ static void calculate_journal_section_size(struct dm_integrity_c *ic) ...@@ -2319,14 +2537,16 @@ static void calculate_journal_section_size(struct dm_integrity_c *ic)
static int calculate_device_limits(struct dm_integrity_c *ic) static int calculate_device_limits(struct dm_integrity_c *ic)
{ {
__u64 initial_sectors; __u64 initial_sectors;
sector_t last_sector, last_area, last_offset;
calculate_journal_section_size(ic); calculate_journal_section_size(ic);
initial_sectors = SB_SECTORS + (__u64)ic->journal_section_sectors * ic->journal_sections; initial_sectors = SB_SECTORS + (__u64)ic->journal_section_sectors * ic->journal_sections;
if (initial_sectors + METADATA_PADDING_SECTORS >= ic->device_sectors || initial_sectors > UINT_MAX) if (initial_sectors + METADATA_PADDING_SECTORS >= ic->meta_device_sectors || initial_sectors > UINT_MAX)
return -EINVAL; return -EINVAL;
ic->initial_sectors = initial_sectors; ic->initial_sectors = initial_sectors;
if (!ic->meta_dev) {
sector_t last_sector, last_area, last_offset;
ic->metadata_run = roundup((__u64)ic->tag_size << (ic->sb->log2_interleave_sectors - ic->sb->log2_sectors_per_block), ic->metadata_run = roundup((__u64)ic->tag_size << (ic->sb->log2_interleave_sectors - ic->sb->log2_sectors_per_block),
(__u64)(1 << SECTOR_SHIFT << METADATA_PADDING_SECTORS)) >> SECTOR_SHIFT; (__u64)(1 << SECTOR_SHIFT << METADATA_PADDING_SECTORS)) >> SECTOR_SHIFT;
if (!(ic->metadata_run & (ic->metadata_run - 1))) if (!(ic->metadata_run & (ic->metadata_run - 1)))
...@@ -2336,9 +2556,19 @@ static int calculate_device_limits(struct dm_integrity_c *ic) ...@@ -2336,9 +2556,19 @@ static int calculate_device_limits(struct dm_integrity_c *ic)
get_area_and_offset(ic, ic->provided_data_sectors - 1, &last_area, &last_offset); get_area_and_offset(ic, ic->provided_data_sectors - 1, &last_area, &last_offset);
last_sector = get_data_sector(ic, last_area, last_offset); last_sector = get_data_sector(ic, last_area, last_offset);
if (last_sector < ic->start || last_sector >= ic->meta_device_sectors)
if (ic->start + last_sector < last_sector || ic->start + last_sector >= ic->device_sectors)
return -EINVAL; return -EINVAL;
} else {
__u64 meta_size = ic->provided_data_sectors * ic->tag_size;
meta_size = (meta_size + ((1U << (ic->log2_buffer_sectors + SECTOR_SHIFT)) - 1))
>> (ic->log2_buffer_sectors + SECTOR_SHIFT);
meta_size <<= ic->log2_buffer_sectors;
if (ic->initial_sectors + meta_size < ic->initial_sectors ||
ic->initial_sectors + meta_size > ic->meta_device_sectors)
return -EINVAL;
ic->metadata_run = 1;
ic->log2_metadata_run = 0;
}
return 0; return 0;
} }
...@@ -2350,7 +2580,6 @@ static int initialize_superblock(struct dm_integrity_c *ic, unsigned journal_sec ...@@ -2350,7 +2580,6 @@ static int initialize_superblock(struct dm_integrity_c *ic, unsigned journal_sec
memset(ic->sb, 0, SB_SECTORS << SECTOR_SHIFT); memset(ic->sb, 0, SB_SECTORS << SECTOR_SHIFT);
memcpy(ic->sb->magic, SB_MAGIC, 8); memcpy(ic->sb->magic, SB_MAGIC, 8);
ic->sb->version = SB_VERSION;
ic->sb->integrity_tag_size = cpu_to_le16(ic->tag_size); ic->sb->integrity_tag_size = cpu_to_le16(ic->tag_size);
ic->sb->log2_sectors_per_block = __ffs(ic->sectors_per_block); ic->sb->log2_sectors_per_block = __ffs(ic->sectors_per_block);
if (ic->journal_mac_alg.alg_string) if (ic->journal_mac_alg.alg_string)
...@@ -2360,8 +2589,9 @@ static int initialize_superblock(struct dm_integrity_c *ic, unsigned journal_sec ...@@ -2360,8 +2589,9 @@ static int initialize_superblock(struct dm_integrity_c *ic, unsigned journal_sec
journal_sections = journal_sectors / ic->journal_section_sectors; journal_sections = journal_sectors / ic->journal_section_sectors;
if (!journal_sections) if (!journal_sections)
journal_sections = 1; journal_sections = 1;
ic->sb->journal_sections = cpu_to_le32(journal_sections);
if (!ic->meta_dev) {
ic->sb->journal_sections = cpu_to_le32(journal_sections);
if (!interleave_sectors) if (!interleave_sectors)
interleave_sectors = DEFAULT_INTERLEAVE_SECTORS; interleave_sectors = DEFAULT_INTERLEAVE_SECTORS;
ic->sb->log2_interleave_sectors = __fls(interleave_sectors); ic->sb->log2_interleave_sectors = __fls(interleave_sectors);
...@@ -2369,19 +2599,45 @@ static int initialize_superblock(struct dm_integrity_c *ic, unsigned journal_sec ...@@ -2369,19 +2599,45 @@ static int initialize_superblock(struct dm_integrity_c *ic, unsigned journal_sec
ic->sb->log2_interleave_sectors = min((__u8)MAX_LOG2_INTERLEAVE_SECTORS, ic->sb->log2_interleave_sectors); ic->sb->log2_interleave_sectors = min((__u8)MAX_LOG2_INTERLEAVE_SECTORS, ic->sb->log2_interleave_sectors);
ic->provided_data_sectors = 0; ic->provided_data_sectors = 0;
for (test_bit = fls64(ic->device_sectors) - 1; test_bit >= 3; test_bit--) { for (test_bit = fls64(ic->meta_device_sectors) - 1; test_bit >= 3; test_bit--) {
__u64 prev_data_sectors = ic->provided_data_sectors; __u64 prev_data_sectors = ic->provided_data_sectors;
ic->provided_data_sectors |= (sector_t)1 << test_bit; ic->provided_data_sectors |= (sector_t)1 << test_bit;
if (calculate_device_limits(ic)) if (calculate_device_limits(ic))
ic->provided_data_sectors = prev_data_sectors; ic->provided_data_sectors = prev_data_sectors;
} }
if (!ic->provided_data_sectors) if (!ic->provided_data_sectors)
return -EINVAL; return -EINVAL;
} else {
ic->sb->log2_interleave_sectors = 0;
ic->provided_data_sectors = ic->data_device_sectors;
ic->provided_data_sectors &= ~(sector_t)(ic->sectors_per_block - 1);
try_smaller_buffer:
ic->sb->journal_sections = cpu_to_le32(0);
for (test_bit = fls(journal_sections) - 1; test_bit >= 0; test_bit--) {
__u32 prev_journal_sections = le32_to_cpu(ic->sb->journal_sections);
__u32 test_journal_sections = prev_journal_sections | (1U << test_bit);
if (test_journal_sections > journal_sections)
continue;
ic->sb->journal_sections = cpu_to_le32(test_journal_sections);
if (calculate_device_limits(ic))
ic->sb->journal_sections = cpu_to_le32(prev_journal_sections);
}
if (!le32_to_cpu(ic->sb->journal_sections)) {
if (ic->log2_buffer_sectors > 3) {
ic->log2_buffer_sectors--;
goto try_smaller_buffer;
}
return -EINVAL;
}
}
ic->sb->provided_data_sectors = cpu_to_le64(ic->provided_data_sectors); ic->sb->provided_data_sectors = cpu_to_le64(ic->provided_data_sectors);
sb_set_version(ic);
return 0; return 0;
} }
...@@ -2828,6 +3084,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -2828,6 +3084,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
{0, 9, "Invalid number of feature args"}, {0, 9, "Invalid number of feature args"},
}; };
unsigned journal_sectors, interleave_sectors, buffer_sectors, journal_watermark, sync_msec; unsigned journal_sectors, interleave_sectors, buffer_sectors, journal_watermark, sync_msec;
bool recalculate;
bool should_write_sb; bool should_write_sb;
__u64 threshold; __u64 threshold;
unsigned long long start; unsigned long long start;
...@@ -2848,6 +3105,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -2848,6 +3105,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
ti->per_io_data_size = sizeof(struct dm_integrity_io); ti->per_io_data_size = sizeof(struct dm_integrity_io);
ic->in_progress = RB_ROOT; ic->in_progress = RB_ROOT;
INIT_LIST_HEAD(&ic->wait_list);
init_waitqueue_head(&ic->endio_wait); init_waitqueue_head(&ic->endio_wait);
bio_list_init(&ic->flush_bio_list); bio_list_init(&ic->flush_bio_list);
init_waitqueue_head(&ic->copy_to_journal_wait); init_waitqueue_head(&ic->copy_to_journal_wait);
...@@ -2883,13 +3141,12 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -2883,13 +3141,12 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
goto bad; goto bad;
} }
ic->device_sectors = i_size_read(ic->dev->bdev->bd_inode) >> SECTOR_SHIFT; journal_sectors = 0;
journal_sectors = min((sector_t)DEFAULT_MAX_JOURNAL_SECTORS,
ic->device_sectors >> DEFAULT_JOURNAL_SIZE_FACTOR);
interleave_sectors = DEFAULT_INTERLEAVE_SECTORS; interleave_sectors = DEFAULT_INTERLEAVE_SECTORS;
buffer_sectors = DEFAULT_BUFFER_SECTORS; buffer_sectors = DEFAULT_BUFFER_SECTORS;
journal_watermark = DEFAULT_JOURNAL_WATERMARK; journal_watermark = DEFAULT_JOURNAL_WATERMARK;
sync_msec = DEFAULT_SYNC_MSEC; sync_msec = DEFAULT_SYNC_MSEC;
recalculate = false;
ic->sectors_per_block = 1; ic->sectors_per_block = 1;
as.argc = argc - DIRECT_ARGUMENTS; as.argc = argc - DIRECT_ARGUMENTS;
...@@ -2908,7 +3165,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -2908,7 +3165,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
goto bad; goto bad;
} }
if (sscanf(opt_string, "journal_sectors:%u%c", &val, &dummy) == 1) if (sscanf(opt_string, "journal_sectors:%u%c", &val, &dummy) == 1)
journal_sectors = val; journal_sectors = val ? val : 1;
else if (sscanf(opt_string, "interleave_sectors:%u%c", &val, &dummy) == 1) else if (sscanf(opt_string, "interleave_sectors:%u%c", &val, &dummy) == 1)
interleave_sectors = val; interleave_sectors = val;
else if (sscanf(opt_string, "buffer_sectors:%u%c", &val, &dummy) == 1) else if (sscanf(opt_string, "buffer_sectors:%u%c", &val, &dummy) == 1)
...@@ -2917,7 +3174,17 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -2917,7 +3174,17 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
journal_watermark = val; journal_watermark = val;
else if (sscanf(opt_string, "commit_time:%u%c", &val, &dummy) == 1) else if (sscanf(opt_string, "commit_time:%u%c", &val, &dummy) == 1)
sync_msec = val; sync_msec = val;
else if (sscanf(opt_string, "block_size:%u%c", &val, &dummy) == 1) { else if (!memcmp(opt_string, "meta_device:", strlen("meta_device:"))) {
if (ic->meta_dev) {
dm_put_device(ti, ic->meta_dev);
ic->meta_dev = NULL;
}
r = dm_get_device(ti, strchr(opt_string, ':') + 1, dm_table_get_mode(ti->table), &ic->meta_dev);
if (r) {
ti->error = "Device lookup failed";
goto bad;
}
} else if (sscanf(opt_string, "block_size:%u%c", &val, &dummy) == 1) {
if (val < 1 << SECTOR_SHIFT || if (val < 1 << SECTOR_SHIFT ||
val > MAX_SECTORS_PER_BLOCK << SECTOR_SHIFT || val > MAX_SECTORS_PER_BLOCK << SECTOR_SHIFT ||
(val & (val -1))) { (val & (val -1))) {
...@@ -2941,6 +3208,8 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -2941,6 +3208,8 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
"Invalid journal_mac argument"); "Invalid journal_mac argument");
if (r) if (r)
goto bad; goto bad;
} else if (!strcmp(opt_string, "recalculate")) {
recalculate = true;
} else { } else {
r = -EINVAL; r = -EINVAL;
ti->error = "Invalid argument"; ti->error = "Invalid argument";
...@@ -2948,6 +3217,21 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -2948,6 +3217,21 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
} }
} }
ic->data_device_sectors = i_size_read(ic->dev->bdev->bd_inode) >> SECTOR_SHIFT;
if (!ic->meta_dev)
ic->meta_device_sectors = ic->data_device_sectors;
else
ic->meta_device_sectors = i_size_read(ic->meta_dev->bdev->bd_inode) >> SECTOR_SHIFT;
if (!journal_sectors) {
journal_sectors = min((sector_t)DEFAULT_MAX_JOURNAL_SECTORS,
ic->data_device_sectors >> DEFAULT_JOURNAL_SIZE_FACTOR);
}
if (!buffer_sectors)
buffer_sectors = 1;
ic->log2_buffer_sectors = min((int)__fls(buffer_sectors), 31 - SECTOR_SHIFT);
r = get_mac(&ic->internal_hash, &ic->internal_hash_alg, &ti->error, r = get_mac(&ic->internal_hash, &ic->internal_hash_alg, &ti->error,
"Invalid internal hash", "Error setting internal hash key"); "Invalid internal hash", "Error setting internal hash key");
if (r) if (r)
...@@ -3062,7 +3346,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -3062,7 +3346,7 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
should_write_sb = true; should_write_sb = true;
} }
if (ic->sb->version != SB_VERSION) { if (!ic->sb->version || ic->sb->version > SB_VERSION_2) {
r = -EINVAL; r = -EINVAL;
ti->error = "Unknown version"; ti->error = "Unknown version";
goto bad; goto bad;
...@@ -3083,12 +3367,20 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -3083,12 +3367,20 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
goto bad; goto bad;
} }
/* make sure that ti->max_io_len doesn't overflow */ /* make sure that ti->max_io_len doesn't overflow */
if (!ic->meta_dev) {
if (ic->sb->log2_interleave_sectors < MIN_LOG2_INTERLEAVE_SECTORS || if (ic->sb->log2_interleave_sectors < MIN_LOG2_INTERLEAVE_SECTORS ||
ic->sb->log2_interleave_sectors > MAX_LOG2_INTERLEAVE_SECTORS) { ic->sb->log2_interleave_sectors > MAX_LOG2_INTERLEAVE_SECTORS) {
r = -EINVAL; r = -EINVAL;
ti->error = "Invalid interleave_sectors in the superblock"; ti->error = "Invalid interleave_sectors in the superblock";
goto bad; goto bad;
} }
} else {
if (ic->sb->log2_interleave_sectors) {
r = -EINVAL;
ti->error = "Invalid interleave_sectors in the superblock";
goto bad;
}
}
ic->provided_data_sectors = le64_to_cpu(ic->sb->provided_data_sectors); ic->provided_data_sectors = le64_to_cpu(ic->sb->provided_data_sectors);
if (ic->provided_data_sectors != le64_to_cpu(ic->sb->provided_data_sectors)) { if (ic->provided_data_sectors != le64_to_cpu(ic->sb->provided_data_sectors)) {
/* test for overflow */ /* test for overflow */
...@@ -3101,20 +3393,28 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -3101,20 +3393,28 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
ti->error = "Journal mac mismatch"; ti->error = "Journal mac mismatch";
goto bad; goto bad;
} }
try_smaller_buffer:
r = calculate_device_limits(ic); r = calculate_device_limits(ic);
if (r) { if (r) {
if (ic->meta_dev) {
if (ic->log2_buffer_sectors > 3) {
ic->log2_buffer_sectors--;
goto try_smaller_buffer;
}
}
ti->error = "The device is too small"; ti->error = "The device is too small";
goto bad; goto bad;
} }
if (!ic->meta_dev)
ic->log2_buffer_sectors = min(ic->log2_buffer_sectors, (__u8)__ffs(ic->metadata_run));
if (ti->len > ic->provided_data_sectors) { if (ti->len > ic->provided_data_sectors) {
r = -EINVAL; r = -EINVAL;
ti->error = "Not enough provided sectors for requested mapping size"; ti->error = "Not enough provided sectors for requested mapping size";
goto bad; goto bad;
} }
if (!buffer_sectors)
buffer_sectors = 1;
ic->log2_buffer_sectors = min3((int)__fls(buffer_sectors), (int)__ffs(ic->metadata_run), 31 - SECTOR_SHIFT);
threshold = (__u64)ic->journal_entries * (100 - journal_watermark); threshold = (__u64)ic->journal_entries * (100 - journal_watermark);
threshold += 50; threshold += 50;
...@@ -3138,8 +3438,40 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -3138,8 +3438,40 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
(unsigned long long)ic->provided_data_sectors); (unsigned long long)ic->provided_data_sectors);
DEBUG_print(" log2_buffer_sectors %u\n", ic->log2_buffer_sectors); DEBUG_print(" log2_buffer_sectors %u\n", ic->log2_buffer_sectors);
ic->bufio = dm_bufio_client_create(ic->dev->bdev, 1U << (SECTOR_SHIFT + ic->log2_buffer_sectors), if (recalculate && !(ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING))) {
1, 0, NULL, NULL); ic->sb->flags |= cpu_to_le32(SB_FLAG_RECALCULATING);
ic->sb->recalc_sector = cpu_to_le64(0);
}
if (ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING)) {
if (!ic->internal_hash) {
r = -EINVAL;
ti->error = "Recalculate is only valid with internal hash";
goto bad;
}
ic->recalc_wq = alloc_workqueue("dm-intergrity-recalc", WQ_MEM_RECLAIM, 1);
if (!ic->recalc_wq ) {
ti->error = "Cannot allocate workqueue";
r = -ENOMEM;
goto bad;
}
INIT_WORK(&ic->recalc_work, integrity_recalc);
ic->recalc_buffer = vmalloc(RECALC_SECTORS << SECTOR_SHIFT);
if (!ic->recalc_buffer) {
ti->error = "Cannot allocate buffer for recalculating";
r = -ENOMEM;
goto bad;
}
ic->recalc_tags = kvmalloc((RECALC_SECTORS >> ic->sb->log2_sectors_per_block) * ic->tag_size, GFP_KERNEL);
if (!ic->recalc_tags) {
ti->error = "Cannot allocate tags for recalculating";
r = -ENOMEM;
goto bad;
}
}
ic->bufio = dm_bufio_client_create(ic->meta_dev ? ic->meta_dev->bdev : ic->dev->bdev,
1U << (SECTOR_SHIFT + ic->log2_buffer_sectors), 1, 0, NULL, NULL);
if (IS_ERR(ic->bufio)) { if (IS_ERR(ic->bufio)) {
r = PTR_ERR(ic->bufio); r = PTR_ERR(ic->bufio);
ti->error = "Cannot initialize dm-bufio"; ti->error = "Cannot initialize dm-bufio";
...@@ -3171,9 +3503,11 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv) ...@@ -3171,9 +3503,11 @@ static int dm_integrity_ctr(struct dm_target *ti, unsigned argc, char **argv)
ic->just_formatted = true; ic->just_formatted = true;
} }
if (!ic->meta_dev) {
r = dm_set_target_max_io_len(ti, 1U << ic->sb->log2_interleave_sectors); r = dm_set_target_max_io_len(ti, 1U << ic->sb->log2_interleave_sectors);
if (r) if (r)
goto bad; goto bad;
}
if (!ic->internal_hash) if (!ic->internal_hash)
dm_integrity_set(ti, ic); dm_integrity_set(ti, ic);
...@@ -3192,6 +3526,7 @@ static void dm_integrity_dtr(struct dm_target *ti) ...@@ -3192,6 +3526,7 @@ static void dm_integrity_dtr(struct dm_target *ti)
struct dm_integrity_c *ic = ti->private; struct dm_integrity_c *ic = ti->private;
BUG_ON(!RB_EMPTY_ROOT(&ic->in_progress)); BUG_ON(!RB_EMPTY_ROOT(&ic->in_progress));
BUG_ON(!list_empty(&ic->wait_list));
if (ic->metadata_wq) if (ic->metadata_wq)
destroy_workqueue(ic->metadata_wq); destroy_workqueue(ic->metadata_wq);
...@@ -3201,6 +3536,12 @@ static void dm_integrity_dtr(struct dm_target *ti) ...@@ -3201,6 +3536,12 @@ static void dm_integrity_dtr(struct dm_target *ti)
destroy_workqueue(ic->commit_wq); destroy_workqueue(ic->commit_wq);
if (ic->writer_wq) if (ic->writer_wq)
destroy_workqueue(ic->writer_wq); destroy_workqueue(ic->writer_wq);
if (ic->recalc_wq)
destroy_workqueue(ic->recalc_wq);
if (ic->recalc_buffer)
vfree(ic->recalc_buffer);
if (ic->recalc_tags)
kvfree(ic->recalc_tags);
if (ic->bufio) if (ic->bufio)
dm_bufio_client_destroy(ic->bufio); dm_bufio_client_destroy(ic->bufio);
mempool_exit(&ic->journal_io_mempool); mempool_exit(&ic->journal_io_mempool);
...@@ -3208,6 +3549,8 @@ static void dm_integrity_dtr(struct dm_target *ti) ...@@ -3208,6 +3549,8 @@ static void dm_integrity_dtr(struct dm_target *ti)
dm_io_client_destroy(ic->io); dm_io_client_destroy(ic->io);
if (ic->dev) if (ic->dev)
dm_put_device(ti, ic->dev); dm_put_device(ti, ic->dev);
if (ic->meta_dev)
dm_put_device(ti, ic->meta_dev);
dm_integrity_free_page_list(ic, ic->journal); dm_integrity_free_page_list(ic, ic->journal);
dm_integrity_free_page_list(ic, ic->journal_io); dm_integrity_free_page_list(ic, ic->journal_io);
dm_integrity_free_page_list(ic, ic->journal_xor); dm_integrity_free_page_list(ic, ic->journal_xor);
...@@ -3248,7 +3591,7 @@ static void dm_integrity_dtr(struct dm_target *ti) ...@@ -3248,7 +3591,7 @@ static void dm_integrity_dtr(struct dm_target *ti)
static struct target_type integrity_target = { static struct target_type integrity_target = {
.name = "integrity", .name = "integrity",
.version = {1, 1, 0}, .version = {1, 2, 0},
.module = THIS_MODULE, .module = THIS_MODULE,
.features = DM_TARGET_SINGLETON | DM_TARGET_INTEGRITY, .features = DM_TARGET_SINGLETON | DM_TARGET_INTEGRITY,
.ctr = dm_integrity_ctr, .ctr = dm_integrity_ctr,
......
...@@ -487,6 +487,8 @@ static int run_complete_job(struct kcopyd_job *job) ...@@ -487,6 +487,8 @@ static int run_complete_job(struct kcopyd_job *job)
if (atomic_dec_and_test(&kc->nr_jobs)) if (atomic_dec_and_test(&kc->nr_jobs))
wake_up(&kc->destroyq); wake_up(&kc->destroyq);
cond_resched();
return 0; return 0;
} }
...@@ -741,7 +743,7 @@ static void split_job(struct kcopyd_job *master_job) ...@@ -741,7 +743,7 @@ static void split_job(struct kcopyd_job *master_job)
} }
} }
int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, void dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
unsigned int num_dests, struct dm_io_region *dests, unsigned int num_dests, struct dm_io_region *dests,
unsigned int flags, dm_kcopyd_notify_fn fn, void *context) unsigned int flags, dm_kcopyd_notify_fn fn, void *context)
{ {
...@@ -818,16 +820,14 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, ...@@ -818,16 +820,14 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
job->progress = 0; job->progress = 0;
split_job(job); split_job(job);
} }
return 0;
} }
EXPORT_SYMBOL(dm_kcopyd_copy); EXPORT_SYMBOL(dm_kcopyd_copy);
int dm_kcopyd_zero(struct dm_kcopyd_client *kc, void dm_kcopyd_zero(struct dm_kcopyd_client *kc,
unsigned num_dests, struct dm_io_region *dests, unsigned num_dests, struct dm_io_region *dests,
unsigned flags, dm_kcopyd_notify_fn fn, void *context) unsigned flags, dm_kcopyd_notify_fn fn, void *context)
{ {
return dm_kcopyd_copy(kc, NULL, num_dests, dests, flags, fn, context); dm_kcopyd_copy(kc, NULL, num_dests, dests, flags, fn, context);
} }
EXPORT_SYMBOL(dm_kcopyd_zero); EXPORT_SYMBOL(dm_kcopyd_zero);
......
...@@ -326,9 +326,8 @@ static void recovery_complete(int read_err, unsigned long write_err, ...@@ -326,9 +326,8 @@ static void recovery_complete(int read_err, unsigned long write_err,
dm_rh_recovery_end(reg, !(read_err || write_err)); dm_rh_recovery_end(reg, !(read_err || write_err));
} }
static int recover(struct mirror_set *ms, struct dm_region *reg) static void recover(struct mirror_set *ms, struct dm_region *reg)
{ {
int r;
unsigned i; unsigned i;
struct dm_io_region from, to[DM_KCOPYD_MAX_REGIONS], *dest; struct dm_io_region from, to[DM_KCOPYD_MAX_REGIONS], *dest;
struct mirror *m; struct mirror *m;
...@@ -367,10 +366,8 @@ static int recover(struct mirror_set *ms, struct dm_region *reg) ...@@ -367,10 +366,8 @@ static int recover(struct mirror_set *ms, struct dm_region *reg)
if (!errors_handled(ms)) if (!errors_handled(ms))
set_bit(DM_KCOPYD_IGNORE_ERROR, &flags); set_bit(DM_KCOPYD_IGNORE_ERROR, &flags);
r = dm_kcopyd_copy(ms->kcopyd_client, &from, ms->nr_mirrors - 1, to, dm_kcopyd_copy(ms->kcopyd_client, &from, ms->nr_mirrors - 1, to,
flags, recovery_complete, reg); flags, recovery_complete, reg);
return r;
} }
static void reset_ms_flags(struct mirror_set *ms) static void reset_ms_flags(struct mirror_set *ms)
...@@ -388,7 +385,6 @@ static void do_recovery(struct mirror_set *ms) ...@@ -388,7 +385,6 @@ static void do_recovery(struct mirror_set *ms)
{ {
struct dm_region *reg; struct dm_region *reg;
struct dm_dirty_log *log = dm_rh_dirty_log(ms->rh); struct dm_dirty_log *log = dm_rh_dirty_log(ms->rh);
int r;
/* /*
* Start quiescing some regions. * Start quiescing some regions.
...@@ -398,11 +394,8 @@ static void do_recovery(struct mirror_set *ms) ...@@ -398,11 +394,8 @@ static void do_recovery(struct mirror_set *ms)
/* /*
* Copy any already quiesced regions. * Copy any already quiesced regions.
*/ */
while ((reg = dm_rh_recovery_start(ms->rh))) { while ((reg = dm_rh_recovery_start(ms->rh)))
r = recover(ms, reg); recover(ms, reg);
if (r)
dm_rh_recovery_end(reg, 0);
}
/* /*
* Update the in sync flag. * Update the in sync flag.
......
...@@ -85,7 +85,7 @@ struct dm_snapshot { ...@@ -85,7 +85,7 @@ struct dm_snapshot {
* A list of pending exceptions that completed out of order. * A list of pending exceptions that completed out of order.
* Protected by kcopyd single-threaded callback. * Protected by kcopyd single-threaded callback.
*/ */
struct list_head out_of_order_list; struct rb_root out_of_order_tree;
mempool_t pending_pool; mempool_t pending_pool;
...@@ -200,7 +200,7 @@ struct dm_snap_pending_exception { ...@@ -200,7 +200,7 @@ struct dm_snap_pending_exception {
/* A sequence number, it is used for in-order completion. */ /* A sequence number, it is used for in-order completion. */
sector_t exception_sequence; sector_t exception_sequence;
struct list_head out_of_order_entry; struct rb_node out_of_order_node;
/* /*
* For writing a complete chunk, bypassing the copy. * For writing a complete chunk, bypassing the copy.
...@@ -1173,7 +1173,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv) ...@@ -1173,7 +1173,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv)
atomic_set(&s->pending_exceptions_count, 0); atomic_set(&s->pending_exceptions_count, 0);
s->exception_start_sequence = 0; s->exception_start_sequence = 0;
s->exception_complete_sequence = 0; s->exception_complete_sequence = 0;
INIT_LIST_HEAD(&s->out_of_order_list); s->out_of_order_tree = RB_ROOT;
mutex_init(&s->lock); mutex_init(&s->lock);
INIT_LIST_HEAD(&s->list); INIT_LIST_HEAD(&s->list);
spin_lock_init(&s->pe_lock); spin_lock_init(&s->pe_lock);
...@@ -1539,28 +1539,41 @@ static void copy_callback(int read_err, unsigned long write_err, void *context) ...@@ -1539,28 +1539,41 @@ static void copy_callback(int read_err, unsigned long write_err, void *context)
pe->copy_error = read_err || write_err; pe->copy_error = read_err || write_err;
if (pe->exception_sequence == s->exception_complete_sequence) { if (pe->exception_sequence == s->exception_complete_sequence) {
struct rb_node *next;
s->exception_complete_sequence++; s->exception_complete_sequence++;
complete_exception(pe); complete_exception(pe);
while (!list_empty(&s->out_of_order_list)) { next = rb_first(&s->out_of_order_tree);
pe = list_entry(s->out_of_order_list.next, while (next) {
struct dm_snap_pending_exception, out_of_order_entry); pe = rb_entry(next, struct dm_snap_pending_exception,
out_of_order_node);
if (pe->exception_sequence != s->exception_complete_sequence) if (pe->exception_sequence != s->exception_complete_sequence)
break; break;
next = rb_next(next);
s->exception_complete_sequence++; s->exception_complete_sequence++;
list_del(&pe->out_of_order_entry); rb_erase(&pe->out_of_order_node, &s->out_of_order_tree);
complete_exception(pe); complete_exception(pe);
cond_resched();
} }
} else { } else {
struct list_head *lh; struct rb_node *parent = NULL;
struct rb_node **p = &s->out_of_order_tree.rb_node;
struct dm_snap_pending_exception *pe2; struct dm_snap_pending_exception *pe2;
list_for_each_prev(lh, &s->out_of_order_list) { while (*p) {
pe2 = list_entry(lh, struct dm_snap_pending_exception, out_of_order_entry); pe2 = rb_entry(*p, struct dm_snap_pending_exception, out_of_order_node);
if (pe2->exception_sequence < pe->exception_sequence) parent = *p;
break;
BUG_ON(pe->exception_sequence == pe2->exception_sequence);
if (pe->exception_sequence < pe2->exception_sequence)
p = &((*p)->rb_left);
else
p = &((*p)->rb_right);
} }
list_add(&pe->out_of_order_entry, lh);
rb_link_node(&pe->out_of_order_node, parent, p);
rb_insert_color(&pe->out_of_order_node, &s->out_of_order_tree);
} }
} }
...@@ -1694,8 +1707,6 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) ...@@ -1694,8 +1707,6 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio)
if (!s->valid) if (!s->valid)
return DM_MAPIO_KILL; return DM_MAPIO_KILL;
/* FIXME: should only take write lock if we need
* to copy an exception */
mutex_lock(&s->lock); mutex_lock(&s->lock);
if (!s->valid || (unlikely(s->snapshot_overflowed) && if (!s->valid || (unlikely(s->snapshot_overflowed) &&
......
...@@ -1220,18 +1220,13 @@ static struct dm_thin_new_mapping *get_next_mapping(struct pool *pool) ...@@ -1220,18 +1220,13 @@ static struct dm_thin_new_mapping *get_next_mapping(struct pool *pool)
static void ll_zero(struct thin_c *tc, struct dm_thin_new_mapping *m, static void ll_zero(struct thin_c *tc, struct dm_thin_new_mapping *m,
sector_t begin, sector_t end) sector_t begin, sector_t end)
{ {
int r;
struct dm_io_region to; struct dm_io_region to;
to.bdev = tc->pool_dev->bdev; to.bdev = tc->pool_dev->bdev;
to.sector = begin; to.sector = begin;
to.count = end - begin; to.count = end - begin;
r = dm_kcopyd_zero(tc->pool->copier, 1, &to, 0, copy_complete, m); dm_kcopyd_zero(tc->pool->copier, 1, &to, 0, copy_complete, m);
if (r < 0) {
DMERR_LIMIT("dm_kcopyd_zero() failed");
copy_complete(1, 1, m);
}
} }
static void remap_and_issue_overwrite(struct thin_c *tc, struct bio *bio, static void remap_and_issue_overwrite(struct thin_c *tc, struct bio *bio,
...@@ -1257,7 +1252,6 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block, ...@@ -1257,7 +1252,6 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
struct dm_bio_prison_cell *cell, struct bio *bio, struct dm_bio_prison_cell *cell, struct bio *bio,
sector_t len) sector_t len)
{ {
int r;
struct pool *pool = tc->pool; struct pool *pool = tc->pool;
struct dm_thin_new_mapping *m = get_next_mapping(pool); struct dm_thin_new_mapping *m = get_next_mapping(pool);
...@@ -1296,19 +1290,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block, ...@@ -1296,19 +1290,8 @@ static void schedule_copy(struct thin_c *tc, dm_block_t virt_block,
to.sector = data_dest * pool->sectors_per_block; to.sector = data_dest * pool->sectors_per_block;
to.count = len; to.count = len;
r = dm_kcopyd_copy(pool->copier, &from, 1, &to, dm_kcopyd_copy(pool->copier, &from, 1, &to,
0, copy_complete, m); 0, copy_complete, m);
if (r < 0) {
DMERR_LIMIT("dm_kcopyd_copy() failed");
copy_complete(1, 1, m);
/*
* We allow the zero to be issued, to simplify the
* error path. Otherwise we'd need to start
* worrying about decrementing the prepare_actions
* counter.
*/
}
/* /*
* Do we need to zero a tail region? * Do we need to zero a tail region?
...@@ -2520,6 +2503,8 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode) ...@@ -2520,6 +2503,8 @@ static void set_pool_mode(struct pool *pool, enum pool_mode new_mode)
case PM_WRITE: case PM_WRITE:
if (old_mode != new_mode) if (old_mode != new_mode)
notify_of_pool_mode_change(pool, "write"); notify_of_pool_mode_change(pool, "write");
if (old_mode == PM_OUT_OF_DATA_SPACE)
cancel_delayed_work_sync(&pool->no_space_timeout);
pool->out_of_data_space = false; pool->out_of_data_space = false;
pool->pf.error_if_no_space = pt->requested_pf.error_if_no_space; pool->pf.error_if_no_space = pt->requested_pf.error_if_no_space;
dm_pool_metadata_read_write(pool->pmd); dm_pool_metadata_read_write(pool->pmd);
...@@ -3890,6 +3875,8 @@ static void pool_status(struct dm_target *ti, status_type_t type, ...@@ -3890,6 +3875,8 @@ static void pool_status(struct dm_target *ti, status_type_t type,
else else
DMEMIT("- "); DMEMIT("- ");
DMEMIT("%llu ", (unsigned long long)calc_metadata_threshold(pt));
break; break;
case STATUSTYPE_TABLE: case STATUSTYPE_TABLE:
...@@ -3979,7 +3966,7 @@ static struct target_type pool_target = { ...@@ -3979,7 +3966,7 @@ static struct target_type pool_target = {
.name = "thin-pool", .name = "thin-pool",
.features = DM_TARGET_SINGLETON | DM_TARGET_ALWAYS_WRITEABLE | .features = DM_TARGET_SINGLETON | DM_TARGET_ALWAYS_WRITEABLE |
DM_TARGET_IMMUTABLE, DM_TARGET_IMMUTABLE,
.version = {1, 19, 0}, .version = {1, 20, 0},
.module = THIS_MODULE, .module = THIS_MODULE,
.ctr = pool_ctr, .ctr = pool_ctr,
.dtr = pool_dtr, .dtr = pool_dtr,
...@@ -4353,7 +4340,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits) ...@@ -4353,7 +4340,7 @@ static void thin_io_hints(struct dm_target *ti, struct queue_limits *limits)
static struct target_type thin_target = { static struct target_type thin_target = {
.name = "thin", .name = "thin",
.version = {1, 19, 0}, .version = {1, 20, 0},
.module = THIS_MODULE, .module = THIS_MODULE,
.ctr = thin_ctr, .ctr = thin_ctr,
.dtr = thin_dtr, .dtr = thin_dtr,
......
...@@ -457,7 +457,7 @@ static void ssd_commit_flushed(struct dm_writecache *wc) ...@@ -457,7 +457,7 @@ static void ssd_commit_flushed(struct dm_writecache *wc)
COMPLETION_INITIALIZER_ONSTACK(endio.c), COMPLETION_INITIALIZER_ONSTACK(endio.c),
ATOMIC_INIT(1), ATOMIC_INIT(1),
}; };
unsigned bitmap_bits = wc->dirty_bitmap_size * BITS_PER_LONG; unsigned bitmap_bits = wc->dirty_bitmap_size * 8;
unsigned i = 0; unsigned i = 0;
while (1) { while (1) {
...@@ -2240,6 +2240,8 @@ static void writecache_status(struct dm_target *ti, status_type_t type, ...@@ -2240,6 +2240,8 @@ static void writecache_status(struct dm_target *ti, status_type_t type,
DMEMIT("%c %s %s %u ", WC_MODE_PMEM(wc) ? 'p' : 's', DMEMIT("%c %s %s %u ", WC_MODE_PMEM(wc) ? 'p' : 's',
wc->dev->name, wc->ssd_dev->name, wc->block_size); wc->dev->name, wc->ssd_dev->name, wc->block_size);
extra_args = 0; extra_args = 0;
if (wc->start_sector)
extra_args += 2;
if (wc->high_wm_percent_set) if (wc->high_wm_percent_set)
extra_args += 2; extra_args += 2;
if (wc->low_wm_percent_set) if (wc->low_wm_percent_set)
...@@ -2254,6 +2256,8 @@ static void writecache_status(struct dm_target *ti, status_type_t type, ...@@ -2254,6 +2256,8 @@ static void writecache_status(struct dm_target *ti, status_type_t type,
extra_args++; extra_args++;
DMEMIT("%u", extra_args); DMEMIT("%u", extra_args);
if (wc->start_sector)
DMEMIT(" start_sector %llu", (unsigned long long)wc->start_sector);
if (wc->high_wm_percent_set) { if (wc->high_wm_percent_set) {
x = (uint64_t)wc->freelist_high_watermark * 100; x = (uint64_t)wc->freelist_high_watermark * 100;
x += wc->n_blocks / 2; x += wc->n_blocks / 2;
...@@ -2280,7 +2284,7 @@ static void writecache_status(struct dm_target *ti, status_type_t type, ...@@ -2280,7 +2284,7 @@ static void writecache_status(struct dm_target *ti, status_type_t type,
static struct target_type writecache_target = { static struct target_type writecache_target = {
.name = "writecache", .name = "writecache",
.version = {1, 1, 0}, .version = {1, 1, 1},
.module = THIS_MODULE, .module = THIS_MODULE,
.ctr = writecache_ctr, .ctr = writecache_ctr,
.dtr = writecache_dtr, .dtr = writecache_dtr,
......
...@@ -161,10 +161,8 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc, ...@@ -161,10 +161,8 @@ static int dmz_reclaim_copy(struct dmz_reclaim *zrc,
/* Copy the valid region */ /* Copy the valid region */
set_bit(DMZ_RECLAIM_KCOPY, &zrc->flags); set_bit(DMZ_RECLAIM_KCOPY, &zrc->flags);
ret = dm_kcopyd_copy(zrc->kc, &src, 1, &dst, flags, dm_kcopyd_copy(zrc->kc, &src, 1, &dst, flags,
dmz_reclaim_kcopy_end, zrc); dmz_reclaim_kcopy_end, zrc);
if (ret)
return ret;
/* Wait for copy to complete */ /* Wait for copy to complete */
wait_on_bit_io(&zrc->flags, DMZ_RECLAIM_KCOPY, wait_on_bit_io(&zrc->flags, DMZ_RECLAIM_KCOPY,
......
...@@ -62,7 +62,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc); ...@@ -62,7 +62,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc);
typedef void (*dm_kcopyd_notify_fn)(int read_err, unsigned long write_err, typedef void (*dm_kcopyd_notify_fn)(int read_err, unsigned long write_err,
void *context); void *context);
int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, void dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
unsigned num_dests, struct dm_io_region *dests, unsigned num_dests, struct dm_io_region *dests,
unsigned flags, dm_kcopyd_notify_fn fn, void *context); unsigned flags, dm_kcopyd_notify_fn fn, void *context);
...@@ -81,7 +81,7 @@ void *dm_kcopyd_prepare_callback(struct dm_kcopyd_client *kc, ...@@ -81,7 +81,7 @@ void *dm_kcopyd_prepare_callback(struct dm_kcopyd_client *kc,
dm_kcopyd_notify_fn fn, void *context); dm_kcopyd_notify_fn fn, void *context);
void dm_kcopyd_do_callback(void *job, int read_err, unsigned long write_err); void dm_kcopyd_do_callback(void *job, int read_err, unsigned long write_err);
int dm_kcopyd_zero(struct dm_kcopyd_client *kc, void dm_kcopyd_zero(struct dm_kcopyd_client *kc,
unsigned num_dests, struct dm_io_region *dests, unsigned num_dests, struct dm_io_region *dests,
unsigned flags, dm_kcopyd_notify_fn fn, void *context); unsigned flags, dm_kcopyd_notify_fn fn, void *context);
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment