Commit 7f2dc5c4 authored by Linus Torvalds's avatar Linus Torvalds

Merge tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper changes from Mike Snitzer:
 "A set of device-mapper changes for 3.13.

  Improve reliability of buffer allocations for dm messages with a small
  number of arguments, a couple path group initialization fixes for dm
  multipath, a fix for resizing a dm array, various fixes and
  optimizations for dm cache, a fix for device mapper's Kconfig menu
  indentation.

  Features added include:
   - dm crypt support for activating legacy CBC TrueCrypt containers
     (useful for forensics of these old TCRYPT containers)
   - reduced dm-cache memory requirements for each block in the cache
   - basic support for shrinking a dm-cache's cache (fast) device
   - most notably, dm-cache support for managing cache coherency when
     deploying dm-cache with sophisticated origin volumes (that support
     hardware snapshots and/or clustering): these changes come in the
     form of a new passthrough operation mode and a cache block
     invalidation interface"

* tag 'dm-3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (32 commits)
  dm cache: resolve small nits and improve Documentation
  dm cache: add cache block invalidation support
  dm cache: add remove_cblock method to policy interface
  dm cache policy mq: reduce memory requirements
  dm cache metadata: check the metadata version when reading the superblock
  dm cache: add passthrough mode
  dm cache: cache shrinking support
  dm cache: promotion optimisation for writes
  dm cache: be much more aggressive about promoting writes to discarded blocks
  dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty()
  dm cache: optimize commit_if_needed
  dm space map disk: optimise sm_disk_dec_block
  MAINTAINERS: add reference to device-mapper's linux-dm.git tree
  dm: fix Kconfig menu indentation
  dm: allow remove to be deferred
  dm table: print error on preresume failure
  dm crypt: add TCW IV mode for old CBC TCRYPT containers
  dm crypt: properly handle extra key string in initialization
  dm cache: log error message if dm_kcopyd_copy() fails
  dm cache: use cell_defer() boolean argument consistently
  ...
parents 82cb6ace 7b6b2bc9
...@@ -30,8 +30,10 @@ multiqueue ...@@ -30,8 +30,10 @@ multiqueue
This policy is the default. This policy is the default.
The multiqueue policy has two sets of 16 queues: one set for entries The multiqueue policy has three sets of 16 queues: one set for entries
waiting for the cache and another one for those in the cache. waiting for the cache and another two for those in the cache (a set for
clean entries and a set for dirty entries).
Cache entries in the queues are aged based on logical time. Entry into Cache entries in the queues are aged based on logical time. Entry into
the cache is based on variable thresholds and queue selection is based the cache is based on variable thresholds and queue selection is based
on hit count on entry. The policy aims to take different cache miss on hit count on entry. The policy aims to take different cache miss
......
...@@ -68,10 +68,11 @@ So large block sizes are bad because they waste cache space. And small ...@@ -68,10 +68,11 @@ So large block sizes are bad because they waste cache space. And small
block sizes are bad because they increase the amount of metadata (both block sizes are bad because they increase the amount of metadata (both
in core and on disk). in core and on disk).
Writeback/writethrough Cache operating modes
---------------------- ---------------------
The cache has two modes, writeback and writethrough. The cache has three operating modes: writeback, writethrough and
passthrough.
If writeback, the default, is selected then a write to a block that is If writeback, the default, is selected then a write to a block that is
cached will go only to the cache and the block will be marked dirty in cached will go only to the cache and the block will be marked dirty in
...@@ -81,8 +82,31 @@ If writethrough is selected then a write to a cached block will not ...@@ -81,8 +82,31 @@ If writethrough is selected then a write to a cached block will not
complete until it has hit both the origin and cache devices. Clean complete until it has hit both the origin and cache devices. Clean
blocks should remain clean. blocks should remain clean.
If passthrough is selected, useful when the cache contents are not known
to be coherent with the origin device, then all reads are served from
the origin device (all reads miss the cache) and all writes are
forwarded to the origin device; additionally, write hits cause cache
block invalidates. To enable passthrough mode the cache must be clean.
Passthrough mode allows a cache device to be activated without having to
worry about coherency. Coherency that exists is maintained, although
the cache will gradually cool as writes take place. If the coherency of
the cache can later be verified, or established through use of the
"invalidate_cblocks" message, the cache device can be transitioned to
writethrough or writeback mode while still warm. Otherwise, the cache
contents can be discarded prior to transitioning to the desired
operating mode.
A simple cleaner policy is provided, which will clean (write back) all A simple cleaner policy is provided, which will clean (write back) all
dirty blocks in a cache. Useful for decommissioning a cache. dirty blocks in a cache. Useful for decommissioning a cache or when
shrinking a cache. Shrinking the cache's fast device requires all cache
blocks, in the area of the cache being removed, to be clean. If the
area being removed from the cache still contains dirty blocks the resize
will fail. Care must be taken to never reduce the volume used for the
cache's fast device until the cache is clean. This is of particular
importance if writeback mode is used. Writethrough and passthrough
modes already maintain a clean cache. Future support to partially clean
the cache, above a specified threshold, will allow for keeping the cache
warm and in writeback mode during resize.
Migration throttling Migration throttling
-------------------- --------------------
...@@ -161,7 +185,7 @@ Constructor ...@@ -161,7 +185,7 @@ Constructor
block size : cache unit size in sectors block size : cache unit size in sectors
#feature args : number of feature arguments passed #feature args : number of feature arguments passed
feature args : writethrough. (The default is writeback.) feature args : writethrough or passthrough (The default is writeback.)
policy : the replacement policy to use policy : the replacement policy to use
#policy args : an even number of arguments corresponding to #policy args : an even number of arguments corresponding to
...@@ -177,6 +201,13 @@ Optional feature arguments are: ...@@ -177,6 +201,13 @@ Optional feature arguments are:
back cache block contents later for performance reasons, back cache block contents later for performance reasons,
so they may differ from the corresponding origin blocks. so they may differ from the corresponding origin blocks.
passthrough : a degraded mode useful for various cache coherency
situations (e.g., rolling back snapshots of
underlying storage). Reads and writes always go to
the origin. If a write goes to a cached origin
block, then the cache block is invalidated.
To enable passthrough mode the cache must be clean.
A policy called 'default' is always registered. This is an alias for A policy called 'default' is always registered. This is an alias for
the policy we currently think is giving best all round performance. the policy we currently think is giving best all round performance.
...@@ -231,12 +262,26 @@ The message format is: ...@@ -231,12 +262,26 @@ The message format is:
E.g. E.g.
dmsetup message my_cache 0 sequential_threshold 1024 dmsetup message my_cache 0 sequential_threshold 1024
Invalidation is removing an entry from the cache without writing it
back. Cache blocks can be invalidated via the invalidate_cblocks
message, which takes an arbitrary number of cblock ranges. Each cblock
must be expressed as a decimal value, in the future a variant message
that takes cblock ranges expressed in hexidecimal may be needed to
better support efficient invalidation of larger caches. The cache must
be in passthrough mode when invalidate_cblocks is used.
invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
E.g.
dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
Examples Examples
======== ========
The test suite can be found here: The test suite can be found here:
https://github.com/jthornber/thinp-test-suite https://github.com/jthornber/device-mapper-test-suite
dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
/dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0' /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
......
...@@ -4,12 +4,15 @@ dm-crypt ...@@ -4,12 +4,15 @@ dm-crypt
Device-Mapper's "crypt" target provides transparent encryption of block devices Device-Mapper's "crypt" target provides transparent encryption of block devices
using the kernel crypto API. using the kernel crypto API.
For a more detailed description of supported parameters see:
http://code.google.com/p/cryptsetup/wiki/DMCrypt
Parameters: <cipher> <key> <iv_offset> <device path> \ Parameters: <cipher> <key> <iv_offset> <device path> \
<offset> [<#opt_params> <opt_params>] <offset> [<#opt_params> <opt_params>]
<cipher> <cipher>
Encryption cipher and an optional IV generation mode. Encryption cipher and an optional IV generation mode.
(In format cipher[:keycount]-chainmode-ivopts:ivmode). (In format cipher[:keycount]-chainmode-ivmode[:ivopts]).
Examples: Examples:
des des
aes-cbc-essiv:sha256 aes-cbc-essiv:sha256
...@@ -19,7 +22,11 @@ Parameters: <cipher> <key> <iv_offset> <device path> \ ...@@ -19,7 +22,11 @@ Parameters: <cipher> <key> <iv_offset> <device path> \
<key> <key>
Key used for encryption. It is encoded as a hexadecimal number. Key used for encryption. It is encoded as a hexadecimal number.
You can only use key sizes that are valid for the selected cipher. You can only use key sizes that are valid for the selected cipher
in combination with the selected iv mode.
Note that for some iv modes the key string can contain additional
keys (for example IV seed) so the key contains more parts concatenated
into a single string.
<keycount> <keycount>
Multi-key compatibility mode. You can define <keycount> keys and Multi-key compatibility mode. You can define <keycount> keys and
......
...@@ -2647,6 +2647,7 @@ M: dm-devel@redhat.com ...@@ -2647,6 +2647,7 @@ M: dm-devel@redhat.com
L: dm-devel@redhat.com L: dm-devel@redhat.com
W: http://sources.redhat.com/dm W: http://sources.redhat.com/dm
Q: http://patchwork.kernel.org/project/dm-devel/list/ Q: http://patchwork.kernel.org/project/dm-devel/list/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git
T: quilt http://people.redhat.com/agk/patches/linux/editing/ T: quilt http://people.redhat.com/agk/patches/linux/editing/
S: Maintained S: Maintained
F: Documentation/device-mapper/ F: Documentation/device-mapper/
......
...@@ -297,6 +297,17 @@ config DM_MIRROR ...@@ -297,6 +297,17 @@ config DM_MIRROR
Allow volume managers to mirror logical volumes, also Allow volume managers to mirror logical volumes, also
needed for live data migration tools such as 'pvmove'. needed for live data migration tools such as 'pvmove'.
config DM_LOG_USERSPACE
tristate "Mirror userspace logging"
depends on DM_MIRROR && NET
select CONNECTOR
---help---
The userspace logging module provides a mechanism for
relaying the dm-dirty-log API to userspace. Log designs
which are more suited to userspace implementation (e.g.
shared storage logs) or experimental logs can be implemented
by leveraging this framework.
config DM_RAID config DM_RAID
tristate "RAID 1/4/5/6/10 target" tristate "RAID 1/4/5/6/10 target"
depends on BLK_DEV_DM depends on BLK_DEV_DM
...@@ -323,17 +334,6 @@ config DM_RAID ...@@ -323,17 +334,6 @@ config DM_RAID
RAID-5, RAID-6 distributes the syndromes across the drives RAID-5, RAID-6 distributes the syndromes across the drives
in one of the available parity distribution methods. in one of the available parity distribution methods.
config DM_LOG_USERSPACE
tristate "Mirror userspace logging"
depends on DM_MIRROR && NET
select CONNECTOR
---help---
The userspace logging module provides a mechanism for
relaying the dm-dirty-log API to userspace. Log designs
which are more suited to userspace implementation (e.g.
shared storage logs) or experimental logs can be implemented
by leveraging this framework.
config DM_ZERO config DM_ZERO
tristate "Zero target" tristate "Zero target"
depends on BLK_DEV_DM depends on BLK_DEV_DM
......
...@@ -20,7 +20,13 @@ ...@@ -20,7 +20,13 @@
#define CACHE_SUPERBLOCK_MAGIC 06142003 #define CACHE_SUPERBLOCK_MAGIC 06142003
#define CACHE_SUPERBLOCK_LOCATION 0 #define CACHE_SUPERBLOCK_LOCATION 0
#define CACHE_VERSION 1
/*
* defines a range of metadata versions that this module can handle.
*/
#define MIN_CACHE_VERSION 1
#define MAX_CACHE_VERSION 1
#define CACHE_METADATA_CACHE_SIZE 64 #define CACHE_METADATA_CACHE_SIZE 64
/* /*
...@@ -134,6 +140,18 @@ static void sb_prepare_for_write(struct dm_block_validator *v, ...@@ -134,6 +140,18 @@ static void sb_prepare_for_write(struct dm_block_validator *v,
SUPERBLOCK_CSUM_XOR)); SUPERBLOCK_CSUM_XOR));
} }
static int check_metadata_version(struct cache_disk_superblock *disk_super)
{
uint32_t metadata_version = le32_to_cpu(disk_super->version);
if (metadata_version < MIN_CACHE_VERSION || metadata_version > MAX_CACHE_VERSION) {
DMERR("Cache metadata version %u found, but only versions between %u and %u supported.",
metadata_version, MIN_CACHE_VERSION, MAX_CACHE_VERSION);
return -EINVAL;
}
return 0;
}
static int sb_check(struct dm_block_validator *v, static int sb_check(struct dm_block_validator *v,
struct dm_block *b, struct dm_block *b,
size_t sb_block_size) size_t sb_block_size)
...@@ -164,7 +182,7 @@ static int sb_check(struct dm_block_validator *v, ...@@ -164,7 +182,7 @@ static int sb_check(struct dm_block_validator *v,
return -EILSEQ; return -EILSEQ;
} }
return 0; return check_metadata_version(disk_super);
} }
static struct dm_block_validator sb_validator = { static struct dm_block_validator sb_validator = {
...@@ -198,7 +216,7 @@ static int superblock_lock(struct dm_cache_metadata *cmd, ...@@ -198,7 +216,7 @@ static int superblock_lock(struct dm_cache_metadata *cmd,
/*----------------------------------------------------------------*/ /*----------------------------------------------------------------*/
static int __superblock_all_zeroes(struct dm_block_manager *bm, int *result) static int __superblock_all_zeroes(struct dm_block_manager *bm, bool *result)
{ {
int r; int r;
unsigned i; unsigned i;
...@@ -214,10 +232,10 @@ static int __superblock_all_zeroes(struct dm_block_manager *bm, int *result) ...@@ -214,10 +232,10 @@ static int __superblock_all_zeroes(struct dm_block_manager *bm, int *result)
return r; return r;
data_le = dm_block_data(b); data_le = dm_block_data(b);
*result = 1; *result = true;
for (i = 0; i < sb_block_size; i++) { for (i = 0; i < sb_block_size; i++) {
if (data_le[i] != zero) { if (data_le[i] != zero) {
*result = 0; *result = false;
break; break;
} }
} }
...@@ -270,7 +288,7 @@ static int __write_initial_superblock(struct dm_cache_metadata *cmd) ...@@ -270,7 +288,7 @@ static int __write_initial_superblock(struct dm_cache_metadata *cmd)
disk_super->flags = 0; disk_super->flags = 0;
memset(disk_super->uuid, 0, sizeof(disk_super->uuid)); memset(disk_super->uuid, 0, sizeof(disk_super->uuid));
disk_super->magic = cpu_to_le64(CACHE_SUPERBLOCK_MAGIC); disk_super->magic = cpu_to_le64(CACHE_SUPERBLOCK_MAGIC);
disk_super->version = cpu_to_le32(CACHE_VERSION); disk_super->version = cpu_to_le32(MAX_CACHE_VERSION);
memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name)); memset(disk_super->policy_name, 0, sizeof(disk_super->policy_name));
memset(disk_super->policy_version, 0, sizeof(disk_super->policy_version)); memset(disk_super->policy_version, 0, sizeof(disk_super->policy_version));
disk_super->policy_hint_size = 0; disk_super->policy_hint_size = 0;
...@@ -411,7 +429,8 @@ static int __open_metadata(struct dm_cache_metadata *cmd) ...@@ -411,7 +429,8 @@ static int __open_metadata(struct dm_cache_metadata *cmd)
static int __open_or_format_metadata(struct dm_cache_metadata *cmd, static int __open_or_format_metadata(struct dm_cache_metadata *cmd,
bool format_device) bool format_device)
{ {
int r, unformatted; int r;
bool unformatted = false;
r = __superblock_all_zeroes(cmd->bm, &unformatted); r = __superblock_all_zeroes(cmd->bm, &unformatted);
if (r) if (r)
...@@ -666,19 +685,85 @@ void dm_cache_metadata_close(struct dm_cache_metadata *cmd) ...@@ -666,19 +685,85 @@ void dm_cache_metadata_close(struct dm_cache_metadata *cmd)
kfree(cmd); kfree(cmd);
} }
/*
* Checks that the given cache block is either unmapped or clean.
*/
static int block_unmapped_or_clean(struct dm_cache_metadata *cmd, dm_cblock_t b,
bool *result)
{
int r;
__le64 value;
dm_oblock_t ob;
unsigned flags;
r = dm_array_get_value(&cmd->info, cmd->root, from_cblock(b), &value);
if (r) {
DMERR("block_unmapped_or_clean failed");
return r;
}
unpack_value(value, &ob, &flags);
*result = !((flags & M_VALID) && (flags & M_DIRTY));
return 0;
}
static int blocks_are_unmapped_or_clean(struct dm_cache_metadata *cmd,
dm_cblock_t begin, dm_cblock_t end,
bool *result)
{
int r;
*result = true;
while (begin != end) {
r = block_unmapped_or_clean(cmd, begin, result);
if (r)
return r;
if (!*result) {
DMERR("cache block %llu is dirty",
(unsigned long long) from_cblock(begin));
return 0;
}
begin = to_cblock(from_cblock(begin) + 1);
}
return 0;
}
int dm_cache_resize(struct dm_cache_metadata *cmd, dm_cblock_t new_cache_size) int dm_cache_resize(struct dm_cache_metadata *cmd, dm_cblock_t new_cache_size)
{ {
int r; int r;
bool clean;
__le64 null_mapping = pack_value(0, 0); __le64 null_mapping = pack_value(0, 0);
down_write(&cmd->root_lock); down_write(&cmd->root_lock);
__dm_bless_for_disk(&null_mapping); __dm_bless_for_disk(&null_mapping);
if (from_cblock(new_cache_size) < from_cblock(cmd->cache_blocks)) {
r = blocks_are_unmapped_or_clean(cmd, new_cache_size, cmd->cache_blocks, &clean);
if (r) {
__dm_unbless_for_disk(&null_mapping);
goto out;
}
if (!clean) {
DMERR("unable to shrink cache due to dirty blocks");
r = -EINVAL;
__dm_unbless_for_disk(&null_mapping);
goto out;
}
}
r = dm_array_resize(&cmd->info, cmd->root, from_cblock(cmd->cache_blocks), r = dm_array_resize(&cmd->info, cmd->root, from_cblock(cmd->cache_blocks),
from_cblock(new_cache_size), from_cblock(new_cache_size),
&null_mapping, &cmd->root); &null_mapping, &cmd->root);
if (!r) if (!r)
cmd->cache_blocks = new_cache_size; cmd->cache_blocks = new_cache_size;
cmd->changed = true; cmd->changed = true;
out:
up_write(&cmd->root_lock); up_write(&cmd->root_lock);
return r; return r;
...@@ -1182,3 +1267,8 @@ int dm_cache_save_hint(struct dm_cache_metadata *cmd, dm_cblock_t cblock, ...@@ -1182,3 +1267,8 @@ int dm_cache_save_hint(struct dm_cache_metadata *cmd, dm_cblock_t cblock,
return r; return r;
} }
int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result)
{
return blocks_are_unmapped_or_clean(cmd, 0, cmd->cache_blocks, result);
}
...@@ -137,6 +137,11 @@ int dm_cache_begin_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy * ...@@ -137,6 +137,11 @@ int dm_cache_begin_hints(struct dm_cache_metadata *cmd, struct dm_cache_policy *
int dm_cache_save_hint(struct dm_cache_metadata *cmd, int dm_cache_save_hint(struct dm_cache_metadata *cmd,
dm_cblock_t cblock, uint32_t hint); dm_cblock_t cblock, uint32_t hint);
/*
* Query method. Are all the blocks in the cache clean?
*/
int dm_cache_metadata_all_clean(struct dm_cache_metadata *cmd, bool *result);
/*----------------------------------------------------------------*/ /*----------------------------------------------------------------*/
#endif /* DM_CACHE_METADATA_H */ #endif /* DM_CACHE_METADATA_H */
...@@ -61,7 +61,12 @@ static inline int policy_writeback_work(struct dm_cache_policy *p, ...@@ -61,7 +61,12 @@ static inline int policy_writeback_work(struct dm_cache_policy *p,
static inline void policy_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock) static inline void policy_remove_mapping(struct dm_cache_policy *p, dm_oblock_t oblock)
{ {
return p->remove_mapping(p, oblock); p->remove_mapping(p, oblock);
}
static inline int policy_remove_cblock(struct dm_cache_policy *p, dm_cblock_t cblock)
{
return p->remove_cblock(p, cblock);
} }
static inline void policy_force_mapping(struct dm_cache_policy *p, static inline void policy_force_mapping(struct dm_cache_policy *p,
......
This diff is collapsed.
...@@ -119,13 +119,13 @@ struct dm_cache_policy *dm_cache_policy_create(const char *name, ...@@ -119,13 +119,13 @@ struct dm_cache_policy *dm_cache_policy_create(const char *name,
type = get_policy(name); type = get_policy(name);
if (!type) { if (!type) {
DMWARN("unknown policy type"); DMWARN("unknown policy type");
return NULL; return ERR_PTR(-EINVAL);
} }
p = type->create(cache_size, origin_size, cache_block_size); p = type->create(cache_size, origin_size, cache_block_size);
if (!p) { if (!p) {
put_policy(type); put_policy(type);
return NULL; return ERR_PTR(-ENOMEM);
} }
p->private = type; p->private = type;
......
...@@ -135,9 +135,6 @@ struct dm_cache_policy { ...@@ -135,9 +135,6 @@ struct dm_cache_policy {
*/ */
int (*lookup)(struct dm_cache_policy *p, dm_oblock_t oblock, dm_cblock_t *cblock); int (*lookup)(struct dm_cache_policy *p, dm_oblock_t oblock, dm_cblock_t *cblock);
/*
* oblock must be a mapped block. Must not block.
*/
void (*set_dirty)(struct dm_cache_policy *p, dm_oblock_t oblock); void (*set_dirty)(struct dm_cache_policy *p, dm_oblock_t oblock);
void (*clear_dirty)(struct dm_cache_policy *p, dm_oblock_t oblock); void (*clear_dirty)(struct dm_cache_policy *p, dm_oblock_t oblock);
...@@ -159,8 +156,24 @@ struct dm_cache_policy { ...@@ -159,8 +156,24 @@ struct dm_cache_policy {
void (*force_mapping)(struct dm_cache_policy *p, dm_oblock_t current_oblock, void (*force_mapping)(struct dm_cache_policy *p, dm_oblock_t current_oblock,
dm_oblock_t new_oblock); dm_oblock_t new_oblock);
int (*writeback_work)(struct dm_cache_policy *p, dm_oblock_t *oblock, dm_cblock_t *cblock); /*
* This is called via the invalidate_cblocks message. It is
* possible the particular cblock has already been removed due to a
* write io in passthrough mode. In which case this should return
* -ENODATA.
*/
int (*remove_cblock)(struct dm_cache_policy *p, dm_cblock_t cblock);
/*
* Provide a dirty block to be written back by the core target.
*
* Returns:
*
* 0 and @cblock,@oblock: block to write back provided
*
* -ENODATA: no dirty blocks available
*/
int (*writeback_work)(struct dm_cache_policy *p, dm_oblock_t *oblock, dm_cblock_t *cblock);
/* /*
* How full is the cache? * How full is the cache?
......
This diff is collapsed.
...@@ -2,6 +2,7 @@ ...@@ -2,6 +2,7 @@
* Copyright (C) 2003 Christophe Saout <christophe@saout.de> * Copyright (C) 2003 Christophe Saout <christophe@saout.de>
* Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org> * Copyright (C) 2004 Clemens Fruhwirth <clemens@endorphin.org>
* Copyright (C) 2006-2009 Red Hat, Inc. All rights reserved. * Copyright (C) 2006-2009 Red Hat, Inc. All rights reserved.
* Copyright (C) 2013 Milan Broz <gmazyland@gmail.com>
* *
* This file is released under the GPL. * This file is released under the GPL.
*/ */
...@@ -98,6 +99,13 @@ struct iv_lmk_private { ...@@ -98,6 +99,13 @@ struct iv_lmk_private {
u8 *seed; u8 *seed;
}; };
#define TCW_WHITENING_SIZE 16
struct iv_tcw_private {
struct crypto_shash *crc32_tfm;
u8 *iv_seed;
u8 *whitening;
};
/* /*
* Crypt: maps a linear range of a block device * Crypt: maps a linear range of a block device
* and encrypts / decrypts at the same time. * and encrypts / decrypts at the same time.
...@@ -139,6 +147,7 @@ struct crypt_config { ...@@ -139,6 +147,7 @@ struct crypt_config {
struct iv_essiv_private essiv; struct iv_essiv_private essiv;
struct iv_benbi_private benbi; struct iv_benbi_private benbi;
struct iv_lmk_private lmk; struct iv_lmk_private lmk;
struct iv_tcw_private tcw;
} iv_gen_private; } iv_gen_private;
sector_t iv_offset; sector_t iv_offset;
unsigned int iv_size; unsigned int iv_size;
...@@ -171,7 +180,8 @@ struct crypt_config { ...@@ -171,7 +180,8 @@ struct crypt_config {
unsigned long flags; unsigned long flags;
unsigned int key_size; unsigned int key_size;
unsigned int key_parts; unsigned int key_parts; /* independent parts in key buffer */
unsigned int key_extra_size; /* additional keys length */
u8 key[0]; u8 key[0];
}; };
...@@ -230,6 +240,16 @@ static struct crypto_ablkcipher *any_tfm(struct crypt_config *cc) ...@@ -230,6 +240,16 @@ static struct crypto_ablkcipher *any_tfm(struct crypt_config *cc)
* version 3: the same as version 2 with additional IV seed * version 3: the same as version 2 with additional IV seed
* (it uses 65 keys, last key is used as IV seed) * (it uses 65 keys, last key is used as IV seed)
* *
* tcw: Compatible implementation of the block chaining mode used
* by the TrueCrypt device encryption system (prior to version 4.1).
* For more info see: http://www.truecrypt.org
* It operates on full 512 byte sectors and uses CBC
* with an IV derived from initial key and the sector number.
* In addition, whitening value is applied on every sector, whitening
* is calculated from initial key, sector number and mixed using CRC32.
* Note that this encryption scheme is vulnerable to watermarking attacks
* and should be used for old compatible containers access only.
*
* plumb: unimplemented, see: * plumb: unimplemented, see:
* http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/454 * http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/454
*/ */
...@@ -530,7 +550,7 @@ static int crypt_iv_lmk_one(struct crypt_config *cc, u8 *iv, ...@@ -530,7 +550,7 @@ static int crypt_iv_lmk_one(struct crypt_config *cc, u8 *iv,
char ctx[crypto_shash_descsize(lmk->hash_tfm)]; char ctx[crypto_shash_descsize(lmk->hash_tfm)];
} sdesc; } sdesc;
struct md5_state md5state; struct md5_state md5state;
u32 buf[4]; __le32 buf[4];
int i, r; int i, r;
sdesc.desc.tfm = lmk->hash_tfm; sdesc.desc.tfm = lmk->hash_tfm;
...@@ -608,6 +628,153 @@ static int crypt_iv_lmk_post(struct crypt_config *cc, u8 *iv, ...@@ -608,6 +628,153 @@ static int crypt_iv_lmk_post(struct crypt_config *cc, u8 *iv,
return r; return r;
} }
static void crypt_iv_tcw_dtr(struct crypt_config *cc)
{
struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw;
kzfree(tcw->iv_seed);
tcw->iv_seed = NULL;
kzfree(tcw->whitening);
tcw->whitening = NULL;
if (tcw->crc32_tfm && !IS_ERR(tcw->crc32_tfm))
crypto_free_shash(tcw->crc32_tfm);
tcw->crc32_tfm = NULL;
}
static int crypt_iv_tcw_ctr(struct crypt_config *cc, struct dm_target *ti,
const char *opts)
{
struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw;
if (cc->key_size <= (cc->iv_size + TCW_WHITENING_SIZE)) {
ti->error = "Wrong key size for TCW";
return -EINVAL;
}
tcw->crc32_tfm = crypto_alloc_shash("crc32", 0, 0);
if (IS_ERR(tcw->crc32_tfm)) {
ti->error = "Error initializing CRC32 in TCW";
return PTR_ERR(tcw->crc32_tfm);
}
tcw->iv_seed = kzalloc(cc->iv_size, GFP_KERNEL);
tcw->whitening = kzalloc(TCW_WHITENING_SIZE, GFP_KERNEL);
if (!tcw->iv_seed || !tcw->whitening) {
crypt_iv_tcw_dtr(cc);
ti->error = "Error allocating seed storage in TCW";
return -ENOMEM;
}
return 0;
}
static int crypt_iv_tcw_init(struct crypt_config *cc)
{
struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw;
int key_offset = cc->key_size - cc->iv_size - TCW_WHITENING_SIZE;
memcpy(tcw->iv_seed, &cc->key[key_offset], cc->iv_size);
memcpy(tcw->whitening, &cc->key[key_offset + cc->iv_size],
TCW_WHITENING_SIZE);
return 0;
}
static int crypt_iv_tcw_wipe(struct crypt_config *cc)
{
struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw;
memset(tcw->iv_seed, 0, cc->iv_size);
memset(tcw->whitening, 0, TCW_WHITENING_SIZE);
return 0;
}
static int crypt_iv_tcw_whitening(struct crypt_config *cc,
struct dm_crypt_request *dmreq,
u8 *data)
{
struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw;
u64 sector = cpu_to_le64((u64)dmreq->iv_sector);
u8 buf[TCW_WHITENING_SIZE];
struct {
struct shash_desc desc;
char ctx[crypto_shash_descsize(tcw->crc32_tfm)];
} sdesc;
int i, r;
/* xor whitening with sector number */
memcpy(buf, tcw->whitening, TCW_WHITENING_SIZE);
crypto_xor(buf, (u8 *)&sector, 8);
crypto_xor(&buf[8], (u8 *)&sector, 8);
/* calculate crc32 for every 32bit part and xor it */
sdesc.desc.tfm = tcw->crc32_tfm;
sdesc.desc.flags = CRYPTO_TFM_REQ_MAY_SLEEP;
for (i = 0; i < 4; i++) {
r = crypto_shash_init(&sdesc.desc);
if (r)
goto out;
r = crypto_shash_update(&sdesc.desc, &buf[i * 4], 4);
if (r)
goto out;
r = crypto_shash_final(&sdesc.desc, &buf[i * 4]);
if (r)
goto out;
}
crypto_xor(&buf[0], &buf[12], 4);
crypto_xor(&buf[4], &buf[8], 4);
/* apply whitening (8 bytes) to whole sector */
for (i = 0; i < ((1 << SECTOR_SHIFT) / 8); i++)
crypto_xor(data + i * 8, buf, 8);
out:
memset(buf, 0, sizeof(buf));
return r;
}
static int crypt_iv_tcw_gen(struct crypt_config *cc, u8 *iv,
struct dm_crypt_request *dmreq)
{
struct iv_tcw_private *tcw = &cc->iv_gen_private.tcw;
u64 sector = cpu_to_le64((u64)dmreq->iv_sector);
u8 *src;
int r = 0;
/* Remove whitening from ciphertext */
if (bio_data_dir(dmreq->ctx->bio_in) != WRITE) {
src = kmap_atomic(sg_page(&dmreq->sg_in));
r = crypt_iv_tcw_whitening(cc, dmreq, src + dmreq->sg_in.offset);
kunmap_atomic(src);
}
/* Calculate IV */
memcpy(iv, tcw->iv_seed, cc->iv_size);
crypto_xor(iv, (u8 *)&sector, 8);
if (cc->iv_size > 8)
crypto_xor(&iv[8], (u8 *)&sector, cc->iv_size - 8);
return r;
}
static int crypt_iv_tcw_post(struct crypt_config *cc, u8 *iv,
struct dm_crypt_request *dmreq)
{
u8 *dst;
int r;
if (bio_data_dir(dmreq->ctx->bio_in) != WRITE)
return 0;
/* Apply whitening on ciphertext */
dst = kmap_atomic(sg_page(&dmreq->sg_out));
r = crypt_iv_tcw_whitening(cc, dmreq, dst + dmreq->sg_out.offset);
kunmap_atomic(dst);
return r;
}
static struct crypt_iv_operations crypt_iv_plain_ops = { static struct crypt_iv_operations crypt_iv_plain_ops = {
.generator = crypt_iv_plain_gen .generator = crypt_iv_plain_gen
}; };
...@@ -643,6 +810,15 @@ static struct crypt_iv_operations crypt_iv_lmk_ops = { ...@@ -643,6 +810,15 @@ static struct crypt_iv_operations crypt_iv_lmk_ops = {
.post = crypt_iv_lmk_post .post = crypt_iv_lmk_post
}; };
static struct crypt_iv_operations crypt_iv_tcw_ops = {
.ctr = crypt_iv_tcw_ctr,
.dtr = crypt_iv_tcw_dtr,
.init = crypt_iv_tcw_init,
.wipe = crypt_iv_tcw_wipe,
.generator = crypt_iv_tcw_gen,
.post = crypt_iv_tcw_post
};
static void crypt_convert_init(struct crypt_config *cc, static void crypt_convert_init(struct crypt_config *cc,
struct convert_context *ctx, struct convert_context *ctx,
struct bio *bio_out, struct bio *bio_in, struct bio *bio_out, struct bio *bio_in,
...@@ -1274,9 +1450,12 @@ static int crypt_alloc_tfms(struct crypt_config *cc, char *ciphermode) ...@@ -1274,9 +1450,12 @@ static int crypt_alloc_tfms(struct crypt_config *cc, char *ciphermode)
static int crypt_setkey_allcpus(struct crypt_config *cc) static int crypt_setkey_allcpus(struct crypt_config *cc)
{ {
unsigned subkey_size = cc->key_size >> ilog2(cc->tfms_count); unsigned subkey_size;
int err = 0, i, r; int err = 0, i, r;
/* Ignore extra keys (which are used for IV etc) */
subkey_size = (cc->key_size - cc->key_extra_size) >> ilog2(cc->tfms_count);
for (i = 0; i < cc->tfms_count; i++) { for (i = 0; i < cc->tfms_count; i++) {
r = crypto_ablkcipher_setkey(cc->tfms[i], r = crypto_ablkcipher_setkey(cc->tfms[i],
cc->key + (i * subkey_size), cc->key + (i * subkey_size),
...@@ -1409,6 +1588,7 @@ static int crypt_ctr_cipher(struct dm_target *ti, ...@@ -1409,6 +1588,7 @@ static int crypt_ctr_cipher(struct dm_target *ti,
return -EINVAL; return -EINVAL;
} }
cc->key_parts = cc->tfms_count; cc->key_parts = cc->tfms_count;
cc->key_extra_size = 0;
cc->cipher = kstrdup(cipher, GFP_KERNEL); cc->cipher = kstrdup(cipher, GFP_KERNEL);
if (!cc->cipher) if (!cc->cipher)
...@@ -1460,13 +1640,6 @@ static int crypt_ctr_cipher(struct dm_target *ti, ...@@ -1460,13 +1640,6 @@ static int crypt_ctr_cipher(struct dm_target *ti,
goto bad; goto bad;
} }
/* Initialize and set key */
ret = crypt_set_key(cc, key);
if (ret < 0) {
ti->error = "Error decoding and setting key";
goto bad;
}
/* Initialize IV */ /* Initialize IV */
cc->iv_size = crypto_ablkcipher_ivsize(any_tfm(cc)); cc->iv_size = crypto_ablkcipher_ivsize(any_tfm(cc));
if (cc->iv_size) if (cc->iv_size)
...@@ -1493,18 +1666,33 @@ static int crypt_ctr_cipher(struct dm_target *ti, ...@@ -1493,18 +1666,33 @@ static int crypt_ctr_cipher(struct dm_target *ti,
cc->iv_gen_ops = &crypt_iv_null_ops; cc->iv_gen_ops = &crypt_iv_null_ops;
else if (strcmp(ivmode, "lmk") == 0) { else if (strcmp(ivmode, "lmk") == 0) {
cc->iv_gen_ops = &crypt_iv_lmk_ops; cc->iv_gen_ops = &crypt_iv_lmk_ops;
/* Version 2 and 3 is recognised according /*
* Version 2 and 3 is recognised according
* to length of provided multi-key string. * to length of provided multi-key string.
* If present (version 3), last key is used as IV seed. * If present (version 3), last key is used as IV seed.
* All keys (including IV seed) are always the same size.
*/ */
if (cc->key_size % cc->key_parts) if (cc->key_size % cc->key_parts) {
cc->key_parts++; cc->key_parts++;
cc->key_extra_size = cc->key_size / cc->key_parts;
}
} else if (strcmp(ivmode, "tcw") == 0) {
cc->iv_gen_ops = &crypt_iv_tcw_ops;
cc->key_parts += 2; /* IV + whitening */
cc->key_extra_size = cc->iv_size + TCW_WHITENING_SIZE;
} else { } else {
ret = -EINVAL; ret = -EINVAL;
ti->error = "Invalid IV mode"; ti->error = "Invalid IV mode";
goto bad; goto bad;
} }
/* Initialize and set key */
ret = crypt_set_key(cc, key);
if (ret < 0) {
ti->error = "Error decoding and setting key";
goto bad;
}
/* Allocate IV */ /* Allocate IV */
if (cc->iv_gen_ops && cc->iv_gen_ops->ctr) { if (cc->iv_gen_ops && cc->iv_gen_ops->ctr) {
ret = cc->iv_gen_ops->ctr(cc, ti, ivopts); ret = cc->iv_gen_ops->ctr(cc, ti, ivopts);
...@@ -1817,7 +2005,7 @@ static int crypt_iterate_devices(struct dm_target *ti, ...@@ -1817,7 +2005,7 @@ static int crypt_iterate_devices(struct dm_target *ti,
static struct target_type crypt_target = { static struct target_type crypt_target = {
.name = "crypt", .name = "crypt",
.version = {1, 12, 1}, .version = {1, 13, 0},
.module = THIS_MODULE, .module = THIS_MODULE,
.ctr = crypt_ctr, .ctr = crypt_ctr,
.dtr = crypt_dtr, .dtr = crypt_dtr,
......
...@@ -57,7 +57,7 @@ struct vers_iter { ...@@ -57,7 +57,7 @@ struct vers_iter {
static struct list_head _name_buckets[NUM_BUCKETS]; static struct list_head _name_buckets[NUM_BUCKETS];
static struct list_head _uuid_buckets[NUM_BUCKETS]; static struct list_head _uuid_buckets[NUM_BUCKETS];
static void dm_hash_remove_all(int keep_open_devices); static void dm_hash_remove_all(bool keep_open_devices, bool mark_deferred, bool only_deferred);
/* /*
* Guards access to both hash tables. * Guards access to both hash tables.
...@@ -86,7 +86,7 @@ static int dm_hash_init(void) ...@@ -86,7 +86,7 @@ static int dm_hash_init(void)
static void dm_hash_exit(void) static void dm_hash_exit(void)
{ {
dm_hash_remove_all(0); dm_hash_remove_all(false, false, false);
} }
/*----------------------------------------------------------------- /*-----------------------------------------------------------------
...@@ -276,7 +276,7 @@ static struct dm_table *__hash_remove(struct hash_cell *hc) ...@@ -276,7 +276,7 @@ static struct dm_table *__hash_remove(struct hash_cell *hc)
return table; return table;
} }
static void dm_hash_remove_all(int keep_open_devices) static void dm_hash_remove_all(bool keep_open_devices, bool mark_deferred, bool only_deferred)
{ {
int i, dev_skipped; int i, dev_skipped;
struct hash_cell *hc; struct hash_cell *hc;
...@@ -293,7 +293,8 @@ static void dm_hash_remove_all(int keep_open_devices) ...@@ -293,7 +293,8 @@ static void dm_hash_remove_all(int keep_open_devices)
md = hc->md; md = hc->md;
dm_get(md); dm_get(md);
if (keep_open_devices && dm_lock_for_deletion(md)) { if (keep_open_devices &&
dm_lock_for_deletion(md, mark_deferred, only_deferred)) {
dm_put(md); dm_put(md);
dev_skipped++; dev_skipped++;
continue; continue;
...@@ -450,6 +451,11 @@ static struct mapped_device *dm_hash_rename(struct dm_ioctl *param, ...@@ -450,6 +451,11 @@ static struct mapped_device *dm_hash_rename(struct dm_ioctl *param,
return md; return md;
} }
void dm_deferred_remove(void)
{
dm_hash_remove_all(true, false, true);
}
/*----------------------------------------------------------------- /*-----------------------------------------------------------------
* Implementation of the ioctl commands * Implementation of the ioctl commands
*---------------------------------------------------------------*/ *---------------------------------------------------------------*/
...@@ -461,7 +467,7 @@ typedef int (*ioctl_fn)(struct dm_ioctl *param, size_t param_size); ...@@ -461,7 +467,7 @@ typedef int (*ioctl_fn)(struct dm_ioctl *param, size_t param_size);
static int remove_all(struct dm_ioctl *param, size_t param_size) static int remove_all(struct dm_ioctl *param, size_t param_size)
{ {
dm_hash_remove_all(1); dm_hash_remove_all(true, !!(param->flags & DM_DEFERRED_REMOVE), false);
param->data_size = 0; param->data_size = 0;
return 0; return 0;
} }
...@@ -683,6 +689,9 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param) ...@@ -683,6 +689,9 @@ static void __dev_status(struct mapped_device *md, struct dm_ioctl *param)
if (dm_suspended_md(md)) if (dm_suspended_md(md))
param->flags |= DM_SUSPEND_FLAG; param->flags |= DM_SUSPEND_FLAG;
if (dm_test_deferred_remove_flag(md))
param->flags |= DM_DEFERRED_REMOVE;
param->dev = huge_encode_dev(disk_devt(disk)); param->dev = huge_encode_dev(disk_devt(disk));
/* /*
...@@ -832,8 +841,13 @@ static int dev_remove(struct dm_ioctl *param, size_t param_size) ...@@ -832,8 +841,13 @@ static int dev_remove(struct dm_ioctl *param, size_t param_size)
/* /*
* Ensure the device is not open and nothing further can open it. * Ensure the device is not open and nothing further can open it.
*/ */
r = dm_lock_for_deletion(md); r = dm_lock_for_deletion(md, !!(param->flags & DM_DEFERRED_REMOVE), false);
if (r) { if (r) {
if (r == -EBUSY && param->flags & DM_DEFERRED_REMOVE) {
up_write(&_hash_lock);
dm_put(md);
return 0;
}
DMDEBUG_LIMIT("unable to remove open device %s", hc->name); DMDEBUG_LIMIT("unable to remove open device %s", hc->name);
up_write(&_hash_lock); up_write(&_hash_lock);
dm_put(md); dm_put(md);
...@@ -848,6 +862,8 @@ static int dev_remove(struct dm_ioctl *param, size_t param_size) ...@@ -848,6 +862,8 @@ static int dev_remove(struct dm_ioctl *param, size_t param_size)
dm_table_destroy(t); dm_table_destroy(t);
} }
param->flags &= ~DM_DEFERRED_REMOVE;
if (!dm_kobject_uevent(md, KOBJ_REMOVE, param->event_nr)) if (!dm_kobject_uevent(md, KOBJ_REMOVE, param->event_nr))
param->flags |= DM_UEVENT_GENERATED_FLAG; param->flags |= DM_UEVENT_GENERATED_FLAG;
...@@ -1469,6 +1485,14 @@ static int message_for_md(struct mapped_device *md, unsigned argc, char **argv, ...@@ -1469,6 +1485,14 @@ static int message_for_md(struct mapped_device *md, unsigned argc, char **argv,
if (**argv != '@') if (**argv != '@')
return 2; /* no '@' prefix, deliver to target */ return 2; /* no '@' prefix, deliver to target */
if (!strcasecmp(argv[0], "@cancel_deferred_remove")) {
if (argc != 1) {
DMERR("Invalid arguments for @cancel_deferred_remove");
return -EINVAL;
}
return dm_cancel_deferred_remove(md);
}
r = dm_stats_message(md, argc, argv, result, maxlen); r = dm_stats_message(md, argc, argv, result, maxlen);
if (r < 2) if (r < 2)
return r; return r;
......
...@@ -87,6 +87,7 @@ struct multipath { ...@@ -87,6 +87,7 @@ struct multipath {
unsigned queue_if_no_path:1; /* Queue I/O if last path fails? */ unsigned queue_if_no_path:1; /* Queue I/O if last path fails? */
unsigned saved_queue_if_no_path:1; /* Saved state during suspension */ unsigned saved_queue_if_no_path:1; /* Saved state during suspension */
unsigned retain_attached_hw_handler:1; /* If there's already a hw_handler present, don't change it. */ unsigned retain_attached_hw_handler:1; /* If there's already a hw_handler present, don't change it. */
unsigned pg_init_disabled:1; /* pg_init is not currently allowed */
unsigned pg_init_retries; /* Number of times to retry pg_init */ unsigned pg_init_retries; /* Number of times to retry pg_init */
unsigned pg_init_count; /* Number of times pg_init called */ unsigned pg_init_count; /* Number of times pg_init called */
...@@ -390,13 +391,16 @@ static int map_io(struct multipath *m, struct request *clone, ...@@ -390,13 +391,16 @@ static int map_io(struct multipath *m, struct request *clone,
if (was_queued) if (was_queued)
m->queue_size--; m->queue_size--;
if ((pgpath && m->queue_io) || if (m->pg_init_required) {
if (!m->pg_init_in_progress)
queue_work(kmultipathd, &m->process_queued_ios);
r = DM_MAPIO_REQUEUE;
} else if ((pgpath && m->queue_io) ||
(!pgpath && m->queue_if_no_path)) { (!pgpath && m->queue_if_no_path)) {
/* Queue for the daemon to resubmit */ /* Queue for the daemon to resubmit */
list_add_tail(&clone->queuelist, &m->queued_ios); list_add_tail(&clone->queuelist, &m->queued_ios);
m->queue_size++; m->queue_size++;
if ((m->pg_init_required && !m->pg_init_in_progress) || if (!m->queue_io)
!m->queue_io)
queue_work(kmultipathd, &m->process_queued_ios); queue_work(kmultipathd, &m->process_queued_ios);
pgpath = NULL; pgpath = NULL;
r = DM_MAPIO_SUBMITTED; r = DM_MAPIO_SUBMITTED;
...@@ -497,7 +501,8 @@ static void process_queued_ios(struct work_struct *work) ...@@ -497,7 +501,8 @@ static void process_queued_ios(struct work_struct *work)
(!pgpath && !m->queue_if_no_path)) (!pgpath && !m->queue_if_no_path))
must_queue = 0; must_queue = 0;
if (m->pg_init_required && !m->pg_init_in_progress && pgpath) if (m->pg_init_required && !m->pg_init_in_progress && pgpath &&
!m->pg_init_disabled)
__pg_init_all_paths(m); __pg_init_all_paths(m);
spin_unlock_irqrestore(&m->lock, flags); spin_unlock_irqrestore(&m->lock, flags);
...@@ -942,10 +947,20 @@ static void multipath_wait_for_pg_init_completion(struct multipath *m) ...@@ -942,10 +947,20 @@ static void multipath_wait_for_pg_init_completion(struct multipath *m)
static void flush_multipath_work(struct multipath *m) static void flush_multipath_work(struct multipath *m)
{ {
unsigned long flags;
spin_lock_irqsave(&m->lock, flags);
m->pg_init_disabled = 1;
spin_unlock_irqrestore(&m->lock, flags);
flush_workqueue(kmpath_handlerd); flush_workqueue(kmpath_handlerd);
multipath_wait_for_pg_init_completion(m); multipath_wait_for_pg_init_completion(m);
flush_workqueue(kmultipathd); flush_workqueue(kmultipathd);
flush_work(&m->trigger_event); flush_work(&m->trigger_event);
spin_lock_irqsave(&m->lock, flags);
m->pg_init_disabled = 0;
spin_unlock_irqrestore(&m->lock, flags);
} }
static void multipath_dtr(struct dm_target *ti) static void multipath_dtr(struct dm_target *ti)
...@@ -1164,7 +1179,7 @@ static int pg_init_limit_reached(struct multipath *m, struct pgpath *pgpath) ...@@ -1164,7 +1179,7 @@ static int pg_init_limit_reached(struct multipath *m, struct pgpath *pgpath)
spin_lock_irqsave(&m->lock, flags); spin_lock_irqsave(&m->lock, flags);
if (m->pg_init_count <= m->pg_init_retries) if (m->pg_init_count <= m->pg_init_retries && !m->pg_init_disabled)
m->pg_init_required = 1; m->pg_init_required = 1;
else else
limit_reached = 1; limit_reached = 1;
...@@ -1665,6 +1680,11 @@ static int multipath_busy(struct dm_target *ti) ...@@ -1665,6 +1680,11 @@ static int multipath_busy(struct dm_target *ti)
spin_lock_irqsave(&m->lock, flags); spin_lock_irqsave(&m->lock, flags);
/* pg_init in progress, requeue until done */
if (m->pg_init_in_progress) {
busy = 1;
goto out;
}
/* Guess which priority_group will be used at next mapping time */ /* Guess which priority_group will be used at next mapping time */
if (unlikely(!m->current_pgpath && m->next_pg)) if (unlikely(!m->current_pgpath && m->next_pg))
pg = m->next_pg; pg = m->next_pg;
...@@ -1714,7 +1734,7 @@ static int multipath_busy(struct dm_target *ti) ...@@ -1714,7 +1734,7 @@ static int multipath_busy(struct dm_target *ti)
*---------------------------------------------------------------*/ *---------------------------------------------------------------*/
static struct target_type multipath_target = { static struct target_type multipath_target = {
.name = "multipath", .name = "multipath",
.version = {1, 5, 1}, .version = {1, 6, 0},
.module = THIS_MODULE, .module = THIS_MODULE,
.ctr = multipath_ctr, .ctr = multipath_ctr,
.dtr = multipath_dtr, .dtr = multipath_dtr,
......
...@@ -545,14 +545,28 @@ static int adjoin(struct dm_table *table, struct dm_target *ti) ...@@ -545,14 +545,28 @@ static int adjoin(struct dm_table *table, struct dm_target *ti)
/* /*
* Used to dynamically allocate the arg array. * Used to dynamically allocate the arg array.
*
* We do first allocation with GFP_NOIO because dm-mpath and dm-thin must
* process messages even if some device is suspended. These messages have a
* small fixed number of arguments.
*
* On the other hand, dm-switch needs to process bulk data using messages and
* excessive use of GFP_NOIO could cause trouble.
*/ */
static char **realloc_argv(unsigned *array_size, char **old_argv) static char **realloc_argv(unsigned *array_size, char **old_argv)
{ {
char **argv; char **argv;
unsigned new_size; unsigned new_size;
gfp_t gfp;
new_size = *array_size ? *array_size * 2 : 64; if (*array_size) {
argv = kmalloc(new_size * sizeof(*argv), GFP_KERNEL); new_size = *array_size * 2;
gfp = GFP_KERNEL;
} else {
new_size = 8;
gfp = GFP_NOIO;
}
argv = kmalloc(new_size * sizeof(*argv), gfp);
if (argv) { if (argv) {
memcpy(argv, old_argv, *array_size * sizeof(*argv)); memcpy(argv, old_argv, *array_size * sizeof(*argv));
*array_size = new_size; *array_size = new_size;
...@@ -1548,9 +1562,12 @@ int dm_table_resume_targets(struct dm_table *t) ...@@ -1548,9 +1562,12 @@ int dm_table_resume_targets(struct dm_table *t)
continue; continue;
r = ti->type->preresume(ti); r = ti->type->preresume(ti);
if (r) if (r) {
DMERR("%s: %s: preresume failed, error = %d",
dm_device_name(t->md), ti->type->name, r);
return r; return r;
} }
}
for (i = 0; i < t->num_targets; i++) { for (i = 0; i < t->num_targets; i++) {
struct dm_target *ti = t->targets + i; struct dm_target *ti = t->targets + i;
......
...@@ -49,6 +49,11 @@ static unsigned int _major = 0; ...@@ -49,6 +49,11 @@ static unsigned int _major = 0;
static DEFINE_IDR(_minor_idr); static DEFINE_IDR(_minor_idr);
static DEFINE_SPINLOCK(_minor_lock); static DEFINE_SPINLOCK(_minor_lock);
static void do_deferred_remove(struct work_struct *w);
static DECLARE_WORK(deferred_remove_work, do_deferred_remove);
/* /*
* For bio-based dm. * For bio-based dm.
* One of these is allocated per bio. * One of these is allocated per bio.
...@@ -116,6 +121,7 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo); ...@@ -116,6 +121,7 @@ EXPORT_SYMBOL_GPL(dm_get_rq_mapinfo);
#define DMF_DELETING 4 #define DMF_DELETING 4
#define DMF_NOFLUSH_SUSPENDING 5 #define DMF_NOFLUSH_SUSPENDING 5
#define DMF_MERGE_IS_OPTIONAL 6 #define DMF_MERGE_IS_OPTIONAL 6
#define DMF_DEFERRED_REMOVE 7
/* /*
* A dummy definition to make RCU happy. * A dummy definition to make RCU happy.
...@@ -299,6 +305,8 @@ static int __init local_init(void) ...@@ -299,6 +305,8 @@ static int __init local_init(void)
static void local_exit(void) static void local_exit(void)
{ {
flush_scheduled_work();
kmem_cache_destroy(_rq_tio_cache); kmem_cache_destroy(_rq_tio_cache);
kmem_cache_destroy(_io_cache); kmem_cache_destroy(_io_cache);
unregister_blkdev(_major, _name); unregister_blkdev(_major, _name);
...@@ -404,7 +412,10 @@ static void dm_blk_close(struct gendisk *disk, fmode_t mode) ...@@ -404,7 +412,10 @@ static void dm_blk_close(struct gendisk *disk, fmode_t mode)
spin_lock(&_minor_lock); spin_lock(&_minor_lock);
atomic_dec(&md->open_count); if (atomic_dec_and_test(&md->open_count) &&
(test_bit(DMF_DEFERRED_REMOVE, &md->flags)))
schedule_work(&deferred_remove_work);
dm_put(md); dm_put(md);
spin_unlock(&_minor_lock); spin_unlock(&_minor_lock);
...@@ -418,14 +429,18 @@ int dm_open_count(struct mapped_device *md) ...@@ -418,14 +429,18 @@ int dm_open_count(struct mapped_device *md)
/* /*
* Guarantees nothing is using the device before it's deleted. * Guarantees nothing is using the device before it's deleted.
*/ */
int dm_lock_for_deletion(struct mapped_device *md) int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only_deferred)
{ {
int r = 0; int r = 0;
spin_lock(&_minor_lock); spin_lock(&_minor_lock);
if (dm_open_count(md)) if (dm_open_count(md)) {
r = -EBUSY; r = -EBUSY;
if (mark_deferred)
set_bit(DMF_DEFERRED_REMOVE, &md->flags);
} else if (only_deferred && !test_bit(DMF_DEFERRED_REMOVE, &md->flags))
r = -EEXIST;
else else
set_bit(DMF_DELETING, &md->flags); set_bit(DMF_DELETING, &md->flags);
...@@ -434,6 +449,27 @@ int dm_lock_for_deletion(struct mapped_device *md) ...@@ -434,6 +449,27 @@ int dm_lock_for_deletion(struct mapped_device *md)
return r; return r;
} }
int dm_cancel_deferred_remove(struct mapped_device *md)
{
int r = 0;
spin_lock(&_minor_lock);
if (test_bit(DMF_DELETING, &md->flags))
r = -EBUSY;
else
clear_bit(DMF_DEFERRED_REMOVE, &md->flags);
spin_unlock(&_minor_lock);
return r;
}
static void do_deferred_remove(struct work_struct *w)
{
dm_deferred_remove();
}
sector_t dm_get_size(struct mapped_device *md) sector_t dm_get_size(struct mapped_device *md)
{ {
return get_capacity(md->disk); return get_capacity(md->disk);
...@@ -2894,6 +2930,11 @@ int dm_suspended_md(struct mapped_device *md) ...@@ -2894,6 +2930,11 @@ int dm_suspended_md(struct mapped_device *md)
return test_bit(DMF_SUSPENDED, &md->flags); return test_bit(DMF_SUSPENDED, &md->flags);
} }
int dm_test_deferred_remove_flag(struct mapped_device *md)
{
return test_bit(DMF_DEFERRED_REMOVE, &md->flags);
}
int dm_suspended(struct dm_target *ti) int dm_suspended(struct dm_target *ti)
{ {
return dm_suspended_md(dm_table_get_md(ti->table)); return dm_suspended_md(dm_table_get_md(ti->table));
......
...@@ -128,6 +128,16 @@ int dm_deleting_md(struct mapped_device *md); ...@@ -128,6 +128,16 @@ int dm_deleting_md(struct mapped_device *md);
*/ */
int dm_suspended_md(struct mapped_device *md); int dm_suspended_md(struct mapped_device *md);
/*
* Test if the device is scheduled for deferred remove.
*/
int dm_test_deferred_remove_flag(struct mapped_device *md);
/*
* Try to remove devices marked for deferred removal.
*/
void dm_deferred_remove(void);
/* /*
* The device-mapper can be driven through one of two interfaces; * The device-mapper can be driven through one of two interfaces;
* ioctl or filesystem, depending which patch you have applied. * ioctl or filesystem, depending which patch you have applied.
...@@ -158,7 +168,8 @@ void dm_stripe_exit(void); ...@@ -158,7 +168,8 @@ void dm_stripe_exit(void);
void dm_destroy(struct mapped_device *md); void dm_destroy(struct mapped_device *md);
void dm_destroy_immediate(struct mapped_device *md); void dm_destroy_immediate(struct mapped_device *md);
int dm_open_count(struct mapped_device *md); int dm_open_count(struct mapped_device *md);
int dm_lock_for_deletion(struct mapped_device *md); int dm_lock_for_deletion(struct mapped_device *md, bool mark_deferred, bool only_deferred);
int dm_cancel_deferred_remove(struct mapped_device *md);
int dm_request_based(struct mapped_device *md); int dm_request_based(struct mapped_device *md);
sector_t dm_get_size(struct mapped_device *md); sector_t dm_get_size(struct mapped_device *md);
struct dm_stats *dm_get_stats(struct mapped_device *md); struct dm_stats *dm_get_stats(struct mapped_device *md);
......
...@@ -509,15 +509,18 @@ static int grow_add_tail_block(struct resize *resize) ...@@ -509,15 +509,18 @@ static int grow_add_tail_block(struct resize *resize)
static int grow_needs_more_blocks(struct resize *resize) static int grow_needs_more_blocks(struct resize *resize)
{ {
int r; int r;
unsigned old_nr_blocks = resize->old_nr_full_blocks;
if (resize->old_nr_entries_in_last_block > 0) { if (resize->old_nr_entries_in_last_block > 0) {
old_nr_blocks++;
r = grow_extend_tail_block(resize, resize->max_entries); r = grow_extend_tail_block(resize, resize->max_entries);
if (r) if (r)
return r; return r;
} }
r = insert_full_ablocks(resize->info, resize->size_of_block, r = insert_full_ablocks(resize->info, resize->size_of_block,
resize->old_nr_full_blocks, old_nr_blocks,
resize->new_nr_full_blocks, resize->new_nr_full_blocks,
resize->max_entries, resize->value, resize->max_entries, resize->value,
&resize->root); &resize->root);
......
...@@ -140,26 +140,10 @@ static int sm_disk_inc_block(struct dm_space_map *sm, dm_block_t b) ...@@ -140,26 +140,10 @@ static int sm_disk_inc_block(struct dm_space_map *sm, dm_block_t b)
static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b) static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b)
{ {
int r;
uint32_t old_count;
enum allocation_event ev; enum allocation_event ev;
struct sm_disk *smd = container_of(sm, struct sm_disk, sm); struct sm_disk *smd = container_of(sm, struct sm_disk, sm);
r = sm_ll_dec(&smd->ll, b, &ev); return sm_ll_dec(&smd->ll, b, &ev);
if (!r && (ev == SM_FREE)) {
/*
* It's only free if it's also free in the last
* transaction.
*/
r = sm_ll_lookup(&smd->old_ll, b, &old_count);
if (r)
return r;
if (!old_count)
smd->nr_allocated_this_transaction--;
}
return r;
} }
static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b) static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b)
......
...@@ -267,9 +267,9 @@ enum { ...@@ -267,9 +267,9 @@ enum {
#define DM_DEV_SET_GEOMETRY _IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl) #define DM_DEV_SET_GEOMETRY _IOWR(DM_IOCTL, DM_DEV_SET_GEOMETRY_CMD, struct dm_ioctl)
#define DM_VERSION_MAJOR 4 #define DM_VERSION_MAJOR 4
#define DM_VERSION_MINOR 26 #define DM_VERSION_MINOR 27
#define DM_VERSION_PATCHLEVEL 0 #define DM_VERSION_PATCHLEVEL 0
#define DM_VERSION_EXTRA "-ioctl (2013-08-15)" #define DM_VERSION_EXTRA "-ioctl (2013-10-30)"
/* Status bits */ /* Status bits */
#define DM_READONLY_FLAG (1 << 0) /* In/Out */ #define DM_READONLY_FLAG (1 << 0) /* In/Out */
...@@ -341,4 +341,15 @@ enum { ...@@ -341,4 +341,15 @@ enum {
*/ */
#define DM_DATA_OUT_FLAG (1 << 16) /* Out */ #define DM_DATA_OUT_FLAG (1 << 16) /* Out */
/*
* If set with DM_DEV_REMOVE or DM_REMOVE_ALL this indicates that if
* the device cannot be removed immediately because it is still in use
* it should instead be scheduled for removal when it gets closed.
*
* On return from DM_DEV_REMOVE, DM_DEV_STATUS or other ioctls, this
* flag indicates that the device is scheduled to be removed when it
* gets closed.
*/
#define DM_DEFERRED_REMOVE (1 << 17) /* In/Out */
#endif /* _LINUX_DM_IOCTL_H */ #endif /* _LINUX_DM_IOCTL_H */
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment