Commit 43672a07 authored by Linus Torvalds's avatar Linus Torvalds

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/linux-dm:
  dm: raid fix device status indicator when array initializing
  dm log userspace: add log device dependency
  dm log userspace: fix comment hyphens
  dm: add thin provisioning target
  dm: add persistent data library
  dm: add bufio
  dm: export dm get md
  dm table: add immutable feature
  dm table: add always writeable feature
  dm table: add singleton feature
  dm kcopyd: add dm_kcopyd_zero to zero an area
  dm: remove superfluous smp_mb
  dm: use local printk ratelimit
  dm table: propagate non rotational flag
parents 2380078c 2e727c3c
...@@ -48,7 +48,7 @@ kernel and userspace, 'connector' is used as the interface for ...@@ -48,7 +48,7 @@ kernel and userspace, 'connector' is used as the interface for
communication. communication.
There are currently two userspace log implementations that leverage this There are currently two userspace log implementations that leverage this
framework - "clustered_disk" and "clustered_core". These implementations framework - "clustered-disk" and "clustered-core". These implementations
provide a cluster-coherent log for shared-storage. Device-mapper mirroring provide a cluster-coherent log for shared-storage. Device-mapper mirroring
can be used in a shared-storage environment when the cluster log implementations can be used in a shared-storage environment when the cluster log implementations
are employed. are employed.
Introduction
============
The more-sophisticated device-mapper targets require complex metadata
that is managed in kernel. In late 2010 we were seeing that various
different targets were rolling their own data strutures, for example:
- Mikulas Patocka's multisnap implementation
- Heinz Mauelshagen's thin provisioning target
- Another btree-based caching target posted to dm-devel
- Another multi-snapshot target based on a design of Daniel Phillips
Maintaining these data structures takes a lot of work, so if possible
we'd like to reduce the number.
The persistent-data library is an attempt to provide a re-usable
framework for people who want to store metadata in device-mapper
targets. It's currently used by the thin-provisioning target and an
upcoming hierarchical storage target.
Overview
========
The main documentation is in the header files which can all be found
under drivers/md/persistent-data.
The block manager
-----------------
dm-block-manager.[hc]
This provides access to the data on disk in fixed sized-blocks. There
is a read/write locking interface to prevent concurrent accesses, and
keep data that is being used in the cache.
Clients of persistent-data are unlikely to use this directly.
The transaction manager
-----------------------
dm-transaction-manager.[hc]
This restricts access to blocks and enforces copy-on-write semantics.
The only way you can get hold of a writable block through the
transaction manager is by shadowing an existing block (ie. doing
copy-on-write) or allocating a fresh one. Shadowing is elided within
the same transaction so performance is reasonable. The commit method
ensures that all data is flushed before it writes the superblock.
On power failure your metadata will be as it was when last committed.
The Space Maps
--------------
dm-space-map.h
dm-space-map-metadata.[hc]
dm-space-map-disk.[hc]
On-disk data structures that keep track of reference counts of blocks.
Also acts as the allocator of new blocks. Currently two
implementations: a simpler one for managing blocks on a different
device (eg. thinly-provisioned data blocks); and one for managing
the metadata space. The latter is complicated by the need to store
its own data within the space it's managing.
The data structures
-------------------
dm-btree.[hc]
dm-btree-remove.c
dm-btree-spine.c
dm-btree-internal.h
Currently there is only one data structure, a hierarchical btree.
There are plans to add more. For example, something with an
array-like interface would see a lot of use.
The btree is 'hierarchical' in that you can define it to be composed
of nested btrees, and take multiple keys. For example, the
thin-provisioning target uses a btree with two levels of nesting.
The first maps a device id to a mapping tree, and that in turn maps a
virtual block to a physical block.
Values stored in the btrees can have arbitrary size. Keys are always
64bits, although nesting allows you to use multiple keys.
This diff is collapsed.
...@@ -208,6 +208,16 @@ config DM_DEBUG ...@@ -208,6 +208,16 @@ config DM_DEBUG
If unsure, say N. If unsure, say N.
config DM_BUFIO
tristate
depends on BLK_DEV_DM && EXPERIMENTAL
---help---
This interface allows you to do buffered I/O on a device and acts
as a cache, holding recently-read blocks in memory and performing
delayed writes.
source "drivers/md/persistent-data/Kconfig"
config DM_CRYPT config DM_CRYPT
tristate "Crypt target support" tristate "Crypt target support"
depends on BLK_DEV_DM depends on BLK_DEV_DM
...@@ -233,6 +243,32 @@ config DM_SNAPSHOT ...@@ -233,6 +243,32 @@ config DM_SNAPSHOT
---help--- ---help---
Allow volume managers to take writable snapshots of a device. Allow volume managers to take writable snapshots of a device.
config DM_THIN_PROVISIONING
tristate "Thin provisioning target (EXPERIMENTAL)"
depends on BLK_DEV_DM && EXPERIMENTAL
select DM_PERSISTENT_DATA
---help---
Provides thin provisioning and snapshots that share a data store.
config DM_DEBUG_BLOCK_STACK_TRACING
boolean "Keep stack trace of thin provisioning block lock holders"
depends on STACKTRACE_SUPPORT && DM_THIN_PROVISIONING
select STACKTRACE
---help---
Enable this for messages that may help debug problems with the
block manager locking used by thin provisioning.
If unsure, say N.
config DM_DEBUG_SPACE_MAPS
boolean "Extra validation for thin provisioning space maps"
depends on DM_THIN_PROVISIONING
---help---
Enable this for messages that may help debug problems with the
space maps used by thin provisioning.
If unsure, say N.
config DM_MIRROR config DM_MIRROR
tristate "Mirror target" tristate "Mirror target"
depends on BLK_DEV_DM depends on BLK_DEV_DM
......
...@@ -10,6 +10,7 @@ dm-snapshot-y += dm-snap.o dm-exception-store.o dm-snap-transient.o \ ...@@ -10,6 +10,7 @@ dm-snapshot-y += dm-snap.o dm-exception-store.o dm-snap-transient.o \
dm-mirror-y += dm-raid1.o dm-mirror-y += dm-raid1.o
dm-log-userspace-y \ dm-log-userspace-y \
+= dm-log-userspace-base.o dm-log-userspace-transfer.o += dm-log-userspace-base.o dm-log-userspace-transfer.o
dm-thin-pool-y += dm-thin.o dm-thin-metadata.o
md-mod-y += md.o bitmap.o md-mod-y += md.o bitmap.o
raid456-y += raid5.o raid456-y += raid5.o
...@@ -27,6 +28,7 @@ obj-$(CONFIG_MD_MULTIPATH) += multipath.o ...@@ -27,6 +28,7 @@ obj-$(CONFIG_MD_MULTIPATH) += multipath.o
obj-$(CONFIG_MD_FAULTY) += faulty.o obj-$(CONFIG_MD_FAULTY) += faulty.o
obj-$(CONFIG_BLK_DEV_MD) += md-mod.o obj-$(CONFIG_BLK_DEV_MD) += md-mod.o
obj-$(CONFIG_BLK_DEV_DM) += dm-mod.o obj-$(CONFIG_BLK_DEV_DM) += dm-mod.o
obj-$(CONFIG_DM_BUFIO) += dm-bufio.o
obj-$(CONFIG_DM_CRYPT) += dm-crypt.o obj-$(CONFIG_DM_CRYPT) += dm-crypt.o
obj-$(CONFIG_DM_DELAY) += dm-delay.o obj-$(CONFIG_DM_DELAY) += dm-delay.o
obj-$(CONFIG_DM_FLAKEY) += dm-flakey.o obj-$(CONFIG_DM_FLAKEY) += dm-flakey.o
...@@ -34,10 +36,12 @@ obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o ...@@ -34,10 +36,12 @@ obj-$(CONFIG_DM_MULTIPATH) += dm-multipath.o dm-round-robin.o
obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o obj-$(CONFIG_DM_MULTIPATH_QL) += dm-queue-length.o
obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o obj-$(CONFIG_DM_MULTIPATH_ST) += dm-service-time.o
obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o
obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/
obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o
obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o
obj-$(CONFIG_DM_ZERO) += dm-zero.o obj-$(CONFIG_DM_ZERO) += dm-zero.o
obj-$(CONFIG_DM_RAID) += dm-raid.o obj-$(CONFIG_DM_RAID) += dm-raid.o
obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o
ifeq ($(CONFIG_DM_UEVENT),y) ifeq ($(CONFIG_DM_UEVENT),y)
dm-mod-objs += dm-uevent.o dm-mod-objs += dm-uevent.o
......
This diff is collapsed.
/*
* Copyright (C) 2009-2011 Red Hat, Inc.
*
* Author: Mikulas Patocka <mpatocka@redhat.com>
*
* This file is released under the GPL.
*/
#ifndef DM_BUFIO_H
#define DM_BUFIO_H
#include <linux/blkdev.h>
#include <linux/types.h>
/*----------------------------------------------------------------*/
struct dm_bufio_client;
struct dm_buffer;
/*
* Create a buffered IO cache on a given device
*/
struct dm_bufio_client *
dm_bufio_client_create(struct block_device *bdev, unsigned block_size,
unsigned reserved_buffers, unsigned aux_size,
void (*alloc_callback)(struct dm_buffer *),
void (*write_callback)(struct dm_buffer *));
/*
* Release a buffered IO cache.
*/
void dm_bufio_client_destroy(struct dm_bufio_client *c);
/*
* WARNING: to avoid deadlocks, these conditions are observed:
*
* - At most one thread can hold at most "reserved_buffers" simultaneously.
* - Each other threads can hold at most one buffer.
* - Threads which call only dm_bufio_get can hold unlimited number of
* buffers.
*/
/*
* Read a given block from disk. Returns pointer to data. Returns a
* pointer to dm_buffer that can be used to release the buffer or to make
* it dirty.
*/
void *dm_bufio_read(struct dm_bufio_client *c, sector_t block,
struct dm_buffer **bp);
/*
* Like dm_bufio_read, but return buffer from cache, don't read
* it. If the buffer is not in the cache, return NULL.
*/
void *dm_bufio_get(struct dm_bufio_client *c, sector_t block,
struct dm_buffer **bp);
/*
* Like dm_bufio_read, but don't read anything from the disk. It is
* expected that the caller initializes the buffer and marks it dirty.
*/
void *dm_bufio_new(struct dm_bufio_client *c, sector_t block,
struct dm_buffer **bp);
/*
* Release a reference obtained with dm_bufio_{read,get,new}. The data
* pointer and dm_buffer pointer is no longer valid after this call.
*/
void dm_bufio_release(struct dm_buffer *b);
/*
* Mark a buffer dirty. It should be called after the buffer is modified.
*
* In case of memory pressure, the buffer may be written after
* dm_bufio_mark_buffer_dirty, but before dm_bufio_write_dirty_buffers. So
* dm_bufio_write_dirty_buffers guarantees that the buffer is on-disk but
* the actual writing may occur earlier.
*/
void dm_bufio_mark_buffer_dirty(struct dm_buffer *b);
/*
* Initiate writing of dirty buffers, without waiting for completion.
*/
void dm_bufio_write_dirty_buffers_async(struct dm_bufio_client *c);
/*
* Write all dirty buffers. Guarantees that all dirty buffers created prior
* to this call are on disk when this call exits.
*/
int dm_bufio_write_dirty_buffers(struct dm_bufio_client *c);
/*
* Send an empty write barrier to the device to flush hardware disk cache.
*/
int dm_bufio_issue_flush(struct dm_bufio_client *c);
/*
* Like dm_bufio_release but also move the buffer to the new
* block. dm_bufio_write_dirty_buffers is needed to commit the new block.
*/
void dm_bufio_release_move(struct dm_buffer *b, sector_t new_block);
unsigned dm_bufio_get_block_size(struct dm_bufio_client *c);
sector_t dm_bufio_get_device_size(struct dm_bufio_client *c);
sector_t dm_bufio_get_block_number(struct dm_buffer *b);
void *dm_bufio_get_block_data(struct dm_buffer *b);
void *dm_bufio_get_aux_data(struct dm_buffer *b);
struct dm_bufio_client *dm_bufio_get_client(struct dm_buffer *b);
/*----------------------------------------------------------------*/
#endif
...@@ -1215,6 +1215,7 @@ static int table_load(struct dm_ioctl *param, size_t param_size) ...@@ -1215,6 +1215,7 @@ static int table_load(struct dm_ioctl *param, size_t param_size)
struct hash_cell *hc; struct hash_cell *hc;
struct dm_table *t; struct dm_table *t;
struct mapped_device *md; struct mapped_device *md;
struct target_type *immutable_target_type;
md = find_device(param); md = find_device(param);
if (!md) if (!md)
...@@ -1230,6 +1231,16 @@ static int table_load(struct dm_ioctl *param, size_t param_size) ...@@ -1230,6 +1231,16 @@ static int table_load(struct dm_ioctl *param, size_t param_size)
goto out; goto out;
} }
immutable_target_type = dm_get_immutable_target_type(md);
if (immutable_target_type &&
(immutable_target_type != dm_table_get_immutable_target_type(t))) {
DMWARN("can't replace immutable target type %s",
immutable_target_type->name);
dm_table_destroy(t);
r = -EINVAL;
goto out;
}
/* Protect md->type and md->queue against concurrent table loads. */ /* Protect md->type and md->queue against concurrent table loads. */
dm_lock_md_type(md); dm_lock_md_type(md);
if (dm_get_md_type(md) == DM_TYPE_NONE) if (dm_get_md_type(md) == DM_TYPE_NONE)
......
...@@ -66,6 +66,8 @@ struct dm_kcopyd_client { ...@@ -66,6 +66,8 @@ struct dm_kcopyd_client {
struct list_head pages_jobs; struct list_head pages_jobs;
}; };
static struct page_list zero_page_list;
static void wake(struct dm_kcopyd_client *kc) static void wake(struct dm_kcopyd_client *kc)
{ {
queue_work(kc->kcopyd_wq, &kc->kcopyd_work); queue_work(kc->kcopyd_wq, &kc->kcopyd_work);
...@@ -254,6 +256,9 @@ int __init dm_kcopyd_init(void) ...@@ -254,6 +256,9 @@ int __init dm_kcopyd_init(void)
if (!_job_cache) if (!_job_cache)
return -ENOMEM; return -ENOMEM;
zero_page_list.next = &zero_page_list;
zero_page_list.page = ZERO_PAGE(0);
return 0; return 0;
} }
...@@ -322,7 +327,7 @@ static int run_complete_job(struct kcopyd_job *job) ...@@ -322,7 +327,7 @@ static int run_complete_job(struct kcopyd_job *job)
dm_kcopyd_notify_fn fn = job->fn; dm_kcopyd_notify_fn fn = job->fn;
struct dm_kcopyd_client *kc = job->kc; struct dm_kcopyd_client *kc = job->kc;
if (job->pages) if (job->pages && job->pages != &zero_page_list)
kcopyd_put_pages(kc, job->pages); kcopyd_put_pages(kc, job->pages);
/* /*
* If this is the master job, the sub jobs have already * If this is the master job, the sub jobs have already
...@@ -484,6 +489,8 @@ static void dispatch_job(struct kcopyd_job *job) ...@@ -484,6 +489,8 @@ static void dispatch_job(struct kcopyd_job *job)
atomic_inc(&kc->nr_jobs); atomic_inc(&kc->nr_jobs);
if (unlikely(!job->source.count)) if (unlikely(!job->source.count))
push(&kc->complete_jobs, job); push(&kc->complete_jobs, job);
else if (job->pages == &zero_page_list)
push(&kc->io_jobs, job);
else else
push(&kc->pages_jobs, job); push(&kc->pages_jobs, job);
wake(kc); wake(kc);
...@@ -592,14 +599,20 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, ...@@ -592,14 +599,20 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
job->flags = flags; job->flags = flags;
job->read_err = 0; job->read_err = 0;
job->write_err = 0; job->write_err = 0;
job->rw = READ;
job->source = *from;
job->num_dests = num_dests; job->num_dests = num_dests;
memcpy(&job->dests, dests, sizeof(*dests) * num_dests); memcpy(&job->dests, dests, sizeof(*dests) * num_dests);
job->pages = NULL; if (from) {
job->source = *from;
job->pages = NULL;
job->rw = READ;
} else {
memset(&job->source, 0, sizeof job->source);
job->source.count = job->dests[0].count;
job->pages = &zero_page_list;
job->rw = WRITE;
}
job->fn = fn; job->fn = fn;
job->context = context; job->context = context;
...@@ -617,6 +630,14 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from, ...@@ -617,6 +630,14 @@ int dm_kcopyd_copy(struct dm_kcopyd_client *kc, struct dm_io_region *from,
} }
EXPORT_SYMBOL(dm_kcopyd_copy); EXPORT_SYMBOL(dm_kcopyd_copy);
int dm_kcopyd_zero(struct dm_kcopyd_client *kc,
unsigned num_dests, struct dm_io_region *dests,
unsigned flags, dm_kcopyd_notify_fn fn, void *context)
{
return dm_kcopyd_copy(kc, NULL, num_dests, dests, flags, fn, context);
}
EXPORT_SYMBOL(dm_kcopyd_zero);
void *dm_kcopyd_prepare_callback(struct dm_kcopyd_client *kc, void *dm_kcopyd_prepare_callback(struct dm_kcopyd_client *kc,
dm_kcopyd_notify_fn fn, void *context) dm_kcopyd_notify_fn fn, void *context)
{ {
......
...@@ -30,6 +30,7 @@ struct flush_entry { ...@@ -30,6 +30,7 @@ struct flush_entry {
struct log_c { struct log_c {
struct dm_target *ti; struct dm_target *ti;
struct dm_dev *log_dev;
uint32_t region_size; uint32_t region_size;
region_t region_count; region_t region_count;
uint64_t luid; uint64_t luid;
...@@ -146,7 +147,7 @@ static int build_constructor_string(struct dm_target *ti, ...@@ -146,7 +147,7 @@ static int build_constructor_string(struct dm_target *ti,
* <UUID> <other args> * <UUID> <other args>
* Where 'other args' is the userspace implementation specific log * Where 'other args' is the userspace implementation specific log
* arguments. An example might be: * arguments. An example might be:
* <UUID> clustered_disk <arg count> <log dev> <region_size> [[no]sync] * <UUID> clustered-disk <arg count> <log dev> <region_size> [[no]sync]
* *
* So, this module will strip off the <UUID> for identification purposes * So, this module will strip off the <UUID> for identification purposes
* when communicating with userspace about a log; but will pass on everything * when communicating with userspace about a log; but will pass on everything
...@@ -161,13 +162,15 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti, ...@@ -161,13 +162,15 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti,
struct log_c *lc = NULL; struct log_c *lc = NULL;
uint64_t rdata; uint64_t rdata;
size_t rdata_size = sizeof(rdata); size_t rdata_size = sizeof(rdata);
char *devices_rdata = NULL;
size_t devices_rdata_size = DM_NAME_LEN;
if (argc < 3) { if (argc < 3) {
DMWARN("Too few arguments to userspace dirty log"); DMWARN("Too few arguments to userspace dirty log");
return -EINVAL; return -EINVAL;
} }
lc = kmalloc(sizeof(*lc), GFP_KERNEL); lc = kzalloc(sizeof(*lc), GFP_KERNEL);
if (!lc) { if (!lc) {
DMWARN("Unable to allocate userspace log context."); DMWARN("Unable to allocate userspace log context.");
return -ENOMEM; return -ENOMEM;
...@@ -195,9 +198,19 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti, ...@@ -195,9 +198,19 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti,
return str_size; return str_size;
} }
/* Send table string */ devices_rdata = kzalloc(devices_rdata_size, GFP_KERNEL);
if (!devices_rdata) {
DMERR("Failed to allocate memory for device information");
r = -ENOMEM;
goto out;
}
/*
* Send table string and get back any opened device.
*/
r = dm_consult_userspace(lc->uuid, lc->luid, DM_ULOG_CTR, r = dm_consult_userspace(lc->uuid, lc->luid, DM_ULOG_CTR,
ctr_str, str_size, NULL, NULL); ctr_str, str_size,
devices_rdata, &devices_rdata_size);
if (r < 0) { if (r < 0) {
if (r == -ESRCH) if (r == -ESRCH)
...@@ -220,7 +233,20 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti, ...@@ -220,7 +233,20 @@ static int userspace_ctr(struct dm_dirty_log *log, struct dm_target *ti,
lc->region_size = (uint32_t)rdata; lc->region_size = (uint32_t)rdata;
lc->region_count = dm_sector_div_up(ti->len, lc->region_size); lc->region_count = dm_sector_div_up(ti->len, lc->region_size);
if (devices_rdata_size) {
if (devices_rdata[devices_rdata_size - 1] != '\0') {
DMERR("DM_ULOG_CTR device return string not properly terminated");
r = -EINVAL;
goto out;
}
r = dm_get_device(ti, devices_rdata,
dm_table_get_mode(ti->table), &lc->log_dev);
if (r)
DMERR("Failed to register %s with device-mapper",
devices_rdata);
}
out: out:
kfree(devices_rdata);
if (r) { if (r) {
kfree(lc); kfree(lc);
kfree(ctr_str); kfree(ctr_str);
...@@ -241,6 +267,9 @@ static void userspace_dtr(struct dm_dirty_log *log) ...@@ -241,6 +267,9 @@ static void userspace_dtr(struct dm_dirty_log *log)
NULL, 0, NULL, 0,
NULL, NULL); NULL, NULL);
if (lc->log_dev)
dm_put_device(lc->ti, lc->log_dev);
kfree(lc->usr_argv_str); kfree(lc->usr_argv_str);
kfree(lc); kfree(lc);
......
...@@ -1017,30 +1017,56 @@ static int raid_status(struct dm_target *ti, status_type_t type, ...@@ -1017,30 +1017,56 @@ static int raid_status(struct dm_target *ti, status_type_t type,
struct raid_set *rs = ti->private; struct raid_set *rs = ti->private;
unsigned raid_param_cnt = 1; /* at least 1 for chunksize */ unsigned raid_param_cnt = 1; /* at least 1 for chunksize */
unsigned sz = 0; unsigned sz = 0;
int i; int i, array_in_sync = 0;
sector_t sync; sector_t sync;
switch (type) { switch (type) {
case STATUSTYPE_INFO: case STATUSTYPE_INFO:
DMEMIT("%s %d ", rs->raid_type->name, rs->md.raid_disks); DMEMIT("%s %d ", rs->raid_type->name, rs->md.raid_disks);
for (i = 0; i < rs->md.raid_disks; i++) {
if (test_bit(Faulty, &rs->dev[i].rdev.flags))
DMEMIT("D");
else if (test_bit(In_sync, &rs->dev[i].rdev.flags))
DMEMIT("A");
else
DMEMIT("a");
}
if (test_bit(MD_RECOVERY_RUNNING, &rs->md.recovery)) if (test_bit(MD_RECOVERY_RUNNING, &rs->md.recovery))
sync = rs->md.curr_resync_completed; sync = rs->md.curr_resync_completed;
else else
sync = rs->md.recovery_cp; sync = rs->md.recovery_cp;
if (sync > rs->md.resync_max_sectors) if (sync >= rs->md.resync_max_sectors) {
array_in_sync = 1;
sync = rs->md.resync_max_sectors; sync = rs->md.resync_max_sectors;
} else {
/*
* The array may be doing an initial sync, or it may
* be rebuilding individual components. If all the
* devices are In_sync, then it is the array that is
* being initialized.
*/
for (i = 0; i < rs->md.raid_disks; i++)
if (!test_bit(In_sync, &rs->dev[i].rdev.flags))
array_in_sync = 1;
}
/*
* Status characters:
* 'D' = Dead/Failed device
* 'a' = Alive but not in-sync
* 'A' = Alive and in-sync
*/
for (i = 0; i < rs->md.raid_disks; i++) {
if (test_bit(Faulty, &rs->dev[i].rdev.flags))
DMEMIT("D");
else if (!array_in_sync ||
!test_bit(In_sync, &rs->dev[i].rdev.flags))
DMEMIT("a");
else
DMEMIT("A");
}
/*
* In-sync ratio:
* The in-sync ratio shows the progress of:
* - Initializing the array
* - Rebuilding a subset of devices of the array
* The user can distinguish between the two by referring
* to the status characters.
*/
DMEMIT(" %llu/%llu", DMEMIT(" %llu/%llu",
(unsigned long long) sync, (unsigned long long) sync,
(unsigned long long) rs->md.resync_max_sectors); (unsigned long long) rs->md.resync_max_sectors);
......
...@@ -54,7 +54,9 @@ struct dm_table { ...@@ -54,7 +54,9 @@ struct dm_table {
sector_t *highs; sector_t *highs;
struct dm_target *targets; struct dm_target *targets;
struct target_type *immutable_target_type;
unsigned integrity_supported:1; unsigned integrity_supported:1;
unsigned singleton:1;
/* /*
* Indicates the rw permissions for the new logical * Indicates the rw permissions for the new logical
...@@ -740,6 +742,12 @@ int dm_table_add_target(struct dm_table *t, const char *type, ...@@ -740,6 +742,12 @@ int dm_table_add_target(struct dm_table *t, const char *type,
char **argv; char **argv;
struct dm_target *tgt; struct dm_target *tgt;
if (t->singleton) {
DMERR("%s: target type %s must appear alone in table",
dm_device_name(t->md), t->targets->type->name);
return -EINVAL;
}
if ((r = check_space(t))) if ((r = check_space(t)))
return r; return r;
...@@ -758,6 +766,36 @@ int dm_table_add_target(struct dm_table *t, const char *type, ...@@ -758,6 +766,36 @@ int dm_table_add_target(struct dm_table *t, const char *type,
return -EINVAL; return -EINVAL;
} }
if (dm_target_needs_singleton(tgt->type)) {
if (t->num_targets) {
DMERR("%s: target type %s must appear alone in table",
dm_device_name(t->md), type);
return -EINVAL;
}
t->singleton = 1;
}
if (dm_target_always_writeable(tgt->type) && !(t->mode & FMODE_WRITE)) {
DMERR("%s: target type %s may not be included in read-only tables",
dm_device_name(t->md), type);
return -EINVAL;
}
if (t->immutable_target_type) {
if (t->immutable_target_type != tgt->type) {
DMERR("%s: immutable target type %s cannot be mixed with other target types",
dm_device_name(t->md), t->immutable_target_type->name);
return -EINVAL;
}
} else if (dm_target_is_immutable(tgt->type)) {
if (t->num_targets) {
DMERR("%s: immutable target type %s cannot be mixed with other target types",
dm_device_name(t->md), tgt->type->name);
return -EINVAL;
}
t->immutable_target_type = tgt->type;
}
tgt->table = t; tgt->table = t;
tgt->begin = start; tgt->begin = start;
tgt->len = len; tgt->len = len;
...@@ -915,6 +953,11 @@ unsigned dm_table_get_type(struct dm_table *t) ...@@ -915,6 +953,11 @@ unsigned dm_table_get_type(struct dm_table *t)
return t->type; return t->type;
} }
struct target_type *dm_table_get_immutable_target_type(struct dm_table *t)
{
return t->immutable_target_type;
}
bool dm_table_request_based(struct dm_table *t) bool dm_table_request_based(struct dm_table *t)
{ {
return dm_table_get_type(t) == DM_TYPE_REQUEST_BASED; return dm_table_get_type(t) == DM_TYPE_REQUEST_BASED;
...@@ -1299,6 +1342,31 @@ static bool dm_table_discard_zeroes_data(struct dm_table *t) ...@@ -1299,6 +1342,31 @@ static bool dm_table_discard_zeroes_data(struct dm_table *t)
return 1; return 1;
} }
static int device_is_nonrot(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
struct request_queue *q = bdev_get_queue(dev->bdev);
return q && blk_queue_nonrot(q);
}
static bool dm_table_is_nonrot(struct dm_table *t)
{
struct dm_target *ti;
unsigned i = 0;
/* Ensure that all underlying device are non-rotational. */
while (i < dm_table_get_num_targets(t)) {
ti = dm_table_get_target(t, i++);
if (!ti->type->iterate_devices ||
!ti->type->iterate_devices(ti, device_is_nonrot, NULL))
return 0;
}
return 1;
}
void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
struct queue_limits *limits) struct queue_limits *limits)
{ {
...@@ -1324,6 +1392,11 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q, ...@@ -1324,6 +1392,11 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
if (!dm_table_discard_zeroes_data(t)) if (!dm_table_discard_zeroes_data(t))
q->limits.discard_zeroes_data = 0; q->limits.discard_zeroes_data = 0;
if (dm_table_is_nonrot(t))
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, q);
else
queue_flag_clear_unlocked(QUEUE_FLAG_NONROT, q);
dm_table_set_integrity(t); dm_table_set_integrity(t);
/* /*
......
This diff is collapsed.
/*
* Copyright (C) 2010-2011 Red Hat, Inc.
*
* This file is released under the GPL.
*/
#ifndef DM_THIN_METADATA_H
#define DM_THIN_METADATA_H
#include "persistent-data/dm-block-manager.h"
#define THIN_METADATA_BLOCK_SIZE 4096
/*----------------------------------------------------------------*/
struct dm_pool_metadata;
struct dm_thin_device;
/*
* Device identifier
*/
typedef uint64_t dm_thin_id;
/*
* Reopens or creates a new, empty metadata volume.
*/
struct dm_pool_metadata *dm_pool_metadata_open(struct block_device *bdev,
sector_t data_block_size);
int dm_pool_metadata_close(struct dm_pool_metadata *pmd);
/*
* Compat feature flags. Any incompat flags beyond the ones
* specified below will prevent use of the thin metadata.
*/
#define THIN_FEATURE_COMPAT_SUPP 0UL
#define THIN_FEATURE_COMPAT_RO_SUPP 0UL
#define THIN_FEATURE_INCOMPAT_SUPP 0UL
/*
* Device creation/deletion.
*/
int dm_pool_create_thin(struct dm_pool_metadata *pmd, dm_thin_id dev);
/*
* An internal snapshot.
*
* You can only snapshot a quiesced origin i.e. one that is either
* suspended or not instanced at all.
*/
int dm_pool_create_snap(struct dm_pool_metadata *pmd, dm_thin_id dev,
dm_thin_id origin);
/*
* Deletes a virtual device from the metadata. It _is_ safe to call this
* when that device is open. Operations on that device will just start
* failing. You still need to call close() on the device.
*/
int dm_pool_delete_thin_device(struct dm_pool_metadata *pmd,
dm_thin_id dev);
/*
* Commits _all_ metadata changes: device creation, deletion, mapping
* updates.
*/
int dm_pool_commit_metadata(struct dm_pool_metadata *pmd);
/*
* Set/get userspace transaction id.
*/
int dm_pool_set_metadata_transaction_id(struct dm_pool_metadata *pmd,
uint64_t current_id,
uint64_t new_id);
int dm_pool_get_metadata_transaction_id(struct dm_pool_metadata *pmd,
uint64_t *result);
/*
* Hold/get root for userspace transaction.
*/
int dm_pool_hold_metadata_root(struct dm_pool_metadata *pmd);
int dm_pool_get_held_metadata_root(struct dm_pool_metadata *pmd,
dm_block_t *result);
/*
* Actions on a single virtual device.
*/
/*
* Opening the same device more than once will fail with -EBUSY.
*/
int dm_pool_open_thin_device(struct dm_pool_metadata *pmd, dm_thin_id dev,
struct dm_thin_device **td);
int dm_pool_close_thin_device(struct dm_thin_device *td);
dm_thin_id dm_thin_dev_id(struct dm_thin_device *td);
struct dm_thin_lookup_result {
dm_block_t block;
int shared;
};
/*
* Returns:
* -EWOULDBLOCK iff @can_block is set and would block.
* -ENODATA iff that mapping is not present.
* 0 success
*/
int dm_thin_find_block(struct dm_thin_device *td, dm_block_t block,
int can_block, struct dm_thin_lookup_result *result);
/*
* Obtain an unused block.
*/
int dm_pool_alloc_data_block(struct dm_pool_metadata *pmd, dm_block_t *result);
/*
* Insert or remove block.
*/
int dm_thin_insert_block(struct dm_thin_device *td, dm_block_t block,
dm_block_t data_block);
int dm_thin_remove_block(struct dm_thin_device *td, dm_block_t block);
/*
* Queries.
*/
int dm_thin_get_highest_mapped_block(struct dm_thin_device *td,
dm_block_t *highest_mapped);
int dm_thin_get_mapped_count(struct dm_thin_device *td, dm_block_t *result);
int dm_pool_get_free_block_count(struct dm_pool_metadata *pmd,
dm_block_t *result);
int dm_pool_get_free_metadata_block_count(struct dm_pool_metadata *pmd,
dm_block_t *result);
int dm_pool_get_metadata_dev_size(struct dm_pool_metadata *pmd,
dm_block_t *result);
int dm_pool_get_data_block_size(struct dm_pool_metadata *pmd, sector_t *result);
int dm_pool_get_data_dev_size(struct dm_pool_metadata *pmd, dm_block_t *result);
/*
* Returns -ENOSPC if the new size is too small and already allocated
* blocks would be lost.
*/
int dm_pool_resize_data_dev(struct dm_pool_metadata *pmd, dm_block_t new_size);
/*----------------------------------------------------------------*/
#endif
This diff is collapsed.
...@@ -25,6 +25,16 @@ ...@@ -25,6 +25,16 @@
#define DM_MSG_PREFIX "core" #define DM_MSG_PREFIX "core"
#ifdef CONFIG_PRINTK
/*
* ratelimit state to be used in DMXXX_LIMIT().
*/
DEFINE_RATELIMIT_STATE(dm_ratelimit_state,
DEFAULT_RATELIMIT_INTERVAL,
DEFAULT_RATELIMIT_BURST);
EXPORT_SYMBOL(dm_ratelimit_state);
#endif
/* /*
* Cookies are numeric values sent with CHANGE and REMOVE * Cookies are numeric values sent with CHANGE and REMOVE
* uevents while resuming, removing or renaming the device. * uevents while resuming, removing or renaming the device.
...@@ -130,6 +140,8 @@ struct mapped_device { ...@@ -130,6 +140,8 @@ struct mapped_device {
/* Protect queue and type against concurrent access. */ /* Protect queue and type against concurrent access. */
struct mutex type_lock; struct mutex type_lock;
struct target_type *immutable_target_type;
struct gendisk *disk; struct gendisk *disk;
char name[16]; char name[16];
...@@ -2086,6 +2098,8 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t, ...@@ -2086,6 +2098,8 @@ static struct dm_table *__bind(struct mapped_device *md, struct dm_table *t,
write_lock_irqsave(&md->map_lock, flags); write_lock_irqsave(&md->map_lock, flags);
old_map = md->map; old_map = md->map;
md->map = t; md->map = t;
md->immutable_target_type = dm_table_get_immutable_target_type(t);
dm_table_set_restrictions(t, q, limits); dm_table_set_restrictions(t, q, limits);
if (merge_is_optional) if (merge_is_optional)
set_bit(DMF_MERGE_IS_OPTIONAL, &md->flags); set_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
...@@ -2156,6 +2170,11 @@ unsigned dm_get_md_type(struct mapped_device *md) ...@@ -2156,6 +2170,11 @@ unsigned dm_get_md_type(struct mapped_device *md)
return md->type; return md->type;
} }
struct target_type *dm_get_immutable_target_type(struct mapped_device *md)
{
return md->immutable_target_type;
}
/* /*
* Fully initialize a request-based queue (->elevator, ->request_fn, etc). * Fully initialize a request-based queue (->elevator, ->request_fn, etc).
*/ */
...@@ -2231,6 +2250,7 @@ struct mapped_device *dm_get_md(dev_t dev) ...@@ -2231,6 +2250,7 @@ struct mapped_device *dm_get_md(dev_t dev)
return md; return md;
} }
EXPORT_SYMBOL_GPL(dm_get_md);
void *dm_get_mdptr(struct mapped_device *md) void *dm_get_mdptr(struct mapped_device *md)
{ {
...@@ -2316,7 +2336,6 @@ static int dm_wait_for_completion(struct mapped_device *md, int interruptible) ...@@ -2316,7 +2336,6 @@ static int dm_wait_for_completion(struct mapped_device *md, int interruptible)
while (1) { while (1) {
set_current_state(interruptible); set_current_state(interruptible);
smp_mb();
if (!md_in_flight(md)) if (!md_in_flight(md))
break; break;
......
...@@ -60,6 +60,7 @@ int dm_table_resume_targets(struct dm_table *t); ...@@ -60,6 +60,7 @@ int dm_table_resume_targets(struct dm_table *t);
int dm_table_any_congested(struct dm_table *t, int bdi_bits); int dm_table_any_congested(struct dm_table *t, int bdi_bits);
int dm_table_any_busy_target(struct dm_table *t); int dm_table_any_busy_target(struct dm_table *t);
unsigned dm_table_get_type(struct dm_table *t); unsigned dm_table_get_type(struct dm_table *t);
struct target_type *dm_table_get_immutable_target_type(struct dm_table *t);
bool dm_table_request_based(struct dm_table *t); bool dm_table_request_based(struct dm_table *t);
bool dm_table_supports_discards(struct dm_table *t); bool dm_table_supports_discards(struct dm_table *t);
int dm_table_alloc_md_mempools(struct dm_table *t); int dm_table_alloc_md_mempools(struct dm_table *t);
...@@ -72,6 +73,7 @@ void dm_lock_md_type(struct mapped_device *md); ...@@ -72,6 +73,7 @@ void dm_lock_md_type(struct mapped_device *md);
void dm_unlock_md_type(struct mapped_device *md); void dm_unlock_md_type(struct mapped_device *md);
void dm_set_md_type(struct mapped_device *md, unsigned type); void dm_set_md_type(struct mapped_device *md, unsigned type);
unsigned dm_get_md_type(struct mapped_device *md); unsigned dm_get_md_type(struct mapped_device *md);
struct target_type *dm_get_immutable_target_type(struct mapped_device *md);
int dm_setup_md_queue(struct mapped_device *md); int dm_setup_md_queue(struct mapped_device *md);
......
config DM_PERSISTENT_DATA
tristate
depends on BLK_DEV_DM && EXPERIMENTAL
select LIBCRC32C
select DM_BUFIO
---help---
Library providing immutable on-disk data structure support for
device-mapper targets such as the thin provisioning target.
obj-$(CONFIG_DM_PERSISTENT_DATA) += dm-persistent-data.o
dm-persistent-data-objs := \
dm-block-manager.o \
dm-space-map-checker.o \
dm-space-map-common.o \
dm-space-map-disk.o \
dm-space-map-metadata.o \
dm-transaction-manager.o \
dm-btree.o \
dm-btree-remove.o \
dm-btree-spine.o
This diff is collapsed.
/*
* Copyright (C) 2011 Red Hat, Inc.
*
* This file is released under the GPL.
*/
#ifndef _LINUX_DM_BLOCK_MANAGER_H
#define _LINUX_DM_BLOCK_MANAGER_H
#include <linux/types.h>
#include <linux/blkdev.h>
/*----------------------------------------------------------------*/
/*
* Block number.
*/
typedef uint64_t dm_block_t;
struct dm_block;
dm_block_t dm_block_location(struct dm_block *b);
void *dm_block_data(struct dm_block *b);
/*----------------------------------------------------------------*/
/*
* @name should be a unique identifier for the block manager, no longer
* than 32 chars.
*
* @max_held_per_thread should be the maximum number of locks, read or
* write, that an individual thread holds at any one time.
*/
struct dm_block_manager;
struct dm_block_manager *dm_block_manager_create(
struct block_device *bdev, unsigned block_size,
unsigned cache_size, unsigned max_held_per_thread);
void dm_block_manager_destroy(struct dm_block_manager *bm);
unsigned dm_bm_block_size(struct dm_block_manager *bm);
dm_block_t dm_bm_nr_blocks(struct dm_block_manager *bm);
/*----------------------------------------------------------------*/
/*
* The validator allows the caller to verify newly-read data and modify
* the data just before writing, e.g. to calculate checksums. It's
* important to be consistent with your use of validators. The only time
* you can change validators is if you call dm_bm_write_lock_zero.
*/
struct dm_block_validator {
const char *name;
void (*prepare_for_write)(struct dm_block_validator *v, struct dm_block *b, size_t block_size);
/*
* Return 0 if the checksum is valid or < 0 on error.
*/
int (*check)(struct dm_block_validator *v, struct dm_block *b, size_t block_size);
};
/*----------------------------------------------------------------*/
/*
* You can have multiple concurrent readers or a single writer holding a
* block lock.
*/
/*
* dm_bm_lock() locks a block and returns through @result a pointer to
* memory that holds a copy of that block. If you have write-locked the
* block then any changes you make to memory pointed to by @result will be
* written back to the disk sometime after dm_bm_unlock is called.
*/
int dm_bm_read_lock(struct dm_block_manager *bm, dm_block_t b,
struct dm_block_validator *v,
struct dm_block **result);
int dm_bm_write_lock(struct dm_block_manager *bm, dm_block_t b,
struct dm_block_validator *v,
struct dm_block **result);
/*
* The *_try_lock variants return -EWOULDBLOCK if the block isn't
* available immediately.
*/
int dm_bm_read_try_lock(struct dm_block_manager *bm, dm_block_t b,
struct dm_block_validator *v,
struct dm_block **result);
/*
* Use dm_bm_write_lock_zero() when you know you're going to
* overwrite the block completely. It saves a disk read.
*/
int dm_bm_write_lock_zero(struct dm_block_manager *bm, dm_block_t b,
struct dm_block_validator *v,
struct dm_block **result);
int dm_bm_unlock(struct dm_block *b);
/*
* An optimisation; we often want to copy a block's contents to a new
* block. eg, as part of the shadowing operation. It's far better for
* bufio to do this move behind the scenes than hold 2 locks and memcpy the
* data.
*/
int dm_bm_unlock_move(struct dm_block *b, dm_block_t n);
/*
* It's a common idiom to have a superblock that should be committed last.
*
* @superblock should be write-locked on entry. It will be unlocked during
* this function. All dirty blocks are guaranteed to be written and flushed
* before the superblock.
*
* This method always blocks.
*/
int dm_bm_flush_and_unlock(struct dm_block_manager *bm,
struct dm_block *superblock);
u32 dm_bm_checksum(const void *data, size_t len, u32 init_xor);
/*----------------------------------------------------------------*/
#endif /* _LINUX_DM_BLOCK_MANAGER_H */
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment