Commit 63c32ed4 authored by Heinz Mauelshagen's avatar Heinz Mauelshagen Committed by Mike Snitzer

dm raid: add raid4/5/6 journaling support

Add md raid4/5/6 journaling support (upstream commit bac624f3 started
the implementation) which closes the write hole (i.e. non-atomic updates
to stripes) using a dedicated journal device.

Background:
raid4/5/6 stripes hold N data payloads per stripe plus one parity raid4/5
or two raid6 P/Q syndrome payloads in an in-memory stripe cache.
Parity or P/Q syndromes used to recover any data payloads in case of a disk
failure are calculated from the N data payloads and need to be updated on the
different component devices of the raid device.  Those are non-atomic,
persistent updates.  Hence a crash can cause failure to update all stripe
payloads persistently and thus cause data loss during stripe recovery.
This problem gets addressed by writing whole stripe cache entries (together with
journal metadata) to a persistent journal entry on a dedicated journal device.
Only if that journal entry is written successfully, the stripe cache entry is
updated on the component devices of the raid device (i.e. writethrough type).
In case of a crash, the entry can be recovered from the journal and be written
again thus ensuring consistent stripe payload suitable to data recovery.

Future dependencies:
once writeback caching being worked on to compensate for the throughput
implictions involved with writethrough overhead is supported with journaling
in upstream, an additional patch based on this one will support it in dm-raid.

Journal resilience related remarks:
because stripes are recovered from the journal in case of a crash, the
journal device better be resilient.  Resilience becomes mandatory with
future writeback support, because loosing the working set in the log
means data loss as oposed to writethrough, were the loss of the
journal device 'only' reintroduces the write hole.

Fix comment on data offsets in parse_dev_params() and initialize
new_data_offset as well.
Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
parent 50c4feb9
...@@ -161,6 +161,15 @@ The target is named "raid" and it accepts the following parameters: ...@@ -161,6 +161,15 @@ The target is named "raid" and it accepts the following parameters:
the RAID type (i.e. the allocation algorithm) as well, e.g. the RAID type (i.e. the allocation algorithm) as well, e.g.
changing from raid5_ls to raid5_n. changing from raid5_ls to raid5_n.
[journal_dev <dev>]
This option adds a journal device to raid4/5/6 raid sets and
uses it to close the 'write hole' caused by the non-atomic updates
to the component devices which can cause data loss during recovery.
The journal device is used as writethrough thus causing writes to
be throttled versus non-journaled raid4/5/6 sets.
Takeover/reshape is not possible with a raid4/5/6 journal device;
it has to be deconfigured before requesting these.
<#raid_devs>: The number of devices composing the array. <#raid_devs>: The number of devices composing the array.
Each device consists of two entries. The first is the device Each device consists of two entries. The first is the device
containing the metadata (if any); the second is the one containing the containing the metadata (if any); the second is the one containing the
...@@ -245,6 +254,9 @@ recovery. Here is a fuller description of the individual fields: ...@@ -245,6 +254,9 @@ recovery. Here is a fuller description of the individual fields:
<data_offset> The current data offset to the start of the user data on <data_offset> The current data offset to the start of the user data on
each component device of a raid set (see the respective each component device of a raid set (see the respective
raid parameter to support out-of-place reshaping). raid parameter to support out-of-place reshaping).
<journal_char> 'A' - active raid4/5/6 journal device.
'D' - dead journal device.
'-' - no journal device.
Message Interface Message Interface
...@@ -318,3 +330,4 @@ Version History ...@@ -318,3 +330,4 @@ Version History
fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
'D' on the status line. If '- -' is passed into the constructor, emit 'D' on the status line. If '- -' is passed into the constructor, emit
'- -' on the table line and '-' as the status line health character. '- -' on the table line and '-' as the status line health character.
1.10.0 Add support for raid4/5/6 journal device
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment