md/r5cache: after recovery, increase journal seq by 10000

Currently, we increase journal entry seq by 10 after recovery. However, this is not sufficient in the following case. After crash the journal looks like | seq+0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 | If +1 is not valid, we dropped all entries from +1 to +12; and write seq+10: | seq+0 | +10 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 | However, if we write a big journal entry with seq+11, it will connect with some stale journal entry: | seq+0 | +10 | +11 | +12 | To reduce the risk of this issue, we increase seq by 10000 instead. Shaohua: use 10000 instead of 1000. The risk should be very unlikely. The total stripe cache size is less than 2k typically, and several stripes can fit into one meta data block. So the total inflight meta data blocks would be quite small, which means the the total sequence number used should be quite small. The 10000 sequence number increase should be far more than safe. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>

md/r5cache: after recovery, increase journal seq by 10000
Currently, we increase journal entry seq by 10 after recovery. However, this is not sufficient in the following case. After crash the journal looks like | seq+0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 | If +1 is not valid, we dropped all entries from +1 to +12; and write seq+10: | seq+0 | +10 | +2 | +3 | +4 | +5 | +6 | +7 | ... | +11 | +12 | However, if we write a big journal entry with seq+11, it will connect with some stale journal entry: | seq+0 | +10 | +11 | +12 | To reduce the risk of this issue, we increase seq by 10000 instead. Shaohua: use 10000 instead of 1000. The risk should be very unlikely. The total stripe cache size is less than 2k typically, and several stripes can fit into one meta data block. So the total inflight meta data blocks would be quite small, which means the the total sequence number used should be quite small. The 10000 sequence number increase should be far more than safe. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>
3c6edc66 · Song Liu · Shaohua Li · 5c88f403 · 3c6edc66
Commit 3c6edc66 authored Dec 07, 2016 by Song Liu Committed by Shaohua Li Dec 08, 2016
Hide whitespace changes
Inline Side-by-side

Showing with 7 additions and 7 deletions

drivers/md/raid5-cache.c drivers/md/raid5-cache.c +7 -7

No files found.
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -2004,8 +2004,8 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
 * happens again, new recovery will start from meta 1. Since meta 2n is
 * valid now, recovery will think meta 3 is valid, which is wrong.
 * The solution is we create a new meta in meta2 with its seq == meta
- * 1's seq + 10 and let superblock points to meta2. The same recovery will
- * not think meta 3 is a valid meta, because its seq doesn't match
+ * 1's seq + 10000 and let superblock points to meta2. The same recovery
+ * will not think meta 3 is a valid meta, because its seq doesn't match
 */

 /*
@@ -2035,7 +2035,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
 *   ---------------------------------------------
 *   ^                              ^
 *   |- log->last_checkpoint        |- ctx->pos+1
- *   |- log->last_cp_seq            |- ctx->seq+11
+ *   |- log->last_cp_seq            |- ctx->seq+10001
 *
 * However, it is not safe to start the state machine yet, because data only
 * parities are not yet secured in RAID. To save these data only parities, we
@@ -2046,7 +2046,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
 *   -----------------------------------------------------------------
 *   ^                                                ^
 *   |- log->last_checkpoint                          |- ctx->pos+n
- *   |- log->last_cp_seq                              |- ctx->seq+10+n
+ *   |- log->last_cp_seq                              |- ctx->seq+10000+n
 *
 * If failure happens again during this process, the recovery can safe start
 * again from log->last_checkpoint.
@@ -2058,7 +2058,7 @@ static int r5c_recovery_flush_log(struct r5l_log *log,
 *   -----------------------------------------------------------------
 *                        ^                         ^
 *                        |- log->last_checkpoint   |- ctx->pos+n
- *                        |- log->last_cp_seq       |- ctx->seq+10+n
+ *                        |- log->last_cp_seq       |- ctx->seq+10000+n
 *
 * Then we can safely start the state machine. If failure happens from this
 * point on, the recovery will start from new log->last_checkpoint.
@@ -2157,8 +2157,8 @@ static int r5l_recovery_log(struct r5l_log *log)
 	if (ret)
 		return ret;

-       pos = ctx.pos;
-       ctx.seq += 10;
+	pos = ctx.pos;
+	ctx.seq += 10000;

 	if (ctx.data_only_stripes == 0) {
 		log->next_checkpoint = ctx.pos;