Commit c0a2ad9b authored by OGAWA Hirofumi's avatar OGAWA Hirofumi Committed by Theodore Ts'o

jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path

On umount path, jbd2_journal_destroy() writes latest transaction ID
(->j_tail_sequence) to be used at next mount.

The bug is that ->j_tail_sequence is not holding latest transaction ID
in some cases. So, at next mount, there is chance to conflict with
remaining (not overwritten yet) transactions.

	mount (id=10)
	write transaction (id=11)
	write transaction (id=12)
	umount (id=10) <= the bug doesn't write latest ID

	mount (id=10)
	write transaction (id=11)
	crash

	mount
	[recovery process]
		transaction (id=11)
		transaction (id=12) <= valid transaction ID, but old commit
                                       must not replay

Like above, this bug become the cause of recovery failure, or FS
corruption.

So why ->j_tail_sequence doesn't point latest ID?

Because if checkpoint transactions was reclaimed by memory pressure
(i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated.
(And another case is, __jbd2_journal_clean_checkpoint_list() is called
with empty transaction.)

So in above cases, ->j_tail_sequence is not pointing latest
transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not
done too.

So, to fix this problem with minimum changes, this patch updates
->j_tail_sequence, and issue REQ_FLUSH.  (With more complex changes,
some optimizations would be possible to avoid unnecessary REQ_FLUSH
for example though.)

BTW,

	journal->j_tail_sequence =
		++journal->j_transaction_sequence;

Increment of ->j_transaction_sequence seems to be unnecessary, but
ext3 does this.
Signed-off-by: default avatarOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
parent 2d90c160
...@@ -1430,11 +1430,12 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid, ...@@ -1430,11 +1430,12 @@ int jbd2_journal_update_sb_log_tail(journal_t *journal, tid_t tail_tid,
/** /**
* jbd2_mark_journal_empty() - Mark on disk journal as empty. * jbd2_mark_journal_empty() - Mark on disk journal as empty.
* @journal: The journal to update. * @journal: The journal to update.
* @write_op: With which operation should we write the journal sb
* *
* Update a journal's dynamic superblock fields to show that journal is empty. * Update a journal's dynamic superblock fields to show that journal is empty.
* Write updated superblock to disk waiting for IO to complete. * Write updated superblock to disk waiting for IO to complete.
*/ */
static void jbd2_mark_journal_empty(journal_t *journal) static void jbd2_mark_journal_empty(journal_t *journal, int write_op)
{ {
journal_superblock_t *sb = journal->j_superblock; journal_superblock_t *sb = journal->j_superblock;
...@@ -1452,7 +1453,7 @@ static void jbd2_mark_journal_empty(journal_t *journal) ...@@ -1452,7 +1453,7 @@ static void jbd2_mark_journal_empty(journal_t *journal)
sb->s_start = cpu_to_be32(0); sb->s_start = cpu_to_be32(0);
read_unlock(&journal->j_state_lock); read_unlock(&journal->j_state_lock);
jbd2_write_superblock(journal, WRITE_FUA); jbd2_write_superblock(journal, write_op);
/* Log is no longer empty */ /* Log is no longer empty */
write_lock(&journal->j_state_lock); write_lock(&journal->j_state_lock);
...@@ -1738,7 +1739,13 @@ int jbd2_journal_destroy(journal_t *journal) ...@@ -1738,7 +1739,13 @@ int jbd2_journal_destroy(journal_t *journal)
if (journal->j_sb_buffer) { if (journal->j_sb_buffer) {
if (!is_journal_aborted(journal)) { if (!is_journal_aborted(journal)) {
mutex_lock(&journal->j_checkpoint_mutex); mutex_lock(&journal->j_checkpoint_mutex);
jbd2_mark_journal_empty(journal);
write_lock(&journal->j_state_lock);
journal->j_tail_sequence =
++journal->j_transaction_sequence;
write_unlock(&journal->j_state_lock);
jbd2_mark_journal_empty(journal, WRITE_FLUSH_FUA);
mutex_unlock(&journal->j_checkpoint_mutex); mutex_unlock(&journal->j_checkpoint_mutex);
} else } else
err = -EIO; err = -EIO;
...@@ -1997,7 +2004,7 @@ int jbd2_journal_flush(journal_t *journal) ...@@ -1997,7 +2004,7 @@ int jbd2_journal_flush(journal_t *journal)
* the magic code for a fully-recovered superblock. Any future * the magic code for a fully-recovered superblock. Any future
* commits of data to the journal will restore the current * commits of data to the journal will restore the current
* s_start value. */ * s_start value. */
jbd2_mark_journal_empty(journal); jbd2_mark_journal_empty(journal, WRITE_FUA);
mutex_unlock(&journal->j_checkpoint_mutex); mutex_unlock(&journal->j_checkpoint_mutex);
write_lock(&journal->j_state_lock); write_lock(&journal->j_state_lock);
J_ASSERT(!journal->j_running_transaction); J_ASSERT(!journal->j_running_transaction);
...@@ -2043,7 +2050,7 @@ int jbd2_journal_wipe(journal_t *journal, int write) ...@@ -2043,7 +2050,7 @@ int jbd2_journal_wipe(journal_t *journal, int write)
if (write) { if (write) {
/* Lock to make assertions happy... */ /* Lock to make assertions happy... */
mutex_lock(&journal->j_checkpoint_mutex); mutex_lock(&journal->j_checkpoint_mutex);
jbd2_mark_journal_empty(journal); jbd2_mark_journal_empty(journal, WRITE_FUA);
mutex_unlock(&journal->j_checkpoint_mutex); mutex_unlock(&journal->j_checkpoint_mutex);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment