1. 21 Apr, 2022 1 commit
    • Ye Bin's avatar
      jbd2: fix a potential race while discarding reserved buffers after an abort · 23e3d7f7
      Ye Bin authored
      we got issue as follows:
      [   72.796117] EXT4-fs error (device sda): ext4_journal_check_start:83: comm fallocate: Detected aborted journal
      [   72.826847] EXT4-fs (sda): Remounting filesystem read-only
      fallocate: fallocate failed: Read-only file system
      [   74.791830] jbd2_journal_commit_transaction: jh=0xffff9cfefe725d90 bh=0x0000000000000000 end delay
      [   74.793597] ------------[ cut here ]------------
      [   74.794203] kernel BUG at fs/jbd2/transaction.c:2063!
      [   74.794886] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      [   74.795533] CPU: 4 PID: 2260 Comm: jbd2/sda-8 Not tainted 5.17.0-rc8-next-20220315-dirty #150
      [   74.798327] RIP: 0010:__jbd2_journal_unfile_buffer+0x3e/0x60
      [   74.801971] RSP: 0018:ffffa828c24a3cb8 EFLAGS: 00010202
      [   74.802694] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [   74.803601] RDX: 0000000000000001 RSI: ffff9cfefe725d90 RDI: ffff9cfefe725d90
      [   74.804554] RBP: ffff9cfefe725d90 R08: 0000000000000000 R09: ffffa828c24a3b20
      [   74.805471] R10: 0000000000000001 R11: 0000000000000001 R12: ffff9cfefe725d90
      [   74.806385] R13: ffff9cfefe725d98 R14: 0000000000000000 R15: ffff9cfe833a4d00
      [   74.807301] FS:  0000000000000000(0000) GS:ffff9d01afb00000(0000) knlGS:0000000000000000
      [   74.808338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   74.809084] CR2: 00007f2b81bf4000 CR3: 0000000100056000 CR4: 00000000000006e0
      [   74.810047] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   74.810981] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   74.811897] Call Trace:
      [   74.812241]  <TASK>
      [   74.812566]  __jbd2_journal_refile_buffer+0x12f/0x180
      [   74.813246]  jbd2_journal_refile_buffer+0x4c/0xa0
      [   74.813869]  jbd2_journal_commit_transaction.cold+0xa1/0x148
      [   74.817550]  kjournald2+0xf8/0x3e0
      [   74.819056]  kthread+0x153/0x1c0
      [   74.819963]  ret_from_fork+0x22/0x30
      
      Above issue may happen as follows:
              write                   truncate                   kjournald2
      generic_perform_write
       ext4_write_begin
        ext4_walk_page_buffers
         do_journal_get_write_access ->add BJ_Reserved list
       ext4_journalled_write_end
        ext4_walk_page_buffers
         write_end_fn
          ext4_handle_dirty_metadata
                      ***************JBD2 ABORT**************
           jbd2_journal_dirty_metadata
       -> return -EROFS, jh in reserved_list
                                                         jbd2_journal_commit_transaction
                                                          while (commit_transaction->t_reserved_list)
                                                            jh = commit_transaction->t_reserved_list;
                              truncate_pagecache_range
                               do_invalidatepage
      			  ext4_journalled_invalidatepage
      			   jbd2_journal_invalidatepage
      			    journal_unmap_buffer
      			     __dispose_buffer
      			      __jbd2_journal_unfile_buffer
      			       jbd2_journal_put_journal_head ->put last ref_count
      			        __journal_remove_journal_head
      				 bh->b_private = NULL;
      				 jh->b_bh = NULL;
      				                      jbd2_journal_refile_buffer(journal, jh);
      							bh = jh2bh(jh);
      							->bh is NULL, later will trigger null-ptr-deref
      				 journal_free_journal_head(jh);
      
      After commit 96f1e097, we no longer hold the j_state_lock while
      iterating over the list of reserved handles in
      jbd2_journal_commit_transaction().  This potentially allows the
      journal_head to be freed by journal_unmap_buffer while the commit
      codepath is also trying to free the BJ_Reserved buffers.  Keeping
      j_state_lock held while trying extends hold time of the lock
      minimally, and solves this issue.
      
      Fixes: 96f1e097("jbd2: avoid long hold times of j_state_lock while committing a transaction")
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220317142137.1821590-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      23e3d7f7
  2. 15 Apr, 2022 3 commits
  3. 13 Apr, 2022 6 commits
    • wangjianjian (C)'s avatar
      ext4, doc: fix incorrect h_reserved size · 7102ffe4
      wangjianjian (C) authored
      According to document and code, ext4_xattr_header's size is 32 bytes, so
      h_reserved size should be 3.
      Signed-off-by: default avatarWang Jianjian <wangjianjian3@huawei.com>
      Link: https://lore.kernel.org/r/92fcc3a6-7d77-8c09-4126-377fcb4c46a5@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      7102ffe4
    • Tadeusz Struk's avatar
      ext4: limit length to bitmap_maxbytes - blocksize in punch_hole · 2da37622
      Tadeusz Struk authored
      Syzbot found an issue [1] in ext4_fallocate().
      The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul,
      and offset 0x1000000ul, which, when added together exceed the
      bitmap_maxbytes for the inode. This triggers a BUG in
      ext4_ind_remove_space(). According to the comments in this function
      the 'end' parameter needs to be one block after the last block to be
      removed. In the case when the BUG is triggered it points to the last
      block. Modify the ext4_punch_hole() function and add constraint that
      caps the length to satisfy the one before laster block requirement.
      
      LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331
      LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000
      
      Fixes: a4bb6b64 ("ext4: enable "punch hole" functionality")
      Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com
      Signed-off-by: default avatarTadeusz Struk <tadeusz.struk@linaro.org>
      Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.orgSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      2da37622
    • Ye Bin's avatar
      ext4: fix use-after-free in ext4_search_dir · c186f088
      Ye Bin authored
      We got issue as follows:
      EXT4-fs (loop0): mounted filesystem without journal. Opts: ,errors=continue
      ==================================================================
      BUG: KASAN: use-after-free in ext4_search_dir fs/ext4/namei.c:1394 [inline]
      BUG: KASAN: use-after-free in search_dirblock fs/ext4/namei.c:1199 [inline]
      BUG: KASAN: use-after-free in __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
      Read of size 1 at addr ffff8881317c3005 by task syz-executor117/2331
      
      CPU: 1 PID: 2331 Comm: syz-executor117 Not tainted 5.10.0+ #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:83 [inline]
       dump_stack+0x144/0x187 lib/dump_stack.c:124
       print_address_description+0x7d/0x630 mm/kasan/report.c:387
       __kasan_report+0x132/0x190 mm/kasan/report.c:547
       kasan_report+0x47/0x60 mm/kasan/report.c:564
       ext4_search_dir fs/ext4/namei.c:1394 [inline]
       search_dirblock fs/ext4/namei.c:1199 [inline]
       __ext4_find_entry+0xdca/0x1210 fs/ext4/namei.c:1553
       ext4_lookup_entry fs/ext4/namei.c:1622 [inline]
       ext4_lookup+0xb8/0x3a0 fs/ext4/namei.c:1690
       __lookup_hash+0xc5/0x190 fs/namei.c:1451
       do_rmdir+0x19e/0x310 fs/namei.c:3760
       do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x445e59
      Code: 4d c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b c7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff2277fac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
      RAX: ffffffffffffffda RBX: 0000000000400280 RCX: 0000000000445e59
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200000c0
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000002
      R10: 00007fff2277f990 R11: 0000000000000246 R12: 0000000000000000
      R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000
      
      The buggy address belongs to the page:
      page:0000000048cd3304 refcount:0 mapcount:0 mapping:0000000000000000 index:0x1 pfn:0x1317c3
      flags: 0x200000000000000()
      raw: 0200000000000000 ffffea0004526588 ffffea0004528088 0000000000000000
      raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881317c2f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8881317c2f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8881317c3000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                         ^
       ffff8881317c3080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8881317c3100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      ==================================================================
      
      ext4_search_dir:
        ...
        de = (struct ext4_dir_entry_2 *)search_buf;
        dlimit = search_buf + buf_size;
        while ((char *) de < dlimit) {
        ...
          if ((char *) de + de->name_len <= dlimit &&
      	 ext4_match(dir, fname, de)) {
      	    ...
          }
        ...
          de_len = ext4_rec_len_from_disk(de->rec_len, dir->i_sb->s_blocksize);
          if (de_len <= 0)
            return -1;
          offset += de_len;
          de = (struct ext4_dir_entry_2 *) ((char *) de + de_len);
        }
      
      Assume:
      de=0xffff8881317c2fff
      dlimit=0x0xffff8881317c3000
      
      If read 'de->name_len' which address is 0xffff8881317c3005, obviously is
      out of range, then will trigger use-after-free.
      To solve this issue, 'dlimit' must reserve 8 bytes, as we will read
      'de->name_len' to judge if '(char *) de + de->name_len' out of range.
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220324064816.1209985-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      c186f088
    • Ye Bin's avatar
      ext4: fix bug_on in start_this_handle during umount filesystem · b98535d0
      Ye Bin authored
      We got issue as follows:
      ------------[ cut here ]------------
      kernel BUG at fs/jbd2/transaction.c:389!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      CPU: 9 PID: 131 Comm: kworker/9:1 Not tainted 5.17.0-862.14.0.6.x86_64-00001-g23f87daf7d74-dirty #197
      Workqueue: events flush_stashed_error_work
      RIP: 0010:start_this_handle+0x41c/0x1160
      RSP: 0018:ffff888106b47c20 EFLAGS: 00010202
      RAX: ffffed10251b8400 RBX: ffff888128dc204c RCX: ffffffffb52972ac
      RDX: 0000000000000200 RSI: 0000000000000004 RDI: ffff888128dc2050
      RBP: 0000000000000039 R08: 0000000000000001 R09: ffffed10251b840a
      R10: ffff888128dc204f R11: ffffed10251b8409 R12: ffff888116d78000
      R13: 0000000000000000 R14: dffffc0000000000 R15: ffff888128dc2000
      FS:  0000000000000000(0000) GS:ffff88839d680000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000001620068 CR3: 0000000376c0e000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       jbd2__journal_start+0x38a/0x790
       jbd2_journal_start+0x19/0x20
       flush_stashed_error_work+0x110/0x2b3
       process_one_work+0x688/0x1080
       worker_thread+0x8b/0xc50
       kthread+0x26f/0x310
       ret_from_fork+0x22/0x30
       </TASK>
      Modules linked in:
      ---[ end trace 0000000000000000 ]---
      
      Above issue may happen as follows:
            umount            read procfs            error_work
      ext4_put_super
        flush_work(&sbi->s_error_work);
      
                            ext4_mb_seq_groups_show
      	                ext4_mb_load_buddy_gfp
      			  ext4_mb_init_group
      			    ext4_mb_init_cache
      	                      ext4_read_block_bitmap_nowait
      			        ext4_validate_block_bitmap
      				  ext4_error
      			            ext4_handle_error
      			              schedule_work(&EXT4_SB(sb)->s_error_work);
      
        ext4_unregister_sysfs(sb);
        jbd2_journal_destroy(sbi->s_journal);
          journal_kill_thread
            journal->j_flags |= JBD2_UNMOUNT;
      
                                                flush_stashed_error_work
      				            jbd2_journal_start
      					      start_this_handle
      					        BUG_ON(journal->j_flags & JBD2_UNMOUNT);
      
      To solve this issue, we call 'ext4_unregister_sysfs() before flushing
      s_error_work in ext4_put_super().
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarRitesh Harjani <riteshh@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220322012419.725457-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      b98535d0
    • Ye Bin's avatar
      ext4: fix symlink file size not match to file content · a2b0b205
      Ye Bin authored
      We got issue as follows:
      [home]# fsck.ext4  -fn  ram0yb
      e2fsck 1.45.6 (20-Mar-2020)
      Pass 1: Checking inodes, blocks, and sizes
      Pass 2: Checking directory structure
      Symlink /p3/d14/d1a/l3d (inode #3494) is invalid.
      Clear? no
      Entry 'l3d' in /p3/d14/d1a (3383) has an incorrect filetype (was 7, should be 0).
      Fix? no
      
      As the symlink file size does not match the file content. If the writeback
      of the symlink data block failed, ext4_finish_bio() handles the end of IO.
      However this function fails to mark the buffer with BH_write_io_error and
      so when unmount does journal checkpoint it cannot detect the writeback
      error and will cleanup the journal. Thus we've lost the correct data in the
      journal area. To solve this issue, mark the buffer as BH_write_io_error in
      ext4_finish_bio().
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220321144438.201685-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      a2b0b205
    • Darrick J. Wong's avatar
      ext4: fix fallocate to use file_modified to update permissions consistently · ad5cd4f4
      Darrick J. Wong authored
      Since the initial introduction of (posix) fallocate back at the turn of
      the century, it has been possible to use this syscall to change the
      user-visible contents of files.  This can happen by extending the file
      size during a preallocation, or through any of the newer modes (punch,
      zero, collapse, insert range).  Because the call can be used to change
      file contents, we should treat it like we do any other modification to a
      file -- update the mtime, and drop set[ug]id privileges/capabilities.
      
      The VFS function file_modified() does all this for us if pass it a
      locked inode, so let's make fallocate drop permissions correctly.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Link: https://lore.kernel.org/r/20220308185043.GA117678@magnoliaSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      ad5cd4f4
  4. 15 Mar, 2022 7 commits
  5. 13 Mar, 2022 8 commits
  6. 03 Mar, 2022 7 commits
  7. 26 Feb, 2022 8 commits