1. 13 Apr, 2023 2 commits
    • Dave Chinner's avatar
      Merge tag 'intents-perag-refs-6.4_2023-04-11' of... · 826053db
      Dave Chinner authored
      Merge tag 'intents-perag-refs-6.4_2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into guilt/xfs-for-next
      
      xfs: make intent items take a perag reference [v24.5]
      
      Now that we've cleaned up some code warts in the deferred work item
      processing code, let's make intent items take an active perag reference
      from their creation until they are finally freed by the defer ops
      machinery.  This change facilitates the scrub drain in the next patchset
      and will make it easier for the future AG removal code to detect a busy
      AG in need of quiescing.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      826053db
    • Dave Chinner's avatar
      Merge tag 'online-fsck-design-6.4_2023-04-11' of... · bed25d80
      Dave Chinner authored
      Merge tag 'online-fsck-design-6.4_2023-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into guilt/xfs-for-next
      
      xfs: design documentation for online fsck [v24.5]
      
      After six years of development and a nearly two year hiatus from
      patchbombing, I think it is time to resume the process of merging the
      online fsck feature into XFS.  The full patchset comprises 105 separate
      patchsets that capture 470 patches across the kernel, xfsprogs, and
      fstests projects.
      
      I would like to merge this feature into upstream in time for the 2023
      LTS kernel.  As of 5.15 (aka last year's LTS), we have merged all
      generally useful infrastructure improvements into the regular
      filesystem.  The only changes to the core filesystem that remain are the
      ones that are only useful to online fsck itself.  In other words, the
      vast majority of the new code in the patchsets comprising the online
      fsck feature are is mostly self contained and can be turned off via
      Kconfig.
      
      Many of you readers might be wondering -- why have I chosen to make one
      large submission with 100+ patchsets comprising ~500 patches?  Why
      didn't I merge small pieces of functionality bit by bit and revise
      common code as necessary?  Well, the simple answer is that in the past
      six years, the fundamental algorithms have been revised repeatedly as
      I've built out the functionality.  In other words, the codebase as it is
      now has the benefit that I now know every piece that's necessary to get
      the job done in a reasonable manner and within the constraints laid out
      by community reviews.  I believe this has reduced code churn in mainline
      and freed up my time so that I can iterate faster.
      
      As a concession to the mail servers, I'm breaking up the submission into
      smaller pieces; I'm only pushing the design document and the revisions
      to the existing scrub code, which is the first 20%% of the patches.
      Also, I'm arbitrarily restarting the version numbering by reversioning
      all patchsets from version 22 to epoch 23, version 1.
      
      The big question to everyone reading this is: How might I convince you
      that there is more merit in merging the whole feature and dealing with
      the consequences than continuing to maintain it out of tree?
      
      ---------
      
      To prepare the XFS community and potential patch reviewers for the
      upstream submission of the online fsck feature, I decided to write a
      document capturing the broader picture behind the online repair
      development effort.  The document begins by defining the problems that
      online fsck aims to solve and outlining specific use cases for the
      functionality.
      
      Using that as a base, the rest of the design document presents the high
      level algorithms that fulfill the goals set out at the start and the
      interactions between the large pieces of the system.  Case studies round
      out the design documentation by adding the details of exactly how
      specific parts of the online fsck code integrate the algorithms with the
      filesystem.
      
      The goal of this effort is to help the XFS community understand how the
      gigantic online repair patchset works.  The questions I submit to the
      community reviewers are:
      
      1. As you read the design doc (and later the code), do you feel that you
         understand what's going on well enough to try to fix a bug if you
         found one?
      
      2. What sorts of interactions between systems (or between scrub and the
         rest of the kernel) am I missing?
      
      3. Do you feel confident enough in the implementation as it is now that
         the benefits of merging the feature (as EXPERIMENTAL) outweigh any
         potential disruptions to XFS at large?
      
      4. Are there problematic interactions between subsystems that ought to
         be cleared up before merging?
      
      5. Can I just merge all of this?
      
      I intend to commit this document to the kernel's documentation directory
      when we start merging the patchset, albeit without the links to
      git.kernel.org.  A much more readable version of this is posted at:
      https://djwong.org/docs/xfs-online-fsck-design/
      
      v2: add missing sections about: all the in-kernel data structures and
          new apis that the scrub and repair functions use; how xattrs and
          directories are checked; how space btree records are checked; and
          add more details to the parts where all these bits tie together.
          Proofread for verb tense inconsistencies and eliminate vague 'we'
          usage.  Move all the discussion of what we can do with pageable
          kernel memory into a single source file and section.  Document where
          log incompat feature locks fit into the locking model.
      
      v3: resync with 6.0, fix a few typos, begin discussion of the merging
          plan for this megapatchset.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      bed25d80
  2. 12 Apr, 2023 24 commits
    • Ye Bin's avatar
      xfs: fix BUG_ON in xfs_getbmap() · 8ee81ed5
      Ye Bin authored
      There's issue as follows:
      XFS: Assertion failed: (bmv->bmv_iflags & BMV_IF_DELALLOC) != 0, file: fs/xfs/xfs_bmap_util.c, line: 329
      ------------[ cut here ]------------
      kernel BUG at fs/xfs/xfs_message.c:102!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 14612 Comm: xfs_io Not tainted 6.3.0-rc2-next-20230315-00006-g2729d23ddb3b-dirty #422
      RIP: 0010:assfail+0x96/0xa0
      RSP: 0018:ffffc9000fa178c0 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff888179a18000
      RDX: 0000000000000000 RSI: ffff888179a18000 RDI: 0000000000000002
      RBP: 0000000000000000 R08: ffffffff8321aab6 R09: 0000000000000000
      R10: 0000000000000001 R11: ffffed1105f85139 R12: ffffffff8aacc4c0
      R13: 0000000000000149 R14: ffff888269f58000 R15: 000000000000000c
      FS:  00007f42f27a4740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000b92388 CR3: 000000024f006000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       xfs_getbmap+0x1a5b/0x1e40
       xfs_ioc_getbmap+0x1fd/0x5b0
       xfs_file_ioctl+0x2cb/0x1d50
       __x64_sys_ioctl+0x197/0x210
       do_syscall_64+0x39/0xb0
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Above issue may happen as follows:
               ThreadA                       ThreadB
      do_shared_fault
       __do_fault
        xfs_filemap_fault
         __xfs_filemap_fault
          filemap_fault
                                   xfs_ioc_getbmap -> Without BMV_IF_DELALLOC flag
      			      xfs_getbmap
      			       xfs_ilock(ip, XFS_IOLOCK_SHARED);
      			       filemap_write_and_wait
       do_page_mkwrite
        xfs_filemap_page_mkwrite
         __xfs_filemap_fault
          xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
          iomap_page_mkwrite
           ...
           xfs_buffered_write_iomap_begin
            xfs_bmapi_reserve_delalloc -> Allocate delay extent
                                    xfs_ilock_data_map_shared(ip)
      	                      xfs_getbmap_report_one
      			       ASSERT((bmv->bmv_iflags & BMV_IF_DELALLOC) != 0)
      	                        -> trigger BUG_ON
      
      As xfs_filemap_page_mkwrite() only hold XFS_MMAPLOCK_SHARED lock, there's
      small window mkwrite can produce delay extent after file write in xfs_getbmap().
      To solve above issue, just skip delalloc extents.
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      8ee81ed5
    • Darrick J. Wong's avatar
      xfs: verify buffer contents when we skip log replay · 22ed903e
      Darrick J. Wong authored
      syzbot detected a crash during log recovery:
      
      XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
      XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
      XFS (loop0): Starting recovery (logdev: internal)
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
      Read of size 8 at addr ffff88807e89f258 by task syz-executor132/5074
      
      CPU: 0 PID: 5074 Comm: syz-executor132 Not tainted 6.2.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
       print_address_description+0x74/0x340 mm/kasan/report.c:306
       print_report+0x107/0x1f0 mm/kasan/report.c:417
       kasan_report+0xcd/0x100 mm/kasan/report.c:517
       xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
       xfs_btree_lookup+0x346/0x12c0 fs/xfs/libxfs/xfs_btree.c:1913
       xfs_btree_simple_query_range+0xde/0x6a0 fs/xfs/libxfs/xfs_btree.c:4713
       xfs_btree_query_range+0x2db/0x380 fs/xfs/libxfs/xfs_btree.c:4953
       xfs_refcount_recover_cow_leftovers+0x2d1/0xa60 fs/xfs/libxfs/xfs_refcount.c:1946
       xfs_reflink_recover_cow+0xab/0x1b0 fs/xfs/xfs_reflink.c:930
       xlog_recover_finish+0x824/0x920 fs/xfs/xfs_log_recover.c:3493
       xfs_log_mount_finish+0x1ec/0x3d0 fs/xfs/xfs_log.c:829
       xfs_mountfs+0x146a/0x1ef0 fs/xfs/xfs_mount.c:933
       xfs_fs_fill_super+0xf95/0x11f0 fs/xfs/xfs_super.c:1666
       get_tree_bdev+0x400/0x620 fs/super.c:1282
       vfs_get_tree+0x88/0x270 fs/super.c:1489
       do_new_mount+0x289/0xad0 fs/namespace.c:3145
       do_mount fs/namespace.c:3488 [inline]
       __do_sys_mount fs/namespace.c:3697 [inline]
       __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f89fa3f4aca
      Code: 83 c4 08 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fffd5fb5ef8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 00646975756f6e2c RCX: 00007f89fa3f4aca
      RDX: 0000000020000100 RSI: 0000000020009640 RDI: 00007fffd5fb5f10
      RBP: 00007fffd5fb5f10 R08: 00007fffd5fb5f50 R09: 000000000000970d
      R10: 0000000000200800 R11: 0000000000000206 R12: 0000000000000004
      R13: 0000555556c6b2c0 R14: 0000000000200800 R15: 00007fffd5fb5f50
       </TASK>
      
      The fuzzed image contains an AGF with an obviously garbage
      agf_refcount_level value of 32, and a dirty log with a buffer log item
      for that AGF.  The ondisk AGF has a higher LSN than the recovered log
      item.  xlog_recover_buf_commit_pass2 reads the buffer, compares the
      LSNs, and decides to skip replay because the ondisk buffer appears to be
      newer.
      
      Unfortunately, the ondisk buffer is corrupt, but recovery just read the
      buffer with no buffer ops specified:
      
      	error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno,
      			buf_f->blf_len, buf_flags, &bp, NULL);
      
      Skipping the buffer leaves its contents in memory unverified.  This sets
      us up for a kernel crash because xfs_refcount_recover_cow_leftovers
      reads the buffer (which is still around in XBF_DONE state, so no read
      verification) and creates a refcountbt cursor of height 32.  This is
      impossible so we run off the end of the cursor object and crash.
      
      Fix this by invoking the verifier on all skipped buffers and aborting
      log recovery if the ondisk buffer is corrupt.  It might be smarter to
      force replay the log item atop the buffer and then see if it'll pass the
      write verifier (like ext4 does) but for now let's go with the
      conservative option where we stop immediately.
      
      Link: https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994eSigned-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      22ed903e
    • Darrick J. Wong's avatar
      xfs: _{attr,data}_map_shared should take ILOCK_EXCL until iread_extents is completely done · c95356ca
      Darrick J. Wong authored
      While fuzzing the data fork extent count on a btree-format directory
      with xfs/375, I observed the following (excerpted) splat:
      
      XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/libxfs/xfs_bmap.c, line: 1208
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 43192 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
      Call Trace:
       <TASK>
       xfs_iread_extents+0x1af/0x210 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
       xchk_dir_walk+0xb8/0x190 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
       xchk_parent_count_parent_dentries+0x41/0x80 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
       xchk_parent_validate+0x199/0x2e0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
       xchk_parent+0xdf/0x130 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
       xfs_scrub_metadata+0x2b8/0x730 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
       xfs_scrubv_metadata+0x38b/0x4d0 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
       xfs_ioc_scrubv_metadata+0x111/0x160 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
       xfs_file_ioctl+0x367/0xf50 [xfs 09f66509ece4938760fac7de64732a0cbd3e39cd]
       __x64_sys_ioctl+0x82/0xa0
       do_syscall_64+0x2b/0x80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      The cause of this is a race condition in xfs_ilock_data_map_shared,
      which performs an unlocked access to the data fork to guess which lock
      mode it needs:
      
      Thread 0                          Thread 1
      
      xfs_need_iread_extents
      <observe no iext tree>
      xfs_ilock(..., ILOCK_EXCL)
      xfs_iread_extents
      <observe no iext tree>
      <check ILOCK_EXCL>
      <load bmbt extents into iext>
      <notice iext size doesn't
       match nextents>
                                        xfs_need_iread_extents
                                        <observe iext tree>
                                        xfs_ilock(..., ILOCK_SHARED)
      <tear down iext tree>
      xfs_iunlock(..., ILOCK_EXCL)
                                        xfs_iread_extents
                                        <observe no iext tree>
                                        <check ILOCK_EXCL>
                                        *BOOM*
      
      Fix this race by adding a flag to the xfs_ifork structure to indicate
      that we have not yet read in the extent records and changing the
      predicate to look at the flag state, not if_height.  The memory barrier
      ensures that the flag will not be set until the very end of the
      function.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      c95356ca
    • Dave Chinner's avatar
      xfs: remove WARN when dquot cache insertion fails · 4b827b3f
      Dave Chinner authored
      It just creates unnecessary bot noise these days.
      
      Reported-by: syzbot+6ae213503fb12e87934f@syzkaller.appspotmail.com
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      4b827b3f
    • Dave Chinner's avatar
      xfs: don't consider future format versions valid · aa880198
      Dave Chinner authored
      In commit fe08cc50 we reworked the valid superblock version
      checks. If it is a V5 filesystem, it is always valid, then we
      checked if the version was less than V4 (reject) and then checked
      feature fields in the V4 flags to determine if it was valid.
      
      What we missed was that if the version is not V4 at this point,
      we shoudl reject the fs. i.e. the check current treats V6+
      filesystems as if it was a v4 filesystem. Fix this.
      
      cc: stable@vger.kernel.org
      Fixes: fe08cc50 ("xfs: open code sb verifier feature checks")
      Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
      Reviewed-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: default avatarDave Chinner <david@fromorbit.com>
      aa880198
    • Darrick J. Wong's avatar
      xfs: give xfs_refcount_intent its own perag reference · 00e7b3ba
      Darrick J. Wong authored
      Give the xfs_refcount_intent a passive reference to the perag structure
      data.  This reference will be used to enable scrub intent draining
      functionality in subsequent patches.  Any space being modified by a
      refcount intent is already allocated, so we need to be able to operate
      even if the AG is being shrunk or offlined.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      00e7b3ba
    • Darrick J. Wong's avatar
      xfs: give xfs_rmap_intent its own perag reference · c13418e8
      Darrick J. Wong authored
      Give the xfs_rmap_intent a passive reference to the perag structure
      data.  This reference will be used to enable scrub intent draining
      functionality in subsequent patches.  The space we're (reverse) mapping
      is already allocated, so we need to be able to operate even if the AG is
      being shrunk or offlined.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      c13418e8
    • Darrick J. Wong's avatar
      xfs: give xfs_extfree_intent its own perag reference · f6b38463
      Darrick J. Wong authored
      Give the xfs_extfree_intent an passive reference to the perag structure
      data.  This reference will be used to enable scrub intent draining
      functionality in subsequent patches.  The space being freed must already
      be allocated, so we need to able to run even if the AG is being offlined
      or shrunk.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      f6b38463
    • Darrick J. Wong's avatar
      xfs: pass per-ag references to xfs_free_extent · b2ccab31
      Darrick J. Wong authored
      Pass a reference to the per-AG structure to xfs_free_extent.  Most
      callers already have one, so we can eliminate unnecessary lookups.  The
      one exception to this is the EFI code, which the next patch will fix.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      b2ccab31
    • Darrick J. Wong's avatar
      xfs: give xfs_bmap_intent its own perag reference · 774a99b4
      Darrick J. Wong authored
      Give the xfs_bmap_intent an active reference to the perag structure
      data.  This reference will be used to enable scrub intent draining
      functionality in subsequent patches.  Later, shrink will use these
      passive references to know if an AG is quiesced or not.
      
      The reason why we take a passive ref for a file mapping operation is
      simple: we're committing to some sort of action involving space in an
      AG, so we want to indicate our interest in that AG.  The space is
      already allocated, so we need to be able to operate on AGs that are
      offline or being shrunk.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      774a99b4
    • Darrick J. Wong's avatar
      xfs: document future directions of online fsck · 03786f0a
      Darrick J. Wong authored
      Add the seventh and final chapter of the online fsck documentation,
      where we talk about future functionality that can tie in with the
      functionality provided by the online fsck patchset.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      03786f0a
    • Darrick J. Wong's avatar
      xfs: document the userspace fsck driver program · af051dfb
      Darrick J. Wong authored
      Add the sixth chapter of the online fsck design documentation, where
      we discuss the details of the data structures and algorithms used by the
      driver program xfs_scrub.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      af051dfb
    • Darrick J. Wong's avatar
      xfs: document directory tree repairs · a26aa252
      Darrick J. Wong authored
      Directory tree repairs are the least complete part of online fsck, due
      to the lack of directory parent pointers.  However, even without that
      feature, we can still make some corrections to the directory tree -- we
      can salvage as many directory entries as we can from a damaged
      directory, and we can reattach orphaned inodes to the lost+found, just
      as xfs_repair does now.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      a26aa252
    • Darrick J. Wong's avatar
      xfs: document metadata file repair · 2f754f7f
      Darrick J. Wong authored
      File-based metadata (such as xattrs and directories) can be extremely
      large.  To reduce the memory requirements and maximize code reuse, it is
      very convenient to create a temporary file, use the regular dir/attr
      code to store salvaged information, and then atomically swap the extents
      between the file being repaired and the temporary file.  Record the high
      level concepts behind how temporary files and atomic content swapping
      should work, and then present some case studies of what the actual
      repair functions do.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      2f754f7f
    • Darrick J. Wong's avatar
      xfs: document full filesystem scans for online fsck · a0d856ee
      Darrick J. Wong authored
      Certain parts of the online fsck code need to scan every file in the
      entire filesystem.  It is not acceptable to block the entire filesystem
      while this happens, which means that we need to be clever in allowing
      scans to coordinate with ongoing filesystem updates.  We also need to
      hook the filesystem so that regular updates propagate to the staging
      records.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      a0d856ee
    • Darrick J. Wong's avatar
      xfs: document online file metadata repair code · d6978871
      Darrick J. Wong authored
      Add to the fifth chapter of the online fsck design documentation, where
      we discuss the details of the data structures and algorithms used by the
      kernel to repair file metadata.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      d6978871
    • Darrick J. Wong's avatar
      xfs: document btree bulk loading · 7fb8ccff
      Darrick J. Wong authored
      Add a discussion of the btree bulk loading code, which makes it easy to
      take an in-memory recordset and write it out to disk in an efficient
      manner.  This also enables atomic switchover from the old to the new
      structure with minimal potential for leaking the old blocks.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      7fb8ccff
    • Darrick J. Wong's avatar
      xfs: document pageable kernel memory · 5f658dad
      Darrick J. Wong authored
      Add a discussion of pageable kernel memory, since online fsck needs
      quite a bit more memory than most other parts of the filesystem to stage
      records and other information.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      5f658dad
    • Darrick J. Wong's avatar
      xfs: document how online fsck deals with eventual consistency · bae43864
      Darrick J. Wong authored
      Writes to an XFS filesystem employ an eventual consistency update model
      to break up complex multistep metadata updates into small chained
      transactions.  This is generally good for performance and scalability
      because XFS doesn't need to prepare for enormous transactions, but it
      also means that online fsck must be careful not to attempt a fsck action
      unless it can be shown that there are no other threads processing a
      transaction chain.  This part of the design documentation covers the
      thinking behind the consistency model and how scrub deals with it.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      bae43864
    • Darrick J. Wong's avatar
      xfs: document the filesystem metadata checking strategy · e5edad52
      Darrick J. Wong authored
      Begin the fifth chapter of the online fsck design documentation, where
      we discuss the details of the data structures and algorithms used by the
      kernel to examine filesystem metadata and cross-reference it around the
      filesystem.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      e5edad52
    • Darrick J. Wong's avatar
      xfs: document the user interface for online fsck · 4f7f6469
      Darrick J. Wong authored
      Start the fourth chapter of the online fsck design documentation, which
      discusses the user interface and the background scrubbing service.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      4f7f6469
    • Darrick J. Wong's avatar
      xfs: document the testing plan for online fsck · 9a30b5b5
      Darrick J. Wong authored
      Start the third chapter of the online fsck design documentation.  This
      covers the testing plan to make sure that both online and offline fsck
      can detect arbitrary problems and correct them without making things
      worse.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      9a30b5b5
    • Darrick J. Wong's avatar
      xfs: document the general theory underlying online fsck design · 88757e04
      Darrick J. Wong authored
      Start the second chapter of the online fsck design documentation.
      This covers the general theory underlying how online fsck works.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      88757e04
    • Darrick J. Wong's avatar
      xfs: document the motivation for online fsck design · a8f6c2e5
      Darrick J. Wong authored
      Start the first chapter of the online fsck design documentation.
      This covers the motivations for creating this in the first place.
      Signed-off-by: default avatarDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      a8f6c2e5
  3. 09 Apr, 2023 5 commits
    • Linus Torvalds's avatar
      Linux 6.3-rc6 · 09a9639e
      Linus Torvalds authored
      09a9639e
    • Linus Torvalds's avatar
      Merge tag 'perf_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · faf8f418
      Linus Torvalds authored
      Pull perf fixes from Borislav Petkov:
      
       - Fix "same task" check when redirecting event output
      
       - Do not wait unconditionally for RCU on the event migration path if
         there are no events to migrate
      
      * tag 'perf_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: Fix the same task check in perf_event_set_output
        perf: Optimize perf_pmu_migrate_context()
      faf8f418
    • Linus Torvalds's avatar
      Merge tag 'x86_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4ba115e2
      Linus Torvalds authored
      Pull x86 fixes from Borislav Petkov:
      
       - Add a new Intel Arrow Lake CPU model number
      
       - Fix a confusion about how to check the version of the ACPI spec which
         supports a "online capable" bit in the MADT table which lead to a
         bunch of boot breakages with Zen1 systems and VMs
      
      * tag 'x86_urgent_for_v6.3_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/cpu: Add model number for Intel Arrow Lake processor
        x86/acpi/boot: Correct acpi_is_processor_usable() check
        x86/ACPI/boot: Use FADT version to check support for online capable
      4ba115e2
    • Linus Torvalds's avatar
      Merge tag 'cxl-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl · c08cfd67
      Linus Torvalds authored
      Pull compute express link (cxl) fixes from Dan Williams:
       "Several fixes for driver startup regressions that landed during the
        merge window as well as some older bugs.
      
        The regressions were due to a lack of testing with what the CXL
        specification calls Restricted CXL Host (RCH) topologies compared to
        the testing with Virtual Host (VH) CXL topologies. A VH topology is
        typical PCIe while RCH topologies map CXL endpoints as Root Complex
        Integrated endpoints. The impact is some driver crashes on startup.
      
        This merge window also added compatibility for range registers (the
        mechanism that CXL 1.1 defined for mapping memory) to treat them like
        HDM decoders (the mechanism that CXL 2.0 defined for mapping
        Host-managed Device Memory). That work collided with the new region
        enumeration code that was tested with CXL 2.0 setups, and fails with
        crashes at startup.
      
        Lastly, the DOE (Data Object Exchange) implementation for retrieving
        an ACPI-like data table from CXL devices is being reworked for v6.4.
        Several fixes fell out of that work that are suitable for v6.3.
      
        All of this has been in linux-next for a while, and all reported
        issues [1] have been addressed.
      
        Summary:
      
         - Fix several issues with region enumeration in RCH topologies that
           can trigger crashes on driver startup or shutdown.
      
         - Fix CXL DVSEC range register compatibility versus region
           enumeration that leads to startup crashes
      
         - Fix CDAT endiannes handling
      
         - Fix multiple buffer handling boundary conditions
      
         - Fix Data Object Exchange (DOE) workqueue usage vs
           CONFIG_DEBUG_OBJECTS warn splats"
      
      Link: http://lore.kernel.org/r/20230405075704.33de8121@canb.auug.org.au [1]
      
      * tag 'cxl-fixes-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
        cxl/hdm: Extend DVSEC range register emulation for region enumeration
        cxl/hdm: Limit emulation to the number of range registers
        cxl/region: Move coherence tracking into cxl_region_attach()
        cxl/region: Fix region setup/teardown for RCDs
        cxl/port: Fix find_cxl_root() for RCDs and simplify it
        cxl/hdm: Skip emulation when driver manages mem_enable
        cxl/hdm: Fix double allocation of @cxlhdm
        PCI/DOE: Fix memory leak with CONFIG_DEBUG_OBJECTS=y
        PCI/DOE: Silence WARN splat with CONFIG_DEBUG_OBJECTS=y
        cxl/pci: Handle excessive CDAT length
        cxl/pci: Handle truncated CDAT entries
        cxl/pci: Handle truncated CDAT header
        cxl/pci: Fix CDAT retrieval on big endian
      c08cfd67
    • Linus Torvalds's avatar
      Merge tag '6.3-rc5-smb3-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · cdc9718d
      Linus Torvalds authored
      Pull cifs client fixes from Steve French:
       "Two cifs/smb3 client fixes, one for stable:
      
         - double lock fix for a cifs/smb1 reconnect path
      
         - DFS prefixpath fix for reconnect when server moved"
      
      * tag '6.3-rc5-smb3-cifs-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: double lock in cifs_reconnect_tcon()
        cifs: sanitize paths in cifs_update_super_prepath.
      cdc9718d
  4. 08 Apr, 2023 9 commits
    • Linus Torvalds's avatar
      Merge tag 'char-misc-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · 68047c48
      Linus Torvalds authored
      Pull char/misc driver fixes from Greg KH:
       "Here are a small set of various small driver changes for 6.3-rc6.
        Included in here are:
      
         - iio driver fixes for reported problems
      
         - coresight hwtracing bugfix for reported problem
      
         - small counter driver bugfixes
      
        All have been in linux-next for a while with no reported problems"
      
      * tag 'char-misc-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
        coresight: etm4x: Do not access TRCIDR1 for identification
        coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
        iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
        iio: adc: palmas_gpadc: fix NULL dereference on rmmod
        counter: 104-quad-8: Fix Synapse action reported for Index signals
        counter: 104-quad-8: Fix race condition between FLAG and CNTR reads
        iio: adc: max11410: fix read_poll_timeout() usage
        iio: dac: cio-dac: Fix max DAC write value check for 12-bit
        iio: light: cm32181: Unregister second I2C client if present
        iio: accel: kionix-kx022a: Get the timestamp from the driver's private data in the trigger_handler
        iio: adc: ad7791: fix IRQ flags
        iio: buffer: make sure O_NONBLOCK is respected
        iio: buffer: correctly return bytes written in output buffers
        iio: light: vcnl4000: Fix WARN_ON on uninitialized lock
        iio: adis16480: select CONFIG_CRC32
        drivers: iio: adc: ltc2497: fix LSB shift
        iio: adc: qcom-spmi-adc5: Fix the channel name
      68047c48
    • Linus Torvalds's avatar
      Merge tag 'tty-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · aa46fe36
      Linus Torvalds authored
      Pull tty/serial driver fixes from Greg KH:
       "Here are some small tty and serial driver fixes for some reported
        problems:
      
         - fsl_uart driver bugfixes
      
         - sh-sci serial driver bugfixes
      
         - renesas serial driver DT binding bugfixes
      
         - 8250 DMA bugfix
      
        All of these have been in linux-next for a while with no reported
        problems"
      
      * tag 'tty-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
        tty: serial: fsl_lpuart: fix crash in lpuart_uport_is_active
        tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
        serial: 8250: Prevent starting up DMA Rx on THRI interrupt
        dt-bindings: serial: renesas,scif: Fix 4th IRQ for 4-IRQ SCIFs
        tty: serial: sh-sci: Fix transmit end interrupt handler
      aa46fe36
    • Linus Torvalds's avatar
      Merge tag 'usb-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · a211b1c0
      Linus Torvalds authored
      Pull USB bugfixes from Greg KH:
       "Here are some small USB bugfixes for 6.3-rc6 that have been in my
        tree, and in linux-next, for a while. Included in here are:
      
         - new usb-serial driver device ids
      
         - xhci bugfixes for reported problems
      
         - gadget driver bugfixes for reported problems
      
         - dwc3 new device id
      
        All have been in linux-next with no reported problems"
      
      * tag 'usb-6.3-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: cdnsp: Fixes error: uninitialized symbol 'len'
        usb: gadgetfs: Fix ep_read_iter to handle ITER_UBUF
        usb: gadget: f_fs: Fix ffs_epfile_read_iter to handle ITER_UBUF
        usb: typec: altmodes/displayport: Fix configure initial pin assignment
        usb: dwc3: pci: add support for the Intel Meteor Lake-S
        xhci: Free the command allocated for setting LPM if we return early
        Revert "usb: xhci-pci: Set PROBE_PREFER_ASYNCHRONOUS"
        xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
        USB: serial: option: add Quectel RM500U-CN modem
        usb: xhci: tegra: fix sleep in atomic call
        USB: serial: option: add Telit FE990 compositions
        USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
      a211b1c0
    • Linus Torvalds's avatar
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · a79d5c76
      Linus Torvalds authored
      Pull SCSI fixes from James Bottomley:
       "Four small fixes, all in drivers. They're all one or two lines except
        for the ufs one, but that's a simple revert of a previous feature"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
        scsi: qla2xxx: Fix memory leak in qla2x00_probe_one()
        scsi: mpi3mr: Handle soft reset in progress fault code (0xF002)
        scsi: Revert "scsi: ufs: core: Initialize devfreq synchronously"
      a79d5c76
    • Linus Torvalds's avatar
      Merge tag 'block-6.3-2023-04-06' of git://git.kernel.dk/linux · da0af3c5
      Linus Torvalds authored
      Pull block fixes from Jens Axboe:
      
       - Ensure that ublk always reads the whole sqe upfront (me)
      
       - Fix for a block size probing issue with ublk (Ming)
      
       - Fix for the bio based polling (Keith)
      
       - NVMe pull request via Christoph:
            - fix discard support without oncs (Keith Busch)
      
       - Partition scan error handling regression fix (Yu)
      
      * tag 'block-6.3-2023-04-06' of git://git.kernel.dk/linux:
        block: don't set GD_NEED_PART_SCAN if scan partition failed
        block: ublk: make sure that block size is set correctly
        ublk: read any SQE values upfront
        nvme: fix discard support without oncs
        blk-mq: directly poll requests
      da0af3c5
    • Linus Torvalds's avatar
      Merge tag 'io_uring-6.3-2023-04-06' of git://git.kernel.dk/linux · d3f05a4c
      Linus Torvalds authored
      Pull io_uring fixes from Jens Axboe:
       "Just two minor fixes for provided buffers - one where we could
        potentially leak a buffer, and one where the returned values was
        off-by-one in some cases"
      
      * tag 'io_uring-6.3-2023-04-06' of git://git.kernel.dk/linux:
        io_uring: fix memory leak when removing provided buffers
        io_uring: fix return value when removing provided buffers
      d3f05a4c
    • Linus Torvalds's avatar
      Merge tag 'dma-mapping-6.3-2023-04-08' of git://git.infradead.org/users/hch/dma-mapping · 973ad544
      Linus Torvalds authored
      Pull dma-mapping fix from Christoph Hellwig:
      
       - fix a braino in the swiotlb alignment check fix (Petr Tesarik)
      
      * tag 'dma-mapping-6.3-2023-04-08' of git://git.infradead.org/users/hch/dma-mapping:
        swiotlb: fix a braino in the alignment check fix
      973ad544
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.3-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 1a8a804a
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "A couple more minor fixes:
      
         - Reset direct->addr back to its original value on error in updating
           the direct trampoline code
      
         - Make lastcmd_mutex static"
      
      * tag 'trace-v6.3-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing/synthetic: Make lastcmd_mutex static
        ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
      1a8a804a
    • Linus Torvalds's avatar
      Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of... · 6fda0bb8
      Linus Torvalds authored
      Merge tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
      
      Pull MM fixes from Andrew Morton:
       "28 hotfixes.
      
        23 are cc:stable and the other five address issues which were
        introduced during this merge cycle.
      
        20 are for MM and the remainder are for other subsystems"
      
      * tag 'mm-hotfixes-stable-2023-04-07-16-23' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
        maple_tree: fix a potential concurrency bug in RCU mode
        maple_tree: fix get wrong data_end in mtree_lookup_walk()
        mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
        nilfs2: fix sysfs interface lifetime
        mm: take a page reference when removing device exclusive entries
        mm: vmalloc: avoid warn_alloc noise caused by fatal signal
        nilfs2: initialize "struct nilfs_binfo_dat"->bi_pad field
        nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
        zsmalloc: document freeable stats
        zsmalloc: document new fullness grouping
        fsdax: force clear dirty mark if CoW
        mm/hugetlb: fix uffd wr-protection for CoW optimization path
        mm: enable maple tree RCU mode by default
        maple_tree: add RCU lock checking to rcu callback functions
        maple_tree: add smp_rmb() to dead node detection
        maple_tree: fix write memory barrier of nodes once dead for RCU mode
        maple_tree: remove extra smp_wmb() from mas_dead_leaves()
        maple_tree: fix freeing of nodes in rcu mode
        maple_tree: detect dead nodes in mas_start()
        maple_tree: be more cautious about dead nodes
        ...
      6fda0bb8