1. 23 Oct, 2023 3 commits
    • Linus Torvalds's avatar
      Merge tag 'for-6.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · e017769f
      Linus Torvalds authored
      Pull btrfs fix from David Sterba:
       "One more fix for a problem with snapshot of a newly created subvolume
        that can lead to inconsistent data under some circumstances. Kernel
        6.5 added a performance optimization to skip transaction commit for
        subvolume creation but this could end up with newer data on disk but
        not linked to other structures.
      
        The fix itself is an added condition, the rest of the patch is a
        parameter added to several functions"
      
      * tag 'for-6.6-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: fix unwritten extent buffer after snapshotting a new subvolume
      e017769f
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 7c145640
      Linus Torvalds authored
      Pull virtio fixes from Michael Tsirkin:
       "A collection of small fixes that look like worth having in this
        release"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_pci: fix the common cfg map size
        virtio-crypto: handle config changed by work queue
        vhost: Allow null msg.size on VHOST_IOTLB_INVALIDATE
        vdpa/mlx5: Fix firmware error on creation of 1k VQs
        virtio_balloon: Fix endless deflation and inflation on arm64
        vdpa/mlx5: Fix double release of debugfs entry
        virtio-mmio: fix memory leak of vm_dev
        vdpa_sim_blk: Fix the potential leak of mgmt_dev
        tools/virtio: Add dma sync api for virtio test
      7c145640
    • Filipe Manana's avatar
      btrfs: fix unwritten extent buffer after snapshotting a new subvolume · eb96e221
      Filipe Manana authored
      When creating a snapshot of a subvolume that was created in the current
      transaction, we can end up not persisting a dirty extent buffer that is
      referenced by the snapshot, resulting in IO errors due to checksum failures
      when trying to read the extent buffer later from disk. A sequence of steps
      that leads to this is the following:
      
      1) At ioctl.c:create_subvol() we allocate an extent buffer, with logical
         address 36007936, for the leaf/root of a new subvolume that has an ID
         of 291. We mark the extent buffer as dirty, and at this point the
         subvolume tree has a single node/leaf which is also its root (level 0);
      
      2) We no longer commit the transaction used to create the subvolume at
         create_subvol(). We used to, but that was recently removed in
         commit 1b53e51a ("btrfs: don't commit transaction for every subvol
         create");
      
      3) The transaction used to create the subvolume has an ID of 33, so the
         extent buffer 36007936 has a generation of 33;
      
      4) Several updates happen to subvolume 291 during transaction 33, several
         files created and its tree height changes from 0 to 1, so we end up with
         a new root at level 1 and the extent buffer 36007936 is now a leaf of
         that new root node, which is extent buffer 36048896.
      
         The commit root remains as 36007936, since we are still at transaction
         33;
      
      5) Creation of a snapshot of subvolume 291, with an ID of 292, starts at
         ioctl.c:create_snapshot(). This triggers a commit of transaction 33 and
         we end up at transaction.c:create_pending_snapshot(), in the critical
         section of a transaction commit.
      
         There we COW the root of subvolume 291, which is extent buffer 36048896.
         The COW operation returns extent buffer 36048896, since there's no need
         to COW because the extent buffer was created in this transaction and it
         was not written yet.
      
         The we call btrfs_copy_root() against the root node 36048896. During
         this operation we allocate a new extent buffer to turn into the root
         node of the snapshot, copy the contents of the root node 36048896 into
         this snapshot root extent buffer, set the owner to 292 (the ID of the
         snapshot), etc, and then we call btrfs_inc_ref(). This will create a
         delayed reference for each leaf pointed by the root node with a
         reference root of 292 - this includes a reference for the leaf
         36007936.
      
         After that we set the bit BTRFS_ROOT_FORCE_COW in the root's state.
      
         Then we call btrfs_insert_dir_item(), to create the directory entry in
         in the tree of subvolume 291 that points to the snapshot. This ends up
         needing to modify leaf 36007936 to insert the respective directory
         items. Because the bit BTRFS_ROOT_FORCE_COW is set for the root's state,
         we need to COW the leaf. We end up at btrfs_force_cow_block() and then
         at update_ref_for_cow().
      
         At update_ref_for_cow() we call btrfs_block_can_be_shared() which
         returns false, despite the fact the leaf 36007936 is shared - the
         subvolume's root and the snapshot's root point to that leaf. The
         reason that it incorrectly returns false is because the commit root
         of the subvolume is extent buffer 36007936 - it was the initial root
         of the subvolume when we created it. So btrfs_block_can_be_shared()
         which has the following logic:
      
         int btrfs_block_can_be_shared(struct btrfs_root *root,
                                       struct extent_buffer *buf)
         {
             if (test_bit(BTRFS_ROOT_SHAREABLE, &root->state) &&
                 buf != root->node && buf != root->commit_root &&
                 (btrfs_header_generation(buf) <=
                  btrfs_root_last_snapshot(&root->root_item) ||
                  btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
                     return 1;
      
             return 0;
         }
      
         Returns false (0) since 'buf' (extent buffer 36007936) matches the
         root's commit root.
      
         As a result, at update_ref_for_cow(), we don't check for the number
         of references for extent buffer 36007936, we just assume it's not
         shared and therefore that it has only 1 reference, so we set the local
         variable 'refs' to 1.
      
         Later on, in the final if-else statement at update_ref_for_cow():
      
         static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans,
                                                struct btrfs_root *root,
                                                struct extent_buffer *buf,
                                                struct extent_buffer *cow,
                                                int *last_ref)
         {
            (...)
            if (refs > 1) {
                (...)
            } else {
                (...)
                btrfs_clear_buffer_dirty(trans, buf);
                *last_ref = 1;
            }
         }
      
         So we mark the extent buffer 36007936 as not dirty, and as a result
         we don't write it to disk later in the transaction commit, despite the
         fact that the snapshot's root points to it.
      
         Attempting to access the leaf or dumping the tree for example shows
         that the extent buffer was not written:
      
         $ btrfs inspect-internal dump-tree -t 292 /dev/sdb
         btrfs-progs v6.2.2
         file tree key (292 ROOT_ITEM 33)
         node 36110336 level 1 items 2 free space 119 generation 33 owner 292
         node 36110336 flags 0x1(WRITTEN) backref revision 1
         checksum stored a8103e3e
         checksum calced a8103e3e
         fs uuid 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79
         chunk uuid e8c9c885-78f4-4d31-85fe-89e5f5fd4a07
                 key (256 INODE_ITEM 0) block 36007936 gen 33
                 key (257 EXTENT_DATA 0) block 36052992 gen 33
         checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29
         checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29
         total bytes 107374182400
         bytes used 38572032
         uuid 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79
      
         The respective on disk region is full of zeroes as the device was
         trimmed at mkfs time.
      
         Obviously 'btrfs check' also detects and complains about this:
      
         $ btrfs check /dev/sdb
         Opening filesystem to check...
         Checking filesystem on /dev/sdb
         UUID: 90c9a46f-ae9f-4626-9aff-0cbf3e2e3a79
         generation: 33 (33)
         [1/7] checking root items
         [2/7] checking extents
         checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29
         checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29
         checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29
         bad tree block 36007936, bytenr mismatch, want=36007936, have=0
         owner ref check failed [36007936 4096]
         ERROR: errors found in extent allocation tree or chunk allocation
         [3/7] checking free space tree
         [4/7] checking fs roots
         checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29
         checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29
         checksum verify failed on 36007936 wanted 0x00000000 found 0x86005f29
         bad tree block 36007936, bytenr mismatch, want=36007936, have=0
         The following tree block(s) is corrupted in tree 292:
              tree block bytenr: 36110336, level: 1, node key: (256, 1, 0)
         root 292 root dir 256 not found
         ERROR: errors found in fs roots
         found 38572032 bytes used, error(s) found
         total csum bytes: 16048
         total tree bytes: 1265664
         total fs tree bytes: 1118208
         total extent tree bytes: 65536
         btree space waste bytes: 562598
         file data blocks allocated: 65978368
          referenced 36569088
      
      Fix this by updating btrfs_block_can_be_shared() to consider that an
      extent buffer may be shared if it matches the commit root and if its
      generation matches the current transaction's generation.
      
      This can be reproduced with the following script:
      
         $ cat test.sh
         #!/bin/bash
      
         MNT=/mnt/sdi
         DEV=/dev/sdi
      
         # Use a filesystem with a 64K node size so that we have the same node
         # size on every machine regardless of its page size (on x86_64 default
         # node size is 16K due to the 4K page size, while on PPC it's 64K by
         # default). This way we can make sure we are able to create a btree for
         # the subvolume with a height of 2.
         mkfs.btrfs -f -n 64K $DEV
         mount $DEV $MNT
      
         btrfs subvolume create $MNT/subvol
      
         # Create a few empty files on the subvolume, this bumps its btree
         # height to 2 (root node at level 1 and 2 leaves).
         for ((i = 1; i <= 300; i++)); do
             echo -n > $MNT/subvol/file_$i
         done
      
         btrfs subvolume snapshot -r $MNT/subvol $MNT/subvol/snap
      
         umount $DEV
      
         btrfs check $DEV
      
      Running it on a 6.5 kernel (or any 6.6-rc kernel at the moment):
      
         $ ./test.sh
         Create subvolume '/mnt/sdi/subvol'
         Create a readonly snapshot of '/mnt/sdi/subvol' in '/mnt/sdi/subvol/snap'
         Opening filesystem to check...
         Checking filesystem on /dev/sdi
         UUID: bbdde2ff-7d02-45ca-8a73-3c36f23755a1
         [1/7] checking root items
         [2/7] checking extents
         parent transid verify failed on 30539776 wanted 7 found 5
         parent transid verify failed on 30539776 wanted 7 found 5
         parent transid verify failed on 30539776 wanted 7 found 5
         Ignoring transid failure
         owner ref check failed [30539776 65536]
         ERROR: errors found in extent allocation tree or chunk allocation
         [3/7] checking free space tree
         [4/7] checking fs roots
         parent transid verify failed on 30539776 wanted 7 found 5
         Ignoring transid failure
         Wrong key of child node/leaf, wanted: (256, 1, 0), have: (2, 132, 0)
         Wrong generation of child node/leaf, wanted: 5, have: 7
         root 257 root dir 256 not found
         ERROR: errors found in fs roots
         found 917504 bytes used, error(s) found
         total csum bytes: 0
         total tree bytes: 851968
         total fs tree bytes: 393216
         total extent tree bytes: 65536
         btree space waste bytes: 736550
         file data blocks allocated: 0
          referenced 0
      
      A test case for fstests will follow soon.
      
      Fixes: 1b53e51a ("btrfs: don't commit transaction for every subvol create")
      CC: stable@vger.kernel.org # 6.5+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      eb96e221
  2. 22 Oct, 2023 4 commits
    • Linus Torvalds's avatar
      Linux 6.6-rc7 · 05d3ef8b
      Linus Torvalds authored
      05d3ef8b
    • Linus Torvalds's avatar
      Merge tag 'phy-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy · fe3cfe86
      Linus Torvalds authored
      Pull phy fixes from Vinod Koul:
      
       - mapphone-mdm6600 runtime pm & pinctrl handling fixes
      
       - Qualcomm qmp usb pcs register fixes, qmp pcie register size warning
         fix, m31 fixes for wrong pointer in PTR_ERR and dropping wrong vreg
         check, qmp combo fix for 8550 power config register
      
       - realtek usb fix for debugfs_create_dir() and kconfig dependency
      
      * tag 'phy-fixes-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
        phy: realtek: Realtek PHYs should depend on ARCH_REALTEK
        phy: qualcomm: Fix typos in comments
        phy: qcom-qmp-combo: initialize PCS_USB registers
        phy: qcom-qmp-combo: Square out 8550 POWER_STATE_CONFIG1
        phy: qcom: m31: Remove unwanted qphy->vreg is NULL check
        phy: realtek: usb: Drop unnecessary error check for debugfs_create_dir()
        phy: qcom: phy-qcom-m31: change m31_ipq5332_regs to static
        phy: qcom: phy-qcom-m31: fix wrong pointer pass to PTR_ERR()
        dt-bindings: phy: qcom,ipq8074-qmp-pcie: fix warning regarding reg size
        phy: qcom-qmp-usb: split PCS_USB init table for sc8280xp and sa8775p
        phy: qcom-qmp-usb: initialize PCS_USB registers
        phy: mapphone-mdm6600: Fix pinctrl_pm handling for sleep pins
        phy: mapphone-mdm6600: Fix runtime PM for remove
        phy: mapphone-mdm6600: Fix runtime disable on probe
      fe3cfe86
    • Linus Torvalds's avatar
      Merge tag 'efi-fixes-for-v6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi · 70e65afc
      Linus Torvalds authored
      Pull EFI fixes from Ard Biesheuvel:
       "The boot_params pointer fix uses a somewhat ugly extern struct
        declaration but this will be cleaned up the next cycle.
      
         - don't try to print warnings to the console when it is no longer
           available
      
         - fix theoretical memory leak in SSDT override handling
      
         - make sure that the boot_params global variable is set before the
           KASLR code attempts to hash it for 'randomness'
      
         - avoid soft lockups in the memory acceptance code"
      
      * tag 'efi-fixes-for-v6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
        efi/unaccepted: Fix soft lockups caused by parallel memory acceptance
        x86/boot: efistub: Assign global boot_params variable
        efi: fix memory leak in krealloc failure handling
        x86/efistub: Don't try to print after ExitBootService()
      70e65afc
    • Linus Torvalds's avatar
      Merge tag 'powerpc-6.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 1acfd2bd
      Linus Torvalds authored
      Pull powerpc fixes from Michael Ellerman:
      
       - Fix stale propagated yield_cpu in qspinlocks leading to lockups
      
       - Fix broken hugepages on some configs due to ARCH_FORCE_MAX_ORDER
      
       - Fix a spurious warning when copros are in use at exit time
      
      Thanks to Nicholas Piggin, Christophe Leroy, Nysal Jan K.A Sachin Sant,
      and Shrikanth Hegde.
      
      * tag 'powerpc-6.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/qspinlock: Fix stale propagated yield_cpu
        powerpc/64s/radix: Don't warn on copros in radix__tlb_flush()
        powerpc/mm: Allow ARCH_FORCE_MAX_ORDER up to 12
      1acfd2bd
  3. 21 Oct, 2023 10 commits
  4. 20 Oct, 2023 23 commits