1. 12 Oct, 2024 3 commits
  2. 09 Oct, 2024 11 commits
    • Kent Overstreet's avatar
      bcachefs: __wait_for_freeing_inode: Switch to wait_bit_queue_entry · 3b80552e
      Kent Overstreet authored
      inode_bit_waitqueue() is changing - this update clears the way for
      sched changes.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      3b80552e
    • Kent Overstreet's avatar
      bcachefs: Check if stuck in journal_res_get() · a7e2dd58
      Kent Overstreet authored
      Like how we already do when the allocator seems to be stuck, check if
      we're waiting too long for a journal reservation and print some debug
      info.
      
      This is specifically to track down
      https://github.com/koverstreet/bcachefs/issues/656
      
      which is showing up in userspace where we don't have sysfs/debugfs to
      get the journal debug info.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      a7e2dd58
    • Kent Overstreet's avatar
      closures: Add closure_wait_event_timeout() · 04b670de
      Kent Overstreet authored
      Add a closure version of wait_event_timeout(), with the same semantics.
      
      The closure version is useful because unlike wait_event(), it allows
      blocking code to run in the conditional expression.
      
      Cc: Coly Li <colyli@suse.de>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      04b670de
    • Alan Huang's avatar
      bcachefs: Fix state lock involved deadlock · 9205d24c
      Alan Huang authored
      We increased write ref, if the fs went to RO, that would lead to
      a deadlock, it actually happens:
      
      00171 ========= TEST   generic/279
      00171
      00172 bcachefs (vdb): starting version 1.12: rebalance_work_acct_fix opts=nocow
      00172 bcachefs (vdb): recovering from clean shutdown, journal seq 35
      00172 bcachefs (vdb): accounting_read... done
      00172 bcachefs (vdb): alloc_read... done
      00172 bcachefs (vdb): stripes_read... done
      00172 bcachefs (vdb): snapshots_read... done
      00172 bcachefs (vdb): journal_replay... done
      00172 bcachefs (vdb): resume_logged_ops... done
      00172 bcachefs (vdb): going read-write
      00172 bcachefs (vdb): done starting filesystem
      00172 FSTYP         -- bcachefs
      00172 PLATFORM      -- Linux/aarch64 farm3-kvm 6.11.0-rc1-ktest-g3e290a0b8e34 #7030 SMP Tue Oct  8 14:15:12 UTC 2024
      00172 MKFS_OPTIONS  -- --nocow /dev/vdc
      00172 MOUNT_OPTIONS -- /dev/vdc /mnt/scratch
      00172
      00172 bcachefs (vdc): starting version 1.12: rebalance_work_acct_fix opts=nocow
      00172 bcachefs (vdc): initializing new filesystem
      00172 bcachefs (vdc): going read-write
      00172 bcachefs (vdc): marking superblocks
      00172 bcachefs (vdc): initializing freespace
      00172 bcachefs (vdc): done initializing freespace
      00172 bcachefs (vdc): reading snapshots table
      00172 bcachefs (vdc): reading snapshots done
      00172 bcachefs (vdc): done starting filesystem
      00173 bcachefs (vdc): shutting down
      00173 bcachefs (vdc): going read-only
      00173 bcachefs (vdc): finished waiting for writes to stop
      00173 bcachefs (vdc): flushing journal and stopping allocators, journal seq 4
      00173 bcachefs (vdc): flushing journal and stopping allocators complete, journal seq 6
      00173 bcachefs (vdc): shutdown complete, journal seq 7
      00173 bcachefs (vdc): marking filesystem clean
      00173 bcachefs (vdc): shutdown complete
      00173 bcachefs (vdb): shutting down
      00173 bcachefs (vdb): going read-only
      00361 INFO: task umount:6180 blocked for more than 122 seconds.
      00361 Not tainted 6.11.0-rc1-ktest-g3e290a0b8e34 #7030
      00361 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      00361 task:umount          state:D stack:0     pid:6180  tgid:6180  ppid:6176   flags:0x00000004
      00361 Call trace:
      00362 __switch_to (arch/arm64/kernel/process.c:556)
      00362 __schedule (kernel/sched/core.c:5191 kernel/sched/core.c:6529)
      00363 schedule (include/asm-generic/bitops/generic-non-atomic.h:128 include/linux/thread_info.h:192 include/linux/sched.h:2084 kernel/sched/core.c:6608 kernel/sched/core.c:6621)
      00365 bch2_fs_read_only (fs/bcachefs/super.c:346 (discriminator 41))
      00367 __bch2_fs_stop (fs/bcachefs/super.c:620)
      00368 bch2_put_super (fs/bcachefs/fs.c:1942)
      00369 generic_shutdown_super (include/linux/list.h:373 (discriminator 2) fs/super.c:650 (discriminator 2))
      00371 bch2_kill_sb (fs/bcachefs/fs.c:2170)
      00372 deactivate_locked_super (fs/super.c:434 fs/super.c:475)
      00373 deactivate_super (fs/super.c:508)
      00374 cleanup_mnt (fs/namespace.c:250 fs/namespace.c:1374)
      00376 __cleanup_mnt (fs/namespace.c:1381)
      00376 task_work_run (include/linux/sched.h:2024 kernel/task_work.c:224)
      00377 do_notify_resume (include/linux/resume_user_mode.h:50 arch/arm64/kernel/entry-common.c:151)
      00377 el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:171 arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713)
      00377 el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:731)
      00378 el0t_64_sync (arch/arm64/kernel/entry.S:598)
      00378 INFO: task tee:6182 blocked for more than 122 seconds.
      00378 Not tainted 6.11.0-rc1-ktest-g3e290a0b8e34 #7030
      00378 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      00378 task:tee             state:D stack:0     pid:6182  tgid:6182  ppid:533    flags:0x00000004
      00378 Call trace:
      00378 __switch_to (arch/arm64/kernel/process.c:556)
      00378 __schedule (kernel/sched/core.c:5191 kernel/sched/core.c:6529)
      00378 schedule (include/asm-generic/bitops/generic-non-atomic.h:128 include/linux/thread_info.h:192 include/linux/sched.h:2084 kernel/sched/core.c:6608 kernel/sched/core.c:6621)
      00378 schedule_preempt_disabled (kernel/sched/core.c:6680)
      00379 rwsem_down_read_slowpath (kernel/locking/rwsem.c:1073 (discriminator 1))
      00379 down_read (kernel/locking/rwsem.c:1529)
      00381 bch2_gc_gens (fs/bcachefs/sb-members.h:77 fs/bcachefs/sb-members.h:88 fs/bcachefs/sb-members.h:128 fs/bcachefs/btree_gc.c:1240)
      00383 bch2_fs_store_inner (fs/bcachefs/sysfs.c:473)
      00385 bch2_fs_internal_store (fs/bcachefs/sysfs.c:417 fs/bcachefs/sysfs.c:580 fs/bcachefs/sysfs.c:576)
      00386 sysfs_kf_write (fs/sysfs/file.c:137)
      00387 kernfs_fop_write_iter (fs/kernfs/file.c:334)
      00389 vfs_write (fs/read_write.c:497 fs/read_write.c:590)
      00390 ksys_write (fs/read_write.c:643)
      00391 __arm64_sys_write (fs/read_write.c:652)
      00391 invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54)
      00392 do_el0_svc (include/linux/thread_info.h:127 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2))
      00392 el0_svc (arch/arm64/include/asm/irqflags.h:55 arch/arm64/include/asm/irqflags.h:76 arch/arm64/kernel/entry-common.c:165 arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713)
      00392 el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:731)
      00392 el0t_64_sync (arch/arm64/kernel/entry.S:598)
      Signed-off-by: default avatarAlan Huang <mmpgouride@gmail.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      9205d24c
    • Mohammed Anees's avatar
      bcachefs: Fix NULL pointer dereference in bch2_opt_to_text · a30f3222
      Mohammed Anees authored
      This patch adds a bounds check to the bch2_opt_to_text function to prevent
      NULL pointer dereferences when accessing the opt->choices array. This
      ensures that the index used is within valid bounds before dereferencing.
      The new version enhances the readability.
      
      Reported-and-tested-by: syzbot+37186860aa7812b331d5@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=37186860aa7812b331d5Signed-off-by: default avatarMohammed Anees <pvmohammedanees2003@gmail.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      a30f3222
    • Alan Huang's avatar
      bcachefs: Release transaction before wake up · a1541541
      Alan Huang authored
      We will get this if we wake up first:
      
      Kernel panic - not syncing: btree_node_write_done leaked btree_trans
      
      since there are still transactions waiting for cycle detectors after
      BTREE_NODE_write_in_flight is cleared.
      Signed-off-by: default avatarAlan Huang <mmpgouride@gmail.com>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      a1541541
    • Piotr Zalewski's avatar
      bcachefs: add check for btree id against max in try read node · 0151d10a
      Piotr Zalewski authored
      Add check for read node's btree_id against BTREE_ID_NR_MAX in
      try_read_btree_node to prevent triggering EBUG_ON condition in
      bch2_btree_id_root[1].
      
      [1] https://syzkaller.appspot.com/bug?extid=cf7b2215b5d70600ec00
      
      Reported-by: syzbot+cf7b2215b5d70600ec00@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=cf7b2215b5d70600ec00
      Fixes: 4409b808 ("bcachefs: Repair pass for scanning for btree nodes")
      Signed-off-by: default avatarPiotr Zalewski <pZ010001011111@proton.me>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      0151d10a
    • Kent Overstreet's avatar
      bcachefs: Disk accounting device validation fixes · 19773ec9
      Kent Overstreet authored
      - Fix failure to validate that accounting replicas entries point to
        valid devices: this wasn't a real bug since they'd be cleaned up by
        GC, but is still something we should know about
      
      - Fix failure to validate that dev_data_type entries point to valid
        devices: this does fix a real bug, since bch2_accounting_read() would
        then try to copy the counters to that device and pop an inconsistent
        error when the device didn't exist
      
      - Remove accounting entries that are zeroed or invalid: if we're not
        validating them we need to get rid of them: they might not exist in
        the superblock, so we need the to trigger the superblock mark path
        when they're readded.
      
        This fixes the replication.ktest rereplicate test, which was failing
        with "superblock not marked for replicas..."
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      19773ec9
    • Kent Overstreet's avatar
      bcachefs: bch2_inode_or_descendents_is_open() · 9d861787
      Kent Overstreet authored
      fsck can now correctly check if inodes in interior snapshot nodes are
      open/in use.
      
      - Tweak the vfs inode rhashtable so that the subvolume ID isn't hashed,
        meaning inums in different subvolumes will hash to the same slot. Note
        that this is a hack, and will cause problems if anyone ever has the
        same file in many different snapshots open all at the same time.
      
      - Then check if any of those subvolumes is a descendent of the snapshot
        ID being checked
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      9d861787
    • Kent Overstreet's avatar
      84878e82
    • Kent Overstreet's avatar
      bcachefs: bcachefs_metadata_version_inode_has_child_snapshots · 9b23fdbd
      Kent Overstreet authored
      There's an inherent race in taking a snapshot while an unlinked file is
      open, and then reattaching it in the child snapshot.
      
      In the interior snapshot node the file will appear unlinked, as though
      it should be deleted - it's not referenced by anything in that snapshot
      - but we can't delete it, because the file data is referenced by the
      child snapshot.
      
      This was being handled incorrectly with
      propagate_key_to_snapshot_leaves() - but that doesn't resolve the
      fundamental inconsistency of "this file looks like it should be deleted
      according to normal rules, but - ".
      
      To fix this, we need to fix the rule for when an inode is deleted. The
      previous rule, ignoring snapshots (there was no well-defined rule
      for with snapshots) was:
        Unlinked, non open files are deleted, either at recovery time or
        during online fsck
      
      The new rule is:
        Unlinked, non open files, that do not exist in child snapshots, are
        deleted.
      
      To make this work transactionally, we add a new inode flag,
      BCH_INODE_has_child_snapshot; it overrides BCH_INODE_unlinked when
      considering whether to delete an inode, or put it on the deleted list.
      
      For transactional consistency, clearing it handled by the inode trigger:
      when deleting an inode we check if there are parent inodes which can now
      have the BCH_INODE_has_child_snapshot flag cleared.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      9b23fdbd
  3. 06 Oct, 2024 5 commits
    • Kent Overstreet's avatar
      bcachefs: Delete vestigal check_inode() checks · cba31b7e
      Kent Overstreet authored
      BCH_INODE_i_size_dirty dates from before we had logged operations for
      truncate (as well as finsert) - it hasn't been needed since before
      bcachefs was mainlined.
      
      BCH_INODE_i_sectors_dirty hasn't been needed since we started always
      updating i_sectors transactionally - it's been unused for even longer.
      
      BCH_INODE_backptr_untrusted also hasn't been used since prior to
      mainlining; when unlinking a hardling, we zero out the backpointer
      fields if they're for the dirent being removed.
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      cba31b7e
    • Kent Overstreet's avatar
      bcachefs: btree_iter_peek_upto() now handles BTREE_ITER_all_snapshots · 12f28608
      Kent Overstreet authored
      end_pos now compares against snapshot ID when required
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      12f28608
    • Kent Overstreet's avatar
      bcachefs: reattach_inode() now correctly handles interior snapshot nodes · 38864ecc
      Kent Overstreet authored
      When we find an unreachable inode, we now reattach it in the oldest
      version that needs to be reattached (thus avoiding redundant work
      reattaching every single version), and we now fix up inode -> dirent
      backpointers in newer versions as needed - or white out the reattaching
      dirent in newer versions, if the newer version isn't supposed to be
      reattached.
      
      This results in the second verify fsck now passing cleanly after
      repairing on a user-provided filesystem image with thousands of
      different snapshots.
      Reported-by: default avatarChristopher Snowhill <chris@kode54.net>
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      38864ecc
    • Kent Overstreet's avatar
      bcachefs: Split out check_unreachable_inodes() pass · bade9711
      Kent Overstreet authored
      With inode backpointers, we can write a very simple
      check_unreachable_inodes() pass that only looks for non-unlinked inodes
      that are missing backpointers, and reattaches them.
      
      This simplifies check_directory_structure() so that it's now only
      checking for directory structure loops,
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      bade9711
    • Kent Overstreet's avatar
      bcachefs: Fix lockdep splat in bch2_accounting_read · bf4baaa0
      Kent Overstreet authored
      We can't take sb_lock while holding mark_lock, so split out
      replicas_entry_validate() and replicas_entry_sb_validate() -
      replicas_entry_validate() now uses the normal online device interface.
      
      00039 ========= TEST   set_option
      00039
      00039 WATCHDOG 30
      00040 bcachefs (vdb): starting version 1.12: rebalance_work_acct_fix opts=errors=panic
      00040 bcachefs (vdb): initializing new filesystem
      00040 bcachefs (vdb): going read-write
      00040 bcachefs (vdb): marking superblocks
      00040 bcachefs (vdb): initializing freespace
      00040 bcachefs (vdb): done initializing freespace
      00040 bcachefs (vdb): reading snapshots table
      00040 bcachefs (vdb): reading snapshots done
      00040 bcachefs (vdb): done starting filesystem
      00040 zstd
      00041 bcachefs (vdb): shutting down
      00041 bcachefs (vdb): going read-only
      00041 bcachefs (vdb): finished waiting for writes to stop
      00041 bcachefs (vdb): flushing journal and stopping allocators, journal seq 3
      00041 bcachefs (vdb): flushing journal and stopping allocators complete, journal seq 11
      00041 bcachefs (vdb): shutdown complete, journal seq 12
      00041 bcachefs (vdb): marking filesystem clean
      00041 bcachefs (vdb): shutdown complete
      00041 Setting option on offline fs
      00041 bch2_write_super(): fatal error : attempting to write superblock that wasn't version downgraded (1.12: (unknown version) > 1.10: disk_accounting_v3)
      00041 fatal error - emergency read only
      00041 bch2_write_super(): fatal error : attempting to write superblock that wasn't version downgraded (1.12: (unknown version) > 1.10: disk_accounting_v3)
      00042 bcachefs (vdb): starting version 1.12: rebalance_work_acct_fix opts=errors=panic,compression=zstd
      00042 bcachefs (vdb): recovering from clean shutdown, journal seq 12
      00042 bcachefs (vdb): accounting_read...
      00042
      00042 ======================================================
      00042 WARNING: possible circular locking dependency detected
      00042 6.12.0-rc1-ktest-g805e938a8502 #6807 Not tainted
      00042 ------------------------------------------------------
      00042 mount.bcachefs/665 is trying to acquire lock:
      00045 ffffff80cc280908 (&c->sb_lock){+.+.}-{3:3}, at: bch2_replicas_entry_validate (fs/bcachefs/replicas.c:102)
      00045
      00045 but task is already holding lock:
      00048 ffffff80cc284870 (&c->mark_lock){++++}-{0:0}, at: bch2_accounting_read (fs/bcachefs/disk_accounting.c:670 (discriminator 1))
      00048
      00048 which lock already depends on the new lock.
      00048
      00048
      00048 the existing dependency chain (in reverse order) is:
      00048
      00048 -> #1 (&c->mark_lock){++++}-{0:0}:
      00049 percpu_down_write (kernel/locking/percpu-rwsem.c:232)
      00052 bch2_sb_replicas_to_cpu_replicas (fs/bcachefs/replicas.c:583)
      00055 bch2_sb_to_fs (fs/bcachefs/super-io.c:614)
      00057 bch2_fs_open (fs/bcachefs/super.c:828 fs/bcachefs/super.c:2050)
      00060 bch2_fs_get_tree (fs/bcachefs/fs.c:2067)
      00062 vfs_get_tree (fs/super.c:1801)
      00064 path_mount (fs/namespace.c:3507 fs/namespace.c:3834)
      00066 __arm64_sys_mount (fs/namespace.c:3847 fs/namespace.c:4055 fs/namespace.c:4032 fs/namespace.c:4032)
      00067 invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54)
      00068 do_el0_svc (include/linux/thread_info.h:127 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2))
      00069 el0_svc (arch/arm64/include/asm/irqflags.h:82 arch/arm64/include/asm/irqflags.h:123 arch/arm64/include/asm/irqflags.h:136 arch/arm64/kernel/entry-common.c:165 arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713)
      00069 ========= FAILED TIMEOUT set_option in 30s
      Signed-off-by: default avatarKent Overstreet <kent.overstreet@linux.dev>
      bf4baaa0
  4. 05 Oct, 2024 14 commits
  5. 03 Oct, 2024 3 commits
  6. 01 Oct, 2024 1 commit
  7. 30 Sep, 2024 1 commit
  8. 29 Sep, 2024 2 commits
    • Linus Torvalds's avatar
      Linux 6.12-rc1 · 9852d85e
      Linus Torvalds authored
      9852d85e
    • Linus Torvalds's avatar
      x86: kvm: fix build error · 3f749bef
      Linus Torvalds authored
      The cpu_emergency_register_virt_callback() function is used
      unconditionally by the x86 kvm code, but it is declared (and defined)
      conditionally:
      
        #if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
        void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
        ...
      
      leading to a build error when neither KVM_INTEL nor KVM_AMD support is
      enabled:
      
        arch/x86/kvm/x86.c: In function ‘kvm_arch_enable_virtualization’:
        arch/x86/kvm/x86.c:12517:9: error: implicit declaration of function ‘cpu_emergency_register_virt_callback’ [-Wimplicit-function-declaration]
        12517 |         cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
              |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        arch/x86/kvm/x86.c: In function ‘kvm_arch_disable_virtualization’:
        arch/x86/kvm/x86.c:12522:9: error: implicit declaration of function ‘cpu_emergency_unregister_virt_callback’ [-Wimplicit-function-declaration]
        12522 |         cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable_virtualization_cpu);
              |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fix the build by defining empty helper functions the same way the old
      cpu_emergency_disable_virtualization() function was dealt with for the
      same situation.
      
      Maybe we could instead have made the call sites conditional, since the
      callers (kvm_arch_{en,dis}able_virtualization()) have an empty weak
      fallback.  I'll leave that to the kvm people to argue about, this at
      least gets the build going for that particular config.
      
      Fixes: 590b09b1 ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Kai Huang <kai.huang@intel.com>
      Cc: Chao Gao <chao.gao@intel.com>
      Cc: Farrah Chen <farrah.chen@intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3f749bef