1. 05 Jan, 2024 5 commits
  2. 14 Dec, 2023 4 commits
  3. 12 Dec, 2023 1 commit
    • Ye Bin's avatar
      jbd2: fix soft lockup in journal_finish_inode_data_buffers() · 6c02757c
      Ye Bin authored
      There's issue when do io test:
      WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170]
      CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G  OE
      Call trace:
       dump_backtrace+0x0/0x1a0
       show_stack+0x24/0x30
       dump_stack+0xb0/0x100
       watchdog_timer_fn+0x254/0x3f8
       __hrtimer_run_queues+0x11c/0x380
       hrtimer_interrupt+0xfc/0x2f8
       arch_timer_handler_phys+0x38/0x58
       handle_percpu_devid_irq+0x90/0x248
       generic_handle_irq+0x3c/0x58
       __handle_domain_irq+0x68/0xc0
       gic_handle_irq+0x90/0x320
       el1_irq+0xcc/0x180
       queued_spin_lock_slowpath+0x1d8/0x320
       jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2]
       kjournald2+0xec/0x2f0 [jbd2]
       kthread+0x134/0x138
       ret_from_fork+0x10/0x18
      
      Analyzed informations from vmcore as follows:
      (1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list';
      (2) Now is processing the 855th jbd2_inode;
      (3) JBD2 task has TIF_NEED_RESCHED flag;
      (4) There's no pags in address_space around the 855th jbd2_inode;
      (5) There are some process is doing drop caches;
      (6) Mounted with 'nodioread_nolock' option;
      (7) 128 CPUs;
      
      According to informations from vmcore we know 'journal->j_list_lock' spin lock
      competition is fierce. So journal_finish_inode_data_buffers() maybe process
      slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors().
      However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK,
      will not call cond_resched(). So may lead to soft lockup.
      journal_finish_inode_data_buffers
        filemap_fdatawait_range_keep_errors
          __filemap_fdatawait_range
            while (index <= end)
              nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK);
              if (!nr_pages)
                 break;    --> If 'nr_pages' is equal zero will break, then will not call cond_resched()
              for (i = 0; i < nr_pages; i++)
                wait_on_page_writeback(page);
              cond_resched();
      
      To solve above issue, add scheduling point in the journal_finish_inode_data_buffers();
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20231211112544.3879780-1-yebin10@huawei.comSigned-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      6c02757c
  4. 01 Dec, 2023 4 commits
  5. 27 Nov, 2023 2 commits
    • Linus Torvalds's avatar
      Linux 6.7-rc3 · 2cc14f52
      Linus Torvalds authored
      2cc14f52
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · 5b2b1173
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt::
       "Eventfs fixes:
      
         - With the usage of simple_recursive_remove() recommended by Al Viro,
           the code should not be calling "d_invalidate()" itself. Doing so is
           causing crashes. The code was calling d_invalidate() on the race of
           trying to look up a file while the parent was being deleted. This
           was detected, and the added dentry was having d_invalidate() called
           on it, but the deletion of the directory was also calling
           d_invalidate() on that same dentry.
      
         - A fix to not free the eventfs_inode (ei) until the last dput() was
           called on its ei->dentry made the ei->dentry exist even after it
           was marked for free by setting the ei->is_freed. But code elsewhere
           still was checking if ei->dentry was NULL if ei->is_freed is set
           and would trigger WARN_ON if that was the case. That's no longer
           true and there should not be any warnings when it is true.
      
         - Use GFP_NOFS for allocations done under eventfs_mutex. The
           eventfs_mutex can be taken on file system reclaim, make sure that
           allocations done under that mutex do not trigger file system
           reclaim.
      
         - Clean up code by moving the taking of inode_lock out of the helper
           functions and into where they are needed, and not use the parameter
           to know to take it or not. It must always be held but some callers
           of the helper function have it taken when they were called.
      
         - Warn if the inode_lock is not held in the helper functions.
      
         - Warn if eventfs_start_creating() is called without a parent. As
           eventfs is underneath tracefs, all files created will have a parent
           (the top one will have a tracefs parent).
      
        Tracing update:
      
         - Add Mathieu Desnoyers as an official reviewer of the tracing subsystem"
      
      * tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer
        eventfs: Make sure that parent->d_inode is locked in creating files/dirs
        eventfs: Do not allow NULL parent to eventfs_start_creating()
        eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
        eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
        eventfs: Do not invalidate dentry in create_file/dir_dentry()
        eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
      5b2b1173
  6. 26 Nov, 2023 6 commits
    • Linus Torvalds's avatar
      Merge tag 'parisc-for-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux · d2da77f4
      Linus Torvalds authored
      Pull parisc architecture fixes from Helge Deller:
       "This patchset fixes and enforces correct section alignments for the
        ex_table, altinstructions, parisc_unwind, jump_table and bug_table
        which are created by inline assembly.
      
        Due to not being correctly aligned at link & load time they can
        trigger unnecessarily the kernel unaligned exception handler at
        runtime. While at it, I switched the bug table to use relative
        addresses which reduces the size of the table by half on 64-bit.
      
        We still had the ENOSYM and EREMOTERELEASE errno symbols as left-overs
        from HP-UX, which now trigger build-issues with glibc. We can simply
        remove them.
      
        Most of the patches are tagged for stable kernel series.
      
        Summary:
      
         - Drop HP-UX ENOSYM and EREMOTERELEASE return codes to avoid glibc
           build issues
      
         - Fix section alignments for ex_table, altinstructions, parisc unwind
           table, jump_table and bug_table
      
         - Reduce size of bug_table on 64-bit kernel by using relative
           pointers"
      
      * tag 'parisc-for-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
        parisc: Reduce size of the bug_table on 64-bit kernel by half
        parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codes
        parisc: Use natural CPU alignment for bug_table
        parisc: Ensure 32-bit alignment on parisc unwind section
        parisc: Mark lock_aligned variables 16-byte aligned on SMP
        parisc: Mark jump_table naturally aligned
        parisc: Mark altinstructions read-only and 32-bit aligned
        parisc: Mark ex_table entries 32-bit aligned in uaccess.h
        parisc: Mark ex_table entries 32-bit aligned in assembly.h
      d2da77f4
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4892711a
      Linus Torvalds authored
      Pull x86 microcode fixes from Ingo Molnar:
       "Fix/enhance x86 microcode version reporting: fix the bootup log spam,
        and remove the driver version announcement to avoid version confusion
        when distros backport fixes"
      
      * tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/microcode: Rework early revisions reporting
        x86/microcode: Remove the driver announcement and version
      4892711a
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e81fe505
      Linus Torvalds authored
      Pull x86 perf event fix from Ingo Molnar:
       "Fix a bug in the Intel hybrid CPUs hardware-capabilities enumeration
        code resulting in non-working events on those platforms"
      
      * tag 'perf-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities
      e81fe505
    • Linus Torvalds's avatar
      Merge tag 'locking-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1d0dbc3d
      Linus Torvalds authored
      Pull locking fix from Ingo Molnar:
       "Fix lockdep block chain corruption resulting in KASAN warnings"
      
      * tag 'locking-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        lockdep: Fix block chain corruption
      1d0dbc3d
    • Linus Torvalds's avatar
      Merge tag '6.7-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 · 4515866d
      Linus Torvalds authored
      Pull smb client fixes from Steve French:
      
       - use after free fix in releasing multichannel interfaces
      
       - fixes for special file types (report char, block, FIFOs properly when
         created e.g. by NFS to Windows)
      
       - fixes for reporting various special file types and symlinks properly
         when using SMB1
      
      * tag '6.7-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
        smb: client: introduce cifs_sfu_make_node()
        smb: client: set correct file type from NFS reparse points
        smb: client: introduce ->parse_reparse_point()
        smb: client: implement ->query_reparse_point() for SMB1
        cifs: fix use after free for iface while disabling secondary channels
      4515866d
    • Linus Torvalds's avatar
      Merge tag 'usb-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 090472ed
      Linus Torvalds authored
      Pull USB / PHY / Thunderbolt fixes from Greg KH:
       "Here are a number of reverts, fixes, and new device ids for 6.7-rc3
        for the USB, PHY, and Thunderbolt driver subsystems. Include in here
        are:
      
         - reverts of some PHY drivers that went into 6.7-rc1 that shouldn't
           have been merged yet, the author is reworking them based on review
           comments as they were using older apis that shouldn't be used
           anymore for newer drivers
      
         - small thunderbolt driver fixes for reported issues
      
         - USB driver fixes for a variety of small issues in dwc3, typec,
           xhci, and other smaller drivers.
      
         - new device ids for usb-serial and onboard_usb_hub drivers.
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
        USB: serial: option: add Luat Air72*U series products
        USB: dwc3: qcom: fix ACPI platform device leak
        USB: dwc3: qcom: fix software node leak on probe errors
        USB: dwc3: qcom: fix resource leaks on probe deferral
        USB: dwc3: qcom: simplify wakeup interrupt setup
        USB: dwc3: qcom: fix wakeup after probe deferral
        dt-bindings: usb: qcom,dwc3: fix example wakeup interrupt types
        usb: misc: onboard-hub: add support for Microchip USB5744
        dt-bindings: usb: microchip,usb5744: Add second supply
        usb: misc: ljca: Fix enumeration error on Dell Latitude 9420
        USB: serial: option: add Fibocom L7xx modules
        USB: xhci-plat: fix legacy PHY double init
        usb: typec: tipd: Supply also I2C driver data
        usb: xhci-mtk: fix in-ep's start-split check failure
        usb: dwc3: set the dma max_seg_size
        usb: config: fix iteration issue in 'usb_get_bos_descriptor()'
        usb: dwc3: add missing of_node_put and platform_device_put
        USB: dwc2: write HCINT with INTMASK applied
        usb: misc: ljca: Drop _ADR support to get ljca children devices
        usb: cdnsp: Fix deadlock issue during using NCM gadget
        ...
      090472ed
  7. 25 Nov, 2023 12 commits
  8. 24 Nov, 2023 6 commits
    • Linus Torvalds's avatar
      Merge tag 's390-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · 0f5cc96c
      Linus Torvalds authored
      Pull s390 updates from Alexander Gordeev:
      
       - Remove unnecessary assignment of the performance event last_tag.
      
       - Create missing /sys/firmware/ipl/* attributes when kernel is booted
         in dump mode using List-directed ECKD IPL.
      
       - Remove odd comment.
      
       - Fix s390-specific part of scripts/checkstack.pl script that only
         matches three-digit numbers starting with 3 or any higher number and
         skips any stack sizes smaller than 304 bytes.
      
      * tag 's390-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
        scripts/checkstack.pl: match all stack sizes for s390
        s390: remove odd comment
        s390/ipl: add missing IPL_TYPE_ECKD_DUMP case to ipl_init()
        s390/pai: cleanup event initialization
      0f5cc96c
    • Linus Torvalds's avatar
      Merge tag 'acpi-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 1bcc6897
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These add an ACPI IRQ override quirk for ASUS ExpertBook B1402CVA and
        fix an ACPI processor idle issue leading to triple-faults in Xen HVM
        guests and an ACPI backlight driver issue that causes GPUs to
        misbehave while their children power is being fixed up.
      
        Specifics:
      
         - Avoid powering up GPUs while attempting to fix up power for their
           children (Hans de Goede)
      
         - Use raw_safe_halt() instead of safe_halt() in acpi_idle_play_dead()
           so as to avoid triple-falts during CPU online in Xen HVM guests due
           to the setting of the hardirqs_enabled flag in safe_halt() (David
           Woodhouse)
      
         - Add an ACPI IRQ override quirk for ASUS ExpertBook B1402CVA (Hans
           de Goede)"
      
      * tag 'acpi-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: resource: Skip IRQ override on ASUS ExpertBook B1402CVA
        ACPI: video: Use acpi_device_fix_up_power_children()
        ACPI: PM: Add acpi_device_fix_up_power_children() function
        ACPI: processor_idle: use raw_safe_halt() in acpi_idle_play_dead()
      1bcc6897
    • Linus Torvalds's avatar
      Merge tag 'pm-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b345fd55
      Linus Torvalds authored
      Pull power management fix from Rafael Wysocki:
       "Fix a syntax error in the sleepgraph utility which causes it to exit
        early on every invocation (David Woodhouse)"
      
      * tag 'pm-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: tools: Fix sleepgraph syntax error
      b345fd55
    • Linus Torvalds's avatar
      Merge tag 'afs-fixes-20231124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 5b7ad877
      Linus Torvalds authored
      Pull AFS fixes from David Howells:
      
       - Fix the afs_server_list struct to be cleaned up with RCU
      
       - Fix afs to translate a no-data result from a DNS lookup into ENOENT,
         not EDESTADDRREQ for consistency with OpenAFS
      
       - Fix afs to translate a negative DNS lookup result into ENOENT rather
         than EDESTADDRREQ
      
       - Fix file locking on R/O volumes to operate in local mode as the
         server doesn't handle exclusive locks on such files
      
       - Set SB_RDONLY on superblocks for RO and Backup volumes so that the
         VFS can see that they're read only
      
      * tag 'afs-fixes-20231124' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        afs: Mark a superblock for an R/O or Backup volume as SB_RDONLY
        afs: Fix file locking on R/O volumes to operate in local mode
        afs: Return ENOENT if no cell DNS record can be found
        afs: Make error on cell lookup failure consistent with OpenAFS
        afs: Fix afs_server_list to be cleaned up with RCU
      5b7ad877
    • Rafael J. Wysocki's avatar
      Merge branches 'acpi-video' and 'acpi-processor' into acpi · e3747062
      Rafael J. Wysocki authored
      Merge ACPI backlight driver fixes and an ACPI processor driver fix for
      6.7-rc3:
      
       - Avoid powering up GPUs while attempting to fix up power for their
         children (Hans de Goede).
      
       - Use raw_safe_halt() instead of safe_halt() in acpi_idle_play_dead()
         so as to avoid triple-falts during CPU online in Xen HVM guests due
         to the setting of the hardirqs_enabled flag in safe_halt() (David
         Woodhouse).
      
      * acpi-video:
        ACPI: video: Use acpi_device_fix_up_power_children()
        ACPI: PM: Add acpi_device_fix_up_power_children() function
      
      * acpi-processor:
        ACPI: processor_idle: use raw_safe_halt() in acpi_idle_play_dead()
      e3747062
    • Linus Torvalds's avatar
      Merge tag 'vfs-6.7-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs · fa2b906f
      Linus Torvalds authored
      Pull vfs fixes from Christian Brauner:
      
       - Avoid calling back into LSMs from vfs_getattr_nosec() calls.
      
         IMA used to query inode properties accessing raw inode fields without
         dedicated helpers. That was finally fixed a few releases ago by
         forcing IMA to use vfs_getattr_nosec() helpers.
      
         The goal of the vfs_getattr_nosec() helper is to query for attributes
         without calling into the LSM layer which would be quite problematic
         because incredibly IMA is called from __fput()...
      
           __fput()
             -> ima_file_free()
      
         What it does is to call back into the filesystem to update the file's
         IMA xattr. Querying the inode without using vfs_getattr_nosec() meant
         that IMA didn't handle stacking filesystems such as overlayfs
         correctly. So the switch to vfs_getattr_nosec() is quite correct. But
         the switch to vfs_getattr_nosec() revealed another bug when used on
         stacking filesystems:
      
           __fput()
             -> ima_file_free()
                -> vfs_getattr_nosec()
                   -> i_op->getattr::ovl_getattr()
                      -> vfs_getattr()
                         -> i_op->getattr::$WHATEVER_UNDERLYING_FS_getattr()
                            -> security_inode_getattr() # calls back into LSMs
      
         Now, if that __fput() happens from task_work_run() of an exiting task
         current->fs and various other pointer could already be NULL. So
         anything in the LSM layer relying on that not being NULL would be
         quite surprised.
      
         Fix that by passing the information that this is a security request
         through to the stacking filesystem by adding a new internal
         ATT_GETATTR_NOSEC flag. Now the callchain becomes:
      
           __fput()
             -> ima_file_free()
                -> vfs_getattr_nosec()
                   -> i_op->getattr::ovl_getattr()
                      -> if (AT_GETATTR_NOSEC)
                                vfs_getattr_nosec()
                         else
                                vfs_getattr()
                         -> i_op->getattr::$WHATEVER_UNDERLYING_FS_getattr()
      
       - Fix a bug introduced with the iov_iter rework from last cycle.
      
         This broke /proc/kcore by copying too much and without the correct
         offset.
      
       - Add a missing NULL check when allocating the root inode in
         autofs_fill_super().
      
       - Fix stable writes for multi-device filesystems (xfs, btrfs etc) and
         the block device pseudo filesystem.
      
         Stable writes used to be a superblock flag only, making it a per
         filesystem property. Add an additional AS_STABLE_WRITES mapping flag
         to allow for fine-grained control.
      
       - Ensure that offset_iterate_dir() returns 0 after reaching the end of
         a directory so it adheres to getdents() convention.
      
      * tag 'vfs-6.7-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
        libfs: getdents() should return 0 after reaching EOD
        xfs: respect the stable writes flag on the RT device
        xfs: clean up FS_XFLAG_REALTIME handling in xfs_ioctl_setattr_xflags
        block: update the stable_writes flag in bdev_add
        filemap: add a per-mapping stable writes flag
        autofs: add: new_inode check in autofs_fill_super()
        iov_iter: fix copy_page_to_iter_nofault()
        fs: Pass AT_GETATTR_NOSEC flag to getattr interface function
      fa2b906f