1. 15 Oct, 2023 1 commit
    • Douglas Anderson's avatar
      kgdb: Flush console before entering kgdb on panic · dd712d3d
      Douglas Anderson authored
      When entering kdb/kgdb on a kernel panic, it was be observed that the
      console isn't flushed before the `kdb` prompt came up. Specifically,
      when using the buddy lockup detector on arm64 and running:
        echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
      
      I could see:
        [   26.161099] lkdtm: Performing direct entry HARDLOCKUP
        [   32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
        [   32.552865] Sending NMI from CPU 5 to CPUs 6:
        [   32.557359] NMI backtrace for cpu 6
        ... [backtrace for cpu 6] ...
        [   32.558353] NMI backtrace for cpu 5
        ... [backtrace for cpu 5] ...
        [   32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
        [   32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
      
        Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
        [5]kdb>
      
      As you can see, backtraces for the other CPUs start printing and get
      interleaved with the kdb PANIC print.
      
      Let's replicate the commands to flush the console in the kdb panic
      entry point to avoid this.
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Link: https://lore.kernel.org/r/20230822131945.1.I5b460ae8f954e4c4f628a373d6e74713c06dd26f@changeidSigned-off-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      dd712d3d
  2. 08 Oct, 2023 4 commits
  3. 07 Oct, 2023 10 commits
  4. 06 Oct, 2023 22 commits
  5. 05 Oct, 2023 3 commits
    • Jeff Moyer's avatar
      io-wq: fully initialize wqe before calling cpuhp_state_add_instance_nocalls() · 0f8baa3c
      Jeff Moyer authored
      I received a bug report with the following signature:
      
      [ 1759.937637] BUG: unable to handle page fault for address: ffffffffffffffe8
      [ 1759.944564] #PF: supervisor read access in kernel mode
      [ 1759.949732] #PF: error_code(0x0000) - not-present page
      [ 1759.954901] PGD 7ab615067 P4D 7ab615067 PUD 7ab617067 PMD 0
      [ 1759.960596] Oops: 0000 1 PREEMPT SMP PTI
      [ 1759.964804] CPU: 15 PID: 109 Comm: cpuhp/15 Kdump: loaded Tainted: G X ------- — 5.14.0-362.3.1.el9_3.x86_64 #1
      [ 1759.976609] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380 Gen10, BIOS U30 06/20/2018
      [ 1759.985181] RIP: 0010:io_wq_for_each_worker.isra.0+0x24/0xa0
      [ 1759.990877] Code: 90 90 90 90 90 90 0f 1f 44 00 00 41 56 41 55 41 54 55 48 8d 6f 78 53 48 8b 47 78 48 39 c5 74 4f 49 89 f5 49 89 d4 48 8d 58 e8 <8b> 13 85 d2 74 32 8d 4a 01 89 d0 f0 0f b1 0b 75 5c 09 ca 78 3d 48
      [ 1760.009758] RSP: 0000:ffffb6f403603e20 EFLAGS: 00010286
      [ 1760.015013] RAX: 0000000000000000 RBX: ffffffffffffffe8 RCX: 0000000000000000
      [ 1760.022188] RDX: ffffb6f403603e50 RSI: ffffffffb11e95b0 RDI: ffff9f73b09e9400
      [ 1760.029362] RBP: ffff9f73b09e9478 R08: 000000000000000f R09: 0000000000000000
      [ 1760.036536] R10: ffffffffffffff00 R11: ffffb6f403603d80 R12: ffffb6f403603e50
      [ 1760.043712] R13: ffffffffb11e95b0 R14: ffffffffb28531e8 R15: ffff9f7a6fbdf548
      [ 1760.050887] FS: 0000000000000000(0000) GS:ffff9f7a6fbc0000(0000) knlGS:0000000000000000
      [ 1760.059025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1760.064801] CR2: ffffffffffffffe8 CR3: 00000007ab610002 CR4: 00000000007706e0
      [ 1760.071976] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1760.079150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1760.086325] PKRU: 55555554
      [ 1760.089044] Call Trace:
      [ 1760.091501] <TASK>
      [ 1760.093612] ? show_trace_log_lvl+0x1c4/0x2df
      [ 1760.097995] ? show_trace_log_lvl+0x1c4/0x2df
      [ 1760.102377] ? __io_wq_cpu_online+0x54/0xb0
      [ 1760.106584] ? __die_body.cold+0x8/0xd
      [ 1760.110356] ? page_fault_oops+0x134/0x170
      [ 1760.114479] ? kernelmode_fixup_or_oops+0x84/0x110
      [ 1760.119298] ? exc_page_fault+0xa8/0x150
      [ 1760.123247] ? asm_exc_page_fault+0x22/0x30
      [ 1760.127458] ? __pfx_io_wq_worker_affinity+0x10/0x10
      [ 1760.132453] ? __pfx_io_wq_worker_affinity+0x10/0x10
      [ 1760.137446] ? io_wq_for_each_worker.isra.0+0x24/0xa0
      [ 1760.142527] __io_wq_cpu_online+0x54/0xb0
      [ 1760.146558] cpuhp_invoke_callback+0x109/0x460
      [ 1760.151029] ? __pfx_io_wq_cpu_offline+0x10/0x10
      [ 1760.155673] ? __pfx_smpboot_thread_fn+0x10/0x10
      [ 1760.160320] cpuhp_thread_fun+0x8d/0x140
      [ 1760.164266] smpboot_thread_fn+0xd3/0x1a0
      [ 1760.168297] kthread+0xdd/0x100
      [ 1760.171457] ? __pfx_kthread+0x10/0x10
      [ 1760.175225] ret_from_fork+0x29/0x50
      [ 1760.178826] </TASK>
      [ 1760.181022] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc vfat fat dm_multipath intel_rapl_msr intel_rapl_common isst_if_common ipmi_ssif nfit libnvdimm mgag200 i2c_algo_bit ioatdma drm_shmem_helper drm_kms_helper acpi_ipmi syscopyarea x86_pkg_temp_thermal sysfillrect ipmi_si intel_powerclamp sysimgblt ipmi_devintf coretemp acpi_power_meter ipmi_msghandler rapl pcspkr dca intel_pch_thermal intel_cstate ses lpc_ich intel_uncore enclosure hpilo mei_me mei acpi_tad fuse drm xfs sd_mod sg bnx2x nvme nvme_core crct10dif_pclmul crc32_pclmul nvme_common ghash_clmulni_intel smartpqi tg3 t10_pi mdio uas libcrc32c crc32c_intel scsi_transport_sas usb_storage hpwdt wmi dm_mirror dm_region_hash dm_log dm_mod
      [ 1760.248623] CR2: ffffffffffffffe8
      
      A cpu hotplug callback was issued before wq->all_list was initialized.
      This results in a null pointer dereference.  The fix is to fully setup
      the io_wq before calling cpuhp_state_add_instance_nocalls().
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Link: https://lore.kernel.org/r/x49y1ghnecs.fsf@segfault.boston.devel.redhat.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0f8baa3c
    • Xuewen Yan's avatar
      cpufreq: schedutil: Update next_freq when cpufreq_limits change · 9e0bc36a
      Xuewen Yan authored
      When cpufreq's policy is 'single', there is a scenario that will
      cause sg_policy's next_freq to be unable to update.
      
      When the CPU's util is always max, the cpufreq will be max,
      and then if we change the policy's scaling_max_freq to be a
      lower freq, indeed, the sg_policy's next_freq need change to
      be the lower freq, however, because the cpu_is_busy, the next_freq
      would keep the max_freq.
      
      For example:
      
      The cpu7 is a single CPU:
      
        unisoc:/sys/devices/system/cpu/cpufreq/policy7 # while true;do done& [1] 4737
        unisoc:/sys/devices/system/cpu/cpufreq/policy7 # taskset -p 80 4737
        pid 4737's current affinity mask: ff
        pid 4737's new affinity mask: 80
        unisoc:/sys/devices/system/cpu/cpufreq/policy7 # cat scaling_max_freq
        2301000
        unisoc:/sys/devices/system/cpu/cpufreq/policy7 # cat scaling_cur_freq
        2301000
        unisoc:/sys/devices/system/cpu/cpufreq/policy7 # echo 2171000 > scaling_max_freq
        unisoc:/sys/devices/system/cpu/cpufreq/policy7 # cat scaling_max_freq
        2171000
      
      At this time, the sg_policy's next_freq would stay at 2301000, which
      is wrong.
      
      To fix this, add a check for the ->need_freq_update flag.
      
      [ mingo: Clarified the changelog. ]
      Co-developed-by: default avatarGuohua Yan <guohua.yan@unisoc.com>
      Signed-off-by: default avatarXuewen Yan <xuewen.yan@unisoc.com>
      Signed-off-by: default avatarGuohua Yan <guohua.yan@unisoc.com>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatar"Rafael J. Wysocki" <rafael@kernel.org>
      Link: https://lore.kernel.org/r/20230719130527.8074-1-xuewen.yan@unisoc.com
      9e0bc36a
    • Renan Guilherme Lebre Ramos's avatar