1. 12 Sep, 2022 1 commit
  2. 08 Sep, 2022 3 commits
  3. 05 Sep, 2022 6 commits
  4. 04 Sep, 2022 2 commits
  5. 02 Sep, 2022 1 commit
  6. 01 Sep, 2022 1 commit
  7. 26 Aug, 2022 1 commit
  8. 24 Aug, 2022 1 commit
  9. 23 Aug, 2022 1 commit
    • Yu Kuai's avatar
      sbitmap: fix possible io hung due to lost wakeup · 040b83fc
      Yu Kuai authored
      There are two problems can lead to lost wakeup:
      
      1) invalid wakeup on the wrong waitqueue:
      
      For example, 2 * wake_batch tags are put, while only wake_batch threads
      are woken:
      
      __sbq_wake_up
       atomic_cmpxchg -> reset wait_cnt
      			__sbq_wake_up -> decrease wait_cnt
      			...
      			__sbq_wake_up -> wait_cnt is decreased to 0 again
      			 atomic_cmpxchg
      			 sbq_index_atomic_inc -> increase wake_index
      			 wake_up_nr -> wake up and waitqueue might be empty
       sbq_index_atomic_inc -> increase again, one waitqueue is skipped
       wake_up_nr -> invalid wake up because old wakequeue might be empty
      
      To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'.
      
      2) 'wait_cnt' can be decreased while waitqueue is empty
      
      As pointed out by Jan Kara, following race is possible:
      
      CPU1				CPU2
      __sbq_wake_up			 __sbq_wake_up
       sbq_wake_ptr()			 sbq_wake_ptr() -> the same
       wait_cnt = atomic_dec_return()
       /* decreased to 0 */
       sbq_index_atomic_inc()
       /* move to next waitqueue */
       atomic_set()
       /* reset wait_cnt */
       wake_up_nr()
       /* wake up on the old waitqueue */
      				 wait_cnt = atomic_dec_return()
      				 /*
      				  * decrease wait_cnt in the old
      				  * waitqueue, while it can be
      				  * empty.
      				  */
      
      Fix the problem by waking up before updating 'wake_index' and
      'wait_cnt'.
      
      With this patch, noted that 'wait_cnt' is still decreased in the old
      empty waitqueue, however, the wakeup is redirected to a active waitqueue,
      and the extra decrement on the old empty waitqueue is not handled.
      
      Fixes: 88459642 ("blk-mq: abstract tag allocation out into sbitmap library")
      Signed-off-by: default avatarYu Kuai <yukuai3@huawei.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.comSigned-off-by: default avatarJens Axboe <axboe@kernel.dk>
      040b83fc
  10. 22 Aug, 2022 11 commits
  11. 21 Aug, 2022 12 commits
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4daa6a81
      Linus Torvalds authored
      Pull irq fixes from Ingo Molnar:
       "Misc irqchip fixes: LoongArch driver fixes and a Hyper-V IOMMU fix"
      
      * tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/loongson-liointc: Fix an error handling path in liointc_init()
        irqchip/loongarch: Fix irq_domain_alloc_fwnode() abuse
        irqchip/loongson-pch-pic: Move find_pch_pic() into CONFIG_ACPI
        irqchip/loongson-eiointc: Fix a build warning
        irqchip/loongson-eiointc: Fix irq affinity setting
        iommu/hyper-v: Use helper instead of directly accessing affinity
      4daa6a81
    • Linus Torvalds's avatar
      Merge tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4f61f842
      Linus Torvalds authored
      Pull x86 kprobes fix from Ingo Molnar:
       "Fix a kprobes bug in JNG/JNLE emulation when a kprobe is installed at
        such instructions, possibly resulting in incorrect execution (the
        wrong branch taken)"
      
      * tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/kprobes: Fix JNG/JNLE emulation
      4f61f842
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 7fb312d2
      Linus Torvalds authored
      Pull tracing fixes from Steven Rostedt:
       "Various fixes for tracing:
      
         - Fix a return value of traceprobe_parse_event_name()
      
         - Fix NULL pointer dereference from failed ftrace enabling
      
         - Fix NULL pointer dereference when asking for registers from eprobes
      
         - Make eprobes consistent with kprobes/uprobes, filters and
           histograms"
      
      * tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Have filter accept "common_cpu" to be consistent
        tracing/probes: Have kprobes and uprobes use $COMM too
        tracing/eprobes: Have event probes be consistent with kprobes and uprobes
        tracing/eprobes: Fix reading of string fields
        tracing/eprobes: Do not hardcode $comm as a string
        tracing/eprobes: Do not allow eprobes to use $stack, or % for regs
        ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
        tracing/perf: Fix double put of trace event when init fails
        tracing: React to error return from traceprobe_parse_event_name()
      7fb312d2
    • Steven Rostedt (Google)'s avatar
      tracing: Have filter accept "common_cpu" to be consistent · b2380577
      Steven Rostedt (Google) authored
      Make filtering consistent with histograms. As "cpu" can be a field of an
      event, allow for "common_cpu" to keep it from being confused with the
      "cpu" field of the event.
      
      Link: https://lkml.kernel.org/r/20220820134401.513062765@goodmis.org
      Link: https://lore.kernel.org/all/20220820220920.e42fa32b70505b1904f0a0ad@kernel.org/
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Fixes: 1e3bac71 ("tracing/histogram: Rename "cpu" to "common_cpu"")
      Suggested-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      b2380577
    • Steven Rostedt (Google)'s avatar
      tracing/probes: Have kprobes and uprobes use $COMM too · ab838444
      Steven Rostedt (Google) authored
      Both $comm and $COMM can be used to get current->comm in eprobes and the
      filtering and histogram logic. Make kprobes and uprobes consistent in this
      regard and allow both $comm and $COMM as well. Currently kprobes and
      uprobes only handle $comm, which is inconsistent with the other utilities,
      and can be confusing to users.
      
      Link: https://lkml.kernel.org/r/20220820134401.317014913@goodmis.org
      Link: https://lore.kernel.org/all/20220820220442.776e1ddaf8836e82edb34d01@kernel.org/
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Fixes: 53305928 ("tracing: probeevent: Introduce new argument fetching code")
      Suggested-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      ab838444
    • Steven Rostedt (Google)'s avatar
      tracing/eprobes: Have event probes be consistent with kprobes and uprobes · 6a832ec3
      Steven Rostedt (Google) authored
      Currently, if a symbol "@" is attempted to be used with an event probe
      (eprobes), it will cause a NULL pointer dereference crash.
      
      Both kprobes and uprobes can reference data other than the main registers.
      Such as immediate address, symbols and the current task name. Have eprobes
      do the same thing.
      
      For "comm", if "comm" is used and the event being attached to does not
      have the "comm" field, then make it the "$comm" that kprobes has. This is
      consistent to the way histograms and filters work.
      
      Link: https://lkml.kernel.org/r/20220820134401.136924220@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Fixes: 7491e2c4 ("tracing: Add a probe that attaches to trace events")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      6a832ec3
    • Steven Rostedt (Google)'s avatar
      tracing/eprobes: Fix reading of string fields · f04dec93
      Steven Rostedt (Google) authored
      Currently when an event probe (eprobe) hooks to a string field, it does
      not display it as a string, but instead as a number. This makes the field
      rather useless. Handle the different kinds of strings, dynamic, static,
      relational/dynamic etc.
      
      Now when a string field is used, the ":string" type can be used to display
      it:
      
        echo "e:sw sched/sched_switch comm=$next_comm:string" > dynamic_events
      
      Link: https://lkml.kernel.org/r/20220820134400.959640191@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Fixes: 7491e2c4 ("tracing: Add a probe that attaches to trace events")
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      f04dec93
    • Steven Rostedt (Google)'s avatar
      tracing/eprobes: Do not hardcode $comm as a string · 02333de9
      Steven Rostedt (Google) authored
      The variable $comm is hard coded as a string, which is true for both
      kprobes and uprobes, but for event probes (eprobes) it is a field name. In
      most cases the "comm" field would be a string, but there's no guarantee of
      that fact.
      
      Do not assume that comm is a string. Not to mention, it currently forces
      comm fields to fault, as string processing for event probes is currently
      broken.
      
      Link: https://lkml.kernel.org/r/20220820134400.756152112@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Fixes: 7491e2c4 ("tracing: Add a probe that attaches to trace events")
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      02333de9
    • Steven Rostedt (Google)'s avatar
      tracing/eprobes: Do not allow eprobes to use $stack, or % for regs · 2673c60e
      Steven Rostedt (Google) authored
      While playing with event probes (eprobes), I tried to see what would
      happen if I attempted to retrieve the instruction pointer (%rip) knowing
      that event probes do not use pt_regs. The result was:
      
       BUG: kernel NULL pointer dereference, address: 0000000000000024
       #PF: supervisor read access in kernel mode
       #PF: error_code(0x0000) - not-present page
       PGD 0 P4D 0
       Oops: 0000 [#1] PREEMPT SMP PTI
       CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ #309
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
      v03.03 07/14/2016
       RIP: 0010:get_event_field.isra.0+0x0/0x50
       Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8
      50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24
      8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74
       RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086
       RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000
       RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000
       RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8
       R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8
       R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff916c9ea40000(0000)
      knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0
       Call Trace:
        <TASK>
        get_eprobe_size+0xb4/0x640
        ? __mod_node_page_state+0x72/0xc0
        __eprobe_trace_func+0x59/0x1a0
        ? __mod_lruvec_page_state+0xaa/0x1b0
        ? page_remove_file_rmap+0x14/0x230
        ? page_remove_rmap+0xda/0x170
        event_triggers_call+0x52/0xe0
        trace_event_buffer_commit+0x18f/0x240
        trace_event_raw_event_sched_wakeup_template+0x7a/0xb0
        try_to_wake_up+0x260/0x4c0
        __wake_up_common+0x80/0x180
        __wake_up_common_lock+0x7c/0xc0
        do_notify_parent+0x1c9/0x2a0
        exit_notify+0x1a9/0x220
        do_exit+0x2ba/0x450
        do_group_exit+0x2d/0x90
        __x64_sys_exit_group+0x14/0x20
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Obviously this is not the desired result.
      
      Move the testing for TPARG_FL_TPOINT which is only used for event probes
      to the top of the "$" variable check, as all the other variables are not
      used for event probes. Also add a check in the register parsing "%" to
      fail if an event probe is used.
      
      Link: https://lkml.kernel.org/r/20220820134400.564426983@goodmis.org
      
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com>
      Cc: Tom Zanussi <zanussi@kernel.org>
      Fixes: 7491e2c4 ("tracing: Add a probe that attaches to trace events")
      Acked-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      2673c60e
    • Yang Jihong's avatar
      ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead · c3b0f72e
      Yang Jihong authored
      ftrace_startup does not remove ops from ftrace_ops_list when
      ftrace_startup_enable fails:
      
      register_ftrace_function
        ftrace_startup
          __register_ftrace_function
            ...
            add_ftrace_ops(&ftrace_ops_list, ops)
            ...
          ...
          ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1
          ...
        return 0 // ops is in the ftrace_ops_list.
      
      When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything:
      unregister_ftrace_function
        ftrace_shutdown
          if (unlikely(ftrace_disabled))
                  return -ENODEV;  // return here, __unregister_ftrace_function is not executed,
                                   // as a result, ops is still in the ftrace_ops_list
          __unregister_ftrace_function
          ...
      
      If ops is dynamically allocated, it will be free later, in this case,
      is_ftrace_trampoline accesses NULL pointer:
      
      is_ftrace_trampoline
        ftrace_ops_trampoline
          do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL!
      
      Syzkaller reports as follows:
      [ 1203.506103] BUG: kernel NULL pointer dereference, address: 000000000000010b
      [ 1203.508039] #PF: supervisor read access in kernel mode
      [ 1203.508798] #PF: error_code(0x0000) - not-present page
      [ 1203.509558] PGD 800000011660b067 P4D 800000011660b067 PUD 130fb8067 PMD 0
      [ 1203.510560] Oops: 0000 [#1] SMP KASAN PTI
      [ 1203.511189] CPU: 6 PID: 29532 Comm: syz-executor.2 Tainted: G    B   W         5.10.0 #8
      [ 1203.512324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [ 1203.513895] RIP: 0010:is_ftrace_trampoline+0x26/0xb0
      [ 1203.514644] Code: ff eb d3 90 41 55 41 54 49 89 fc 55 53 e8 f2 00 fd ff 48 8b 1d 3b 35 5d 03 e8 e6 00 fd ff 48 8d bb 90 00 00 00 e8 2a 81 26 00 <48> 8b ab 90 00 00 00 48 85 ed 74 1d e8 c9 00 fd ff 48 8d bb 98 00
      [ 1203.518838] RSP: 0018:ffffc900012cf960 EFLAGS: 00010246
      [ 1203.520092] RAX: 0000000000000000 RBX: 000000000000007b RCX: ffffffff8a331866
      [ 1203.521469] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000000010b
      [ 1203.522583] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8df18b07
      [ 1203.523550] R10: fffffbfff1be3160 R11: 0000000000000001 R12: 0000000000478399
      [ 1203.524596] R13: 0000000000000000 R14: ffff888145088000 R15: 0000000000000008
      [ 1203.525634] FS:  00007f429f5f4700(0000) GS:ffff8881daf00000(0000) knlGS:0000000000000000
      [ 1203.526801] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1203.527626] CR2: 000000000000010b CR3: 0000000170e1e001 CR4: 00000000003706e0
      [ 1203.528611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1203.529605] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      Therefore, when ftrace_startup_enable fails, we need to rollback registration
      process and remove ops from ftrace_ops_list.
      
      Link: https://lkml.kernel.org/r/20220818032659.56209-1-yangjihong1@huawei.comSuggested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarYang Jihong <yangjihong1@huawei.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      c3b0f72e
    • Steven Rostedt (Google)'s avatar
      tracing/perf: Fix double put of trace event when init fails · 7249921d
      Steven Rostedt (Google) authored
      If in perf_trace_event_init(), the perf_trace_event_open() fails, then it
      will call perf_trace_event_unreg() which will not only unregister the perf
      trace event, but will also call the put() function of the tp_event.
      
      The problem here is that the trace_event_try_get_ref() is called by the
      caller of perf_trace_event_init() and if perf_trace_event_init() returns a
      failure, it will then call trace_event_put(). But since the
      perf_trace_event_unreg() already called the trace_event_put() function, it
      triggers a WARN_ON().
      
       WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 trace_event_dyn_put_ref+0x15/0x20
      
      If perf_trace_event_reg() does not call the trace_event_try_get_ref() then
      the perf_trace_event_unreg() should not be calling trace_event_put(). This
      breaks symmetry and causes bugs like these.
      
      Pull out the trace_event_put() from perf_trace_event_unreg() and call it
      in the locations that perf_trace_event_unreg() is called. This not only
      fixes this bug, but also brings back the proper symmetry of the reg/unreg
      vs get/put logic.
      
      Link: https://lore.kernel.org/all/cover.1660347763.git.kjlx@templeofstupid.com/
      Link: https://lkml.kernel.org/r/20220816192817.43d5e17f@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: 1d18538e ("tracing: Have dynamic events have a ref counter")
      Reported-by: default avatarKrister Johansen <kjlx@templeofstupid.com>
      Reviewed-by: default avatarKrister Johansen <kjlx@templeofstupid.com>
      Tested-by: default avatarKrister Johansen <kjlx@templeofstupid.com>
      Acked-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      7249921d
    • Lukas Bulwahn's avatar
      tracing: React to error return from traceprobe_parse_event_name() · d8a64313
      Lukas Bulwahn authored
      The function traceprobe_parse_event_name() may set the first two function
      arguments to a non-null value and still return -EINVAL to indicate an
      unsuccessful completion of the function. Hence, it is not sufficient to
      just check the result of the two function arguments for being not null,
      but the return value also needs to be checked.
      
      Commit 95c104c3 ("tracing: Auto generate event name when creating a
      group of events") changed the error-return-value checking of the second
      traceprobe_parse_event_name() invocation in __trace_eprobe_create() and
      removed checking the return value to jump to the error handling case.
      
      Reinstate using the return value in the error-return-value checking.
      
      Link: https://lkml.kernel.org/r/20220811071734.20700-1-lukas.bulwahn@gmail.com
      
      Fixes: 95c104c3 ("tracing: Auto generate event name when creating a group of events")
      Acked-by: default avatarLinyu Yuan <quic_linyyuan@quicinc.com>
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Signed-off-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      d8a64313