• Thomas Gleixner's avatar
    alarmtimer: Prevent starvation by small intervals and SIG_IGN · d125d134
    Thomas Gleixner authored
    syzbot reported a RCU stall which is caused by setting up an alarmtimer
    with a very small interval and ignoring the signal. The reproducer arms the
    alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
    problem per se, but that's an issue when the signal is ignored because then
    the timer is immediately rearmed because there is no way to delay that
    rearming to the signal delivery path.  See posix_timer_fn() and commit
    58229a18 ("posix-timers: Prevent softirq starvation by small intervals
    and SIG_IGN") for details.
    
    The reproducer does not set SIG_IGN explicitely, but it sets up the timers
    signal with SIGCONT. That has the same effect as explicitely setting
    SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
    the task is not ptraced.
    
    The log clearly shows that:
    
       [pid  5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
    
    It works because the tasks are traced and therefore the signal is queued so
    the tracer can see it, which delays the restart of the timer to the signal
    delivery path. But then the tracer is killed:
    
       [pid  5087] kill(-5102, SIGKILL <unfinished ...>
       ...
       ./strace-static-x86_64: Process 5107 detached
    
    and after it's gone the stall can be observed:
    
       syzkaller login: [   79.439102][    C0] hrtimer: interrupt took 68471 ns
       [  184.460538][    C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
       ...
       [  184.658237][    C1] rcu: Stack dump where RCU GP kthread last ran:
       [  184.664574][    C1] Sending NMI from CPU 1 to CPUs 0:
       [  184.669821][    C0] NMI backtrace for cpu 0
       [  184.669831][    C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
       ...
       [  184.670036][    C0] Call Trace:
       [  184.670041][    C0]  <IRQ>
       [  184.670045][    C0]  alarmtimer_fired+0x327/0x670
    
    posix_timer_fn() prevents that by checking whether the interval for
    timers which have the signal ignored is smaller than a jiffie and
    artifically delay it by shifting the next expiry out by a jiffie. That's
    accurate vs. the overrun accounting, but slightly inaccurate
    vs. timer_gettimer(2).
    
    The comment in that function says what needs to be done and there was a fix
    available for the regular userspace induced SIG_IGN mechanism, but that did
    not work due to the implicit ignore for SIGCONT and similar signals. This
    needs to be worked on, but for now the only available workaround is to do
    exactly what posix_timer_fn() does:
    
    Increase the interval of self-rearming timers, which have their signal
    ignored, to at least a jiffie.
    
    Interestingly this has been fixed before via commit ff86bf0c
    ("alarmtimer: Rate limit periodic intervals") already, but that fix got
    lost in a later rework.
    
    Reported-by: syzbot+b9564ba6e8e00694511b@syzkaller.appspotmail.com
    Fixes: f2c45807 ("alarmtimer: Switch over to generic set/get/rearm routine")
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Acked-by: default avatarJohn Stultz <jstultz@google.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx
    d125d134
alarmtimer.c 24.5 KB