• Frederic Weisbecker's avatar
    timers/migration: Fix endless timer requeue after idle interrupts · f55acb1e
    Frederic Weisbecker authored
    When a CPU is an idle migrator, but another CPU wakes up before it,
    becomes an active migrator and handles the queue, the initial idle
    migrator may end up endlessly reprogramming its clockevent, chasing ghost
    timers forever such as in the following scenario:
    
                   [GRP0:0]
                 migrator = 0
                 active   = 0
                 nextevt  = T1
                  /         \
                 0           1
              active        idle (T1)
    
    0) CPU 1 is idle and has a timer queued (T1), CPU 0 is active and is
    the active migrator.
    
                   [GRP0:0]
                 migrator = NONE
                 active   = NONE
                 nextevt  = T1
                  /         \
                 0           1
              idle        idle (T1)
              wakeup = T1
    
    1) CPU 0 is now idle and is therefore the idle migrator. It has
    programmed its next timer interrupt to handle T1.
    
                    [GRP0:0]
                 migrator = 1
                 active   = 1
                 nextevt  = KTIME_MAX
                  /         \
                 0           1
              idle        active
              wakeup = T1
    
    2) CPU 1 has woken up, it is now active and it has just handled its own
    timer T1.
    
    3) CPU 0 gets a timer interrupt to handle T1 but tmigr_handle_remote()
    realize it is not the migrator anymore. So it early returns without
    observing that T1 has been expired already and therefore without
    updating its ->wakeup value.
    
    4) CPU 0 goes into tmigr_cpu_new_timer() which also early returns
    because it doesn't queue a timer of its own. So ->wakeup is left
    unchanged and the next timer is programmed to fire now.
    
    5) goto 3) forever
    
    This results in timer interrupt storms in idle and also in nohz_full (as
    observed in rcutorture's TREE07 scenario).
    
    Fix this with forcing a re-evaluation of tmc->wakeup while trying
    remote timer handling when the CPU isn't the migrator anymmore. The
    check is inherently racy but in the worst case the CPU just races setting
    the KTIME_MAX value that a remote expiry also tries to set.
    
    Fixes: 7ee98877 ("timers: Implement the hierarchical pull model")
    Reported-by: default avatarPaul E. McKenney <paulmck@kernel.org>
    Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20240318230729.15497-2-frederic@kernel.org
    f55acb1e
timer_migration.c 55 KB