• Frederic Weisbecker's avatar
    timers/migration: Fix ignored event due to missing CPU update · 61f7fdf8
    Frederic Weisbecker authored
    When a group event is updated with its expiry unchanged but a different
    CPU, that target change may go unnoticed and the event may be propagated
    up with a stale CPU value. The following depicts a scenario that has
    been actually observed:
    
                           [GRP2:0]
                       migrator = GRP1:1
                       active   = GRP1:1
                       nextevt  = TGRP1:0 (T0)
                        /              \
                   [GRP1:0]           [GRP1:1]
                migrator = NONE       [...]
                active   = NONE
                nextevt  = TGRP0:0 (T0)
                /           \
            [GRP0:0]       [...]
          migrator = NONE
          active   = NONE
          nextevt  = T0
          /         \
        0 (T0)       1 (T1)
        idle         idle
    
    0) The hierarchy has 3 levels. The left part (GRP1:0) is all idle,
    including CPU 0 and CPU 1 which have a timer each: T0 and T1. They have
    the same expiry value.
    
                           [GRP2:0]
                       migrator = GRP1:1
                       active   = GRP1:1
                       nextevt  = KTIME_MAX
                        /              \
                   [GRP1:0]           [GRP1:1]
                migrator = NONE       [...]
                active   = NONE
                nextevt  = TGRP0:0 (T0)
                /           \
            [GRP0:0]       [...]
          migrator = NONE
          active   = NONE
          nextevt  = T0
          /         \
        0 (T0)       1 (T1)
        idle         idle
    
    1) The migrator in GRP1:1 handles remotely T0. The event is dequeued
    from the top and T0 executed.
    
                           [GRP2:0]
                       migrator = GRP1:1
                       active   = GRP1:1
                       nextevt  = KTIME_MAX
                        /              \
                   [GRP1:0]           [GRP1:1]
                migrator = NONE       [...]
                active   = NONE
                nextevt  = TGRP0:0 (T0)
                /           \
            [GRP0:0]       [...]
          migrator = NONE
          active   = NONE
          nextevt  = T1
          /         \
        0            1 (T1)
        idle         idle
    
    2) The migrator in GRP1:1 fetches the next timer for CPU 0 and finds
    none. But it updates the events from its groups, starting with GRP0:0
    which now has T1 as its next event. So far so good.
    
                           [GRP2:0]
                       migrator = GRP1:1
                       active   = GRP1:1
                       nextevt  = KTIME_MAX
                        /              \
                   [GRP1:0]           [GRP1:1]
                migrator = NONE       [...]
                active   = NONE
                nextevt  = TGRP0:0 (T0)
                /           \
            [GRP0:0]       [...]
          migrator = NONE
          active   = NONE
          nextevt  = T1
          /         \
        0            1 (T1)
        idle         idle
    
    3) The migrator in GRP1:1 proceeds upward and updates the events in
    GRP1:0. The child event TGRP0:0 is found queued with the same expiry
    as before. And therefore it is left unchanged. However the target CPU
    is not the same but that fact is ignored so TGRP0:0 still points to
    CPU 0 when it should point to CPU 1.
    
                           [GRP2:0]
                       migrator = GRP1:1
                       active   = GRP1:1
                       nextevt  = TGRP1:0 (T0)
                        /              \
                   [GRP1:0]           [GRP1:1]
                migrator = NONE       [...]
                active   = NONE
                nextevt  = TGRP0:0 (T0)
                /           \
            [GRP0:0]       [...]
          migrator = NONE
          active   = NONE
          nextevt  = T1
          /         \
        0            1 (T1)
        idle         idle
    
    4) The propagation has reached the top level and TGRP1:0, having TGRP0:0
    as its first event, also wrongly points to CPU 0. TGRP1:0 is added to
    the top level group.
    
                           [GRP2:0]
                       migrator = GRP1:1
                       active   = GRP1:1
                       nextevt  = KTIME_MAX
                        /              \
                   [GRP1:0]           [GRP1:1]
                migrator = NONE       [...]
                active   = NONE
                nextevt  = TGRP0:0 (T0)
                /           \
            [GRP0:0]       [...]
          migrator = NONE
          active   = NONE
          nextevt  = T1
          /         \
        0            1 (T1)
        idle         idle
    
    5) The migrator in GRP1:1 dequeues the next event in top level pointing
    to CPU 0. But since it actually doesn't see any real event in CPU 0, it
    early returns.
    
    6) T1 is left unhandled until either CPU 0 or CPU 1 wake up.
    
    Some other bad scenario may involve trees with just two levels.
    
    Fix this with unconditionally updating the CPU of the child event before
    considering to early return while updating a queued event with an
    unchanged expiry value.
    
    Fixes: 7ee98877 ("timers: Implement the hierarchical pull model")
    Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Reviewed-by: default avatarAnna-Maria Behnsen <anna-maria@linutronix.de>
    Link: https://lore.kernel.org/r/Zg2Ct6M2RJAYHgCB@localhost.localdomain
    61f7fdf8
timer_migration.c 55.1 KB