• Thomas Gleixner's avatar
    posix-cpu-timers: Implement the missing timer_wait_running callback · f7abf14f
    Thomas Gleixner authored
    For some unknown reason the introduction of the timer_wait_running callback
    missed to fixup posix CPU timers, which went unnoticed for almost four years.
    Marco reported recently that the WARN_ON() in timer_wait_running()
    triggers with a posix CPU timer test case.
    
    Posix CPU timers have two execution models for expiring timers depending on
    CONFIG_POSIX_CPU_TIMERS_TASK_WORK:
    
    1) If not enabled, the expiry happens in hard interrupt context so
       spin waiting on the remote CPU is reasonably time bound.
    
       Implement an empty stub function for that case.
    
    2) If enabled, the expiry happens in task work before returning to user
       space or guest mode. The expired timers are marked as firing and moved
       from the timer queue to a local list head with sighand lock held. Once
       the timers are moved, sighand lock is dropped and the expiry happens in
       fully preemptible context. That means the expiring task can be scheduled
       out, migrated, interrupted etc. So spin waiting on it is more than
       suboptimal.
    
       The timer wheel has a timer_wait_running() mechanism for RT, which uses
       a per CPU timer-base expiry lock which is held by the expiry code and the
       task waiting for the timer function to complete blocks on that lock.
    
       This does not work in the same way for posix CPU timers as there is no
       timer base and expiry for process wide timers can run on any task
       belonging to that process, but the concept of waiting on an expiry lock
       can be used too in a slightly different way:
    
        - Add a mutex to struct posix_cputimers_work. This struct is per task
          and used to schedule the expiry task work from the timer interrupt.
    
        - Add a task_struct pointer to struct cpu_timer which is used to store
          a the task which runs the expiry. That's filled in when the task
          moves the expired timers to the local expiry list. That's not
          affecting the size of the k_itimer union as there are bigger union
          members already
    
        - Let the task take the expiry mutex around the expiry function
    
        - Let the waiter acquire a task reference with rcu_read_lock() held and
          block on the expiry mutex
    
       This avoids spin-waiting on a task which might not even be on a CPU and
       works nicely for RT too.
    
    Fixes: ec8f954a ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
    Reported-by: default avatarMarco Elver <elver@google.com>
    Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
    Tested-by: default avatarMarco Elver <elver@google.com>
    Tested-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
    Reviewed-by: default avatarFrederic Weisbecker <frederic@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/87zg764ojw.ffs@tglx
    f7abf14f
posix-timers.c 37.6 KB