• Sergey Senozhatsky's avatar
    rcu/tree: Handle VM stoppage in stall detection · ccfc9dd6
    Sergey Senozhatsky authored
    The soft watchdog timer function checks if a virtual machine
    was suspended and hence what looks like a lockup in fact
    is a false positive.
    
    This is what kvm_check_and_clear_guest_paused() does: it
    tests guest PVCLOCK_GUEST_STOPPED (which is set by the host)
    and if it's set then we need to touch all watchdogs and bail
    out.
    
    Watchdog timer function runs from IRQ, so PVCLOCK_GUEST_STOPPED
    check works fine.
    
    There is, however, one more watchdog that runs from IRQ, so
    watchdog timer fn races with it, and that watchdog is not aware
    of PVCLOCK_GUEST_STOPPED - RCU stall detector.
    
    apic_timer_interrupt()
     smp_apic_timer_interrupt()
      hrtimer_interrupt()
       __hrtimer_run_queues()
        tick_sched_timer()
         tick_sched_handle()
          update_process_times()
           rcu_sched_clock_irq()
    
    This triggers RCU stalls on our devices during VM resume.
    
    If tick_sched_handle()->rcu_sched_clock_irq() runs on a VCPU
    before watchdog_timer_fn()->kvm_check_and_clear_guest_paused()
    then there is nothing on this VCPU that touches watchdogs and
    RCU reads stale gp stall timestamp and new jiffies value, which
    makes it think that RCU has stalled.
    
    Make RCU stall watchdog aware of PVCLOCK_GUEST_STOPPED and
    don't report RCU stalls when we resume the VM.
    Signed-off-by: default avatarSergey Senozhatsky <senozhatsky@chromium.org>
    Signed-off-by: default avatarSigned-off-by: Paul E. McKenney <paulmck@kernel.org>
    ccfc9dd6
tree_stall.h 29.6 KB