• Christian Borntraeger's avatar
    stop_machine: make stop_machine_run more virtualization friendly · 3401a61e
    Christian Borntraeger authored
    On kvm I have seen some rare hangs in stop_machine when I used more guest
    cpus than hosts cpus. e.g. 32 guest cpus on 1 host cpu triggered the
    hang quite often. I could also reproduce the problem on a 4 way z/VM host with
    a 64 way guest.
    
    It turned out that the guest was consuming all available cpus mostly for
    spinning on scheduler locks like rq->lock. This is expected as the threads are
    calling yield all the time.
    The problem is now, that the host scheduling decisings together with the guest
    scheduling decisions and spinlocks not being fair managed to create an
    interesting scenario similar to a live lock. (Sometimes the hang resolved
    itself after some minutes)
    
    Changing stop_machine to yield the cpu to the hypervisor when yielding inside
    the guest fixed the problem for me. While I am not completely happy with this
    patch, I think it causes no harm and it really improves the situation for me.
    
    I used cpu_relax for yielding to the hypervisor, does that work on all
    architectures?
    
    p.s.: If you want to reproduce the problem, cpu hotplug and kprobes use
    stop_machine_run and both triggered the problem after some retries.
    Signed-off-by: default avatarChristian Borntraeger <borntraeger@de.ibm.com>
    CC: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
    3401a61e
stop_machine.c 4.95 KB