• Alexey Starikovskiy's avatar
    Execute AML Notify() requests on stack. · 5f7748cf
    Alexey Starikovskiy authored
    HP nx6125/nx6325/... machines have a _GPE handler with an infinite
    loop sending Notify() events to different ACPI subsystems.
    
    The notify handler in the ACPI thermal driver is a C-routine,
    which may invoke the ACPI interpreter again to get access
    to some ACPI variables such as temperature.  (acpi_evaluate_xxx)
    On these HP machines such an evaluation changes state of an ASL variable
    and lets the loop above break.
    
    In the current ACPI implementation, Notify requests are being deferred
    to the same kacpid workqueue on which the above GPE handler with
    infinite loop is executing. Thus we have a deadlock -- loop will
    continue to spin, sending notify events, and at the same time
    preventing these notify events from being run on a workqueue. All
    notify events are deferred, thus we see explosion in memory consumption.
    
    Also as GPE handling is blocked, machines overheat because ACPI-based
    fan control is stalled.  Eventually by external poll of the same
    acpi_evaluate, kacpid is released and all the queued notify events are
    free to run, thus 100% CPU utilization by kacpid for several seconds
    or more.
    
    To prevent this failure,  Linux must not send notify events to the
    kacpid workqueue -- either executing them immediately or putting them
    on some other thread.
    
    The first attempt to create a new thread was done by Peter Wainwright
    He created a bunch of threads, which were stealing work from a kacpid
    workqueue.
    This patch appeared in 2.6.15-based kernel shipped with Ubuntu 6.06 LTS.
    
    Second attempt was done by Alexey Starikovskiy, who created a new thread
    for each Notify event. This worked OK on HP nx machines,
    but broke Linus' Compaq n620c, by producing threads with a speed what
    they stopped the machine completely.
    Thus this patch was reverted from 2.6.18-rc2.
    
    Alexey re-made the patch to create second workqueue just for notify events,
    thus hopping it will not break Linus' machine. Patch was tested on the
    same HP nx machines in #5534 and #7122, but this broke Linus' machine
    also and was reverted from 2.6.19-rc with much fanfair.
    
    The 4th patch inserted schedule_timeout(1) into deferred
    execution of kacpid, if we had any notify requests pending, but Linus
    decided that it was too complex (involved either changes to workqueue
    to see if it's empty or atomic inc/dec).  Then a 5th attempt did a
    yield() to every GPE execution.
    
    Finally, this 6th generation patch simply executes the notify handler
    on the stack.  Previous attempts to do this simple solution failed
    because of issues in AML mutex re-entrancy which are now fixed
    by the previous patch in this series.
    
    http://bugzilla.kernel.org/show_bug.cgi?id=5534Signed-off-by: default avatarAlexey Starikovskiy <alexey.y.starikovskiy@linux.intel.com>
    Signed-off-by: default avatarLen Brown <len.brown@intel.com>
    5f7748cf
evmisc.c 16.7 KB