• Dave Chinner's avatar
    xfs: prevent NMI timeouts in cmn_err · 73efe4a4
    Dave Chinner authored
    We currently have a global error message buffer in cmn_err that is
    protected by a spin lock that disables interrupts.  Recently there
    have been reports of NMI timeouts occurring when the console is
    being flooded by SCSI error reports due to cmn_err() getting stuck
    trying to print to the console while holding this lock (i.e. with
    interrupts disabled). The NMI watchdog is seeing this CPU as
    non-responding and so is triggering a panic.  While the trigger for
    the reported case is SCSI errors, pretty much anything that spams
    the kernel log could cause this to occur.
    
    Realistically the only reason that we have the intemediate message
    buffer is to prepend the correct kernel log level prefix to the log
    message. The only reason we have the lock is to protect the global
    message buffer and the only reason the message buffer is global is
    to keep it off the stack. Hence if we can avoid needing a global
    message buffer we avoid needing the lock, and we can do this with a
    small amount of cleanup and some preprocessor tricks:
    
    	1. clean up xfs_cmn_err() panic mask functionality to avoid
    	   needing debug code in xfs_cmn_err()
    	2. remove the couple of "!" message prefixes that still exist that
    	   the existing cmn_err() code steps over.
    	3. redefine CE_* levels directly to KERN_*
    	4. redefine cmn_err() and friends to use printk() directly
    	   via variable argument length macros.
    
    By doing this, we can completely remove the cmn_err() code and the
    lock that is causing the problems, and rely solely on printk()
    serialisation to ensure that we don't get garbled messages.
    
    A series of followup patches is really needed to clean up all the
    cmn_err() calls and related messages properly, but that results in a
    series that is not easily back portable to enterprise kernels. Hence
    this initial fix is only to address the direct problem in the lowest
    impact way possible.
    Signed-off-by: default avatarDave Chinner <dchinner@redhat.com>
    Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
    Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
    73efe4a4
debug.h 1.81 KB