1. 03 Aug, 2011 11 commits
    • Len Brown's avatar
      Merge branch 'apei' into apei-release · d0e323b4
      Len Brown authored
      Some trivial conflicts due to other various merges
      adding to the end of common lists sooner than this one.
      
      	arch/ia64/Kconfig
      	arch/powerpc/Kconfig
      	arch/x86/Kconfig
      	lib/Kconfig
      	lib/Makefile
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      d0e323b4
    • Huang Ying's avatar
      ACPI, APEI, EINJ Param support is disabled by default · c3e6088e
      Huang Ying authored
      EINJ parameter support is only usable for some specific BIOS.
      Originally, it is expected to have no harm for BIOS does not support
      it.  But now, we found it will cause issue (memory overwriting) for
      some BIOS.  So param support is disabled by default and only enabled
      when newly added module parameter named "param_extension" is
      explicitly specified.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Cc: Matthew Garrett <mjg@redhat.com>
      Acked-by: default avatarDon Zickus <dzickus@redhat.com>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      c3e6088e
    • Len Brown's avatar
      APEI GHES: 32-bit buildfix · 70cb6e1d
      Len Brown authored
      drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression
      drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression
      
      ghes.c:(.text+0x46289): undefined reference to `__udivdi3'
        in function ghes_estatus_cache_add().
      Reported-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      70cb6e1d
    • Len Brown's avatar
      ACPI: APEI build fix · a7e09d45
      Len Brown authored
      as GHES is optional...
      
      When # CONFIG_ACPI_APEI_GHES is not set:
      
      (.init.text+0x4c22): undefined reference to `ghes_disable'
      Reported-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Acked-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      a7e09d45
    • Huang Ying's avatar
      ACPI, APEI, GHES: Add hardware memory error recovery support · ba61ca4a
      Huang Ying authored
      memory_failure_queue() is called when recoverable memory errors are
      notified by firmware to do the recovery work.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      ba61ca4a
    • Huang Ying's avatar
      HWPoison: add memory_failure_queue() · ea8f5fb8
      Huang Ying authored
      memory_failure() is the entry point for HWPoison memory error
      recovery.  It must be called in process context.  But commonly
      hardware memory errors are notified via MCE or NMI, so some delayed
      execution mechanism must be used.  In MCE handler, a work queue + ring
      buffer mechanism is used.
      
      In addition to MCE, now APEI (ACPI Platform Error Interface) GHES
      (Generic Hardware Error Source) can be used to report memory errors
      too.  To add support to APEI GHES memory recovery, a mechanism similar
      to that of MCE is implemented.  memory_failure_queue() is the new
      entry point that can be called in IRQ context.  The next step is to
      make MCE handler uses this interface too.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      ea8f5fb8
    • Huang Ying's avatar
      ACPI, APEI, GHES, Error records content based throttle · 152cef40
      Huang Ying authored
      printk is used by GHES to report hardware errors.  Ratelimit is
      enforced on the printk to avoid too many hardware error reports in
      kernel log.  Because there may be thousands or even millions of
      corrected hardware errors during system running.
      
      Currently, a simple scheme is used.  That is, the total number of
      hardware error reporting is ratelimited.  This may cause some issues
      in practice.
      
      For example, there are two kinds of hardware errors occurred in
      system.  One is corrected memory error, because the fault memory
      address is accessed frequently, there may be hundreds error report
      per-second.  The other is corrected PCIe AER error, it will be
      reported once per-second.  Because they share one ratelimit control
      structure, it is highly possible that only memory error is reported.
      
      To avoid the above issue, an error record content based throttle
      algorithm is implemented in the patch.  Where after the first
      successful reporting, all error records that are same are throttled for
      some time, to let other kinds of error records have the opportunity to
      be reported.
      
      In above example, the memory errors will be throttled for some time,
      after being printked.  Then the PCIe AER error will be printked
      successfully.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      152cef40
    • Huang Ying's avatar
      ACPI, APEI, GHES, printk support for recoverable error via NMI · 67eb2e99
      Huang Ying authored
      Some APEI GHES recoverable errors are reported via NMI, but printk is
      not safe in NMI context.
      
      To solve the issue, a lock-less memory allocator is used to allocate
      memory in NMI handler, save the error record into the allocated
      memory, put the error record into a lock-less list.  On the other
      hand, an irq_work is used to delay the operation from NMI context to
      IRQ context.  The irq_work IRQ handler will remove nodes from
      lock-less list, printk the error record and do some further processing
      include recovery operation, then free the memory.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      67eb2e99
    • Huang Ying's avatar
      lib, Make gen_pool memory allocator lockless · 7f184275
      Huang Ying authored
      This version of the gen_pool memory allocator supports lockless
      operation.
      
      This makes it safe to use in NMI handlers and other special
      unblockable contexts that could otherwise deadlock on locks.  This is
      implemented by using atomic operations and retries on any conflicts.
      The disadvantage is that there may be livelocks in extreme cases.  For
      better scalability, one gen_pool allocator can be used for each CPU.
      
      The lockless operation only works if there is enough memory available.
      If new memory is added to the pool a lock has to be still taken.  So
      any user relying on locklessness has to ensure that sufficient memory
      is preallocated.
      
      The basic atomic operation of this allocator is cmpxchg on long.  On
      architectures that don't have NMI-safe cmpxchg implementation, the
      allocator can NOT be used in NMI handler.  So code uses the allocator
      in NMI handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Reviewed-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      7f184275
    • Huang Ying's avatar
      lib, Add lock-less NULL terminated single list · f49f23ab
      Huang Ying authored
      Cmpxchg is used to implement adding new entry to the list, deleting
      all entries from the list, deleting first entry of the list and some
      other operations.
      
      Because this is a single list, so the tail can not be accessed in O(1).
      
      If there are multiple producers and multiple consumers, llist_add can
      be used in producers and llist_del_all can be used in consumers.  They
      can work simultaneously without lock.  But llist_del_first can not be
      used here.  Because llist_del_first depends on list->first->next does
      not changed if list->first is not changed during its operation, but
      llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
      llist_add) sequence in another consumer may violate that.
      
      If there are multiple producers and one consumer, llist_add can be
      used in producers and llist_del_all or llist_del_first can be used in
      the consumer.
      
      This can be summarized as follow:
      
                 |   add    | del_first |  del_all
       add       |    -     |     -     |     -
       del_first |          |     L     |     L
       del_all   |          |           |     -
      
      Where "-" stands for no lock is needed, while "L" stands for lock is
      needed.
      
      The list entries deleted via llist_del_all can be traversed with
      traversing function such as llist_for_each etc.  But the list entries
      can not be traversed safely before deleted from the list.  The order
      of deleted entries is from the newest to the oldest added one.  If you
      want to traverse from the oldest to the newest, you must reverse the
      order by yourself before traversing.
      
      The basic atomic operation of this list is cmpxchg on long.  On
      architectures that don't have NMI-safe cmpxchg implementation, the
      list can NOT be used in NMI handler.  So code uses the list in NMI
      handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Reviewed-by: default avatarAndi Kleen <ak@linux.intel.com>
      Reviewed-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      f49f23ab
    • Huang Ying's avatar
      Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG · df013ffb
      Huang Ying authored
      cmpxchg() is widely used by lockless code, including NMI-safe lockless
      code.  But on some architectures, the cmpxchg() implementation is not
      NMI-safe, on these architectures the lockless code may need a
      spin_trylock_irqsave() based implementation.
      
      This patch adds a Kconfig option: ARCH_HAVE_NMI_SAFE_CMPXCHG, so that
      NMI-safe lockless code can depend on it or provide different
      implementation according to it.
      
      On many architectures, cmpxchg is only NMI-safe for several specific
      operand sizes. So, ARCH_HAVE_NMI_SAFE_CMPXCHG define in this patch
      only guarantees cmpxchg is NMI-safe for sizeof(unsigned long).
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      Acked-by: default avatarMike Frysinger <vapier@gentoo.org>
      Acked-by: default avatarPaul Mundt <lethal@linux-sh.org>
      Acked-by: default avatarHans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Acked-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: default avatarChris Metcalf <cmetcalf@tilera.com>
      Acked-by: default avatarRichard Henderson <rth@twiddle.net>
      CC: Mikael Starvik <starvik@axis.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      CC: Yoshinori Sato <ysato@users.sourceforge.jp>
      CC: Tony Luck <tony.luck@intel.com>
      CC: Hirokazu Takata <takata@linux-m32r.org>
      CC: Geert Uytterhoeven <geert@linux-m68k.org>
      CC: Michal Simek <monstr@monstr.eu>
      Acked-by: default avatarRalf Baechle <ralf@linux-mips.org>
      CC: Kyle McMartin <kyle@mcmartin.ca>
      CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
      CC: Chen Liqin <liqin.chen@sunplusct.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Ingo Molnar <mingo@redhat.com>
      CC: Chris Zankel <chris@zankel.net>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      df013ffb
  2. 02 Aug, 2011 2 commits
    • Oleg Nesterov's avatar
      oom: task->mm == NULL doesn't mean the memory was freed · c027a474
      Oleg Nesterov authored
      exit_mm() sets ->mm == NULL then it does mmput()->exit_mmap() which
      frees the memory.
      
      However select_bad_process() checks ->mm != NULL before TIF_MEMDIE,
      so it continues to kill other tasks even if we have the oom-killed
      task freeing its memory.
      
      Change select_bad_process() to check ->mm after TIF_MEMDIE, but skip
      the tasks which have already passed exit_notify() to ensure a zombie
      with TIF_MEMDIE set can't block oom-killer. Alternatively we could
      probably clear TIF_MEMDIE after exit_mmap().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c027a474
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6 · cfe22345
      Linus Torvalds authored
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: (23 commits)
        regulator: Improve WM831x DVS VSEL selection algorithm
        regulator: Bootstrap wm831x DVS VSEL value from ON VSEL if not already set
        regulator: Set up GPIO for WM831x VSEL before enabling VSEL mode
        regulator: Add EPEs to the MODULE_ALIAS() for wm831x-dcdc
        regulator: Fix WM831x DCDC DVS VSEL bootstrapping
        regulator: Fix WM831x regulator ID lookups for multiple WM831xs
        regulator: Fix argument format type errors in error prints
        regulator: Fix memory leak in set_machine_constraints() error paths
        regulator: Make core more chatty about some errors
        regulator: tps65910: Fix array access out of bounds bug
        regulator: tps65910: Add missing breaks in switch/case
        regulator: tps65910: Fix a memory leak in tps65910_probe error path
        regulator: TWL: Remove entry of RES_ID for 6030 macros
        ASoC: tlv320aic3x: Add correct hw registers to Line1 cross connect muxes
        regulator: Add basic per consumer debugfs
        regulator: Add rdev_crit() macro
        regulator: Refactor supply implementation to work as regular consumers
        regulator: Include the device name in the microamps_requested_ file
        regulator: Increase the limit on sysfs file names
        regulator: Properly register dummy regulator driver
        ...
      cfe22345
  3. 01 Aug, 2011 27 commits