• Andi Kleen's avatar
    HWPOISON: Add soft page offline support · facb6011
    Andi Kleen authored
    This is a simpler, gentler variant of memory_failure() for soft page
    offlining controlled from user space.  It doesn't kill anything, just
    tries to invalidate and if that doesn't work migrate the
    page away.
    
    This is useful for predictive failure analysis, where a page has
    a high rate of corrected errors, but hasn't gone bad yet. Instead
    it can be offlined early and avoided.
    
    The offlining is controlled from sysfs, including a new generic
    entry point for hard page offlining for symmetry too.
    
    We use the page isolate facility to prevent re-allocation
    race. Normally this is only used by memory hotplug. To avoid
    races with memory allocation I am using lock_system_sleep().
    This avoids the situation where memory hotplug is about
    to isolate a page range and then hwpoison undoes that work.
    This is a big hammer currently, but the simplest solution
    currently.
    
    When the page is not free or LRU we try to free pages
    from slab and other caches. The slab freeing is currently
    quite dumb and does not try to focus on the specific slab
    cache which might own the page. This could be potentially
    improved later.
    
    Thanks to Fengguang Wu and Haicheng Li for some fixes.
    
    [Added fix from Andrew Morton to adapt to new migrate_pages prototype]
    Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
    facb6011
memory-failure.c 33.5 KB