• Hugh Dickins's avatar
    ksm: remove old stable nodes more thoroughly · cbf86cfe
    Hugh Dickins authored
    Switching merge_across_nodes after running KSM is liable to oops on stale
    nodes still left over from the previous stable tree.  It's not something
    that people will often want to do, but it would be lame to demand a reboot
    when they're trying to determine which merge_across_nodes setting is best.
    
    How can this happen?  We only permit switching merge_across_nodes when
    pages_shared is 0, and usually set run 2 to force that beforehand, which
    ought to unmerge everything: yet oopses still occur when you then run 1.
    
    Three causes:
    
    1. The old stable tree (built according to the inverse
       merge_across_nodes) has not been fully torn down.  A stable node
       lingers until get_ksm_page() notices that the page it references no
       longer references it: but the page is not necessarily freed as soon as
       expected, particularly when swapcache.
    
       Fix this with a pass through the old stable tree, applying
       get_ksm_page() to each of the remaining nodes (most found stale and
       removed immediately), with forced removal of any left over.  Unless the
       page is still mapped: I've not seen that case, it shouldn't occur, but
       better to WARN_ON_ONCE and EBUSY than BUG.
    
    2. __ksm_enter() has a nice little optimization, to insert the new mm
       just behind ksmd's cursor, so there's a full pass for it to stabilize
       (or be removed) before ksmd addresses it.  Nice when ksmd is running,
       but not so nice when we're trying to unmerge all mms: we were missing
       those mms forked and inserted behind the unmerge cursor.  Easily fixed
       by inserting at the end when KSM_RUN_UNMERGE.
    
    3.  It is possible for a KSM page to be faulted back from swapcache
       into an mm, just after unmerge_and_remove_all_rmap_items() scanned past
       it.  Fix this by copying on fault when KSM_RUN_UNMERGE: but that is
       private to ksm.c, so dissolve the distinction between
       ksm_might_need_to_copy() and ksm_does_need_to_copy(), doing it all in
       the one call into ksm.c.
    
    A long outstanding, unrelated bugfix sneaks in with that third fix:
    ksm_does_need_to_copy() would copy from a !PageUptodate page (implying I/O
    error when read in from swap) to a page which it then marks Uptodate.  Fix
    this case by not copying, letting do_swap_page() discover the error.
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Petr Holasek <pholasek@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Izik Eidus <izik.eidus@ravellosystems.com>
    Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
    Acked-by: default avatarMel Gorman <mgorman@suse.de>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    cbf86cfe
memory.c 113 KB