• Hugh Dickins's avatar
    ksm: make !merge_across_nodes migration safe · 4146d2d6
    Hugh Dickins authored
    The new KSM NUMA merge_across_nodes knob introduces a problem, when it's
    set to non-default 0: if a KSM page is migrated to a different NUMA node,
    how do we migrate its stable node to the right tree?  And what if that
    collides with an existing stable node?
    
    ksm_migrate_page() can do no more than it's already doing, updating
    stable_node->kpfn: the stable tree itself cannot be manipulated without
    holding ksm_thread_mutex.  So accept that a stable tree may temporarily
    indicate a page belonging to the wrong NUMA node, leave updating until the
    next pass of ksmd, just be careful not to merge other pages on to a
    misplaced page.  Note nid of holding tree in stable_node, and recognize
    that it will not always match nid of kpfn.
    
    A misplaced KSM page is discovered, either when ksm_do_scan() next comes
    around to one of its rmap_items (we now have to go to cmp_and_merge_page
    even on pages in a stable tree), or when stable_tree_search() arrives at a
    matching node for another page, and this node page is found misplaced.
    
    In each case, move the misplaced stable_node to a list of migrate_nodes
    (and use the address of migrate_nodes as magic by which to identify them):
    we don't need them in a tree.  If stable_tree_search() finds no match for
    a page, but it's currently exiled to this list, then slot its stable_node
    right there into the tree, bringing all of its mappings with it; otherwise
    they get migrated one by one to the original page of the colliding node.
    stable_tree_search() is now modelled more like stable_tree_insert(), in
    order to handle these insertions of migrated nodes.
    
    remove_node_from_stable_tree(), remove_all_stable_nodes() and
    ksm_check_stable_tree() have to handle the migrate_nodes list as well as
    the stable tree itself.  Less obviously, we do need to prune the list of
    stale entries from time to time (scan_get_next_rmap_item() does it once
    each full scan): whereas stale nodes in the stable tree get naturally
    pruned as searches try to brush past them, these migrate_nodes may get
    forgotten and accumulate.
    Signed-off-by: default avatarHugh Dickins <hughd@google.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Petr Holasek <pholasek@redhat.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Izik Eidus <izik.eidus@ravellosystems.com>
    Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    4146d2d6
ksm.c 64.1 KB