• Michal Hocko's avatar
    mm, page_alloc: rip out ZONELIST_ORDER_ZONE · c9bff3ee
    Michal Hocko authored
    Patch series "cleanup zonelists initialization", v1.
    
    This is aimed at cleaning up the zonelists initialization code we have
    but the primary motivation was bug report [2] which got resolved but the
    usage of stop_machine is just too ugly to live.  Most patches are
    straightforward but 3 of them need a special consideration.
    
    Patch 1 removes zone ordered zonelists completely.  I am CCing linux-api
    because this is a user visible change.  As I argue in the patch
    description I do not think we have a strong usecase for it these days.
    I have kept sysctl in place and warn into the log if somebody tries to
    configure zone lists ordering.  If somebody has a real usecase for it we
    can revert this patch but I do not expect anybody will actually notice
    runtime differences.  This patch is not strictly needed for the rest but
    it made patch 6 easier to implement.
    
    Patch 7 removes stop_machine from build_all_zonelists without adding any
    special synchronization between iterators and updater which I _believe_
    is acceptable as explained in the changelog.  I hope I am not missing
    anything.
    
    Patch 8 then removes zonelists_mutex which is kind of ugly as well and
    not really needed AFAICS but a care should be taken when double checking
    my thinking.
    
    This patch (of 9):
    
    Supporting zone ordered zonelists costs us just a lot of code while the
    usefulness is arguable if existent at all.  Mel has already made node
    ordering default on 64b systems.  32b systems are still using
    ZONELIST_ORDER_ZONE because it is considered better to fallback to a
    different NUMA node rather than consume precious lowmem zones.
    
    This argument is, however, weaken by the fact that the memory reclaim
    has been reworked to be node rather than zone oriented.  This means that
    lowmem requests have to skip over all highmem pages on LRUs already and
    so zone ordering doesn't save the reclaim time much.  So the only
    advantage of the zone ordering is under a light memory pressure when
    highmem requests do not ever hit into lowmem zones and the lowmem
    pressure doesn't need to reclaim.
    
    Considering that 32b NUMA systems are rather suboptimal already and it
    is generally advisable to use 64b kernel on such a HW I believe we
    should rather care about the code maintainability and just get rid of
    ZONELIST_ORDER_ZONE altogether.  Keep systcl in place and warn if
    somebody tries to set zone ordering either from kernel command line or
    the sysctl.
    
    [mhocko@suse.com: reading vm.numa_zonelist_order will never terminate]
    Link: http://lkml.kernel.org/r/20170721143915.14161-2-mhocko@kernel.orgSigned-off-by: default avatarMichal Hocko <mhocko@suse.com>
    Acked-by: default avatarMel Gorman <mgorman@suse.de>
    Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Joonsoo Kim <js1304@gmail.com>
    Cc: Shaohua Li <shaohua.li@intel.com>
    Cc: Toshi Kani <toshi.kani@hpe.com>
    Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
    Cc: <linux-api@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    c9bff3ee
page_alloc.c 211 KB