• Alexander Duyck's avatar
    mm: use mm_zero_struct_page from SPARC on all 64b architectures · 5470dea4
    Alexander Duyck authored
    Patch series "Deferred page init improvements", v7.
    
    This patchset is essentially a refactor of the page initialization logic
    that is meant to provide for better code reuse while providing a
    significant improvement in deferred page initialization performance.
    
    In my testing on an x86_64 system with 384GB of RAM I have seen the
    following.  In the case of regular memory initialization the deferred init
    time was decreased from 3.75s to 1.38s on average.  This amounts to a 172%
    improvement for the deferred memory initialization performance.
    
    I have called out the improvement observed with each patch.
    
    This patch (of 4):
    
    Use the same approach that was already in use on Sparc on all the
    architectures that support a 64b long.
    
    This is mostly motivated by the fact that 7 to 10 store/move instructions
    are likely always going to be faster than having to call into a function
    that is not specialized for handling page init.
    
    An added advantage to doing it this way is that the compiler can get away
    with combining writes in the __init_single_page call.  As a result the
    memset call will be reduced to only about 4 write operations, or at least
    that is what I am seeing with GCC 6.2 as the flags, LRU pointers, and
    count/mapcount seem to be cancelling out at least 4 of the 8 assignments
    on my system.
    
    One change I had to make to the function was to reduce the minimum page
    size to 56 to support some powerpc64 configurations.
    
    This change should introduce no change on SPARC since it already had this
    code.  In the case of x86_64 I saw a reduction from 3.75s to 2.80s when
    initializing 384GB of RAM per node.  Pavel Tatashin tested on a system
    with Broadcom's Stingray CPU and 48GB of RAM and found that
    __init_single_page() takes 19.30ns / 64-byte struct page before this patch
    and with this patch it takes 17.33ns / 64-byte struct page.  Mike Rapoport
    ran a similar test on a OpenPower (S812LC 8348-21C) with Power8 processor
    and 128GB or RAM.  His results per 64-byte struct page were 4.68ns before,
    and 4.59ns after this patch.
    
    Link: http://lkml.kernel.org/r/20190405221213.12227.9392.stgit@localhost.localdomainSigned-off-by: default avatarAlexander Duyck <alexander.h.duyck@linux.intel.com>
    Reviewed-by: default avatarPavel Tatashin <pavel.tatashin@microsoft.com>
    Acked-by: default avatarMichal Hocko <mhocko@suse.com>
    Cc: Mike Rapoport <rppt@linux.ibm.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Jiang <dave.jiang@intel.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Ingo Molnar <mingo@kernel.org>
    Cc: Khalid Aziz <khalid.aziz@oracle.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: <yi.z.zhang@linux.intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    5470dea4
pgtable_64.h 31.1 KB