• Kemi Wang's avatar
    mm: change the call sites of numa statistics items · 3a321d2a
    Kemi Wang authored
    Patch series "Separate NUMA statistics from zone statistics", v2.
    
    Each page allocation updates a set of per-zone statistics with a call to
    zone_statistics().  As discussed in 2017 MM summit, these are a
    substantial source of overhead in the page allocator and are very rarely
    consumed.  This significant overhead in cache bouncing caused by zone
    counters (NUMA associated counters) update in parallel in multi-threaded
    page allocation (pointed out by Dave Hansen).
    
    A link to the MM summit slides:
      http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf
    
    To mitigate this overhead, this patchset separates NUMA statistics from
    zone statistics framework, and update NUMA counter threshold to a fixed
    size of MAX_U16 - 2, as a small threshold greatly increases the update
    frequency of the global counter from local per cpu counter (suggested by
    Ying Huang).  The rationality is that these statistics counters don't
    need to be read often, unlike other VM counters, so it's not a problem
    to use a large threshold and make readers more expensive.
    
    With this patchset, we see 31.3% drop of CPU cycles(537-->369, see
    below) for per single page allocation and reclaim on Jesper's
    page_bench03 benchmark.  Meanwhile, this patchset keeps the same style
    of virtual memory statistics with little end-user-visible effects (only
    move the numa stats to show behind zone page stats, see the first patch
    for details).
    
    I did an experiment of single page allocation and reclaim concurrently
    using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based
    server (88 processors with 126G memory) with different size of threshold
    of pcp counter.
    
    Benchmark provided by Jesper D Brouer(increase loop times to 10000000):
      https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
    
       Threshold   CPU cycles    Throughput(88 threads)
          32        799         241760478
          64        640         301628829
          125       537         358906028 <==> system by default
          256       468         412397590
          512       428         450550704
          4096      399         482520943
          20000     394         489009617
          30000     395         488017817
          65533     369(-31.3%) 521661345(+45.3%) <==> with this patchset
          N/A       342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics
    
    This patch (of 3):
    
    In this patch, NUMA statistics is separated from zone statistics
    framework, all the call sites of NUMA stats are changed to use
    numa-stats-specific functions, it does not have any functionality change
    except that the number of NUMA stats is shown behind zone page stats
    when users *read* the zone info.
    
    E.g. cat /proc/zoneinfo
        ***Base***                           ***With this patch***
    nr_free_pages 3976                         nr_free_pages 3976
    nr_zone_inactive_anon 0                    nr_zone_inactive_anon 0
    nr_zone_active_anon 0                      nr_zone_active_anon 0
    nr_zone_inactive_file 0                    nr_zone_inactive_file 0
    nr_zone_active_file 0                      nr_zone_active_file 0
    nr_zone_unevictable 0                      nr_zone_unevictable 0
    nr_zone_write_pending 0                    nr_zone_write_pending 0
    nr_mlock     0                             nr_mlock     0
    nr_page_table_pages 0                      nr_page_table_pages 0
    nr_kernel_stack 0                          nr_kernel_stack 0
    nr_bounce    0                             nr_bounce    0
    nr_zspages   0                             nr_zspages   0
    numa_hit 0                                *nr_free_cma  0*
    numa_miss 0                                numa_hit     0
    numa_foreign 0                             numa_miss    0
    numa_interleave 0                          numa_foreign 0
    numa_local   0                             numa_interleave 0
    numa_other   0                             numa_local   0
    *nr_free_cma 0*                            numa_other 0
        ...                                        ...
    vm stats threshold: 10                     vm stats threshold: 10
        ...                                        ...
    
    The next patch updates the numa stats counter size and threshold.
    
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1503568801-21305-2-git-send-email-kemi.wang@intel.comSigned-off-by: default avatarKemi Wang <kemi.wang@intel.com>
    Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
    Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Christopher Lameter <cl@linux.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Andi Kleen <andi.kleen@intel.com>
    Cc: Ying Huang <ying.huang@intel.com>
    Cc: Aaron Lu <aaron.lu@intel.com>
    Cc: Tim Chen <tim.c.chen@intel.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    3a321d2a
node.c 18.6 KB