• Donet Tom's avatar
    mm/mempolicy: use numa_node_id() instead of cpu_to_node() · f8fd525b
    Donet Tom authored
    Patch series "Allow migrate on protnone reference with MPOL_PREFERRED_MANY
    policy:, v4.
    
    This patchset is to optimize the cross-socket memory access with
    MPOL_PREFERRED_MANY policy.
    
    To test this patch we ran the following test on a 3 node system.
     Node 0 - 2GB   - Tier 1
     Node 1 - 11GB  - Tier 1
     Node 6 - 10GB  - Tier 2
    
    Below changes are made to memcached to set the memory policy,
    It select Node0 and Node1 as preferred nodes.
    
       #include <numaif.h>
       #include <numa.h>
    
        unsigned long nodemask;
        int ret;
    
        nodemask = 0x03;
        ret = set_mempolicy(MPOL_PREFERRED_MANY | MPOL_F_NUMA_BALANCING,
                                                   &nodemask, 10);
        /* If MPOL_F_NUMA_BALANCING isn't supported,
         * fall back to MPOL_PREFERRED_MANY */
        if (ret < 0 && errno == EINVAL){
           printf("set mem policy normal\n");
            ret = set_mempolicy(MPOL_PREFERRED_MANY, &nodemask, 10);
        }
        if (ret < 0) {
           perror("Failed to call set_mempolicy");
           exit(-1);
        }
    
    Test Procedure:
    ===============
    1. Make sure memory tiring and demotion are enabled.
    2. Start memcached.
    
       # ./memcached -b 100000 -m 204800 -u root -c 1000000 -t 7
           -d -s "/tmp/memcached.sock"
    
    3. Run memtier_benchmark to store 3200000 keys.
    
      #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
        --threads=1 --pipeline=1 --ratio=1:0 --key-pattern=S:S --key-minimum=1
        --key-maximum=3200000 -n allkeys -c 1 -R -x 1 -d 1024
    
    4. Start a memory eater on node 0 and 1. This will demote all memcached
       pages to node 6.
    5. Make sure all the memcached pages got demoted to lower tier by reading
       /proc/<memcaced PID>/numa_maps.
    
        # cat /proc/2771/numa_maps
         ---
        default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
        default anon=1009 dirty=1009 active=0 N6=1009 kernelpagesize_kB=64
         ---
    
    6. Kill memory eater.
    7. Read the pgpromote_success counter.
    8. Start reading the keys by running memtier_benchmark.
    
      #./memtier_benchmark -S "/tmp/memcached.sock" --protocol=memcache_binary
       --pipeline=1 --distinct-client-seed --ratio=0:3 --key-pattern=R:R
       --key-minimum=1 --key-maximum=3200000 -n allkeys
       --threads=64 -c 1 -R -x 6
    
    9. Read the pgpromote_success counter.
    
    Test Results:
    =============
    Without Patch
    ------------------
    1. pgpromote_success  before test
    Node 0:  pgpromote_success 11
    Node 1:  pgpromote_success 140974
    
    pgpromote_success  after test
    Node 0:  pgpromote_success 11
    Node 1:  pgpromote_success 140974
    
    2. Memtier-benchmark result.
    AGGREGATED AVERAGE RESULTS (6 runs)
    ==================================================================
    Type    Ops/sec   Hits/sec   Misses/sec  Avg. Latency  p50 Latency
    ------------------------------------------------------------------
    Sets     0.00       ---         ---        ---          ---
    Gets    305792.03  305791.93   0.10       0.18949       0.16700
    Waits    0.00       ---         ---        ---          ---
    Totals  305792.03  305791.93   0.10       0.18949       0.16700
    
    ======================================
    p99 Latency  p99.9 Latency  KB/sec
    -------------------------------------
    ---          ---            0.00
    0.44700     1.71100        11542.69
    ---           ---            ---
    0.44700     1.71100        11542.69
    
    With Patch
    ---------------
    1. pgpromote_success  before test
    Node 0:  pgpromote_success 5
    Node 1:  pgpromote_success 89386
    
    pgpromote_success  after test
    Node 0:  pgpromote_success 57895
    Node 1:  pgpromote_success 141463
    
    2. Memtier-benchmark result.
    AGGREGATED AVERAGE RESULTS (6 runs)
    ====================================================================
    Type    Ops/sec    Hits/sec  Misses/sec  Avg. Latency  p50 Latency
    --------------------------------------------------------------------
    Sets     0.00        ---       ---        ---           ---
    Gets    521942.24  521942.07  0.17       0.11459        0.10300
    Waits    0.00        ---       ---         ---          ---
    Totals  521942.24  521942.07  0.17       0.11459        0.10300
    
    =======================================
    p99 Latency  p99.9 Latency  KB/sec
    ---------------------------------------
     ---          ---            0.00
    0.23100      0.31900        19701.68
    ---          ---             ---
    0.23100      0.31900        19701.68
    
    
    Test Result Analysis:
    =====================
    1. With patch we could observe pages are getting promoted.
    2. Memtier-benchmark results shows that, with the patch,
       performance has increased more than 50%.
    
     Ops/sec without fix -  305792.03
     Ops/sec with fix    -  521942.24
    
    
    This patch (of 2):
    
    Instead of using 'cpu_to_node()', we use 'numa_node_id()', which is
    quicker.  smp_processor_id is guaranteed to be stable in the
    'mpol_misplaced()' function because it is called with ptl held. 
    lockdep_assert_held was added to ensure that.
    
    No functional change in this patch.
    
    [donettom@linux.ibm.com: add "* @vmf: structure describing the fault" comment]
      Link: https://lkml.kernel.org/r/d8b993ea9dccfac0bc3ed61d3a81f4ac5f376e46.1711002865.git.donettom@linux.ibm.com
    Link: https://lkml.kernel.org/r/cover.1711373653.git.donettom@linux.ibm.com
    Link: https://lkml.kernel.org/r/6059f034f436734b472d066db69676fb3a459864.1711373653.git.donettom@linux.ibm.com
    Link: https://lkml.kernel.org/r/cover.1709909210.git.donettom@linux.ibm.com
    Link: https://lkml.kernel.org/r/744646531af02cc687cde8ae788fb1779e99d02c.1709909210.git.donettom@linux.ibm.comSigned-off-by: default avatarAneesh Kumar K.V (IBM) <aneesh.kumar@kernel.org>
    Signed-off-by: default avatarDonet Tom <donettom@linux.ibm.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Feng Tang <feng.tang@intel.com>
    Cc: Huang, Ying <ying.huang@intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    f8fd525b
huge_memory.c 101 KB