• Uladzislau Rezki (Sony)'s avatar
    mm/vmalloc: rework vmap_area_lock · e36176be
    Uladzislau Rezki (Sony) authored
    With the new allocation approach introduced in the 5.2 kernel, it
    becomes possible to get rid of one global spinlock.  By doing that we
    can further improve the KVA from the performance point of view.
    
    Basically we can have two independent locks, one for allocation part and
    another one for deallocation, because of two different entities: "free
    data structures" and "busy data structures".
    
    As a result, allocation/deallocation operations can still interfere
    between each other in case of running simultaneously on different CPUs,
    it means there is still dependency, but with two locks it becomes lower.
    
    Summarizing:
      - it reduces the high lock contention
      - it allows to perform operations on "free" and "busy"
        trees in parallel on different CPUs. Please note it
        does not solve scalability issue.
    
    Test results:
    
    In order to evaluate this patch, we can run "vmalloc test driver" to see
    how many CPU cycles it takes to complete all test cases running
    sequentially.  All online CPUs run it so it will cause a high lock
    contention.
    
    HiKey 960, ARM64, 8xCPUs, big.LITTLE:
    
    <snip>
        sudo ./test_vmalloc.sh sequential_test_order=1
    <snip>
    
    <default>
    [  390.950557] All test took CPU0=457126382 cycles
    [  391.046690] All test took CPU1=454763452 cycles
    [  391.128586] All test took CPU2=454539334 cycles
    [  391.222669] All test took CPU3=455649517 cycles
    [  391.313946] All test took CPU4=388272196 cycles
    [  391.410425] All test took CPU5=384036264 cycles
    [  391.492219] All test took CPU6=387432964 cycles
    [  391.578433] All test took CPU7=387201996 cycles
    <default>
    
    <patched>
    [  304.721224] All test took CPU0=391521310 cycles
    [  304.821219] All test took CPU1=393533002 cycles
    [  304.917120] All test took CPU2=392243032 cycles
    [  305.008986] All test took CPU3=392353853 cycles
    [  305.108944] All test took CPU4=297630721 cycles
    [  305.196406] All test took CPU5=297548736 cycles
    [  305.288602] All test took CPU6=297092392 cycles
    [  305.381088] All test took CPU7=297293597 cycles
    <patched>
    
    ~14%-23% patched variant is better.
    
    Link: http://lkml.kernel.org/r/20191022155800.20468-1-urezki@gmail.comSigned-off-by: default avatarUladzislau Rezki (Sony) <urezki@gmail.com>
    Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e36176be
vmalloc.c 91.5 KB