• Lance Yang's avatar
    mm/khugepaged: bypassing unnecessary scans with MMF_DISABLE_THP check · 879c6000
    Lance Yang authored
    khugepaged scans the entire address space in the background for each
    given mm, looking for opportunities to merge sequences of basic pages
    into huge pages.  However, when an mm is inserted to the mm_slots list,
    and the MMF_DISABLE_THP flag is set later, this scanning process
    becomes unnecessary for that mm and can be skipped to avoid redundant
    operations, especially in scenarios with a large address space.
    
    On an Intel Core i5 CPU, the time taken by khugepaged to scan the
    address space of the process, which has been set with the
    MMF_DISABLE_THP flag after being added to the mm_slots list, is as
    follows (shorter is better):
    
    VMA Count |   Old   |   New   |  Change
    ---------------------------------------
        50    |   23us  |    9us  |  -60.9%
       100    |   32us  |    9us  |  -71.9%
       200    |   44us  |    9us  |  -79.5%
       400    |   75us  |    9us  |  -88.0%
       800    |   98us  |    9us  |  -90.8%
    
    Once the count of VMAs for the process exceeds page_to_scan, khugepaged
    needs to wait for scan_sleep_millisecs ms before scanning the next
    process.  IMO, unnecessary scans could actually be skipped with a very
    inexpensive mm->flags check in this case.
    
    This commit introduces a check before each scanning process to test the
    MMF_DISABLE_THP flag for the given mm; if the flag is set, the scanning
    process is bypassed, thereby improving the efficiency of khugepaged.
    
    This optimization is not a correctness issue but rather an enhancement
    to save expensive checks on each VMA when userspace cannot prctl itself
    before spawning into the new process.
    
    On some servers within our company, we deploy a daemon responsible for
    monitoring and updating local applications.  Some applications prefer
    not to use THP, so the daemon calls prctl to disable THP before
    fork/exec.  Conversely, for other applications, the daemon calls prctl
    to enable THP before fork/exec.
    
    Ideally, the daemon should invoke prctl after the fork, but its current
    implementation follows the described approach.  In the Go standard
    library, there is no direct encapsulation of the fork system call;
    instead, fork and execve are combined into one through
    syscall.ForkExec.
    
    Link: https://lkml.kernel.org/r/20240129054551.57728-1-ioworker0@gmail.comSigned-off-by: default avatarLance Yang <ioworker0@gmail.com>
    Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Zach O'Keefe <zokeefe@google.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    879c6000
khugepaged.c 72 KB