• David Matlack's avatar
    KVM: x86/mmu: Split huge pages mapped by the TDP MMU on fault · c4b33d28
    David Matlack authored
    Now that the TDP MMU has a mechanism to split huge pages, use it in the
    fault path when a huge page needs to be replaced with a mapping at a
    lower level.
    
    This change reduces the negative performance impact of NX HugePages.
    Prior to this change if a vCPU executed from a huge page and NX
    HugePages was enabled, the vCPU would take a fault, zap the huge page,
    and mapping the faulting address at 4KiB with execute permissions
    enabled. The rest of the memory would be left *unmapped* and have to be
    faulted back in by the guest upon access (read, write, or execute). If
    guest is backed by 1GiB, a single execute instruction can zap an entire
    GiB of its physical address space.
    
    For example, it can take a VM longer to execute from its memory than to
    populate that memory in the first place:
    
    $ ./execute_perf_test -s anonymous_hugetlb_1gb -v96
    
    Populating memory             : 2.748378795s
    Executing from memory         : 2.899670885s
    
    With this change, such faults split the huge page instead of zapping it,
    which avoids the non-present faults on the rest of the huge page:
    
    $ ./execute_perf_test -s anonymous_hugetlb_1gb -v96
    
    Populating memory             : 2.729544474s
    Executing from memory         : 0.111965688s   <---
    
    This change also reduces the performance impact of dirty logging when
    eager_page_split=N. eager_page_split=N (abbreviated "eps=N" below) can
    be desirable for read-heavy workloads, as it avoids allocating memory to
    split huge pages that are never written and avoids increasing the TLB
    miss cost on reads of those pages.
    
                 | Config: ept=Y, tdp_mmu=Y, 5% writes           |
                 | Iteration 1 dirty memory time                 |
                 | --------------------------------------------- |
    vCPU Count   | eps=N (Before) | eps=N (After) | eps=Y        |
    ------------ | -------------- | ------------- | ------------ |
    2            | 0.332305091s   | 0.019615027s  | 0.006108211s |
    4            | 0.353096020s   | 0.019452131s  | 0.006214670s |
    8            | 0.453938562s   | 0.019748246s  | 0.006610997s |
    16           | 0.719095024s   | 0.019972171s  | 0.007757889s |
    32           | 1.698727124s   | 0.021361615s  | 0.012274432s |
    64           | 2.630673582s   | 0.031122014s  | 0.016994683s |
    96           | 3.016535213s   | 0.062608739s  | 0.044760838s |
    
    Eager page splitting remains beneficial for write-heavy workloads, but
    the gap is now reduced.
    
                 | Config: ept=Y, tdp_mmu=Y, 100% writes         |
                 | Iteration 1 dirty memory time                 |
                 | --------------------------------------------- |
    vCPU Count   | eps=N (Before) | eps=N (After) | eps=Y        |
    ------------ | -------------- | ------------- | ------------ |
    2            | 0.317710329s   | 0.296204596s  | 0.058689782s |
    4            | 0.337102375s   | 0.299841017s  | 0.060343076s |
    8            | 0.386025681s   | 0.297274460s  | 0.060399702s |
    16           | 0.791462524s   | 0.298942578s  | 0.062508699s |
    32           | 1.719646014s   | 0.313101996s  | 0.075984855s |
    64           | 2.527973150s   | 0.455779206s  | 0.079789363s |
    96           | 2.681123208s   | 0.673778787s  | 0.165386739s |
    
    Further study is needed to determine if the remaining gap is acceptable
    for customer workloads or if eager_page_split=N still requires a-priori
    knowledge of the VM workload, especially when considering these costs
    extrapolated out to large VMs with e.g. 416 vCPUs and 12TB RAM.
    Signed-off-by: default avatarDavid Matlack <dmatlack@google.com>
    Reviewed-by: default avatarMingwei Zhang <mizhang@google.com>
    Message-Id: <20221109185905.486172-3-dmatlack@google.com>
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    c4b33d28
tdp_mmu.c 57.3 KB