• Paolo Bonzini's avatar
    KVM: MMU: speedup update_permission_bitmask · 09f037aa
    Paolo Bonzini authored
    update_permission_bitmask currently does a 128-iteration loop to,
    essentially, compute a constant array.  Computing the 8 bits in parallel
    reduces it to 16 iterations, and is enough to speed it up substantially
    because many boolean operations in the inner loop become constants or
    simplify noticeably.
    
    Because update_permission_bitmask is actually the top item in the profile
    for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
    clock cycles, or up to 30%:
    
                                             before     after
       cpuid                                 35173      25954
       vmcall                                35122      27079
       inl_from_pmtimer                      52635      42675
       inl_from_qemu                         53604      44599
       inl_from_kernel                       38498      30798
       outl_to_kernel                        34508      28816
       wr_tsc_adjust_msr                     34185      26818
       rd_tsc_adjust_msr                     37409      27049
       mmio-no-eventfd:pci-mem               50563      45276
       mmio-wildcard-eventfd:pci-mem         34495      30823
       mmio-datamatch-eventfd:pci-mem        35612      31071
       portio-no-eventfd:pci-io              44925      40661
       portio-wildcard-eventfd:pci-io        29708      27269
       portio-datamatch-eventfd:pci-io       31135      27164
    
    (I wrote a small C program to compare the tables for all values of CR0.WP,
    CR4.SMAP and CR4.SMEP, and they match).
    Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
    09f037aa
mmu.c 140 KB