• Jie Meng's avatar
    bpf,x64: use shrx/sarx/shlx when available · 77d8f5d4
    Jie Meng authored
    BMI2 provides 3 shift instructions (shrx, sarx and shlx) that use VEX
    encoding but target general purpose registers [1]. They allow the shift
    count in any general purpose register and have the same performance as
    non BMI2 shift instructions [2].
    
    Instead of shr/sar/shl that implicitly use %cl (lowest 8 bit of %rcx),
    emit their more flexible alternatives provided in BMI2 when advantageous;
    keep using the non BMI2 instructions when shift count is already in
    BPF_REG_4/%rcx as non BMI2 instructions are shorter.
    
    To summarize, when BMI2 is available:
    -------------------------------------------------
                |   arbitrary dst
    =================================================
    src == ecx  |   shl dst, cl
    -------------------------------------------------
    src != ecx  |   shlx dst, dst, src
    -------------------------------------------------
    
    And no additional register shuffling is needed.
    
    A concrete example between non BMI2 and BMI2 codegen.  To shift %rsi by
    %rdi:
    
    Without BMI2:
    
     ef3:   push   %rcx
            51
     ef4:   mov    %rdi,%rcx
            48 89 f9
     ef7:   shl    %cl,%rsi
            48 d3 e6
     efa:   pop    %rcx
            59
    
    With BMI2:
    
     f0b:   shlx   %rdi,%rsi,%rsi
            c4 e2 c1 f7 f6
    
    [1] https://en.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set
    [2] https://www.agner.org/optimize/instruction_tables.pdfSigned-off-by: default avatarJie Meng <jmeng@fb.com>
    Link: https://lore.kernel.org/r/20221007202348.1118830-3-jmeng@fb.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    77d8f5d4
bpf_jit_comp.c 71.2 KB