• Jiong Wang's avatar
    nfp: bpf: implement memory bulk copy for length within 32-bytes · 9879a381
    Jiong Wang authored
    For NFP, we want to re-group a sequence of load/store pairs lowered from
    memcpy/memmove into single memory bulk operation which then could be
    accelerated using NFP CPP bus.
    
    This patch extends the existing load/store auxiliary information by adding
    two new fields:
    
    	struct bpf_insn *paired_st;
    	s16 ldst_gather_len;
    
    Both fields are supposed to be carried by the the load instruction at the
    head of the sequence. "paired_st" is the corresponding store instruction at
    the head and "ldst_gather_len" is the gathered length.
    
    If "ldst_gather_len" is negative, then the sequence is doing memory
    load/store in descending order, otherwise it is in ascending order. We need
    this information to detect overlapped memory access.
    
    This patch then optimize memory bulk copy when the copy length is within
    32-bytes.
    
    The strategy of read/write used is:
    
      * Read.
        Use read32 (direct_ref), always.
    
      * Write.
        - length <= 8-bytes
          write8 (direct_ref).
        - length <= 32-bytes and is 4-byte aligned
          write32 (direct_ref).
        - length <= 32-bytes but is not 4-byte aligned
          write8 (indirect_ref).
    
    NOTE: the optimization should not change program semantics. The destination
    register of the last load instruction should contain the same value before
    and after this optimization.
    Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
    Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    9879a381
nfp_asm.h 9.94 KB