• Alexei Starovoitov's avatar
    bpf: improve verification speed by droping states · 9f4686c4
    Alexei Starovoitov authored
    Branch instructions, branch targets and calls in a bpf program are
    the places where the verifier remembers states that led to successful
    verification of the program.
    These states are used to prune brute force program analysis.
    For unprivileged programs there is a limit of 64 states per such
    'branching' instructions (maximum length is tracked by max_states_per_insn
    counter introduced in the previous patch).
    Simply reducing this threshold to 32 or lower increases insn_processed
    metric to the point that small valid programs get rejected.
    For root programs there is no limit and cilium programs can have
    max_states_per_insn to be 100 or higher.
    Walking 100+ states multiplied by number of 'branching' insns during
    verification consumes significant amount of cpu time.
    Turned out simple LRU-like mechanism can be used to remove states
    that unlikely will be helpful in future search pruning.
    This patch introduces hit_cnt and miss_cnt counters:
    hit_cnt - this many times this state successfully pruned the search
    miss_cnt - this many times this state was not equivalent to other states
    (and that other states were added to state list)
    
    The heuristic introduced in this patch is:
    if (sl->miss_cnt > sl->hit_cnt * 3 + 3)
      /* drop this state from future considerations */
    
    Higher numbers increase max_states_per_insn (allow more states to be
    considered for pruning) and slow verification speed, but do not meaningfully
    reduce insn_processed metric.
    Lower numbers drop too many states and insn_processed increases too much.
    Many different formulas were considered.
    This one is simple and works well enough in practice.
    (the analysis was done on selftests/progs/* and on cilium programs)
    
    The end result is this heuristic improves verification speed by 10 times.
    Large synthetic programs that used to take a second more now take
    1/10 of a second.
    In cases where max_states_per_insn used to be 100 or more, now it's ~10.
    
    There is a slight increase in insn_processed for cilium progs:
                           before   after
    bpf_lb-DLB_L3.o 	1831	1838
    bpf_lb-DLB_L4.o 	3029	3218
    bpf_lb-DUNKNOWN.o 	1064	1064
    bpf_lxc-DDROP_ALL.o	26309	26935
    bpf_lxc-DUNKNOWN.o	33517	34439
    bpf_netdev.o		9713	9721
    bpf_overlay.o		6184	6184
    bpf_lcx_jit.o		37335	39389
    And 2-3 times improvement in the verification speed.
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
    Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
    9f4686c4
verifier.c 231 KB