1. 01 Jun, 2017 31 commits
  2. 31 May, 2017 9 commits
    • David S. Miller's avatar
      Merge branch 'bpf-stack-tracker' · 6c21a2a6
      David S. Miller authored
      Alexei Starovoitov says:
      
      ====================
      bpf: stack depth tracking
      
      Introduce tracking of bpf program stack depth in the verifier and use that
      info to reduce bpf program stack consumption in the interpreter and x64 JIT.
      Other JITs can take advantage of it as well in the future.
      Most of the programs consume very little stack, so it's good optimization
      in general and it's the first step toward bpf to bpf function calls.
      
      Also use internal opcode for bpf_tail_call() marking to make clear
      that jmp|call|x opcode is not uapi and may be used for actual
      indirect call opcode in the future.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c21a2a6
    • Alexei Starovoitov's avatar
      bpf: take advantage of stack_depth tracking in x64 JIT · 2960ae48
      Alexei Starovoitov authored
      Take advantage of stack_depth tracking in x64 JIT.
      Round up allocated stack by 8 bytes to make sure it stays aligned
      for functions called from JITed bpf program.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2960ae48
    • Alexei Starovoitov's avatar
      bpf: change x86 JITed program stack layout · 177366bf
      Alexei Starovoitov authored
      in order to JIT programs with different stack sizes we need to
      make epilogue and exception path to be stack size independent,
      hence move auxiliary stack space from the bottom of the stack
      to the top of the stack.
      Nice side effect is that JITed function prologue becomes shorter
      due to imm8 offset encoding vs imm32.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      177366bf
    • Alexei Starovoitov's avatar
      bpf: use different interpreter depending on required stack size · b870aa90
      Alexei Starovoitov authored
      16 __bpf_prog_run() interpreters for various stack sizes add .text
      but not a lot comparing to run-time stack savings
      
         text	   data	    bss	    dec	    hex	filename
        26350   10328     624   37302    91b6 kernel/bpf/core.o.before_split
        25777   10328     624   36729    8f79 kernel/bpf/core.o.after_split
        26970	  10328	    624	  37922	   9422	kernel/bpf/core.o.now
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b870aa90
    • Alexei Starovoitov's avatar
      bpf: fix stack_depth usage by test_bpf.ko · 105c0361
      Alexei Starovoitov authored
      test_bpf.ko doesn't call verifier before selecting interpreter or JITing,
      hence the tests need to manually specify the amount of stack they consume.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      105c0361
    • Alexei Starovoitov's avatar
      bpf: track stack depth of classic bpf programs · 50bbfed9
      Alexei Starovoitov authored
      To track stack depth of classic bpf programs we only need
      to analyze ST|STX instructions, since check_load_and_stores()
      verifies that programs can load from stack only after write.
      
      We also need to change the way cBPF stack slots map to eBPF stack,
      since typical classic programs are using slots 0 and 1, so they
      need to map to stack offsets -4 and -8 respectively in order
      to take advantage of small stack interpreter and JITs.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50bbfed9
    • Alexei Starovoitov's avatar
      bpf: reconcile bpf_tail_call and stack_depth · 80a58d02
      Alexei Starovoitov authored
      The next set of patches will take advantage of stack_depth tracking,
      so make sure that the program that does bpf_tail_call() has
      stack depth large enough for the callee.
      We could have tracked the stack depth of the prog_array owner program
      and only allow insertion of the programs with stack depth less
      than the owner, but it will break existing applications.
      Some of them have trivial root bpf program that only does
      multiple bpf_tail_calls and at init time the prog array is empty.
      In the future we may add a flag to do such tracking optionally,
      but for now play simple and safe.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      80a58d02
    • Alexei Starovoitov's avatar
      bpf: teach verifier to track stack depth · 8726679a
      Alexei Starovoitov authored
      teach verifier to track bpf program stack depth
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8726679a
    • Alexei Starovoitov's avatar
      bpf: split bpf core interpreter · f696b8f4
      Alexei Starovoitov authored
      split __bpf_prog_run() interpreter into stack allocation and execution parts.
      The code section shrinks which helps interpreter performance in some cases.
         text	   data	    bss	    dec	    hex	filename
        26350	  10328	    624	  37302	   91b6	kernel/bpf/core.o.before
        25777	  10328	    624	  36729	   8f79	kernel/bpf/core.o.after
      
      Very short programs got slower (due to extra function call):
      Before:
      test_bpf: #89 ALU64_ADD_K: 1 + 2 = 3 jited:0 7 PASS
      test_bpf: #90 ALU64_ADD_K: 3 + 0 = 3 jited:0 8 PASS
      test_bpf: #91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 7 PASS
      test_bpf: #92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 11 PASS
      test_bpf: #93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 7 PASS
      After:
      test_bpf: #89 ALU64_ADD_K: 1 + 2 = 3 jited:0 11 PASS
      test_bpf: #90 ALU64_ADD_K: 3 + 0 = 3 jited:0 11 PASS
      test_bpf: #91 ALU64_ADD_K: 1 + 2147483646 = 2147483647 jited:0 11 PASS
      test_bpf: #92 ALU64_ADD_K: 4294967294 + 2 = 4294967296 jited:0 14 PASS
      test_bpf: #93 ALU64_ADD_K: 2147483646 + -2147483647 = -1 jited:0 10 PASS
      
      Longer programs got faster:
      Before:
      test_bpf: #266 BPF_MAXINSNS: Ctx heavy transformations jited:0 20286 20513 PASS
      test_bpf: #267 BPF_MAXINSNS: Call heavy transformations jited:0 31853 31768 PASS
      test_bpf: #268 BPF_MAXINSNS: Jump heavy test jited:0 9815 PASS
      test_bpf: #269 BPF_MAXINSNS: Very long jump backwards jited:0 6 PASS
      test_bpf: #270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13959 PASS
      test_bpf: #271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 210 PASS
      test_bpf: #272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 21724 PASS
      test_bpf: #273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19118 PASS
      After:
      test_bpf: #266 BPF_MAXINSNS: Ctx heavy transformations jited:0 19008 18827 PASS
      test_bpf: #267 BPF_MAXINSNS: Call heavy transformations jited:0 29238 28450 PASS
      test_bpf: #268 BPF_MAXINSNS: Jump heavy test jited:0 9485 PASS
      test_bpf: #269 BPF_MAXINSNS: Very long jump backwards jited:0 12 PASS
      test_bpf: #270 BPF_MAXINSNS: Edge hopping nuthouse jited:0 13257 PASS
      test_bpf: #271 BPF_MAXINSNS: Jump, gap, jump, ... jited:0 213 PASS
      test_bpf: #272 BPF_MAXINSNS: ld_abs+get_processor_id jited:0 19389 PASS
      test_bpf: #273 BPF_MAXINSNS: ld_abs+vlan_push/pop jited:0 19583 PASS
      
      For real world production programs the difference is noise.
      
      This patch is first step towards reducing interpreter stack consumption.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f696b8f4