1. 25 May, 2019 18 commits
    • Alexei Starovoitov's avatar
      Merge branch 'optimize-zext' · 198ae936
      Alexei Starovoitov authored
      Jiong Wang says:
      
      ====================
      v9:
        - Split patch 5 in v8.
          make bpf uapi header file sync a separate patch. (Alexei)
      
      v8:
        - For stack slot read, mark them as REG_LIVE_READ64. (Alexei)
        - Change DEF_NOT_SUBREG from -1 to 0. (Alexei)
        - Rebased on top of latest bpf-next.
      
      v7:
        - Drop the first patch in v6, the one adding 32-bit return value and
          argument type. (Alexei)
        - Rename bpf_jit_hardware_zext to bpf_jit_needs_zext. (Alexei)
        - Use mov32 with imm == 1 to indicate it is zext. (Alexei)
        - JIT back-ends peephole next insn to optimize out unnecessary zext
          inserted by verifier. (Alexei)
        - Testing:
          + patch set tested (bpf selftest) on x64 host with llvm 9.0
            no regression observed no both JIT and interpreter modes.
          + patch set tested (bpf selftest) on x32 host.
            By Yanqing Wang, thanks!
            no regression observed on both JIT and interpreter modes.
          + patch set tested (bpf selftest) on RV64 host with llvm 9.0,
            By Björn Töpel, thanks!
            no regression observed before and after this set with JIT_ALWAYS_ON.
            test_progs_32 also enabled as LLVM 9.0 is used by Björn.
          + cross compiled the other affected targets, arm, PowerPC, SPARC, S390.
      
      v6:
        - Fixed s390 kbuild test robot error. (kbuild)
        - Make comment style in backends patches more consistent.
      
      v5:
        - Adjusted several test_verifier helpers to make them works on hosts
          w and w/o hardware zext. (Naveen)
        - Make sure zext flag not set when verifier by-passed, for example,
          libtest_bpf.ko. (Naveen)
        - Conservatively mark bpf main return value as 64-bit. (Alexei)
        - Make sure read flag is either READ64 or READ32, not the mix of both.
          (Alexei)
        - Merged patch 1 and 2 in v4. (Alexei)
        - Fixed kbuild test robot warning on NFP. (kbuild)
        - Proposed new BPF_ZEXT insn to have optimal code-gen for various JIT
          back-ends.
        - Conservately set zext flags for patched-insn.
        - Fixed return value zext for helper function calls.
        - Also adjusted test_verifier scalability unit test to avoid triggerring
          too many insn patch which will hang computer.
        - re-tested on x86 host with llvm 9.0, no regression on test_verifier,
          test_progs, test_progs_32.
        - re-tested offload target (nfp), no regression on local testsuite.
      
      v4:
        - added the two missing fixes which addresses two Jakub's reviewes in v3.
        - rebase on top of bpf-next.
      
      v3:
        - remove redundant check in "propagate_liveness_reg". (Jakub)
        - add extra check in "mark_reg_read" to prune more search. (Jakub)
        - re-implemented "prog_flags" passing mechanism, removed use of
          global switch inside libbpf.
        - enabled high 32-bit randomization beyond "test_verifier" and
          "test_progs". Now it should have been enabled for all possible
          tests. Re-run all tests, haven't noticed regression.
        - remove RFC tag.
      
      v2:
        - rebased on top of bpf-next master.
        - added comments for what is sub-register def index. (Edward, Alexei)
        - removed patch 1 which turns bit mask from enum to macro. (Alexei)
        - removed sysctl/bpf_jit_32bit_opt. (Alexei)
        - merged sub-register def insn index into reg state. (Alexei)
        - change test methodology (Alexei):
            + instead of simple unit tests on x86_64 for which this optimization
              doesn't enabled due to there is hardware support, poison high
              32-bit for whose def identified as safe to do so. this could let
              the correctness of this patch set checked when daily bpf selftest
              ran which delivers very stressful test on host machine like x86_64.
            + hi32 poisoning is gated by a new BPF_F_TEST_RND_HI32 prog flags.
            + BPF_F_TEST_RND_HI32 is enabled for all tests of "test_progs" and
              "test_verifier", the latter needs minor tweak on two unit tests,
              please see the patch for the change.
            + introduced a new global variable "libbpf_test_mode" into libbpf.
              once it is set to true, it will set BPF_F_TEST_RND_HI32 for all the
              later PROG_LOAD syscall, the goal is to easy the enable of hi32
              poison on exsiting testsuite.
              we could also introduce new APIs, for example "bpf_prog_test_load",
              then use -Dbpf_prog_load=bpf_prog_test_load to migrate tests under
              test_progs, but there are several load APIs, and such new API need
              some change on struture like "struct bpf_prog_load_attr".
            + removed old unit tests. it is based on insn scan and requires quite
              a few test_verifier generic code change. given hi32 randomization
              could offer good test coverage, the unit tests doesn't add much
              extra test value.
        - enhanced register width check ("is_reg64") when record sub-register
          write, now, it returns more accurate width.
        - Re-run all tests under "test_progs" and "test_verifier" on x86_64, no
          regression. Fixed a couple of bugs exposed:
            1. ctx field size transformation was not taken into account.
            2. insn patch could cause lost of original aux data which is
               important for ctx field conversion.
            3. return value for propagate_liveness was wrong and caused
               regression on processed insn number.
            4. helper call arg wasn't handled properly that path prune may cause
               64-bit read info in pruned path lost.
        - Re-run Cilium bpf prog for processed-insn-number benchmarking, no
          regression.
      
      v1:
        - Fixed the missing handling on callee-saved for bpf-to-bpf call,
          sub-register defs therefore moved to frame state. (Jakub Kicinski)
        - Removed redundant "cross_reg". (Jakub Kicinski)
        - Various coding styles & grammar fixes. (Jakub Kicinski, Quentin Monnet)
      
      eBPF ISA specification requires high 32-bit cleared when low 32-bit
      sub-register is written. This applies to destination register of ALU32 etc.
      JIT back-ends must guarantee this semantic when doing code-gen. x86_64 and
      AArch64 ISA has the same semantics, so the corresponding JIT back-end
      doesn't need to do extra work.
      
      However, 32-bit arches (arm, x86, nfp etc.) and some other 64-bit arches
      (PowerPC, SPARC etc) need to do explicit zero extension to meet this
      requirement, otherwise code like the following will fail.
      
        u64_value = (u64) u32_value
        ... other uses of u64_value
      
      This is because compiler could exploit the semantic described above and
      save those zero extensions for extending u32_value to u64_value, these JIT
      back-ends are expected to guarantee this through inserting extra zero
      extensions which however could be a significant increase on the code size.
      Some benchmarks show there could be ~40% sub-register writes out of total
      insns, meaning at least ~40% extra code-gen.
      
      One observation is these extra zero extensions are not always necessary.
      Take above code snippet for example, it is possible u32_value will never be
      casted into a u64, the value of high 32-bit of u32_value then could be
      ignored and extra zero extension could be eliminated.
      
      This patch implements this idea, insns defining sub-registers will be
      marked when the high 32-bit of the defined sub-register matters. For
      those unmarked insns, it is safe to eliminate high 32-bit clearnace for
      them.
      
      Algo
      ====
      We could use insn scan based static analysis to tell whether one
      sub-register def doesn't need zero extension. However, using such static
      analysis, we must do conservative assumption at branching point where
      multiple uses could be introduced. So, for any sub-register def that is
      active at branching point, we need to mark it as needing zero extension.
      This could introducing quite a few false alarms, for example ~25% on
      Cilium bpf_lxc.
      
      It will be far better to use dynamic data-flow tracing which verifier
      fortunately already has and could be easily extend to serve the purpose of
      this patch set.
      
       - Split read flags into READ32 and READ64.
      
       - Record index of insn that does sub-register write. Keep the index inside
         reg state and update it during verifier insn walking.
      
       - A full register read on a sub-register marks its definition insn as
         needing zero extension on dst register.
      
         A new sub-register write overrides the old one.
      
       - When propagating read64 during path pruning, also mark any insn defining
         a sub-register that is read in the pruned path as full-register.
      
      Benchmark
      =========
       - I estimate the JITed image could be 10% ~ 30% smaller on these affected
         arches (nfp, arm, x32, risv, ppc, sparc, s390), depending on the prog.
      
       - For Cilium bpf_lxc, there is ~11500 insns in the compiled binary (use
         latest LLVM snapshot, and with -mcpu=v3 -mattr=+alu32 enabled), 4460 of
         them has sub-register writes (~40%). Calculated by:
      
          cat dump | grep -P "\tw" | wc -l       (ALU32)
          cat dump | grep -P "r.*=.*u32" | wc -l (READ_W)
          cat dump | grep -P "r.*=.*u16" | wc -l (READ_H)
          cat dump | grep -P "r.*=.*u8" | wc -l  (READ_B)
      
         After this patch set enabled, > 25% of those 4460 could be identified as
         doesn't needing zero extension on the destination, and the percentage
         could go further up to more than 50% with some follow up optimizations
         based on the infrastructure offered by this set. This leads to
         significant save on JITed image.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      198ae936
    • Jiong Wang's avatar
      nfp: bpf: eliminate zero extension code-gen · 0b4de1ff
      Jiong Wang authored
      This patch eliminate zero extension code-gen for instructions including
      both alu and load/store. The only exception is for ctx load, because
      offload target doesn't go through host ctx convert logic so we do
      customized load and ignores zext flag set by verifier.
      
      Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      0b4de1ff
    • Jiong Wang's avatar
      riscv: bpf: eliminate zero extension code-gen · 66d0d5a8
      Jiong Wang authored
      Cc: Björn Töpel <bjorn.topel@gmail.com>
      Acked-by: default avatarBjörn Töpel <bjorn.topel@gmail.com>
      Tested-by: default avatarBjörn Töpel <bjorn.topel@gmail.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      66d0d5a8
    • Jiong Wang's avatar
      x32: bpf: eliminate zero extension code-gen · 836256bf
      Jiong Wang authored
      Cc: Wang YanQing <udknight@gmail.com>
      Tested-by: default avatarWang YanQing <udknight@gmail.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      836256bf
    • Jiong Wang's avatar
      sparc: bpf: eliminate zero extension code-gen · 3e2a33cf
      Jiong Wang authored
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3e2a33cf
    • Jiong Wang's avatar
      s390: bpf: eliminate zero extension code-gen · 591006b9
      Jiong Wang authored
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      591006b9
    • Jiong Wang's avatar
      powerpc: bpf: eliminate zero extension code-gen · a4c92773
      Jiong Wang authored
      Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a4c92773
    • Jiong Wang's avatar
      arm: bpf: eliminate zero extension code-gen · 163541e6
      Jiong Wang authored
      Cc: Shubham Bansal <illusionist.neo@gmail.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      163541e6
    • Jiong Wang's avatar
      selftests: bpf: enable hi32 randomization for all tests · 9d120b41
      Jiong Wang authored
      The previous libbpf patch allows user to specify "prog_flags" to bpf
      program load APIs. To enable high 32-bit randomization for a test, we need
      to set BPF_F_TEST_RND_HI32 in "prog_flags".
      
      To enable such randomization for all tests, we need to make sure all places
      are passing BPF_F_TEST_RND_HI32. Changing them one by one is not
      convenient, also, it would be better if a test could be switched to
      "normal" running mode without code change.
      
      Given the program load APIs used across bpf selftests are mostly:
        bpf_prog_load:      load from file
        bpf_load_program:   load from raw insns
      
      A test_stub.c is implemented for bpf seltests, it offers two functions for
      testing purpose:
      
        bpf_prog_test_load
        bpf_test_load_program
      
      The are the same as "bpf_prog_load" and "bpf_load_program", except they
      also set BPF_F_TEST_RND_HI32. Given *_xattr functions are the APIs to
      customize any "prog_flags", it makes little sense to put these two
      functions into libbpf.
      
      Then, the following CFLAGS are passed to compilations for host programs:
        -Dbpf_prog_load=bpf_prog_test_load
        -Dbpf_load_program=bpf_test_load_program
      
      They migrate the used load APIs to the test version, hence enable high
      32-bit randomization for these tests without changing source code.
      
      Besides all these, there are several testcases are using
      "bpf_prog_load_attr" directly, their call sites are updated to pass
      BPF_F_TEST_RND_HI32.
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9d120b41
    • Jiong Wang's avatar
      selftests: bpf: adjust several test_verifier helpers for insn insertion · f3b55abb
      Jiong Wang authored
      - bpf_fill_ld_abs_vlan_push_pop:
          Prevent zext happens inside PUSH_CNT loop. This could happen because
          of BPF_LD_ABS (32-bit def) + BPF_JMP (64-bit use), or BPF_LD_ABS +
          EXIT (64-bit use of R0). So, change BPF_JMP to BPF_JMP32 and redefine
          R0 at exit path to cut off the data-flow from inside the loop.
      
        - bpf_fill_jump_around_ld_abs:
          Jump range is limited to 16 bit. every ld_abs is replaced by 6 insns,
          but on arches like arm, ppc etc, there will be one BPF_ZEXT inserted
          to extend the error value of the inlined ld_abs sequence which then
          contains 7 insns. so, set the dividend to 7 so the testcase could
          work on all arches.
      
        - bpf_fill_scale1/bpf_fill_scale2:
          Both contains ~1M BPF_ALU32_IMM which will trigger ~1M insn patcher
          call because of hi32 randomization later when BPF_F_TEST_RND_HI32 is
          set for bpf selftests. Insn patcher is not efficient that 1M call to
          it will hang computer. So , change to BPF_ALU64_IMM to avoid hi32
          randomization.
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f3b55abb
    • Jiong Wang's avatar
      libbpf: add "prog_flags" to bpf_program/bpf_prog_load_attr/bpf_load_program_attr · 04656198
      Jiong Wang authored
      libbpf doesn't allow passing "prog_flags" during bpf program load in a
      couple of load related APIs, "bpf_load_program_xattr", "load_program" and
      "bpf_prog_load_xattr".
      
      It makes sense to allow passing "prog_flags" which is useful for
      customizing program loading.
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      04656198
    • Jiong Wang's avatar
      bpf: verifier: randomize high 32-bit when BPF_F_TEST_RND_HI32 is set · d6c2308c
      Jiong Wang authored
      This patch randomizes high 32-bit of a definition when BPF_F_TEST_RND_HI32
      is set.
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d6c2308c
    • Jiong Wang's avatar
      tools: bpf: sync uapi header bpf.h · 9ce33e33
      Jiong Wang authored
      Sync new bpf prog load flag "BPF_F_TEST_RND_HI32" to tools/.
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9ce33e33
    • Jiong Wang's avatar
      bpf: introduce new bpf prog load flags "BPF_F_TEST_RND_HI32" · c240eff6
      Jiong Wang authored
      x86_64 and AArch64 perhaps are two arches that running bpf testsuite
      frequently, however the zero extension insertion pass is not enabled for
      them because of their hardware support.
      
      It is critical to guarantee the pass correction as it is supposed to be
      enabled at default for a couple of other arches, for example PowerPC,
      SPARC, arm, NFP etc. Therefore, it would be very useful if there is a way
      to test this pass on for example x86_64.
      
      The test methodology employed by this set is "poisoning" useless bits. High
      32-bit of a definition is randomized if it is identified as not used by any
      later insn. Such randomization is only enabled under testing mode which is
      gated by the new bpf prog load flags "BPF_F_TEST_RND_HI32".
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c240eff6
    • Jiong Wang's avatar
      bpf: verifier: insert zero extension according to analysis result · a4b1d3c1
      Jiong Wang authored
      After previous patches, verifier will mark a insn if it really needs zero
      extension on dst_reg.
      
      It is then for back-ends to decide how to use such information to eliminate
      unnecessary zero extension code-gen during JIT compilation.
      
      One approach is verifier insert explicit zero extension for those insns
      that need zero extension in a generic way, JIT back-ends then do not
      generate zero extension for sub-register write at default.
      
      However, only those back-ends which do not have hardware zero extension
      want this optimization. Back-ends like x86_64 and AArch64 have hardware
      zero extension support that the insertion should be disabled.
      
      This patch introduces new target hook "bpf_jit_needs_zext" which returns
      false at default, meaning verifier zero extension insertion is disabled at
      default. A back-end could override this hook to return true if it doesn't
      have hardware support and want verifier insert zero extension explicitly.
      
      Offload targets do not use this native target hook, instead, they could
      get the optimization results using bpf_prog_offload_ops.finalize.
      
      NOTE: arches could have diversified features, it is possible for one arch
      to have hardware zero extension support for some sub-register write insns
      but not for all. For example, PowerPC, SPARC have zero extended loads, but
      not for alu32. So when verifier zero extension insertion enabled, these JIT
      back-ends need to peephole insns to remove those zero extension inserted
      for insn that actually has hardware zero extension support. The peephole
      could be as simple as looking the next insn, if it is a special zero
      extension insn then it is safe to eliminate it if the current insn has
      hardware zero extension support.
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a4b1d3c1
    • Jiong Wang's avatar
      bpf: introduce new mov32 variant for doing explicit zero extension · 7d134041
      Jiong Wang authored
      The encoding for this new variant is based on BPF_X format. "imm" field was
      0 only, now it could be 1 which means doing zero extension unconditionally
      
        .code = BPF_ALU | BPF_MOV | BPF_X
        .dst_reg = DST
        .src_reg = SRC
        .imm  = 1
      
      We use this new form for doing zero extension for which verifier will
      guarantee SRC == DST.
      
      Implications on JIT back-ends when doing code-gen for
      BPF_ALU | BPF_MOV | BPF_X:
        1. No change if hardware already does zero extension unconditionally for
           sub-register write.
        2. Otherwise, when seeing imm == 1, just generate insns to clear high
           32-bit. No need to generate insns for the move because when imm == 1,
           dst_reg is the same as src_reg at the moment.
      
      Interpreter doesn't need change as well. It is doing unconditionally zero
      extension for mov32 already.
      
      One helper macro BPF_ZEXT_REG is added to help creating zero extension
      insn using this new mov32 variant.
      
      One helper function insn_is_zext is added for checking one insn is an
      zero extension on dst. This will be widely used by a few JIT back-ends in
      later patches in this set.
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7d134041
    • Jiong Wang's avatar
      bpf: verifier: mark patched-insn with sub-register zext flag · b325fbca
      Jiong Wang authored
      Patched insns do not go through generic verification, therefore doesn't has
      zero extension information collected during insn walking.
      
      We don't bother analyze them at the moment, for any sub-register def comes
      from them, just conservatively mark it as needing zero extension.
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b325fbca
    • Jiong Wang's avatar
      bpf: verifier: mark verified-insn with sub-register zext flag · 5327ed3d
      Jiong Wang authored
      eBPF ISA specification requires high 32-bit cleared when low 32-bit
      sub-register is written. This applies to destination register of ALU32 etc.
      JIT back-ends must guarantee this semantic when doing code-gen. x86_64 and
      AArch64 ISA has the same semantics, so the corresponding JIT back-end
      doesn't need to do extra work.
      
      However, 32-bit arches (arm, x86, nfp etc.) and some other 64-bit arches
      (PowerPC, SPARC etc) need to do explicit zero extension to meet this
      requirement, otherwise code like the following will fail.
      
        u64_value = (u64) u32_value
        ... other uses of u64_value
      
      This is because compiler could exploit the semantic described above and
      save those zero extensions for extending u32_value to u64_value, these JIT
      back-ends are expected to guarantee this through inserting extra zero
      extensions which however could be a significant increase on the code size.
      Some benchmarks show there could be ~40% sub-register writes out of total
      insns, meaning at least ~40% extra code-gen.
      
      One observation is these extra zero extensions are not always necessary.
      Take above code snippet for example, it is possible u32_value will never be
      casted into a u64, the value of high 32-bit of u32_value then could be
      ignored and extra zero extension could be eliminated.
      
      This patch implements this idea, insns defining sub-registers will be
      marked when the high 32-bit of the defined sub-register matters. For
      those unmarked insns, it is safe to eliminate high 32-bit clearnace for
      them.
      
      Algo:
       - Split read flags into READ32 and READ64.
      
       - Record index of insn that does sub-register write. Keep the index inside
         reg state and update it during verifier insn walking.
      
       - A full register read on a sub-register marks its definition insn as
         needing zero extension on dst register.
      
         A new sub-register write overrides the old one.
      
       - When propagating read64 during path pruning, also mark any insn defining
         a sub-register that is read in the pruned path as full-register.
      Reviewed-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5327ed3d
  2. 24 May, 2019 19 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-send-sig' · a08acd11
      Daniel Borkmann authored
      Yonghong Song says:
      
      ====================
      This patch tries to solve the following specific use case.
      
      Currently, bpf program can already collect stack traces
      through kernel function get_perf_callchain()
      when certain events happens (e.g., cache miss counter or
      cpu clock counter overflows). But such stack traces are
      not enough for jitted programs, e.g., hhvm (jited php).
      To get real stack trace, jit engine internal data structures
      need to be traversed in order to get the real user functions.
      
      bpf program itself may not be the best place to traverse
      the jit engine as the traversing logic could be complex and
      it is not a stable interface either.
      
      Instead, hhvm implements a signal handler,
      e.g. for SIGALARM, and a set of program locations which
      it can dump stack traces. When it receives a signal, it will
      dump the stack in next such program location.
      
      This patch implements bpf_send_signal() helper to send
      a signal to hhvm in real time, resulting in intended stack traces.
      
      Patch #1 implemented the bpf_send_helper() in the kernel.
      Patch #2 synced uapi header bpf.h to tools directory.
      Patch #3 added a self test which covers tracepoint
      and perf_event bpf programs.
      
      Changelogs:
        v4 => v5:
          . pass the "current" task struct to irq_work as well
            since the current task struct may change between
            nmi and subsequent irq_work_interrupt.
            Discovered by Daniel.
        v3 => v4:
          . fix one typo and declare "const char *id_path = ..."
            to avoid directly use the long string in the func body
            in Patch #3.
        v2 => v3:
          . change the standalone test to be part of prog_tests.
        RFC v1 => v2:
          . previous version allows to send signal to an arbitrary
            pid. This version just sends the signal to current
            task to avoid unstable pid and potential races between
            sending signals and task state changes for the pid.
      ====================
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a08acd11
    • Yonghong Song's avatar
      tools/bpf: add selftest in test_progs for bpf_send_signal() helper · 16f0efc3
      Yonghong Song authored
      The test covered both nmi and tracepoint perf events.
        $ ./test_progs
        ...
        test_send_signal_tracepoint:PASS:tracepoint 0 nsec
        ...
        test_send_signal_common:PASS:tracepoint 0 nsec
        ...
        test_send_signal_common:PASS:perf_event 0 nsec
        ...
        test_send_signal:OK
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      16f0efc3
    • Yonghong Song's avatar
      tools/bpf: sync bpf uapi header bpf.h to tools directory · edaccf89
      Yonghong Song authored
      The bpf uapi header include/uapi/linux/bpf.h is sync'ed
      to tools/include/uapi/linux/bpf.h.
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      edaccf89
    • Yonghong Song's avatar
      bpf: implement bpf_send_signal() helper · 8b401f9e
      Yonghong Song authored
      This patch tries to solve the following specific use case.
      
      Currently, bpf program can already collect stack traces
      through kernel function get_perf_callchain()
      when certain events happens (e.g., cache miss counter or
      cpu clock counter overflows). But such stack traces are
      not enough for jitted programs, e.g., hhvm (jited php).
      To get real stack trace, jit engine internal data structures
      need to be traversed in order to get the real user functions.
      
      bpf program itself may not be the best place to traverse
      the jit engine as the traversing logic could be complex and
      it is not a stable interface either.
      
      Instead, hhvm implements a signal handler,
      e.g. for SIGALARM, and a set of program locations which
      it can dump stack traces. When it receives a signal, it will
      dump the stack in next such program location.
      
      Such a mechanism can be implemented in the following way:
        . a perf ring buffer is created between bpf program
          and tracing app.
        . once a particular event happens, bpf program writes
          to the ring buffer and the tracing app gets notified.
        . the tracing app sends a signal SIGALARM to the hhvm.
      
      But this method could have large delays and causing profiling
      results skewed.
      
      This patch implements bpf_send_signal() helper to send
      a signal to hhvm in real time, resulting in intended stack traces.
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarYonghong Song <yhs@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      8b401f9e
    • Alexei Starovoitov's avatar
      Merge branch 'btf2c-converter' · 5420f320
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      This patch set adds BTF-to-C dumping APIs to libbpf, allowing to output
      a subset of BTF types as a compilable C type definitions. This is useful by
      itself, as raw BTF output is not easy to inspect and comprehend. But it's also
      a big part of BPF CO-RE (compile once - run everywhere) initiative aimed at
      allowing to write relocatable BPF programs, that won't require on-the-host
      kernel headers (and would be able to inspect internal kernel structures, not
      exposed through kernel headers).
      
      This patch set consists of three groups of patches and one pre-patch, with the
      BTF-to-C dumper API depending on the first two groups.
      
      Pre-patch #1 fixes issue with libbpf_internal.h.
      
      btf__parse_elf() API patches:
      - patch #2 adds btf__parse_elf() API to libbpf, allowing to load BTF and/or
        BTF.ext from ELF file;
      - patch #3 utilizies btf__parse_elf() from bpftool for `btf dump file` command;
      - patch #4 switches test_btf.c to use btf__parse_elf() to check for presence
        of BTF data in object file.
      
      libbpf's internal hashmap patches:
      - patch #5 adds resizeable non-thread safe generic hashmap to libbpf;
      - patch #6 adds tests for that hashmap;
      - patch #7 migrates btf_dedup()'s dedup_table to use hashmap w/ APPEND.
      
      BTF-to-C dumper API patches:
      - patch #8 adds btf_dump APIs with all the logic for laying out type
        definitions in correct order and emitting C syntax for them;
      - patch #9 adds lots of tests for common and quirky parts of C type system;
      - patch #10 adds support for C-syntax btf dumping to bpftool;
      - patch #11 updates bpftool documentation to mention C-syntax dump option;
      - patch #12 update bash-completion for btf dump sub-command.
      
      v2->v3:
      - fix bpftool-btf.rst formatting (Quentin);
      - simplify bash autocompletion script (Quentin);
      - better error message in btf dump (Quentin);
      
      v1->v2:
      - removed unuseful file header (Jakub);
      - removed inlines in .c (Jakub);
      - added 'format {c|raw}' keyword/option (Jakub);
      - re-use i var for iteration in btf_dump_c() (Jakub);
      - bumped libbpf version to 0.0.4;
      
      v0->v1:
      - fix bug in hashmap__for_each_bucket_entry() not handling empty hashmap;
      - removed `btf dump`-specific libbpf logging hook up (Quentin has more generic
        patchset);
      - change btf__parse_elf() to always load .BTF and return it as a result, with
        .BTF.ext being optional and returned through struct btf_ext** arg (Alexei);
      - endianness check to use __BYTE_ORDER__ (Alexei);
      - bool:1 to __u8:1 in type_aux_state (Alexei);
      - added HASHMAP_APPEND strategy to hashmap, changed
        hashmap__for_each_key_entry() to also check for key equality during
        iteration (multimap iteration for key);
      - added new tests for empty hashmap and hashmap as a multimap;
      - tried to clarify weak/strong dependency ordering comments (Alexei)
      - btf dump test's expected output - support better commenting aproach (Alexei);
      - added bash-completion for a new "c" option (Alexei).
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5420f320
    • Andrii Nakryiko's avatar
      bpftool: update bash-completion w/ new c option for btf dump · 90eea408
      Andrii Nakryiko authored
      Add bash completion for new C btf dump option.
      
      Cc: Quentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      90eea408
    • Andrii Nakryiko's avatar
      bpftool/docs: add description of btf dump C option · 220ba451
      Andrii Nakryiko authored
      Document optional **c** option for btf dump subcommand.
      
      Cc: Quentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      220ba451
    • Andrii Nakryiko's avatar
      bpftool: add C output format option to btf dump subcommand · 2119f218
      Andrii Nakryiko authored
      Utilize new libbpf's btf_dump API to emit BTF as a C definitions.
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Reviewed-by: default avatarQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2119f218
    • Andrii Nakryiko's avatar
      selftests/bpf: add btf_dump BTF-to-C conversion tests · 2d2a3ad8
      Andrii Nakryiko authored
      Add new test_btf_dump set of tests, validating BTF-to-C conversion
      correctness. Tests rely on clang to generate BTF from provided C test
      cases.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2d2a3ad8
    • Andrii Nakryiko's avatar
      libbpf: add btf_dump API for BTF-to-C conversion · 351131b5
      Andrii Nakryiko authored
      BTF contains enough type information to allow generating valid
      compilable C header w/ correct layout of structs/unions and all the
      typedef/enum definitions. This patch adds a new "object" - btf_dump to
      facilitate dumping BTF as valid C. btf_dump__dump_type() is the main API
      which takes care of dumping out (through user-provided printf-like
      callback function) C definitions for given type ID and it's required
      dependencies. This allows for not just dumping out entirety of BTF types,
      but also selective filtering based on user-provided criterias w/ minimal
      set of dependent types.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      351131b5
    • Andrii Nakryiko's avatar
      libbpf: switch btf_dedup() to hashmap for dedup table · 2fc3fc0b
      Andrii Nakryiko authored
      Utilize libbpf's hashmap as a multimap fof dedup_table implementation.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2fc3fc0b
    • Andrii Nakryiko's avatar
      selftests/bpf: add tests for libbpf's hashmap · 5d04ec68
      Andrii Nakryiko authored
      Test all APIs for internal hashmap implementation.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      5d04ec68
    • Andrii Nakryiko's avatar
      libbpf: add resizable non-thread safe internal hashmap · e3b92422
      Andrii Nakryiko authored
      There is a need for fast point lookups inside libbpf for multiple use
      cases (e.g., name resolution for BTF-to-C conversion, by-name lookups in
      BTF for upcoming BPF CO-RE relocation support, etc). This patch
      implements simple resizable non-thread safe hashmap using single linked
      list chains.
      
      Four different insert strategies are supported:
       - HASHMAP_ADD - only add key/value if key doesn't exist yet;
       - HASHMAP_SET - add key/value pair if key doesn't exist yet; otherwise,
         update value;
       - HASHMAP_UPDATE - update value, if key already exists; otherwise, do
         nothing and return -ENOENT;
       - HASHMAP_APPEND - always add key/value pair, even if key already exists.
         This turns hashmap into a multimap by allowing multiple values to be
         associated with the same key. Most useful read API for such hashmap is
         hashmap__for_each_key_entry() iteration. If hashmap__find() is still
         used, it will return last inserted key/value entry (first in a bucket
         chain).
      
      For HASHMAP_SET and HASHMAP_UPDATE, old key/value pair is returned, so
      that calling code can handle proper memory management, if necessary.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e3b92422
    • Andrii Nakryiko's avatar
      selftests/bpf: use btf__parse_elf to check presence of BTF/BTF.ext · 9db32431
      Andrii Nakryiko authored
      Switch test_btf.c to rely on btf__parse_elf to check presence of BTF and
      BTF.ext data, instead of implementing its own ELF parsing.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9db32431
    • Andrii Nakryiko's avatar
      bpftool: use libbpf's btf__parse_elf API · 58650cc4
      Andrii Nakryiko authored
      Use btf__parse_elf() API, provided by libbpf, instead of implementing
      ELF parsing by itself.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      58650cc4
    • Andrii Nakryiko's avatar
      libbpf: add btf__parse_elf API to load .BTF and .BTF.ext · e6c64855
      Andrii Nakryiko authored
      Loading BTF and BTF.ext from ELF file is a common need. Instead of
      requiring every user to re-implement it, let's provide this API from
      libbpf itself. It's mostly copy/paste from `bpftool btf dump`
      implementation, which will be switched to libbpf's version in next
      patch. btf__parse_elf allows to load BTF and optionally BTF.ext.
      This is also useful for tests that need to load/work with BTF, loaded
      from test ELF files.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e6c64855
    • Andrii Nakryiko's avatar
      libbpf: ensure libbpf.h is included along libbpf_internal.h · 1d7a08b3
      Andrii Nakryiko authored
      libbpf_internal.h expects a bunch of stuff defined in libbpf.h to be
      defined. This patch makes sure that libbpf.h is always included.
      Signed-off-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1d7a08b3
    • Michal Rostecki's avatar
      samples: bpf: Do not define bpf_printk macro · c87f60a7
      Michal Rostecki authored
      The bpf_printk macro was moved to bpf_helpers.h which is included in all
      example programs.
      Signed-off-by: default avatarMichal Rostecki <mrostecki@opensuse.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c87f60a7
    • Michal Rostecki's avatar
      selftests: bpf: Move bpf_printk to bpf_helpers.h · 37739d1b
      Michal Rostecki authored
      bpf_printk is a macro which is commonly used to print out debug messages
      in BPF programs and it was copied in many selftests and samples. Since
      all of them include bpf_helpers.h, this change moves the macro there.
      Signed-off-by: default avatarMichal Rostecki <mrostecki@opensuse.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      37739d1b
  3. 23 May, 2019 3 commits
    • Daniel Borkmann's avatar
      Merge branch 'bpf-explored-states' · 5762a20b
      Daniel Borkmann authored
      Alexei Starovoitov says:
      
      ====================
      Convert explored_states array into hash table and use simple hash
      to reduce verifier peak memory consumption for programs with bpf2bpf
      calls. More details in patch 3.
      
      v1->v2: fixed Jakub's small nit in patch 1
      ====================
      Acked-by: default avatarAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      5762a20b
    • Alexei Starovoitov's avatar
      bpf: convert explored_states to hash table · dc2a4ebc
      Alexei Starovoitov authored
      All prune points inside a callee bpf function most likely will have
      different callsites. For example, if function foo() is called from
      two callsites the half of explored states in all prune points in foo()
      will be useless for subsequent walking of one of those callsites.
      Fortunately explored_states pruning heuristics keeps the number of states
      per prune point small, but walking these states is still a waste of cpu
      time when the callsite of the current state is different from the callsite
      of the explored state.
      
      To improve pruning logic convert explored_states into hash table and
      use simple insn_idx ^ callsite hash to select hash bucket.
      This optimization has no effect on programs without bpf2bpf calls
      and drastically improves programs with calls.
      In the later case it reduces total memory consumption in 1M scale tests
      by almost 3 times (peak_states drops from 5752 to 2016).
      
      Care should be taken when comparing the states for equivalency.
      Since the same hash bucket can now contain states with different indices
      the insn_idx has to be part of verifier_state and compared.
      
      Different hash table sizes and different hash functions were explored,
      but the results were not significantly better vs this patch.
      They can be improved in the future.
      
      Hit/miss heuristic is not counting index miscompare as a miss.
      Otherwise verifier stats become unstable when experimenting
      with different hash functions.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      dc2a4ebc
    • Alexei Starovoitov's avatar
      bpf: split explored_states · a8f500af
      Alexei Starovoitov authored
      split explored_states into prune_point boolean mark
      and link list of explored states.
      This removes STATE_LIST_MARK hack and allows marks to be separate from states.
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      a8f500af