1. 11 Nov, 2022 7 commits
  2. 10 Nov, 2022 7 commits
    • Martin KaFai Lau's avatar
      Merge branch 'fix panic bringing up veth with xdp progs' · c7028aa2
      Martin KaFai Lau authored
      John Fastabend says:
      
      ====================
      
      Not sure if folks want to take this through BPF tree or networking tree.
      I took a quick look and didn't see any pending fixes so seems no one
      has noticed the panic yet. It reproducible and easy to repro.
      
      I put bpf in the title thinking it woudl be great to run through the
      BPF selftests given its XDP triggering the panic.
      
      Sorry maintainers resent with CC'ing actual lists. Had a scripting
      issue. Also dropped henqqi has they are bouncing.
      
      Thanks!
      ====================
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      c7028aa2
    • John Fastabend's avatar
      bpf: veth driver panics when xdp prog attached before veth_open · 5e5dc33d
      John Fastabend authored
      The following panic is observed when bringing up (veth_open) a veth device
      that has an XDP program attached.
      
      [   61.519185] kernel BUG at net/core/dev.c:6442!
      [   61.519456] invalid opcode: 0000 [#1] PREEMPT SMP PTI
      [   61.519752] CPU: 0 PID: 408 Comm: ip Tainted: G        W          6.1.0-rc2-185930-gd9095f92-dirty #26
      [   61.520288] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
      [   61.520806] RIP: 0010:napi_enable+0x3d/0x40
      [   61.521077] Code: f6 f6 80 61 08 00 00 02 74 0d 48 83 bf 88 01 00 00 00 74 03 80 cd 01 48 89 d0 f0 48 0f b1 4f 10 48 39 c2 75 c8 c3 cc cc cc cc <0f> 0b 90 48 8b 87 b0 00 00 00 48 81 c7 b0 00 00 00 45 31 c0 48 39
      [   61.522226] RSP: 0018:ffffbc9800cc36f8 EFLAGS: 00010246
      [   61.522557] RAX: 0000000000000001 RBX: 0000000000000300 RCX: 0000000000000001
      [   61.523004] RDX: 0000000000000010 RSI: ffffffff8b0de852 RDI: ffff9f03848e5000
      [   61.523452] RBP: 0000000000000000 R08: 0000000000000800 R09: 0000000000000000
      [   61.523899] R10: ffff9f0384a96800 R11: ffffffffffa48061 R12: ffff9f03849c3000
      [   61.524345] R13: 0000000000000300 R14: ffff9f03848e5000 R15: 0000001000000100
      [   61.524792] FS:  00007f58cb64d2c0(0000) GS:ffff9f03bbc00000(0000) knlGS:0000000000000000
      [   61.525301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   61.525673] CR2: 00007f6cc629b498 CR3: 000000010498c000 CR4: 00000000000006f0
      [   61.526121] Call Trace:
      [   61.526284]  <TASK>
      [   61.526425]  __veth_napi_enable_range+0xd6/0x230
      [   61.526723]  veth_enable_xdp+0xd0/0x160
      [   61.526969]  veth_open+0x2e/0xc0
      [   61.527180]  __dev_open+0xe2/0x1b0
      [   61.527405]  __dev_change_flags+0x1a1/0x210
      [   61.527673]  dev_change_flags+0x1c/0x60
      
      This happens because we are calling veth_napi_enable() on already enabled
      queues. The root cause is in commit 2e0de636 changed the control logic
      dropping this case,
      
              if (priv->_xdp_prog) {
                      err = veth_enable_xdp(dev);
                      if (err)
                              return err;
      -       } else if (veth_gro_requested(dev)) {
      +               /* refer to the logic in veth_xdp_set() */
      +               if (!rtnl_dereference(peer_rq->napi)) {
      +                       err = veth_napi_enable(peer);
      +                       if (err)
      +                               return err;
      +               }
      
      so that now veth_napi_enable is called if the peer has not yet
      initialiazed its peer_rq->napi. The issue is this will happen
      even if the NIC is not up. Then in veth_enable_xdp just above
      we have similar path,
      
        veth_enable_xdp
         napi_already_on = (dev->flags & IFF_UP) && rcu_access_pointer(rq->napi)
          err = veth_enable_xdp_range(dev, 0, dev->real_num_rx_queues, napi_already_on);
      
      The trouble is an xdp prog is assigned before bringing the device up each
      of the veth_open path will enable the peers xdp napi structs. But then when
      we bring the peer up it will similar try to enable again because from
      veth_open the IFF_UP flag is not set until after the op in __dev_open so
      we believe napi_alread_on = false.
      
      To fix this just drop the IFF_UP test and rely on checking if the napi
      struct is enabled. This also matches the peer check in veth_xdp for
      disabling.
      
      To reproduce run ./test_xdp_meta.sh I found adding Cilium/Tetragon tests
      for XDP.
      
      Fixes: 2e0de636 ("veth: Avoid drop packets when xdp_redirect performs")
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20221108221650.808950-2-john.fastabend@gmail.comAcked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      5e5dc33d
    • Domenico Cerasuolo's avatar
      selftests: Fix test group SKIPPED result · fd74b79d
      Domenico Cerasuolo authored
      When showing the result of a test group, if one
      of the subtests was skipped, while still having
      passing subtests, the group result was marked as
      SKIP. E.g.:
      
      223/1   usdt/basic:SKIP
      223/2   usdt/multispec:OK
      223/3   usdt/urand_auto_attach:OK
      223/4   usdt/urand_pid_attach:OK
      223     usdt:SKIP
      
      The test result of usdt in the example above
      should be OK instead of SKIP, because the test
      group did have passing tests and it would be
      considered in "normal" state.
      
      With this change, only if all of the subtests
      were skipped, the group test is marked as SKIP.
      When only some of the subtests are skipped, a
      more detailed result is given, stating how
      many of the subtests were skipped. E.g:
      
      223/1   usdt/basic:SKIP
      223/2   usdt/multispec:OK
      223/3   usdt/urand_auto_attach:OK
      223/4   usdt/urand_pid_attach:OK
      223     usdt:OK (SKIP: 1/4)
      Signed-off-by: default avatarDomenico Cerasuolo <dceras@meta.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20221109184039.3514033-1-cerasuolodomenico@gmail.com
      fd74b79d
    • Andrii Nakryiko's avatar
      Merge branch 'libbpf: Resolve unambigous forward declarations' · 15157d2e
      Andrii Nakryiko authored
      Eduard Zingerman says:
      
      ====================
      
      The patch-set is consists of the following parts:
      - An update for libbpf's hashmap interface from void* -> void* to a
        polymorphic one, allowing to use both long and void* keys and values
        w/o additional casts. Interface functions are hidden behind
        auxiliary macro that add casts as necessary.
      
        This simplifies many use cases in libbpf as hashmaps there are
        mostly integer to integer and required awkward looking incantations
        like `(void *)(long)off` previously. Also includes updates for perf
        as it copies map implementation from libbpf.
      
      - A change to `lib/bpf/btf.c:btf__dedup` that adds a new pass named
        "Resolve unambiguous forward declaration". This pass builds a
        hashmap `name_off -> uniquely named struct or union` and uses it to
        replace FWD types by structs or unions. This is necessary for corner
        cases when FWD is not used as a part of some struct or union
        definition de-duplicated by `btf_dedup_struct_types`.
      
      The goal of the patch-set is to resolve forward declarations that
      don't take part in type graphs comparisons if declaration name is
      unambiguous.
      
      Example:
      
      CU #1:
      
      struct foo;              // standalone forward declaration
      struct foo *some_global;
      
      CU #2:
      
      struct foo { int x; };
      struct foo *another_global;
      
      Currently the de-duplicated BTF for this example looks as follows:
      
      [1] STRUCT 'foo' size=4 vlen=1 ...
      [2] INT 'int' size=4 ...
      [3] PTR '(anon)' type_id=1
      [4] FWD 'foo' fwd_kind=struct
      [5] PTR '(anon)' type_id=4
      
      The goal of this patch-set is to simplify it as follows:
      
      [1] STRUCT 'foo' size=4 vlen=1
      	'x' type_id=2 bits_offset=0
      [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
      [3] PTR '(anon)' type_id=1
      
      For defconfig kernel with BTF enabled this removes 63 forward
      declarations.
      
      For allmodconfig kernel with BTF enabled this removes ~5K out of ~21K
      forward declarations in ko objects. This unlocks some additional
      de-duplication in ko objects, but impact is tiny: ~13K less BTF ids
      out of ~2M.
      
      Changelog:
       v3 -> v4
       Changes suggested by Andrii:
       - hashmap interface rework to allow use of integer and pointer keys
         an values w/o casts as suggested in [1].
       v2 -> v3
       Changes suggested by Andrii:
       - perf's util/hashtable.{c,h} are synchronized with libbpf
         implementation, perf's source code updated accordingly;
       - changes to libbpf, bpf selftests and perf are combined in a single
         patch to simplify bisecting;
       - hashtable interface updated to be long -> long instead of
         uintptr_t -> uintptr_t;
       - btf_dedup_resolve_fwds updated to correctly use IS_ERR / PTR_ERR
         macro;
       - test cases for btf_dedup_resolve_fwds are updated for better
         clarity.
       v1 -> v2
       - Style fixes in btf_dedup_resolve_fwd and btf_dedup_resolve_fwds as
         suggested by Alan.
      
      [v1] https://lore.kernel.org/bpf/20221102110905.2433622-1-eddyz87@gmail.com/
      [v2] https://lore.kernel.org/bpf/20221103033430.2611623-1-eddyz87@gmail.com/
      [v3] https://lore.kernel.org/bpf/20221106202910.4193104-1-eddyz87@gmail.com/
      
      [1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/
      ====================
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      15157d2e
    • Eduard Zingerman's avatar
      selftests/bpf: Tests for btf_dedup_resolve_fwds · 99e18fad
      Eduard Zingerman authored
      Tests to verify the following behavior of `btf_dedup_resolve_fwds`:
      - remapping for struct forward declarations;
      - remapping for union forward declarations;
      - no remapping if forward declaration kind does not match similarly
        named struct or union declaration;
      - no remapping if forward declaration name is ambiguous;
      - base ids are considered for fwd resolution in split btf scenario.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Link: https://lore.kernel.org/bpf/20221109142611.879983-4-eddyz87@gmail.com
      99e18fad
    • Eduard Zingerman's avatar
      libbpf: Resolve unambigous forward declarations · 082108fd
      Eduard Zingerman authored
      Resolve forward declarations that don't take part in type graphs
      comparisons if declaration name is unambiguous. Example:
      
      CU #1:
      
      struct foo;              // standalone forward declaration
      struct foo *some_global;
      
      CU #2:
      
      struct foo { int x; };
      struct foo *another_global;
      
      The `struct foo` from CU #1 is not a part of any definition that is
      compared against another definition while `btf_dedup_struct_types`
      processes structural types. The the BTF after `btf_dedup_struct_types`
      the BTF looks as follows:
      
      [1] STRUCT 'foo' size=4 vlen=1 ...
      [2] INT 'int' size=4 ...
      [3] PTR '(anon)' type_id=1
      [4] FWD 'foo' fwd_kind=struct
      [5] PTR '(anon)' type_id=4
      
      This commit adds a new pass `btf_dedup_resolve_fwds`, that maps such
      forward declarations to structs or unions with identical name in case
      if the name is not ambiguous.
      
      The pass is positioned before `btf_dedup_ref_types` so that types
      [3] and [5] could be merged as a same type after [1] and [4] are merged.
      The final result for the example above looks as follows:
      
      [1] STRUCT 'foo' size=4 vlen=1
      	'x' type_id=2 bits_offset=0
      [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
      [3] PTR '(anon)' type_id=1
      
      For defconfig kernel with BTF enabled this removes 63 forward
      declarations. Examples of removed declarations: `pt_regs`, `in6_addr`.
      The running time of `btf__dedup` function is increased by about 3%.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Reviewed-by: default avatarAlan Maguire <alan.maguire@oracle.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221109142611.879983-3-eddyz87@gmail.com
      082108fd
    • Eduard Zingerman's avatar
      libbpf: Hashmap interface update to allow both long and void* keys/values · c302378b
      Eduard Zingerman authored
      An update for libbpf's hashmap interface from void* -> void* to a
      polymorphic one, allowing both long and void* keys and values.
      
      This simplifies many use cases in libbpf as hashmaps there are mostly
      integer to integer.
      
      Perf copies hashmap implementation from libbpf and has to be
      updated as well.
      
      Changes to libbpf, selftests/bpf and perf are packed as a single
      commit to avoid compilation issues with any future bisect.
      
      Polymorphic interface is acheived by hiding hashmap interface
      functions behind auxiliary macros that take care of necessary
      type casts, for example:
      
          #define hashmap_cast_ptr(p)						\
      	({								\
      		_Static_assert((p) == NULL || sizeof(*(p)) == sizeof(long),\
      			       #p " pointee should be a long-sized integer or a pointer"); \
      		(long *)(p);						\
      	})
      
          bool hashmap_find(const struct hashmap *map, long key, long *value);
      
          #define hashmap__find(map, key, value) \
      		hashmap_find((map), (long)(key), hashmap_cast_ptr(value))
      
      - hashmap__find macro casts key and value parameters to long
        and long* respectively
      - hashmap_cast_ptr ensures that value pointer points to a memory
        of appropriate size.
      
      This hack was suggested by Andrii Nakryiko in [1].
      This is a follow up for [2].
      
      [1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/
      [2] https://lore.kernel.org/bpf/af1facf9-7bc8-8a3d-0db4-7b3f333589a2@meta.com/T/#m65b28f1d6d969fcd318b556db6a3ad499a42607dSigned-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221109142611.879983-2-eddyz87@gmail.com
      c302378b
  3. 08 Nov, 2022 2 commits
  4. 04 Nov, 2022 24 commits
    • Bagas Sanjaya's avatar
      Documentation: bpf: Escape underscore in BPF type name prefix · 25906092
      Bagas Sanjaya authored
      Sphinx reported unknown target warning:
      
      Documentation/bpf/bpf_design_QA.rst:329: WARNING: Unknown target name: "bpf".
      
      The warning is caused by BPF type name prefix ("bpf_") which is written
      without escaping the trailing underscore.
      
      Escape the underscore to fix the warning. While at it, wrap the
      containing paragraph in less than 80 characters.
      
      Fixes: 9805af8d ("bpf: Document UAPI details for special BPF types")
      Signed-off-by: default avatarBagas Sanjaya <bagasdotme@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarKP Singh <kpsingh@kernel.org>
      Acked-by: default avatarDavid Vernet <void@manifault.com>
      Link: https://lore.kernel.org/bpf/20221104123913.50610-1-bagasdotme@gmail.com
      25906092
    • Artem Savkov's avatar
      selftests/bpf: Use consistent build-id type for liburandom_read.so · 61fc5e66
      Artem Savkov authored
      lld produces "fast" style build-ids by default, which is inconsistent
      with ld's "sha1" style. Explicitly specify build-id style to be "sha1"
      when linking liburandom_read.so the same way it is already done for
      urandom_read.
      Signed-off-by: default avatarArtem Savkov <asavkov@redhat.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Acked-by: default avatarKP Singh <kpsingh@kernel.org>
      Link: https://lore.kernel.org/bpf/20221104094016.102049-1-asavkov@redhat.com
      61fc5e66
    • Rong Tao's avatar
      selftests/bpf: cgroup_helpers.c: Fix strncpy() fortify warning · b3c09fdc
      Rong Tao authored
      Copy libbpf_strlcpy() from libbpf_internal.h to bpf_util.h, and rename it
      to bpf_strlcpy(), then replace selftests strncpy()/libbpf_strlcpy() with
      bpf_strlcpy(), fix compile warning.
      
      The libbpf_internal.h header cannot be used directly here, because
      references to cgroup_helpers.c in samples/bpf will generate compilation
      errors. We also can't add libbpf_strlcpy() directly to bpf_util.h,
      because the definition of libbpf_strlcpy() in libbpf_internal.h is
      duplicated. In order not to modify the libbpf code, add a new function
      bpf_strlcpy() to selftests bpf_util.h.
      
      How to reproduce this compilation warning:
      
      $ make -C samples/bpf
      cgroup_helpers.c: In function ‘__enable_controllers’:
      cgroup_helpers.c:80:17: warning: ‘strncpy’ specified bound 4097 equals destination size [-Wstringop-truncation]
         80 |                 strncpy(enable, controllers, sizeof(enable));
            |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: default avatarRong Tao <rongtao@cestc.cn>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/tencent_469D8AF32BD56816A29981BED06E96D22506@qq.com
      b3c09fdc
    • Rong Tao's avatar
      samples/bpf: Fix tracex2 error: No such file or directory · 1baa7e38
      Rong Tao authored
      since commit c504e5c2("net: skb: introduce kfree_skb_reason()")
      kfree_skb() is replaced by kfree_skb_reason() and kfree_skb() is set to
      the inline function. So, we replace kprobe/kfree_skb with
      kprobe/kfree_skb_reason to solve the tracex2 error.
      
       $ cd samples/bpf
       $ sudo ./tracex2
       libbpf: prog 'bpf_prog2': failed to create kprobe 'kfree_skb+0x0' perf event: No such file or directory
       ERROR: bpf_program__attach failed
      Signed-off-by: default avatarRong Tao <rongtao@cestc.cn>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/tencent_0F0DAE84C0B3C42E0B550E5E9F47A9114D09@qq.com
      1baa7e38
    • Eduard Zingerman's avatar
      selftests/bpf: Tests for enum fwd resolved as full enum64 · 2e20f50f
      Eduard Zingerman authored
      A set of test cases to verify enum fwd resolution logic:
      - verify that enum fwd can be resolved as full enum64;
      - verify that enum64 fwd can be resolved as full enum;
      - verify that enum size is considered when enums are compared for
        equivalence.
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221101235413.1824260-2-eddyz87@gmail.com
      2e20f50f
    • Eduard Zingerman's avatar
      libbpf: Resolve enum fwd as full enum64 and vice versa · de048b6e
      Eduard Zingerman authored
      Changes de-duplication logic for enums in the following way:
      - update btf_hash_enum to ignore size and kind fields to get
        ENUM and ENUM64 types in a same hash bucket;
      - update btf_compat_enum to consider enum fwd to be compatible with
        full enum64 (and vice versa);
      
      This allows BTF de-duplication in the following case:
      
          // CU #1
          enum foo;
      
          struct s {
            enum foo *a;
          } *x;
      
          // CU #2
          enum foo {
            x = 0xfffffffff // big enough to force enum64
          };
      
          struct s {
            enum foo *a;
          } *y;
      
      De-duplicated BTF prior to this commit:
      
          [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
          	'x' val=68719476735ULL
          [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64
              encoding=(none)
          [3] STRUCT 's' size=8 vlen=1
          	'a' type_id=4 bits_offset=0
          [4] PTR '(anon)' type_id=1
          [5] PTR '(anon)' type_id=3
          [6] STRUCT 's' size=8 vlen=1
          	'a' type_id=8 bits_offset=0
          [7] ENUM 'foo' encoding=UNSIGNED size=4 vlen=0
          [8] PTR '(anon)' type_id=7
          [9] PTR '(anon)' type_id=6
      
      De-duplicated BTF after this commit:
      
          [1] ENUM64 'foo' encoding=UNSIGNED size=8 vlen=1
          	'x' val=68719476735ULL
          [2] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64
              encoding=(none)
          [3] STRUCT 's' size=8 vlen=1
          	'a' type_id=4 bits_offset=0
          [4] PTR '(anon)' type_id=1
          [5] PTR '(anon)' type_id=3
      
      Enum forward declarations in C do not provide information about
      enumeration values range. Thus the `btf_type->size` field is
      meaningless for forward enum declarations. In fact, GCC does not
      encode size in DWARF for forward enum declarations
      (but dwarves sets enumeration size to a default value of `sizeof(int) * 8`
      when size is not specified see dwarf_loader.c:die__create_new_enumeration).
      Signed-off-by: default avatarEduard Zingerman <eddyz87@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20221101235413.1824260-1-eddyz87@gmail.com
      de048b6e
    • Alexei Starovoitov's avatar
      Merge branch 'BPF verifier precision tracking improvements' · 07d90c72
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      This patch set fixes and improves BPF verifier's precision tracking logic for
      SCALAR registers.
      
      Patches #1 and #2 are bug fixes discovered while working on these changes.
      
      Patch #3 enables precision tracking for BPF programs that contain subprograms.
      This was disabled before and prevent any modern BPF programs that use
      subprograms from enjoying the benefits of SCALAR (im)precise logic.
      
      Patch #4 is few lines of code changes and many lines of explaining why those
      changes are correct. We establish why ignoring precise markings in current
      state is OK.
      
      Patch #5 build on explanation in patch #4 and pushes it to the limit by
      forcefully forgetting inherited precise markins. Patch #4 by itself doesn't
      prevent current state from having precise=true SCALARs, so patch #5 is
      necessary to prevent such stray precise=true registers from creeping in.
      
      Patch #6 adjusts test_align selftests to work around BPF verifier log's
      limitations when it comes to interactions between state output and precision
      backtracking output.
      
      Overall, the goal of this patch set is to make BPF verifier's state tracking
      a bit more efficient by trying to preserve as much generality in checkpointed
      states as possible.
      
      v1->v2:
      - adjusted patch #1 commit message to make it clear we are fixing forward
        step, not precision backtracking (Alexei);
      - moved last_idx/first_idx verbose logging up to make it clear when global
        func reaches the first empty state (Alexei).
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      07d90c72
    • Andrii Nakryiko's avatar
      selftests/bpf: make test_align selftest more robust · 4f999b76
      Andrii Nakryiko authored
      test_align selftest relies on BPF verifier log emitting register states
      for specific instructions in expected format. Unfortunately, BPF
      verifier precision backtracking log interferes with such expectations.
      And instruction on which precision propagation happens sometimes don't
      output full expected register states. This does indeed look like
      something to be improved in BPF verifier, but is beyond the scope of
      this patch set.
      
      So to make test_align a bit more robust, inject few dummy R4 = R5
      instructions which capture desired state of R5 and won't have precision
      tracking logs on them. This fixes tests until we can improve BPF
      verifier output in the presence of precision tracking.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221104163649.121784-7-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      4f999b76
    • Andrii Nakryiko's avatar
      bpf: aggressively forget precise markings during state checkpointing · 7a830b53
      Andrii Nakryiko authored
      Exploit the property of about-to-be-checkpointed state to be able to
      forget all precise markings up to that point even more aggressively. We
      now clear all potentially inherited precise markings right before
      checkpointing and branching off into child state. If any of children
      states require precise knowledge of any SCALAR register, those will be
      propagated backwards later on before this state is finalized, preserving
      correctness.
      
      There is a single selftests BPF program change, but tremendous one: 25x
      reduction in number of verified instructions and states in
      trace_virtqueue_add_sgs.
      
      Cilium results are more modest, but happen across wider range of programs.
      
      SELFTESTS RESULTS
      =================
      
      $ ./veristat -C -e file,prog,insns,states ~/imprecise-early-results.csv ~/imprecise-aggressive-results.csv | grep -v '+0'
      File                 Program                  Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
      -------------------  -----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      loop6.bpf.linked1.o  trace_virtqueue_add_sgs           398057            15114   -382943 (-96.20%)              8717               336      -8381 (-96.15%)
      -------------------  -----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      
      CILIUM RESULTS
      ==============
      
      $ ./veristat -C -e file,prog,insns,states ~/imprecise-early-results-cilium.csv ~/imprecise-aggressive-results-cilium.csv | grep -v '+0'
      File           Program                           Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
      -------------  --------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      bpf_host.o     tail_handle_nat_fwd_ipv4                    23426            23221       -205 (-0.88%)              1537              1515         -22 (-1.43%)
      bpf_host.o     tail_handle_nat_fwd_ipv6                    13009            12904       -105 (-0.81%)               719               708         -11 (-1.53%)
      bpf_host.o     tail_nodeport_nat_ingress_ipv6               5261             5196        -65 (-1.24%)               247               243          -4 (-1.62%)
      bpf_host.o     tail_nodeport_nat_ipv6_egress                3446             3406        -40 (-1.16%)               203               198          -5 (-2.46%)
      bpf_lxc.o      tail_handle_nat_fwd_ipv4                    23426            23221       -205 (-0.88%)              1537              1515         -22 (-1.43%)
      bpf_lxc.o      tail_handle_nat_fwd_ipv6                    13009            12904       -105 (-0.81%)               719               708         -11 (-1.53%)
      bpf_lxc.o      tail_ipv4_ct_egress                          5074             4897       -177 (-3.49%)               255               248          -7 (-2.75%)
      bpf_lxc.o      tail_ipv4_ct_ingress                         5100             4923       -177 (-3.47%)               255               248          -7 (-2.75%)
      bpf_lxc.o      tail_ipv4_ct_ingress_policy_only             5100             4923       -177 (-3.47%)               255               248          -7 (-2.75%)
      bpf_lxc.o      tail_ipv6_ct_egress                          4558             4536        -22 (-0.48%)               188               187          -1 (-0.53%)
      bpf_lxc.o      tail_ipv6_ct_ingress                         4578             4556        -22 (-0.48%)               188               187          -1 (-0.53%)
      bpf_lxc.o      tail_ipv6_ct_ingress_policy_only             4578             4556        -22 (-0.48%)               188               187          -1 (-0.53%)
      bpf_lxc.o      tail_nodeport_nat_ingress_ipv6               5261             5196        -65 (-1.24%)               247               243          -4 (-1.62%)
      bpf_overlay.o  tail_nodeport_nat_ingress_ipv6               5261             5196        -65 (-1.24%)               247               243          -4 (-1.62%)
      bpf_overlay.o  tail_nodeport_nat_ipv6_egress                3482             3442        -40 (-1.15%)               204               201          -3 (-1.47%)
      bpf_xdp.o      tail_nodeport_nat_egress_ipv4               17200            15619      -1581 (-9.19%)              1111              1010        -101 (-9.09%)
      -------------  --------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221104163649.121784-6-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      7a830b53
    • Andrii Nakryiko's avatar
      bpf: stop setting precise in current state · f63181b6
      Andrii Nakryiko authored
      Setting reg->precise to true in current state is not necessary from
      correctness standpoint, but it does pessimise the whole precision (or
      rather "imprecision", because that's what we want to keep as much as
      possible) tracking. Why is somewhat subtle and my best attempt to
      explain this is recorded in an extensive comment for __mark_chain_precise()
      function. Some more careful thinking and code reading is probably required
      still to grok this completely, unfortunately. Whiteboarding and a bunch
      of extra handwaiving in person would be even more helpful, but is deemed
      impractical in Git commit.
      
      Next patch pushes this imprecision property even further, building on top of
      the insights described in this patch.
      
      End results are pretty nice, we get reduction in number of total instructions
      and states verified due to a better states reuse, as some of the states are now
      more generic and permissive due to less unnecessary precise=true requirements.
      
      SELFTESTS RESULTS
      =================
      
      $ ./veristat -C -e file,prog,insns,states ~/subprog-precise-results.csv ~/imprecise-early-results.csv | grep -v '+0'
      File                                     Program                 Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
      ---------------------------------------  ----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      bpf_iter_ksym.bpf.linked1.o              dump_ksym                           347              285       -62 (-17.87%)                20                19          -1 (-5.00%)
      pyperf600_bpf_loop.bpf.linked1.o         on_event                           3678             3736        +58 (+1.58%)               276               285          +9 (+3.26%)
      setget_sockopt.bpf.linked1.o             skops_sockopt                      4038             3947        -91 (-2.25%)               347               343          -4 (-1.15%)
      test_l4lb.bpf.linked1.o                  balancer_ingress                   4559             2611     -1948 (-42.73%)               118               105        -13 (-11.02%)
      test_l4lb_noinline.bpf.linked1.o         balancer_ingress                   6279             6268        -11 (-0.18%)               237               236          -1 (-0.42%)
      test_misc_tcp_hdr_options.bpf.linked1.o  misc_estab                         1307             1303         -4 (-0.31%)               100                99          -1 (-1.00%)
      test_sk_lookup.bpf.linked1.o             ctx_narrow_access                   456              447         -9 (-1.97%)                39                38          -1 (-2.56%)
      test_sysctl_loop1.bpf.linked1.o          sysctl_tcp_mem                     1389             1384         -5 (-0.36%)                26                25          -1 (-3.85%)
      test_tc_dtime.bpf.linked1.o              egress_fwdns_prio101                518              485        -33 (-6.37%)                51                46          -5 (-9.80%)
      test_tc_dtime.bpf.linked1.o              egress_host                         519              468        -51 (-9.83%)                50                44         -6 (-12.00%)
      test_tc_dtime.bpf.linked1.o              ingress_fwdns_prio101               842             1000      +158 (+18.76%)                73                88        +15 (+20.55%)
      xdp_synproxy_kern.bpf.linked1.o          syncookie_tc                     405757           373173     -32584 (-8.03%)             25735             22882      -2853 (-11.09%)
      xdp_synproxy_kern.bpf.linked1.o          syncookie_xdp                    479055           371590   -107465 (-22.43%)             29145             22207      -6938 (-23.81%)
      ---------------------------------------  ----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      
      Slight regression in test_tc_dtime.bpf.linked1.o/ingress_fwdns_prio101
      is left for a follow up, there might be some more precision-related bugs
      in existing BPF verifier logic.
      
      CILIUM RESULTS
      ==============
      
      $ ./veristat -C -e file,prog,insns,states ~/subprog-precise-results-cilium.csv ~/imprecise-early-results-cilium.csv | grep -v '+0'
      File           Program                         Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
      -------------  ------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      bpf_host.o     cil_from_host                               762              556      -206 (-27.03%)                43                37         -6 (-13.95%)
      bpf_host.o     tail_handle_nat_fwd_ipv4                  23541            23426       -115 (-0.49%)              1538              1537          -1 (-0.07%)
      bpf_host.o     tail_nodeport_nat_egress_ipv4             33592            33566        -26 (-0.08%)              2163              2161          -2 (-0.09%)
      bpf_lxc.o      tail_handle_nat_fwd_ipv4                  23541            23426       -115 (-0.49%)              1538              1537          -1 (-0.07%)
      bpf_overlay.o  tail_nodeport_nat_egress_ipv4             33581            33543        -38 (-0.11%)              2160              2157          -3 (-0.14%)
      bpf_xdp.o      tail_handle_nat_fwd_ipv4                  21659            20920       -739 (-3.41%)              1440              1376         -64 (-4.44%)
      bpf_xdp.o      tail_handle_nat_fwd_ipv6                  17084            17039        -45 (-0.26%)               907               905          -2 (-0.22%)
      bpf_xdp.o      tail_lb_ipv4                              73442            73430        -12 (-0.02%)              4370              4369          -1 (-0.02%)
      bpf_xdp.o      tail_lb_ipv6                             152114           151895       -219 (-0.14%)              6493              6479         -14 (-0.22%)
      bpf_xdp.o      tail_nodeport_nat_egress_ipv4             17377            17200       -177 (-1.02%)              1125              1111         -14 (-1.24%)
      bpf_xdp.o      tail_nodeport_nat_ingress_ipv6             6405             6397         -8 (-0.12%)               309               308          -1 (-0.32%)
      bpf_xdp.o      tail_rev_nodeport_lb4                      7126             6934       -192 (-2.69%)               414               402         -12 (-2.90%)
      bpf_xdp.o      tail_rev_nodeport_lb6                     18059            17905       -154 (-0.85%)              1105              1096          -9 (-0.81%)
      -------------  ------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221104163649.121784-5-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f63181b6
    • Andrii Nakryiko's avatar
      bpf: allow precision tracking for programs with subprogs · be2ef816
      Andrii Nakryiko authored
      Stop forcing precise=true for SCALAR registers when BPF program has any
      subprograms. Current restriction means that any BPF program, as soon as
      it uses subprograms, will end up not getting any of the precision
      tracking benefits in reduction of number of verified states.
      
      This patch keeps the fallback mark_all_scalars_precise() behavior if
      precise marking has to cross function frames. E.g., if subprogram
      requires R1 (first input arg) to be marked precise, ideally we'd need to
      backtrack to the parent function and keep marking R1 and its
      dependencies as precise. But right now we give up and force all the
      SCALARs in any of the current and parent states to be forced to
      precise=true. We can lift that restriction in the future.
      
      But this patch fixes two issues identified when trying to enable
      precision tracking for subprogs.
      
      First, prevent "escaping" from top-most state in a global subprog. While
      with entry-level BPF program we never end up requesting precision for
      R1-R5 registers, because R2-R5 are not initialized (and so not readable
      in correct BPF program), and R1 is PTR_TO_CTX, not SCALAR, and so is
      implicitly precise. With global subprogs, though, it's different, as
      global subprog a) can have up to 5 SCALAR input arguments, which might
      get marked as precise=true and b) it is validated in isolation from its
      main entry BPF program. b) means that we can end up exhausting parent
      state chain and still not mark all registers in reg_mask as precise,
      which would lead to verifier bug warning.
      
      To handle that, we need to consider two cases. First, if the very first
      state is not immediately "checkpointed" (i.e., stored in state lookup
      hashtable), it will get correct first_insn_idx and last_insn_idx
      instruction set during state checkpointing. As such, this case is
      already handled and __mark_chain_precision() already handles that by
      just doing nothing when we reach to the very first parent state.
      st->parent will be NULL and we'll just stop. Perhaps some extra check
      for reg_mask and stack_mask is due here, but this patch doesn't address
      that issue.
      
      More problematic second case is when global function's initial state is
      immediately checkpointed before we manage to process the very first
      instruction. This is happening because when there is a call to global
      subprog from the main program the very first subprog's instruction is
      marked as pruning point, so before we manage to process first
      instruction we have to check and checkpoint state. This patch adds
      a special handling for such "empty" state, which is identified by having
      st->last_insn_idx set to -1. In such case, we check that we are indeed
      validating global subprog, and with some sanity checking we mark input
      args as precise if requested.
      
      Note that we also initialize state->first_insn_idx with correct start
      insn_idx offset. For main program zero is correct value, but for any
      subprog it's quite confusing to not have first_insn_idx set. This
      doesn't have any functional impact, but helps with debugging and state
      printing. We also explicitly initialize state->last_insns_idx instead of
      relying on is_state_visited() to do this with env->prev_insns_idx, which
      will be -1 on the very first instruction. This concludes necessary
      changes to handle specifically global subprog's precision tracking.
      
      Second identified problem was missed handling of BPF helper functions
      that call into subprogs (e.g., bpf_loop and few others). From precision
      tracking and backtracking logic's standpoint those are effectively calls
      into subprogs and should be called as BPF_PSEUDO_CALL calls.
      
      This patch takes the least intrusive way and just checks against a short
      list of current BPF helpers that do call subprogs, encapsulated in
      is_callback_calling_function() function. But to prevent accidentally
      forgetting to add new BPF helpers to this "list", we also do a sanity
      check in __check_func_call, which has to be called for each such special
      BPF helper, to validate that BPF helper is indeed recognized as
      callback-calling one. This should catch any missed checks in the future.
      Adding some special flags to be added in function proto definitions
      seemed like an overkill in this case.
      
      With the above changes, it's possible to remove forceful setting of
      reg->precise to true in __mark_reg_unknown, which turns on precision
      tracking both inside subprogs and entry progs that have subprogs. No
      warnings or errors were detected across all the selftests, but also when
      validating with veristat against internal Meta BPF objects and Cilium
      objects. Further, in some BPF programs there are noticeable reduction in
      number of states and instructions validated due to more effective
      precision tracking, especially benefiting syncookie test.
      
      $ ./veristat -C -e file,prog,insns,states ~/baseline-results.csv ~/subprog-precise-results.csv  | grep -v '+0'
      File                                      Program                     Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
      ----------------------------------------  --------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      pyperf600_bpf_loop.bpf.linked1.o          on_event                               3966             3678       -288 (-7.26%)               306               276         -30 (-9.80%)
      pyperf_global.bpf.linked1.o               on_event                               7563             7530        -33 (-0.44%)               520               517          -3 (-0.58%)
      pyperf_subprogs.bpf.linked1.o             on_event                              36358            36934       +576 (+1.58%)              2499              2531         +32 (+1.28%)
      setget_sockopt.bpf.linked1.o              skops_sockopt                          3965             4038        +73 (+1.84%)               343               347          +4 (+1.17%)
      test_cls_redirect_subprogs.bpf.linked1.o  cls_redirect                          64965            64901        -64 (-0.10%)              4619              4612          -7 (-0.15%)
      test_misc_tcp_hdr_options.bpf.linked1.o   misc_estab                             1491             1307      -184 (-12.34%)               110               100         -10 (-9.09%)
      test_pkt_access.bpf.linked1.o             test_pkt_access                         354              349         -5 (-1.41%)                25                24          -1 (-4.00%)
      test_sock_fields.bpf.linked1.o            egress_read_sock_fields                 435              375       -60 (-13.79%)                22                20          -2 (-9.09%)
      test_sysctl_loop2.bpf.linked1.o           sysctl_tcp_mem                         1508             1501         -7 (-0.46%)                29                28          -1 (-3.45%)
      test_tc_dtime.bpf.linked1.o               egress_fwdns_prio100                    468              435        -33 (-7.05%)                45                41          -4 (-8.89%)
      test_tc_dtime.bpf.linked1.o               ingress_fwdns_prio100                   398              408        +10 (+2.51%)                42                39          -3 (-7.14%)
      test_tc_dtime.bpf.linked1.o               ingress_fwdns_prio101                  1096              842      -254 (-23.18%)                97                73        -24 (-24.74%)
      test_tcp_hdr_options.bpf.linked1.o        estab                                  2758             2408      -350 (-12.69%)               208               181        -27 (-12.98%)
      test_urandom_usdt.bpf.linked1.o           urand_read_with_sema                    466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
      test_urandom_usdt.bpf.linked1.o           urand_read_without_sema                 466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
      test_urandom_usdt.bpf.linked1.o           urandlib_read_with_sema                 466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
      test_urandom_usdt.bpf.linked1.o           urandlib_read_without_sema              466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
      test_xdp_noinline.bpf.linked1.o           balancer_ingress_v6                    4302             4294         -8 (-0.19%)               257               256          -1 (-0.39%)
      xdp_synproxy_kern.bpf.linked1.o           syncookie_tc                         583722           405757   -177965 (-30.49%)             35846             25735     -10111 (-28.21%)
      xdp_synproxy_kern.bpf.linked1.o           syncookie_xdp                        609123           479055   -130068 (-21.35%)             35452             29145      -6307 (-17.79%)
      ----------------------------------------  --------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221104163649.121784-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      be2ef816
    • Andrii Nakryiko's avatar
      bpf: propagate precision across all frames, not just the last one · 529409ea
      Andrii Nakryiko authored
      When equivalent completed state is found and it has additional precision
      restrictions, BPF verifier propagates precision to
      currently-being-verified state chain (i.e., including parent states) so
      that if some of the states in the chain are not yet completed, necessary
      precision restrictions are enforced.
      
      Unfortunately, right now this happens only for the last frame (deepest
      active subprogram's frame), not all the frames. This can lead to
      incorrect matching of states due to missing precision marker. Currently
      this doesn't seem possible as BPF verifier forces everything to precise
      when validated BPF program has any subprograms. But with the next patch
      lifting this restriction, this becomes problematic.
      
      In fact, without this fix, we'll start getting failure in one of the
      existing test_verifier test cases:
      
        #906/p precise: cross frame pruning FAIL
        Unexpected success to load!
        verification time 48 usec
        stack depth 0+0
        processed 26 insns (limit 1000000) max_states_per_insn 3 total_states 17 peak_states 17 mark_read 8
      
      This patch adds precision propagation across all frames.
      
      Fixes: a3ce685d ("bpf: fix precision tracking")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221104163649.121784-3-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      529409ea
    • Andrii Nakryiko's avatar
      bpf: propagate precision in ALU/ALU64 operations · a3b666bf
      Andrii Nakryiko authored
      When processing ALU/ALU64 operations (apart from BPF_MOV, which is
      handled correctly already; and BPF_NEG and BPF_END are special and don't
      have source register), if destination register is already marked
      precise, this causes problem with potentially missing precision tracking
      for the source register. E.g., when we have r1 >>= r5 and r1 is marked
      precise, but r5 isn't, this will lead to r5 staying as imprecise. This
      is due to the precision backtracking logic stopping early when it sees
      r1 is already marked precise. If r1 wasn't precise, we'd keep
      backtracking and would add r5 to the set of registers that need to be
      marked precise. So there is a discrepancy here which can lead to invalid
      and incompatible states matched due to lack of precision marking on r5.
      If r1 wasn't precise, precision backtracking would correctly mark both
      r1 and r5 as precise.
      
      This is simple to fix, though. During the forward instruction simulation
      pass, for arithmetic operations of `scalar <op>= scalar` form (where
      <op> is ALU or ALU64 operations), if destination register is already
      precise, mark source register as precise. This applies only when both
      involved registers are SCALARs. `ptr += scalar` and `scalar += ptr`
      cases are already handled correctly.
      
      This does have (negative) effect on some selftest programs and few
      Cilium programs.  ~/baseline-tmp-results.csv are veristat results with
      this patch, while ~/baseline-results.csv is without it. See post
      scriptum for instructions on how to make Cilium programs testable with
      veristat. Correctness has a price.
      
      $ ./veristat -C -e file,prog,insns,states ~/baseline-results.csv ~/baseline-tmp-results.csv | grep -v '+0'
      File                     Program               Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
      -----------------------  --------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      bpf_cubic.bpf.linked1.o  bpf_cubic_cong_avoid              997             1700      +703 (+70.51%)                62                90        +28 (+45.16%)
      test_l4lb.bpf.linked1.o  balancer_ingress                 4559             5469      +910 (+19.96%)               118               126          +8 (+6.78%)
      -----------------------  --------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      
      $ ./veristat -C -e file,prog,verdict,insns,states ~/baseline-results-cilium.csv ~/baseline-tmp-results-cilium.csv | grep -v '+0'
      File           Program                         Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
      -------------  ------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      bpf_host.o     tail_nodeport_nat_ingress_ipv6             4448             5261      +813 (+18.28%)               234               247         +13 (+5.56%)
      bpf_host.o     tail_nodeport_nat_ipv6_egress              3396             3446        +50 (+1.47%)               201               203          +2 (+1.00%)
      bpf_lxc.o      tail_nodeport_nat_ingress_ipv6             4448             5261      +813 (+18.28%)               234               247         +13 (+5.56%)
      bpf_overlay.o  tail_nodeport_nat_ingress_ipv6             4448             5261      +813 (+18.28%)               234               247         +13 (+5.56%)
      bpf_xdp.o      tail_lb_ipv4                              71736            73442      +1706 (+2.38%)              4295              4370         +75 (+1.75%)
      -------------  ------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
      
      P.S. To make Cilium ([0]) programs libbpf-compatible and thus
      veristat-loadable, apply changes from topmost commit in [1], which does
      minimal changes to Cilium source code, mostly around SEC() annotations
      and BPF map definitions.
      
        [0] https://github.com/cilium/cilium/
        [1] https://github.com/anakryiko/cilium/commits/libbpf-friendliness
      
      Fixes: b5dc0163 ("bpf: precise scalar_value tracking")
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221104163649.121784-2-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a3b666bf
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Refactor map->off_arr handling · f71b2f64
      Kumar Kartikeya Dwivedi authored
      Refactor map->off_arr handling into generic functions that can work on
      their own without hardcoding map specific code. The btf_fields_offs
      structure is now returned from btf_parse_field_offs, which can be reused
      later for types in program BTF.
      
      All functions like copy_map_value, zero_map_value call generic
      underlying functions so that they can also be reused later for copying
      to values allocated in programs which encode specific fields.
      
      Later, some helper functions will also require access to this
      btf_field_offs structure to be able to skip over special fields at
      runtime.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221103191013.1236066-9-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f71b2f64
    • Kumar Kartikeya Dwivedi's avatar
      bpf: Consolidate spin_lock, timer management into btf_record · db559117
      Kumar Kartikeya Dwivedi authored
      Now that kptr_off_tab has been refactored into btf_record, and can hold
      more than one specific field type, accomodate bpf_spin_lock and
      bpf_timer as well.
      
      While they don't require any more metadata than offset, having all
      special fields in one place allows us to share the same code for
      allocated user defined types and handle both map values and these
      allocated objects in a similar fashion.
      
      As an optimization, we still keep spin_lock_off and timer_off offsets in
      the btf_record structure, just to avoid having to find the btf_field
      struct each time their offset is needed. This is mostly needed to
      manipulate such objects in a map value at runtime. It's ok to hardcode
      just one offset as more than one field is disallowed.
      Signed-off-by: default avatarKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20221103191013.1236066-8-memxor@gmail.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      db559117
    • Alexei Starovoitov's avatar
      Merge branch 'veristat: replay, filtering, sorting' · af085f55
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      
      This patch set adds a bunch of new featurs and improvements that were sorely
      missing during recent active use of veristat to develop BPF verifier precision
      changes. Individual patches provide justification, explanation and often
      examples showing how new capabilities can be used.
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      af085f55
    • Andrii Nakryiko's avatar
      selftests/bpf: support stat filtering in comparison mode in veristat · d5ce4b89
      Andrii Nakryiko authored
      Finally add support for filtering stats values, similar to
      non-comparison mode filtering. For comparison mode 4 variants of stats
      are important for filtering, as they allow to filter either A or B side,
      but even more importantly they allow to filter based on value
      difference, and for verdict stat value difference is MATCH/MISMATCH
      classification. So with these changes it's finally possible to easily
      check if there were any mismatches between failure/success outcomes on
      two separate data sets. Like in an example below:
      
        $ ./veristat -e file,prog,verdict,insns -C ~/baseline-results.csv ~/shortest-results.csv -f verdict_diff=mismatch
        File                                   Program                Verdict (A)  Verdict (B)  Verdict (DIFF)  Insns (A)  Insns (B)  Insns        (DIFF)
        -------------------------------------  ---------------------  -----------  -----------  --------------  ---------  ---------  -------------------
        dynptr_success.bpf.linked1.o           test_data_slice        success      failure      MISMATCH               85          0       -85 (-100.00%)
        dynptr_success.bpf.linked1.o           test_read_write        success      failure      MISMATCH             1992          0     -1992 (-100.00%)
        dynptr_success.bpf.linked1.o           test_ringbuf           success      failure      MISMATCH               74          0       -74 (-100.00%)
        kprobe_multi.bpf.linked1.o             test_kprobe            failure      success      MISMATCH                0        246      +246 (+100.00%)
        kprobe_multi.bpf.linked1.o             test_kprobe_manual     failure      success      MISMATCH                0        246      +246 (+100.00%)
        kprobe_multi.bpf.linked1.o             test_kretprobe         failure      success      MISMATCH                0        248      +248 (+100.00%)
        kprobe_multi.bpf.linked1.o             test_kretprobe_manual  failure      success      MISMATCH                0        248      +248 (+100.00%)
        kprobe_multi.bpf.linked1.o             trigger                failure      success      MISMATCH                0          2        +2 (+100.00%)
        netcnt_prog.bpf.linked1.o              bpf_nextcnt            failure      success      MISMATCH                0         56       +56 (+100.00%)
        pyperf600_nounroll.bpf.linked1.o       on_event               success      failure      MISMATCH           568128    1000001    +431873 (+76.02%)
        ringbuf_bench.bpf.linked1.o            bench_ringbuf          success      failure      MISMATCH                8          0        -8 (-100.00%)
        strobemeta.bpf.linked1.o               on_event               success      failure      MISMATCH           557149    1000001    +442852 (+79.49%)
        strobemeta_nounroll1.bpf.linked1.o     on_event               success      failure      MISMATCH            57240    1000001  +942761 (+1647.03%)
        strobemeta_nounroll2.bpf.linked1.o     on_event               success      failure      MISMATCH           501725    1000001    +498276 (+99.31%)
        strobemeta_subprogs.bpf.linked1.o      on_event               success      failure      MISMATCH            65420    1000001  +934581 (+1428.59%)
        test_map_in_map_invalid.bpf.linked1.o  xdp_noop0              success      failure      MISMATCH                2          0        -2 (-100.00%)
        test_mmap.bpf.linked1.o                test_mmap              success      failure      MISMATCH               46          0       -46 (-100.00%)
        test_verif_scale3.bpf.linked1.o        balancer_ingress       success      failure      MISMATCH           845499    1000001    +154502 (+18.27%)
        -------------------------------------  ---------------------  -----------  -----------  --------------  ---------  ---------  -------------------
      
      Note that by filtering on verdict_diff=mismatch, it's now extremely easy and
      fast to see any changes in verdict. Example above showcases both failure ->
      success transitions (which are generally surprising) and success -> failure
      transitions (which are expected if bugs are present).
      
      Given veristat allows to query relative percent difference values, internal
      logic for comparison mode is based on floating point numbers, so requires a bit
      of epsilon precision logic, deviating from typical integer simple handling
      rules.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221103055304.2904589-11-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d5ce4b89
    • Andrii Nakryiko's avatar
      selftests/bpf: support stats ordering in comparison mode in veristat · fa9bb590
      Andrii Nakryiko authored
      Introduce the concept of "stat variant", by which it's possible to
      specify whether to use the value from A (baseline) side, B (comparison
      or control) side, the absolute difference value or relative (percentage)
      difference value.
      
      To support specifying this, veristat recognizes `_a`, `_b`, `_diff`,
      `_pct` suffixes, which can be appended to stat name(s). In
      non-comparison mode variants are ignored (there is only `_a` variant
      effectively), if no variant suffix is provided, `_b` is assumed, as
      control group is of primary interest in comparison mode.
      
      These stat variants can be flexibly combined with asc/desc orders.
      
      Here's an example of ordering results first by verdict match/mismatch (or n/a
      if one of the sides is missing; n/a is always considered to be the lowest
      value), and within each match/mismatch/n/a group further sort by number of
      instructions in B side. In this case we don't have MISMATCH cases, but N/A are
      split from MATCH, demonstrating this custom ordering.
      
        $ ./veristat -e file,prog,verdict,insns -s verdict_diff,insns_b_ -C ~/base.csv ~/comp.csv
        File                Program                         Verdict (A)  Verdict (B)  Verdict (DIFF)  Insns (A)  Insns (B)  Insns   (DIFF)
        ------------------  ------------------------------  -----------  -----------  --------------  ---------  ---------  --------------
        bpf_xdp.o           tail_lb_ipv6                    N/A          success      N/A                   N/A     151895             N/A
        bpf_xdp.o           tail_nodeport_nat_egress_ipv4   N/A          success      N/A                   N/A      15619             N/A
        bpf_xdp.o           tail_nodeport_ipv6_dsr          N/A          success      N/A                   N/A       1206             N/A
        bpf_xdp.o           tail_nodeport_ipv4_dsr          N/A          success      N/A                   N/A       1162             N/A
        bpf_alignchecker.o  tail_icmp6_send_echo_reply      N/A          failure      N/A                   N/A         74             N/A
        bpf_alignchecker.o  __send_drop_notify              success      N/A          N/A                    53        N/A             N/A
        bpf_host.o          __send_drop_notify              success      N/A          N/A                    53        N/A             N/A
        bpf_host.o          cil_from_host                   success      N/A          N/A                   762        N/A             N/A
        bpf_xdp.o           tail_lb_ipv4                    success      success      MATCH               71736      73430  +1694 (+2.36%)
        bpf_xdp.o           tail_handle_nat_fwd_ipv4        success      success      MATCH               21547      20920   -627 (-2.91%)
        bpf_xdp.o           tail_rev_nodeport_lb6           success      success      MATCH               17954      17905    -49 (-0.27%)
        bpf_xdp.o           tail_handle_nat_fwd_ipv6        success      success      MATCH               16974      17039    +65 (+0.38%)
        bpf_xdp.o           tail_nodeport_nat_ingress_ipv4  success      success      MATCH                7658       7713    +55 (+0.72%)
        bpf_xdp.o           tail_rev_nodeport_lb4           success      success      MATCH                7126       6934   -192 (-2.69%)
        bpf_xdp.o           tail_nodeport_nat_ingress_ipv6  success      success      MATCH                6405       6397     -8 (-0.12%)
        bpf_xdp.o           tail_nodeport_nat_ipv6_egress   failure      failure      MATCH                 752        752     +0 (+0.00%)
        bpf_xdp.o           cil_xdp_entry                   success      success      MATCH                 423        423     +0 (+0.00%)
        bpf_xdp.o           __send_drop_notify              success      success      MATCH                 151        151     +0 (+0.00%)
        bpf_alignchecker.o  tail_icmp6_handle_ns            failure      failure      MATCH                  33         33     +0 (+0.00%)
        ------------------  ------------------------------  -----------  -----------  --------------  ---------  ---------  --------------
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221103055304.2904589-10-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      fa9bb590
    • Andrii Nakryiko's avatar
      selftests/bpf: handle missing records in comparison mode better in veristat · a5710848
      Andrii Nakryiko authored
      When comparing two datasets, if either side is missing corresponding
      record with the same file and prog name, currently veristat emits
      misleading zeros/failures, and even tried to calculate a difference,
      even though there is no data to compare against.
      
      This patch improves internal logic of handling such situations. Now
      we'll emit "N/A" in places where data is missing and comparison is
      non-sensical.
      
      As an example, in an artificially truncated and mismatched Cilium
      results, the output looks like below:
      
        $ ./veristat -e file,prog,verdict,insns -C ~/base.csv ~/comp.csv
        File                Program                         Verdict (A)  Verdict (B)  Verdict (DIFF)  Insns (A)  Insns (B)  Insns   (DIFF)
        ------------------  ------------------------------  -----------  -----------  --------------  ---------  ---------  --------------
        bpf_alignchecker.o  __send_drop_notify              success      N/A          N/A                    53        N/A             N/A
        bpf_alignchecker.o  tail_icmp6_handle_ns            failure      failure      MATCH                  33         33     +0 (+0.00%)
        bpf_alignchecker.o  tail_icmp6_send_echo_reply      N/A          failure      N/A                   N/A         74             N/A
        bpf_host.o          __send_drop_notify              success      N/A          N/A                    53        N/A             N/A
        bpf_host.o          cil_from_host                   success      N/A          N/A                   762        N/A             N/A
        bpf_xdp.o           __send_drop_notify              success      success      MATCH                 151        151     +0 (+0.00%)
        bpf_xdp.o           cil_xdp_entry                   success      success      MATCH                 423        423     +0 (+0.00%)
        bpf_xdp.o           tail_handle_nat_fwd_ipv4        success      success      MATCH               21547      20920   -627 (-2.91%)
        bpf_xdp.o           tail_handle_nat_fwd_ipv6        success      success      MATCH               16974      17039    +65 (+0.38%)
        bpf_xdp.o           tail_lb_ipv4                    success      success      MATCH               71736      73430  +1694 (+2.36%)
        bpf_xdp.o           tail_lb_ipv6                    N/A          success      N/A                   N/A     151895             N/A
        bpf_xdp.o           tail_nodeport_ipv4_dsr          N/A          success      N/A                   N/A       1162             N/A
        bpf_xdp.o           tail_nodeport_ipv6_dsr          N/A          success      N/A                   N/A       1206             N/A
        bpf_xdp.o           tail_nodeport_nat_egress_ipv4   N/A          success      N/A                   N/A      15619             N/A
        bpf_xdp.o           tail_nodeport_nat_ingress_ipv4  success      success      MATCH                7658       7713    +55 (+0.72%)
        bpf_xdp.o           tail_nodeport_nat_ingress_ipv6  success      success      MATCH                6405       6397     -8 (-0.12%)
        bpf_xdp.o           tail_nodeport_nat_ipv6_egress   failure      failure      MATCH                 752        752     +0 (+0.00%)
        bpf_xdp.o           tail_rev_nodeport_lb4           success      success      MATCH                7126       6934   -192 (-2.69%)
        bpf_xdp.o           tail_rev_nodeport_lb6           success      success      MATCH               17954      17905    -49 (-0.27%)
        ------------------  ------------------------------  -----------  -----------  --------------  ---------  ---------  --------------
      
      Internally veristat now separates joining two datasets and remembering the
      join, and actually emitting a comparison view. This will come handy when we add
      support for filtering and custom ordering in comparison mode.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221103055304.2904589-9-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a5710848
    • Andrii Nakryiko's avatar
      selftests/bpf: make veristat emit all stats in CSV mode by default · 77534401
      Andrii Nakryiko authored
      Make veristat distinguish between table and CSV output formats and use
      different default set of stats (columns) that are emitted. While for
      human-readable table output it doesn't make sense to output all known
      stats, it is very useful for CSV mode to record all possible data, so
      that it can later be queried and filtered in replay or comparison mode.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221103055304.2904589-8-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      77534401
    • Andrii Nakryiko's avatar
      selftests/bpf: support simple filtering of stats in veristat · 1bb4ec81
      Andrii Nakryiko authored
      Define simple expressions to filter not just by file and program name,
      but also by resulting values of collected stats. Support usual
      equality and inequality operators. Verdict, which is a boolean-like
      field can be also filtered either as 0/1, failure/success (with f/s,
      fail/succ, and failure/success aliases) symbols, or as false/true (f/t).
      Aliases are case insensitive.
      
      Currently this filtering is honored only in verification and replay
      modes. Comparison mode support will be added in next patch.
      
      Here's an example of verifying a bunch of BPF object files and emitting
      only results for successfully validated programs that have more than 100
      total instructions processed by BPF verifier, sorted by number of
      instructions in ascending order:
      
        $ sudo ./veristat *.bpf.o -s insns^ -f 'insns>100'
      
      There can be many filters (both allow and deny flavors), all of them are
      combined.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221103055304.2904589-7-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1bb4ec81
    • Andrii Nakryiko's avatar
      selftests/bpf: allow to define asc/desc ordering for sort specs in veristat · d68c07e2
      Andrii Nakryiko authored
      Allow to specify '^' at the end of stat name to designate that it should
      be sorted in ascending order. Similarly, allow any of 'v', 'V', '.',
      '!', or '_' suffix "symbols" to designate descending order. It's such
      a zoo for descending order because there is no single intuitive symbol
      that could be used (using 'v' looks pretty weird in practice), so few
      symbols that are "downwards leaning or pointing" were chosen. Either
      way, it shouldn't cause any troubles in practice.
      
      This new feature allows to customize sortering order to match user's
      needs.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221103055304.2904589-6-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d68c07e2
    • Andrii Nakryiko's avatar
      selftests/bpf: ensure we always have non-ambiguous sorting in veristat · b9670b90
      Andrii Nakryiko authored
      Always fall back to unique file/prog comparison if user's custom order
      specs are ambiguous. This ensures stable output no matter what.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221103055304.2904589-5-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b9670b90
    • Andrii Nakryiko's avatar
      selftests/bpf: consolidate and improve file/prog filtering in veristat · 10b1b3f3
      Andrii Nakryiko authored
      Slightly change rules of specifying file/prog glob filters. In practice
      it's quite often inconvenient to do `*/<prog-glob>` if that program glob
      is unique enough and won't accidentally match any file names.
      
      This patch changes the rules so that `-f <glob>` will apply specified
      glob to both file and program names. User still has all the control by
      doing '*/<prog-only-glob>' or '<file-only-glob/*'. We also now allow
      '/<prog-glob>' and '<file-glob/' (all matching wildcard is assumed if
      missing).
      
      Also, internally unify file-only and file+prog checks
      (should_process_file and should_process_prog are now
      should_process_file_prog that can handle prog name as optional). This
      makes maintaining and extending this code easier.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20221103055304.2904589-4-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      10b1b3f3