1. 15 Dec, 2023 7 commits
    • Daniel Xu's avatar
      bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state() · 2cd07b0e
      Daniel Xu authored
      This commit extends test_tunnel selftest to test the new XDP xfrm state
      lookup kfunc.
      Co-developed-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Link: https://lore.kernel.org/r/e704e9a4332e3eac7b458e4bfdec8fcc6984cdb6.1702593901.git.dxu@dxuuu.xyzSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      2cd07b0e
    • Daniel Xu's avatar
      bpf: selftests: Move xfrm tunnel test to test_progs · e7adc829
      Daniel Xu authored
      test_progs is better than a shell script b/c C is a bit easier to
      maintain than shell. Also it's easier to use new infra like memory
      mapped global variables from C via bpf skeleton.
      Co-developed-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Link: https://lore.kernel.org/r/a350db9e08520c64544562d88ec005a039124d9b.1702593901.git.dxu@dxuuu.xyzSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      e7adc829
    • Daniel Xu's avatar
      bpf: selftests: test_tunnel: Use vmlinux.h declarations · 02b4e126
      Daniel Xu authored
      vmlinux.h declarations are more ergnomic, especially when working with
      kfuncs. The uapi headers are often incomplete for kfunc definitions.
      
      This commit also switches bitfield accesses to use CO-RE helpers.
      Switching to vmlinux.h definitions makes the verifier very
      unhappy with raw bitfield accesses. The error is:
      
          ; md.u.md2.dir = direction;
          33: (69) r1 = *(u16 *)(r2 +11)
          misaligned stack access off (0x0; 0x0)+-64+11 size 2
      
      Fix by using CO-RE-aware bitfield reads and writes.
      Co-developed-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Link: https://lore.kernel.org/r/884bde1d9a351d126a3923886b945ea6b1b0776b.1702593901.git.dxu@dxuuu.xyzSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      02b4e126
    • Daniel Xu's avatar
      bpf: selftests: test_tunnel: Setup fresh topology for each subtest · 77a7a822
      Daniel Xu authored
      This helps with determinism b/c individual setup/teardown prevents
      leaking state between different subtests.
      Signed-off-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Link: https://lore.kernel.org/r/0fb59fa16fb58cca7def5239df606005a3e8dd0e.1702593901.git.dxu@dxuuu.xyzSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      77a7a822
    • Daniel Xu's avatar
      bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc · 8f0ec8c6
      Daniel Xu authored
      This commit adds an unstable kfunc helper to access internal xfrm_state
      associated with an SA. This is intended to be used for the upcoming
      IPsec pcpu work to assign special pcpu SAs to a particular CPU. In other
      words: for custom software RSS.
      
      That being said, the function that this kfunc wraps is fairly generic
      and used for a lot of xfrm tasks. I'm sure people will find uses
      elsewhere over time.
      
      This commit also adds a corresponding bpf_xdp_xfrm_state_release() kfunc
      to release the refcnt acquired by bpf_xdp_get_xfrm_state(). The verifier
      will require that all acquired xfrm_state's are released.
      Co-developed-by: default avatarAntony Antony <antony.antony@secunet.com>
      Signed-off-by: default avatarAntony Antony <antony.antony@secunet.com>
      Acked-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDaniel Xu <dxu@dxuuu.xyz>
      Link: https://lore.kernel.org/r/a29699c42f5fad456b875c98dd11c6afc3ffb707.1702593901.git.dxu@dxuuu.xyzSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      8f0ec8c6
    • Yonghong Song's avatar
      selftests/bpf: Remove flaky test_btf_id test · 56925f38
      Yonghong Song authored
      With previous patch, one of subtests in test_btf_id becomes
      flaky and may fail. The following is a failing example:
      
        Error: #26 btf
        Error: #26/174 btf/BTF ID
          Error: #26/174 btf/BTF ID
          btf_raw_create:PASS:check 0 nsec
          btf_raw_create:PASS:check 0 nsec
          test_btf_id:PASS:check 0 nsec
          ...
          test_btf_id:PASS:check 0 nsec
          test_btf_id:FAIL:check BTF lingersdo_test_get_info:FAIL:check failed: -1
      
      The test tries to prove a btf_id not available after the map is closed.
      But btf_id is freed only after workqueue and a rcu grace period, compared
      to previous case just after a rcu grade period.
      Depending on system workload, workqueue could take quite some time
      to execute function bpf_map_free_deferred() which may cause the test failure.
      Instead of adding arbitrary delays, let us remove the logic to
      check btf_id availability after map is closed.
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20231214203820.1469402-1-yonghong.song@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      56925f38
    • Yonghong Song's avatar
      bpf: Fix a race condition between btf_put() and map_free() · 59e5791f
      Yonghong Song authored
      When running `./test_progs -j` in my local vm with latest kernel,
      I once hit a kasan error like below:
      
        [ 1887.184724] BUG: KASAN: slab-use-after-free in bpf_rb_root_free+0x1f8/0x2b0
        [ 1887.185599] Read of size 4 at addr ffff888106806910 by task kworker/u12:2/2830
        [ 1887.186498]
        [ 1887.186712] CPU: 3 PID: 2830 Comm: kworker/u12:2 Tainted: G           OEL     6.7.0-rc3-00699-g90679706-dirty #494
        [ 1887.188034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
        [ 1887.189618] Workqueue: events_unbound bpf_map_free_deferred
        [ 1887.190341] Call Trace:
        [ 1887.190666]  <TASK>
        [ 1887.190949]  dump_stack_lvl+0xac/0xe0
        [ 1887.191423]  ? nf_tcp_handle_invalid+0x1b0/0x1b0
        [ 1887.192019]  ? panic+0x3c0/0x3c0
        [ 1887.192449]  print_report+0x14f/0x720
        [ 1887.192930]  ? preempt_count_sub+0x1c/0xd0
        [ 1887.193459]  ? __virt_addr_valid+0xac/0x120
        [ 1887.194004]  ? bpf_rb_root_free+0x1f8/0x2b0
        [ 1887.194572]  kasan_report+0xc3/0x100
        [ 1887.195085]  ? bpf_rb_root_free+0x1f8/0x2b0
        [ 1887.195668]  bpf_rb_root_free+0x1f8/0x2b0
        [ 1887.196183]  ? __bpf_obj_drop_impl+0xb0/0xb0
        [ 1887.196736]  ? preempt_count_sub+0x1c/0xd0
        [ 1887.197270]  ? preempt_count_sub+0x1c/0xd0
        [ 1887.197802]  ? _raw_spin_unlock+0x1f/0x40
        [ 1887.198319]  bpf_obj_free_fields+0x1d4/0x260
        [ 1887.198883]  array_map_free+0x1a3/0x260
        [ 1887.199380]  bpf_map_free_deferred+0x7b/0xe0
        [ 1887.199943]  process_scheduled_works+0x3a2/0x6c0
        [ 1887.200549]  worker_thread+0x633/0x890
        [ 1887.201047]  ? __kthread_parkme+0xd7/0xf0
        [ 1887.201574]  ? kthread+0x102/0x1d0
        [ 1887.202020]  kthread+0x1ab/0x1d0
        [ 1887.202447]  ? pr_cont_work+0x270/0x270
        [ 1887.202954]  ? kthread_blkcg+0x50/0x50
        [ 1887.203444]  ret_from_fork+0x34/0x50
        [ 1887.203914]  ? kthread_blkcg+0x50/0x50
        [ 1887.204397]  ret_from_fork_asm+0x11/0x20
        [ 1887.204913]  </TASK>
        [ 1887.204913]  </TASK>
        [ 1887.205209]
        [ 1887.205416] Allocated by task 2197:
        [ 1887.205881]  kasan_set_track+0x3f/0x60
        [ 1887.206366]  __kasan_kmalloc+0x6e/0x80
        [ 1887.206856]  __kmalloc+0xac/0x1a0
        [ 1887.207293]  btf_parse_fields+0xa15/0x1480
        [ 1887.207836]  btf_parse_struct_metas+0x566/0x670
        [ 1887.208387]  btf_new_fd+0x294/0x4d0
        [ 1887.208851]  __sys_bpf+0x4ba/0x600
        [ 1887.209292]  __x64_sys_bpf+0x41/0x50
        [ 1887.209762]  do_syscall_64+0x4c/0xf0
        [ 1887.210222]  entry_SYSCALL_64_after_hwframe+0x63/0x6b
        [ 1887.210868]
        [ 1887.211074] Freed by task 36:
        [ 1887.211460]  kasan_set_track+0x3f/0x60
        [ 1887.211951]  kasan_save_free_info+0x28/0x40
        [ 1887.212485]  ____kasan_slab_free+0x101/0x180
        [ 1887.213027]  __kmem_cache_free+0xe4/0x210
        [ 1887.213514]  btf_free+0x5b/0x130
        [ 1887.213918]  rcu_core+0x638/0xcc0
        [ 1887.214347]  __do_softirq+0x114/0x37e
      
      The error happens at bpf_rb_root_free+0x1f8/0x2b0:
      
        00000000000034c0 <bpf_rb_root_free>:
        ; {
          34c0: f3 0f 1e fa                   endbr64
          34c4: e8 00 00 00 00                callq   0x34c9 <bpf_rb_root_free+0x9>
          34c9: 55                            pushq   %rbp
          34ca: 48 89 e5                      movq    %rsp, %rbp
        ...
        ;       if (rec && rec->refcount_off >= 0 &&
          36aa: 4d 85 ed                      testq   %r13, %r13
          36ad: 74 a9                         je      0x3658 <bpf_rb_root_free+0x198>
          36af: 49 8d 7d 10                   leaq    0x10(%r13), %rdi
          36b3: e8 00 00 00 00                callq   0x36b8 <bpf_rb_root_free+0x1f8>
                                              <==== kasan function
          36b8: 45 8b 7d 10                   movl    0x10(%r13), %r15d
                                              <==== use-after-free load
          36bc: 45 85 ff                      testl   %r15d, %r15d
          36bf: 78 8c                         js      0x364d <bpf_rb_root_free+0x18d>
      
      So the problem is at rec->refcount_off in the above.
      
      I did some source code analysis and find the reason.
                                        CPU A                        CPU B
        bpf_map_put:
          ...
          btf_put with rcu callback
          ...
          bpf_map_free_deferred
            with system_unbound_wq
          ...                          ...                           ...
          ...                          btf_free_rcu:                 ...
          ...                          ...                           bpf_map_free_deferred:
          ...                          ...
          ...         --------->       btf_struct_metas_free()
          ...         | race condition ...
          ...         --------->                                     map->ops->map_free()
          ...
          ...                          btf->struct_meta_tab = NULL
      
      In the above, map_free() corresponds to array_map_free() and eventually
      calling bpf_rb_root_free() which calls:
        ...
        __bpf_obj_drop_impl(obj, field->graph_root.value_rec, false);
        ...
      
      Here, 'value_rec' is assigned in btf_check_and_fixup_fields() with following code:
      
        meta = btf_find_struct_meta(btf, btf_id);
        if (!meta)
          return -EFAULT;
        rec->fields[i].graph_root.value_rec = meta->record;
      
      So basically, 'value_rec' is a pointer to the record in struct_metas_tab.
      And it is possible that that particular record has been freed by
      btf_struct_metas_free() and hence we have a kasan error here.
      
      Actually it is very hard to reproduce the failure with current bpf/bpf-next
      code, I only got the above error once. To increase reproducibility, I added
      a delay in bpf_map_free_deferred() to delay map->ops->map_free(), which
      significantly increased reproducibility.
      
        diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
        index 5e43ddd1b83f..aae5b5213e93 100644
        --- a/kernel/bpf/syscall.c
        +++ b/kernel/bpf/syscall.c
        @@ -695,6 +695,7 @@ static void bpf_map_free_deferred(struct work_struct *work)
              struct bpf_map *map = container_of(work, struct bpf_map, work);
              struct btf_record *rec = map->record;
      
        +     mdelay(100);
              security_bpf_map_free(map);
              bpf_map_release_memcg(map);
              /* implementation dependent freeing */
      
      Hao also provided test cases ([1]) for easily reproducing the above issue.
      
      There are two ways to fix the issue, the v1 of the patch ([2]) moving
      btf_put() after map_free callback, and the v5 of the patch ([3]) using
      a kptr style fix which tries to get a btf reference during
      map_check_btf(). Each approach has its pro and cons. The first approach
      delays freeing btf while the second approach needs to acquire reference
      depending on context which makes logic not very elegant and may
      complicate things with future new data structures. Alexei
      suggested in [4] going back to v1 which is what this patch
      tries to do.
      
      Rerun './test_progs -j' with the above mdelay() hack for a couple
      of times and didn't observe the error for the above rb_root test cases.
      Running Hou's test ([1]) is also successful.
      
        [1] https://lore.kernel.org/bpf/20231207141500.917136-1-houtao@huaweicloud.com/
        [2] v1: https://lore.kernel.org/bpf/20231204173946.3066377-1-yonghong.song@linux.dev/
        [3] v5: https://lore.kernel.org/bpf/20231208041621.2968241-1-yonghong.song@linux.dev/
        [4] v4: https://lore.kernel.org/bpf/CAADnVQJ3FiXUhZJwX_81sjZvSYYKCFB3BT6P8D59RS2Gu+0Z7g@mail.gmail.com/
      
      Cc: Hou Tao <houtao@huaweicloud.com>
      Fixes: 958cf2e2 ("bpf: Introduce bpf_obj_new")
      Signed-off-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/r/20231214203815.1469107-1-yonghong.song@linux.devSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      59e5791f
  2. 14 Dec, 2023 26 commits
  3. 13 Dec, 2023 7 commits
    • Alexei Starovoitov's avatar
      Merge branch 'bpf-token-support-in-libbpf-s-bpf-object' · 73376328
      Alexei Starovoitov authored
      Andrii Nakryiko says:
      
      ====================
      BPF token support in libbpf's BPF object
      
      Add fuller support for BPF token in high-level BPF object APIs. This is the
      most frequently used way to work with BPF using libbpf, so supporting BPF
      token there is critical.
      
      Patch #1 is improving kernel-side BPF_TOKEN_CREATE behavior by rejecting to
      create "empty" BPF token with no delegation. This seems like saner behavior
      which also makes libbpf's caching better overall. If we ever want to create
      BPF token with no delegate_xxx options set on BPF FS, we can use a new flag to
      enable that.
      
      Patches #2-#5 refactor libbpf internals, mostly feature detection code, to
      prepare it from BPF token FD.
      
      Patch #6 adds options to pass BPF token into BPF object open options. It also
      adds implicit BPF token creation logic to BPF object load step, even without
      any explicit involvement of the user. If the environment is setup properly,
      BPF token will be created transparently and used implicitly. This allows for
      all existing application to gain BPF token support by just linking with
      latest version of libbpf library. No source code modifications are required.
      All that under assumption that privileged container management agent properly
      set up default BPF FS instance at /sys/bpf/fs to allow BPF token creation.
      
      Patches #7-#8 adds more selftests, validating BPF object APIs work as expected
      under unprivileged user namespaced conditions in the presence of BPF token.
      
      Patch #9 extends libbpf with LIBBPF_BPF_TOKEN_PATH envvar knowledge, which can
      be used to override custom BPF FS location used for implicit BPF token
      creation logic without needing to adjust application code. This allows admins
      or container managers to mount BPF token-enabled BPF FS at non-standard
      location without the need to coordinate with applications.
      LIBBPF_BPF_TOKEN_PATH can also be used to disable BPF token implicit creation
      by setting it to an empty value. Patch #10 tests this new envvar functionality.
      
      v2->v3:
        - move some stray feature cache refactorings into patch #4 (Alexei);
        - add LIBBPF_BPF_TOKEN_PATH envvar support (Alexei);
      v1->v2:
        - remove minor code redundancies (Eduard, John);
        - add acks and rebase.
      ====================
      
      Link: https://lore.kernel.org/r/20231213190842.3844987-1-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      73376328
    • Andrii Nakryiko's avatar
      selftests/bpf: add tests for LIBBPF_BPF_TOKEN_PATH envvar · 322122bf
      Andrii Nakryiko authored
      Add new subtest validating LIBBPF_BPF_TOKEN_PATH envvar semantics.
      Extend existing test to validate that LIBBPF_BPF_TOKEN_PATH allows to
      disable implicit BPF token creation by setting envvar to empty string.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20231213190842.3844987-11-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      322122bf
    • Andrii Nakryiko's avatar
      libbpf: support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar · ed54124b
      Andrii Nakryiko authored
      To allow external admin authority to override default BPF FS location
      (/sys/fs/bpf) for implicit BPF token creation, teach libbpf to recognize
      LIBBPF_BPF_TOKEN_PATH envvar. If it is specified and user application
      didn't explicitly specify neither bpf_token_path nor bpf_token_fd
      option, it will be treated exactly like bpf_token_path option,
      overriding default /sys/fs/bpf location and making BPF token mandatory.
      Suggested-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20231213190842.3844987-10-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      ed54124b
    • Andrii Nakryiko's avatar
      selftests/bpf: add tests for BPF object load with implicit token · 18678cf0
      Andrii Nakryiko authored
      Add a test to validate libbpf's implicit BPF token creation from default
      BPF FS location (/sys/fs/bpf). Also validate that disabling this
      implicit BPF token creation works.
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20231213190842.3844987-9-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      18678cf0
    • Andrii Nakryiko's avatar
      selftests/bpf: add BPF object loading tests with explicit token passing · 98e0eaa3
      Andrii Nakryiko authored
      Add a few tests that attempt to load BPF object containing privileged
      map, program, and the one requiring mandatory BTF uploading into the
      kernel (to validate token FD propagation to BPF_BTF_LOAD command).
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20231213190842.3844987-8-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      98e0eaa3
    • Andrii Nakryiko's avatar
      libbpf: wire up BPF token support at BPF object level · 1d0dd6ea
      Andrii Nakryiko authored
      Add BPF token support to BPF object-level functionality.
      
      BPF token is supported by BPF object logic either as an explicitly
      provided BPF token from outside (through BPF FS path or explicit BPF
      token FD), or implicitly (unless prevented through
      bpf_object_open_opts).
      
      Implicit mode is assumed to be the most common one for user namespaced
      unprivileged workloads. The assumption is that privileged container
      manager sets up default BPF FS mount point at /sys/fs/bpf with BPF token
      delegation options (delegate_{cmds,maps,progs,attachs} mount options).
      BPF object during loading will attempt to create BPF token from
      /sys/fs/bpf location, and pass it for all relevant operations
      (currently, map creation, BTF load, and program load).
      
      In this implicit mode, if BPF token creation fails due to whatever
      reason (BPF FS is not mounted, or kernel doesn't support BPF token,
      etc), this is not considered an error. BPF object loading sequence will
      proceed with no BPF token.
      
      In explicit BPF token mode, user provides explicitly either custom BPF
      FS mount point path or creates BPF token on their own and just passes
      token FD directly. In such case, BPF object will either dup() token FD
      (to not require caller to hold onto it for entire duration of BPF object
      lifetime) or will attempt to create BPF token from provided BPF FS
      location. If BPF token creation fails, that is considered a critical
      error and BPF object load fails with an error.
      
      Libbpf provides a way to disable implicit BPF token creation, if it
      causes any troubles (BPF token is designed to be completely optional and
      shouldn't cause any problems even if provided, but in the world of BPF
      LSM, custom security logic can be installed that might change outcome
      dependin on the presence of BPF token). To disable libbpf's default BPF
      token creation behavior user should provide either invalid BPF token FD
      (negative), or empty bpf_token_path option.
      
      BPF token presence can influence libbpf's feature probing, so if BPF
      object has associated BPF token, feature probing is instructed to use
      BPF object-specific feature detection cache and token FD.
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20231213190842.3844987-7-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      1d0dd6ea
    • Andrii Nakryiko's avatar
      libbpf: wire up token_fd into feature probing logic · a75bb6a1
      Andrii Nakryiko authored
      Adjust feature probing callbacks to take into account optional token_fd.
      In unprivileged contexts, some feature detectors would fail to detect
      kernel support just because BPF program, BPF map, or BTF object can't be
      loaded due to privileged nature of those operations. So when BPF object
      is loaded with BPF token, this token should be used for feature probing.
      
      This patch is setting support for this scenario, but we don't yet pass
      non-zero token FD. This will be added in the next patch.
      
      We also switched BPF cookie detector from using kprobe program to
      tracepoint one, as tracepoint is somewhat less dangerous BPF program
      type and has higher likelihood of being allowed through BPF token in the
      future. This change has no effect on detection behavior.
      Acked-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20231213190842.3844987-6-andrii@kernel.orgSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a75bb6a1