1. 30 Jun, 2022 2 commits
  2. 29 Jun, 2022 13 commits
    • Quentin Monnet's avatar
      bpftool: Probe for memcg-based accounting before bumping rlimit · f0cf642c
      Quentin Monnet authored
      Bpftool used to bump the memlock rlimit to make sure to be able to load
      BPF objects. After the kernel has switched to memcg-based memory
      accounting [0] in 5.11, bpftool has relied on libbpf to probe the system
      for memcg-based accounting support and for raising the rlimit if
      necessary [1]. But this was later reverted, because the probe would
      sometimes fail, resulting in bpftool not being able to load all required
      objects [2].
      
      Here we add a more efficient probe, in bpftool itself. We first lower
      the rlimit to 0, then we attempt to load a BPF object (and finally reset
      the rlimit): if the load succeeds, then memcg-based memory accounting is
      supported.
      
      This approach was earlier proposed for the probe in libbpf itself [3],
      but given that the library may be used in multithreaded applications,
      the probe could have undesirable consequences if one thread attempts to
      lock kernel memory while memlock rlimit is at 0. Since bpftool is
      single-threaded and the rlimit is process-based, this is fine to do in
      bpftool itself.
      
      This probe was inspired by the similar one from the cilium/ebpf Go
      library [4].
      
        [0] commit 97306be4 ("Merge branch 'switch to memcg-based memory accounting'")
        [1] commit a777e18f ("bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK")
        [2] commit 6b4384ff ("Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"")
        [3] https://lore.kernel.org/bpf/20220609143614.97837-1-quentin@isovalent.com/t/#u
        [4] https://github.com/cilium/ebpf/blob/v0.9.0/rlimit/rlimit.go#L39Suggested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Link: https://lore.kernel.org/bpf/20220629111351.47699-1-quentin@isovalent.com
      f0cf642c
    • Alexei Starovoitov's avatar
      Merge branch 'bpf: cgroup_sock lsm flavor' · d17b557e
      Alexei Starovoitov authored
      Stanislav Fomichev says:
      
      ====================
      
      This series implements new lsm flavor for attaching per-cgroup programs to
      existing lsm hooks. The cgroup is taken out of 'current', unless
      the first argument of the hook is 'struct socket'. In this case,
      the cgroup association is taken out of socket. The attachment
      looks like a regular per-cgroup attachment: we add new BPF_LSM_CGROUP
      attach type which, together with attach_btf_id, signals per-cgroup lsm.
      Behind the scenes, we allocate trampoline shim program and
      attach to lsm. This program looks up cgroup from current/socket
      and runs cgroup's effective prog array. The rest of the per-cgroup BPF
      stays the same: hierarchy, local storage, retval conventions
      (return 1 == success).
      
      Current limitations:
      * haven't considered sleepable bpf; can be extended later on
      * not sure the verifier does the right thing with null checks;
        see latest selftest for details
      * total of 10 (global) per-cgroup LSM attach points
      
      v11:
      - Martin: address selftest memory & fd leaks
      - Martin: address moving into root (instead have another temp leaf cgroup)
      - Martin: move tools/include/uapi/linux/bpf.h change from libbpf patch
        into 'sync tools' patch
      
      v10:
      - Martin: reword commit message, drop outdated items
      - Martin: remove rcu_real_lock from __cgroup_bpf_run_lsm_current
      - Martin: remove CONFIG_BPF_LSM from cgroup_bpf_release
      - Martin: fix leaking shim reference in bpf_cgroup_link_release
      - Martin: WARN_ON_ONCE for bpf_trampoline_lookup in bpf_trampoline_unlink_cgroup_shim
      - Martin: sync tools/include/linux/btf_ids.h
      - Martin: move progs/flags closer to the places where they are used in __cgroup_bpf_query
      - Martin: remove sk_clone_security & sctp_bind_connect from bpf_lsm_locked_sockopt_hooks
      - Martin: try to determine vmlinux btf_id in bpftool
      - Martin: update tools header in a separate commit
      - Quentin: do libbpf_find_kernel_btf from the ops that need it
      - lkp@intel.com: another build failure
      
      v9:
      Major change since last version is the switch to bpf_setsockopt to
      change the socket state instead of letting the progs poke socket directly.
      This, in turn, highlights the challenge that we need to care about whether
      the socket is locked or not when we call bpf_setsockopt. (with my original
      example selftest, the hooks are running early in the init phase for this
      not to matter).
      
      For now, I've added two btf id lists:
      * hooks where we know the socket is locked and it's safe to call bpf_setsockopt
      * hooks where we know the socket is _not_ locked, but the hook works on
        the socket that's not yet exposed to userspace so it should be safe
        (for this mode, special new set of bpf_{s,g}etsockopt helpers
         is added; they don't have sock_owned_by_me check)
      
      Going forward, for the rest of the hooks, this might be a good motivation
      to expand lsm cgroup to support sleeping bpf and allow the callers to
      lock/unlock sockets or have a new bpf_setsockopt variant that does the
      locking.
      
      - ifdef around cleanup in cgroup_bpf_release
      - Andrii: a few nits in libbpf patches
      - Martin: remove unused btf_id_set_index
      - Martin: bring back refcnt for cgroup_atype
      - Martin: make __cgroup_bpf_query a bit more readable
      - Martin: expose dst_prog->aux->attach_btf as attach_btf_obj_id as well
      - Martin: reorg check_return_code path for BPF_LSM_CGROUP
      - Martin: return directly from check_helper_call (instead of goto err)
      - Martin: add note to new warning in check_return_code, print only for void hooks
      - Martin: remove confusing shim reuse
      - Martin: use bpf_{s,g}etsockopt instead of poking into socket data
      - Martin: use CONFIG_CGROUP_BPF in bpf_prog_alloc_no_stats/bpf_prog_free_deferred
      
      v8:
      - CI: fix compile issue
      - CI: fix broken bpf_cookie
      - Yonghong: remove __bpf_trampoline_unlink_prog comment
      - Yonghong: move cgroup_atype around to fill the gap
      - Yonghong: make bpf_lsm_find_cgroup_shim void
      - Yonghong: rename regs to args
      - Yonghong: remove if(current) check
      - Martin: move refcnt into bpf_link
      - Martin: move shim management to bpf_link ops
      - Martin: use cgroup_atype for shim only
      - Martin: go back to arrays for managing cgroup_atype(s)
      - Martin: export bpf_obj_id(aux->attach_btf)
      - Andrii: reorder SEC_DEF("lsm_cgroup+")
      - Andrii: OPTS_SET instead of OPTS_HAS
      - Andrii: rename attach_btf_func_id
      - Andrii: move into 1.0 map
      
      v7:
      - there were a lot of comments last time, hope I didn't forget anything,
        some of the bigger ones:
        - Martin: use/extend BTF_SOCK_TYPE_SOCKET
        - Martin: expose bpf_set_retval
        - Martin: reject 'return 0' at the verifier for 'void' hooks
        - Martin: prog_query returns all BPF_LSM_CGROUP, prog_info
          returns attach_btf_func_id
        - Andrii: split libbpf changes
        - Andrii: add field access test to test_progs, not test_verifier (still
          using asm though)
      - things that I haven't addressed, stating them here explicitly, let
        me know if some of these are still problematic:
        1. Andrii: exposing only link-based api: seems like the changes
           to support non-link-based ones are minimal, couple of lines,
           so seems like it worth having it?
        2. Alexei: applying cgroup_atype for all cgroup hooks, not only
           cgroup lsm: looks a bit harder to apply everywhere that I
           originally thought; with lsm cgroup, we have a shim_prog pointer where
           we store cgroup_atype; for non-lsm programs, we don't have a
           trace program where to store it, so we still need some kind
           of global table to map from "static" hook to "dynamic" slot.
           So I'm dropping this "can be easily extended" clause from the
           description for now. I have converted this whole machinery
           to an RCU-managed list to remove synchronize_rcu().
      - also note that I had to introduce new bpf_shim_tramp_link and
        moved refcnt there; we need something to manage new bpf_tramp_link
      
      v6:
      - remove active count & stats for shim program (Martin KaFai Lau)
      - remove NULL/error check for btf_vmlinux (Martin)
      - don't check cgroup_atype in bpf_cgroup_lsm_shim_release (Martin)
      - use old_prog (instead of passed one) in __cgroup_bpf_detach (Martin)
      - make sure attach_btf_id is the same in __cgroup_bpf_replace (Martin)
      - enable cgroup local storage and test it (Martin)
      - properly implement prog query and add bpftool & tests (Martin)
      - prohibit non-shared cgroup storage mode for BPF_LSM_CGROUP (Martin)
      
      v5:
      - __cgroup_bpf_run_lsm_socket remove NULL sock/sk checks (Martin KaFai Lau)
      - __cgroup_bpf_run_lsm_{socket,current} s/prog/shim_prog/ (Martin)
      - make sure bpf_lsm_find_cgroup_shim works for hooks without args (Martin)
      - __cgroup_bpf_attach make sure attach_btf_id is the same when replacing (Martin)
      - call bpf_cgroup_lsm_shim_release only for LSM_CGROUP (Martin)
      - drop BPF_LSM_CGROUP from bpf_attach_type_to_tramp (Martin)
      - drop jited check from cgroup_shim_find (Martin)
      - new patch to convert cgroup_bpf to hlist_node (Jakub Sitnicki)
      - new shim flavor for 'struct sock' + list of exceptions (Martin)
      
      v4:
      - fix build when jit is on but syscall is off
      
      v3:
      - add BPF_LSM_CGROUP to bpftool
      - use simple int instead of refcnt_t (to avoid use-after-free
        false positive)
      
      v2:
      - addressed build bot failures
      ====================
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      d17b557e
    • Stanislav Fomichev's avatar
      selftests/bpf: lsm_cgroup functional test · dca85aac
      Stanislav Fomichev authored
      Functional test that exercises the following:
      
      1. apply default sk_priority policy
      2. permit TX-only AF_PACKET socket
      3. cgroup attach/detach/replace
      4. reusing trampoline shim
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-12-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      dca85aac
    • Stanislav Fomichev's avatar
      bpftool: implement cgroup tree for BPF_LSM_CGROUP · 596f5fb2
      Stanislav Fomichev authored
      $ bpftool --nomount prog loadall $KDIR/tools/testing/selftests/bpf/lsm_cgroup.o /sys/fs/bpf/x
      $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_alloc
      $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_bind
      $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_clone
      $ bpftool cgroup attach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_post_create
      $ bpftool cgroup tree
      CgroupPath
      ID       AttachType      AttachFlags     Name
      /sys/fs/cgroup
      6        lsm_cgroup                      socket_post_create bpf_lsm_socket_post_create
      8        lsm_cgroup                      socket_bind     bpf_lsm_socket_bind
      10       lsm_cgroup                      socket_alloc    bpf_lsm_sk_alloc_security
      11       lsm_cgroup                      socket_clone    bpf_lsm_inet_csk_clone
      
      $ bpftool cgroup detach /sys/fs/cgroup lsm_cgroup pinned /sys/fs/bpf/x/socket_post_create
      $ bpftool cgroup tree
      CgroupPath
      ID       AttachType      AttachFlags     Name
      /sys/fs/cgroup
      8        lsm_cgroup                      socket_bind     bpf_lsm_socket_bind
      10       lsm_cgroup                      socket_alloc    bpf_lsm_sk_alloc_security
      11       lsm_cgroup                      socket_clone    bpf_lsm_inet_csk_clone
      Reviewed-by: default avatarQuentin Monnet <quentin@isovalent.com>
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-11-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      596f5fb2
    • Stanislav Fomichev's avatar
      libbpf: implement bpf_prog_query_opts · a4b2f3cf
      Stanislav Fomichev authored
      Implement bpf_prog_query_opts as a more expendable version of
      bpf_prog_query. Expose new prog_attach_flags and attach_btf_func_id as
      well:
      
      * prog_attach_flags is a per-program attach_type; relevant only for
        lsm cgroup program which might have different attach_flags
        per attach_btf_id
      * attach_btf_func_id is a new field expose for prog_query which
        specifies real btf function id for lsm cgroup attachments
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Acked-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-10-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      a4b2f3cf
    • Stanislav Fomichev's avatar
      bffcf348
    • Stanislav Fomichev's avatar
      tools/bpf: Sync btf_ids.h to tools · 3b34bcb9
      Stanislav Fomichev authored
      Has been slowly getting out of sync, let's update it.
      
      resolve_btfids usage has been updated to match the header changes.
      
      Also bring new parts of tools/include/uapi/linux/bpf.h.
      Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-8-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3b34bcb9
    • Stanislav Fomichev's avatar
      bpf: expose bpf_{g,s}etsockopt to lsm cgroup · 9113d7e4
      Stanislav Fomichev authored
      I don't see how to make it nice without introducing btf id lists
      for the hooks where these helpers are allowed. Some LSM hooks
      work on the locked sockets, some are triggering early and
      don't grab any locks, so have two lists for now:
      
      1. LSM hooks which trigger under socket lock - minority of the hooks,
         but ideal case for us, we can expose existing BTF-based helpers
      2. LSM hooks which trigger without socket lock, but they trigger
         early in the socket creation path where it should be safe to
         do setsockopt without any locks
      3. The rest are prohibited. I'm thinking that this use-case might
         be a good gateway to sleeping lsm cgroup hooks in the future.
         We can either expose lock/unlock operations (and add tracking
         to the verifier) or have another set of bpf_setsockopt
         wrapper that grab the locks and might sleep.
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-7-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      9113d7e4
    • Stanislav Fomichev's avatar
      bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP · b79c9fc9
      Stanislav Fomichev authored
      We have two options:
      1. Treat all BPF_LSM_CGROUP the same, regardless of attach_btf_id
      2. Treat BPF_LSM_CGROUP+attach_btf_id as a separate hook point
      
      I was doing (2) in the original patch, but switching to (1) here:
      
      * bpf_prog_query returns all attached BPF_LSM_CGROUP programs
      regardless of attach_btf_id
      * attach_btf_id is exported via bpf_prog_info
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-6-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      b79c9fc9
    • Stanislav Fomichev's avatar
      bpf: minimize number of allocated lsm slots per program · c0e19f2c
      Stanislav Fomichev authored
      Previous patch adds 1:1 mapping between all 211 LSM hooks
      and bpf_cgroup program array. Instead of reserving a slot per
      possible hook, reserve 10 slots per cgroup for lsm programs.
      Those slots are dynamically allocated on demand and reclaimed.
      
      struct cgroup_bpf {
      	struct bpf_prog_array *    effective[33];        /*     0   264 */
      	/* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */
      	struct hlist_head          progs[33];            /*   264   264 */
      	/* --- cacheline 8 boundary (512 bytes) was 16 bytes ago --- */
      	u8                         flags[33];            /*   528    33 */
      
      	/* XXX 7 bytes hole, try to pack */
      
      	struct list_head           storages;             /*   568    16 */
      	/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
      	struct bpf_prog_array *    inactive;             /*   584     8 */
      	struct percpu_ref          refcnt;               /*   592    16 */
      	struct work_struct         release_work;         /*   608    72 */
      
      	/* size: 680, cachelines: 11, members: 7 */
      	/* sum members: 673, holes: 1, sum holes: 7 */
      	/* last cacheline: 40 bytes */
      };
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-5-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      c0e19f2c
    • Stanislav Fomichev's avatar
      bpf: per-cgroup lsm flavor · 69fd337a
      Stanislav Fomichev authored
      Allow attaching to lsm hooks in the cgroup context.
      
      Attaching to per-cgroup LSM works exactly like attaching
      to other per-cgroup hooks. New BPF_LSM_CGROUP is added
      to trigger new mode; the actual lsm hook we attach to is
      signaled via existing attach_btf_id.
      
      For the hooks that have 'struct socket' or 'struct sock' as its first
      argument, we use the cgroup associated with that socket. For the rest,
      we use 'current' cgroup (this is all on default hierarchy == v2 only).
      Note that for some hooks that work on 'struct sock' we still
      take the cgroup from 'current' because some of them work on the socket
      that hasn't been properly initialized yet.
      
      Behind the scenes, we allocate a shim program that is attached
      to the trampoline and runs cgroup effective BPF programs array.
      This shim has some rudimentary ref counting and can be shared
      between several programs attaching to the same lsm hook from
      different cgroups.
      
      Note that this patch bloats cgroup size because we add 211
      cgroup_bpf_attach_type(s) for simplicity sake. This will be
      addressed in the subsequent patch.
      
      Also note that we only add non-sleepable flavor for now. To enable
      sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu,
      shim programs have to be freed via trace rcu, cgroup_bpf.effective
      should be also trace-rcu-managed + maybe some other changes that
      I'm not aware of.
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-4-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      69fd337a
    • Stanislav Fomichev's avatar
      bpf: convert cgroup_bpf.progs to hlist · 00442143
      Stanislav Fomichev authored
      This lets us reclaim some space to be used by new cgroup lsm slots.
      
      Before:
      struct cgroup_bpf {
      	struct bpf_prog_array *    effective[23];        /*     0   184 */
      	/* --- cacheline 2 boundary (128 bytes) was 56 bytes ago --- */
      	struct list_head           progs[23];            /*   184   368 */
      	/* --- cacheline 8 boundary (512 bytes) was 40 bytes ago --- */
      	u32                        flags[23];            /*   552    92 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	/* --- cacheline 10 boundary (640 bytes) was 8 bytes ago --- */
      	struct list_head           storages;             /*   648    16 */
      	struct bpf_prog_array *    inactive;             /*   664     8 */
      	struct percpu_ref          refcnt;               /*   672    16 */
      	struct work_struct         release_work;         /*   688    32 */
      
      	/* size: 720, cachelines: 12, members: 7 */
      	/* sum members: 716, holes: 1, sum holes: 4 */
      	/* last cacheline: 16 bytes */
      };
      
      After:
      struct cgroup_bpf {
      	struct bpf_prog_array *    effective[23];        /*     0   184 */
      	/* --- cacheline 2 boundary (128 bytes) was 56 bytes ago --- */
      	struct hlist_head          progs[23];            /*   184   184 */
      	/* --- cacheline 5 boundary (320 bytes) was 48 bytes ago --- */
      	u8                         flags[23];            /*   368    23 */
      
      	/* XXX 1 byte hole, try to pack */
      
      	/* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */
      	struct list_head           storages;             /*   392    16 */
      	struct bpf_prog_array *    inactive;             /*   408     8 */
      	struct percpu_ref          refcnt;               /*   416    16 */
      	struct work_struct         release_work;         /*   432    72 */
      
      	/* size: 504, cachelines: 8, members: 7 */
      	/* sum members: 503, holes: 1, sum holes: 1 */
      	/* last cacheline: 56 bytes */
      };
      Suggested-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Reviewed-by: default avatarJakub Sitnicki <jakub@cloudflare.com>
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-3-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      00442143
    • Stanislav Fomichev's avatar
      bpf: add bpf_func_t and trampoline helpers · af3f4134
      Stanislav Fomichev authored
      I'll be adding lsm cgroup specific helpers that grab
      trampoline mutex.
      
      No functional changes.
      Reviewed-by: default avatarMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: default avatarStanislav Fomichev <sdf@google.com>
      Link: https://lore.kernel.org/r/20220628174314.1216643-2-sdf@google.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      af3f4134
  3. 28 Jun, 2022 17 commits
  4. 24 Jun, 2022 7 commits
  5. 23 Jun, 2022 1 commit