Merge branch 'bpf: refine kernel.unprivileged_bpf_disabled behaviour'

Alan Maguire says: ==================== Unprivileged BPF disabled (kernel.unprivileged_bpf_disabled >= 1) is the default in most cases now; when set, the BPF system call is blocked for users without CAP_BPF/CAP_SYS_ADMIN. In some cases however, it makes sense to split activities between capability-requiring ones - such as program load/attach - and those that might not require capabilities such as reading perf/ringbuf events, reading or updating BPF map configuration etc. One example of this sort of approach is a service that loads a BPF program, and a user-space program that interacts with it. Here - rather than blocking all BPF syscall commands - unprivileged BPF disabled blocks the key object-creating commands (prog load, map load). Discussion has alluded to this idea in the past [1], and Alexei mentioned it was also discussed at LSF/MM/BPF this year. Changes since v3 [2]: - added acks to patch 1 - CI was failing on Ubuntu; I suspect the issue was an old capability.h file which specified CAP_LAST_CAP as < CAP_BPF, leading to the logic disabling all caps not disabling CAP_BPF. Use CAP_BPF as basis for "all caps" bitmap instead as we explicitly define it in cap_helpers.h if not already found in capabilities.h - made global variables arguments to subtests instead (Andrii, patch 2) Changes since v2 [3]: - added acks from Yonghong - clang compilation issue in selftest with bpf_prog_query() (Alexei, patch 2) - disable all capabilities for test (Yonghong, patch 2) - add assertions that size of perf/ringbuf data matches expectations (Yonghong, patch 2) - add map array size definition, remove unneeded whitespace (Yonghong, patch 2) Changes since RFC [4]: - widened scope of commands unprivileged BPF disabled allows (Alexei, patch 1) - removed restrictions on map types for lookup, update, delete (Alexei, patch 1) - removed kernel CONFIG parameter controlling unprivileged bpf disabled change (Alexei, patch 1) - widened test scope to cover most BPF syscall commands, with positive and negative subtests [1] https://lore.kernel.org/bpf/CAADnVQLTBhCTAx1a_nev7CgMZxv1Bb7ecz1AFRin8tHmjPREJA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/1652880861-27373-1-git-send-email-alan.maguire@oracle.com/T/ [3] https://lore.kernel.org/bpf/1652788780-25520-1-git-send-email-alan.maguire@oracle.com/T/#t [4] https://lore.kernel.org/bpf/20220511163604.5kuczj6jx3ec5qv6@MBP-98dd607d3435.dhcp.thefacebook.com/T/#mae65f35a193279e718f37686da636094d69b96ee ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf: refine kernel.unprivileged_bpf_disabled behaviour'
Alan Maguire says: ==================== Unprivileged BPF disabled (kernel.unprivileged_bpf_disabled >= 1) is the default in most cases now; when set, the BPF system call is blocked for users without CAP_BPF/CAP_SYS_ADMIN. In some cases however, it makes sense to split activities between capability-requiring ones - such as program load/attach - and those that might not require capabilities such as reading perf/ringbuf events, reading or updating BPF map configuration etc. One example of this sort of approach is a service that loads a BPF program, and a user-space program that interacts with it. Here - rather than blocking all BPF syscall commands - unprivileged BPF disabled blocks the key object-creating commands (prog load, map load). Discussion has alluded to this idea in the past [1], and Alexei mentioned it was also discussed at LSF/MM/BPF this year. Changes since v3 [2]: - added acks to patch 1 - CI was failing on Ubuntu; I suspect the issue was an old capability.h file which specified CAP_LAST_CAP as < CAP_BPF, leading to the logic disabling all caps not disabling CAP_BPF. Use CAP_BPF as basis for "all caps" bitmap instead as we explicitly define it in cap_helpers.h if not already found in capabilities.h - made global variables arguments to subtests instead (Andrii, patch 2) Changes since v2 [3]: - added acks from Yonghong - clang compilation issue in selftest with bpf_prog_query() (Alexei, patch 2) - disable all capabilities for test (Yonghong, patch 2) - add assertions that size of perf/ringbuf data matches expectations (Yonghong, patch 2) - add map array size definition, remove unneeded whitespace (Yonghong, patch 2) Changes since RFC [4]: - widened scope of commands unprivileged BPF disabled allows (Alexei, patch 1) - removed restrictions on map types for lookup, update, delete (Alexei, patch 1) - removed kernel CONFIG parameter controlling unprivileged bpf disabled change (Alexei, patch 1) - widened test scope to cover most BPF syscall commands, with positive and negative subtests [1] https://lore.kernel.org/bpf/CAADnVQLTBhCTAx1a_nev7CgMZxv1Bb7ecz1AFRin8tHmjPREJA@mail.gmail.com/ [2] https://lore.kernel.org/bpf/1652880861-27373-1-git-send-email-alan.maguire@oracle.com/T/ [3] https://lore.kernel.org/bpf/1652788780-25520-1-git-send-email-alan.maguire@oracle.com/T/#t [4] https://lore.kernel.org/bpf/20220511163604.5kuczj6jx3ec5qv6@MBP-98dd607d3435.dhcp.thefacebook.com/T/#mae65f35a193279e718f37686da636094d69b96ee ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>
c272e259 · Alexei Starovoitov · 97949767 · 90a039fd · c272e259 · c272e259
Commit c272e259 authored May 20, 2022 by Alexei Starovoitov
3 changed files
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4863,9 +4863,21 @@ static int bpf_prog_bind_map(union bpf_attr *attr)
 static int __sys_bpf(int cmd, bpfptr_t uattr, unsigned int size)
 {
 	union bpf_attr attr;
+	bool capable;
 	int err;

-	if (sysctl_unprivileged_bpf_disabled && !bpf_capable())
+	capable = bpf_capable() || !sysctl_unprivileged_bpf_disabled;
+
+	/* Intent here is for unprivileged_bpf_disabled to block key object
+	 * creation commands for unprivileged users; other actions depend
+	 * of fd availability and access to bpffs, so are dependent on
+	 * object creation success.  Capabilities are later verified for
+	 * operations such as load and map create, so even with unprivileged
+	 * BPF disabled, capability checks are still carried out for these
+	 * and other operations.
+	 */
+	if (!capable &&
+	    (cmd == BPF_MAP_CREATE || cmd == BPF_PROG_LOAD))
 		return -EPERM;

 	err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);

--- a/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
+++ b/tools/testing/selftests/bpf/prog_tests/unpriv_bpf_disabled.c
--- a/tools/testing/selftests/bpf/progs/test_unpriv_bpf_disabled.c
+++ b/tools/testing/selftests/bpf/progs/test_unpriv_bpf_disabled.c
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2022, Oracle and/or its affiliates. */
+
+#include "vmlinux.h"
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "bpf_misc.h"
+
+__u32 perfbuf_val = 0;
+__u32 ringbuf_val = 0;
+
+int test_pid;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u32);
+} array SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u32);
+} percpu_array SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u32);
+} hash SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERCPU_HASH);
+	__uint(max_entries, 1);
+	__type(key, __u32);
+	__type(value, __u32);
+} percpu_hash SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
+	__type(key, __u32);
+	__type(value, __u32);
+} perfbuf SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_RINGBUF);
+	__uint(max_entries, 1 << 12);
+} ringbuf SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
+	__uint(max_entries, 1);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(__u32));
+} prog_array SEC(".maps");
+
+SEC("fentry/" SYS_PREFIX "sys_nanosleep")
+int sys_nanosleep_enter(void *ctx)
+{
+	int cur_pid;
+
+	cur_pid = bpf_get_current_pid_tgid() >> 32;
+
+	if (cur_pid != test_pid)
+		return 0;
+
+	bpf_perf_event_output(ctx, &perfbuf, BPF_F_CURRENT_CPU, &perfbuf_val, sizeof(perfbuf_val));
+	bpf_ringbuf_output(&ringbuf, &ringbuf_val, sizeof(ringbuf_val), 0);
+
+	return 0;
+}
+
+SEC("perf_event")
+int handle_perf_event(void *ctx)
+{
+	return 0;
+}
+
+char _license[] SEC("license") = "GPL";