Commit dda77040 authored by Martin KaFai Lau's avatar Martin KaFai Lau

Merge branch 'Update and document struct_ops'

David Vernet says:

====================
The struct bpf_struct_ops structure in BPF is a framework that allows
subsystems to extend themselves using BPF. In commit 68b04864
("bpf: Create links for BPF struct_ops maps") and commit aef56f2e
("bpf: Update the struct_ops of a bpf_link"), the structure was updated
to include new ->validate() and ->update() callbacks respectively in
support of allowing struct_ops maps to be created with BPF_F_LINK.

The intention was that struct bpf_struct_ops implementations could
support map updates through the link. Because map validation and
registration would take place in two separate steps for struct_ops
maps managed by the link (the first in map update elem, and the latter
in link create), the ->validate() callback was added, and any struct_ops
implementation that wished to use BPF_F_LINK, even just for lifetime
management, would then be required to define both it and ->update().

Not all struct_ops implementations can or will support update, however.
For example, the sched_ext struct_ops implementation proposed in [0]
will not be able to support atomic map updates because it can race with
sysrq, has to cycle tasks through various states in order to safely
transition, etc. It can, however, benefit from letting the BPF link
automatically evict the struc_ops map when the application exits (e.g.
if it crashes).

This patch set therefore:

1. Updates the struct_ops implementation to support default values for
   ->validate() and ->update() so that struct_ops implementations can
   benefit from BPF_F_LINK management even if they can't support
   updates.
2. Documents struct bpf_struct_ops so that the semantics are clear and
   well defined.
---
v2: https://lore.kernel.org/bpf/0f5ea3de-c6e7-490f-b5ec-b5c7cd288687@gmail.com/T/
Changes from v2 -> v3:
- Add patch 2/2 that documents the struct bpf_struct_ops structure.
- Add Kui-Feng's Acked-by tag to patch 1/2.

v1: https://lore.kernel.org/lkml/20230811150934.GA542801@maniforge/
Changes from v1 -> v2:
- Move the if (!st_map->st_ops->update) check outside of the critical
  section before we acquire the update_mutex.
====================
Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
parents ccd9a8be bb48cf16
......@@ -1550,6 +1550,53 @@ struct bpf_struct_ops_value;
struct btf_member;
#define BPF_STRUCT_OPS_MAX_NR_MEMBERS 64
/**
* struct bpf_struct_ops - A structure of callbacks allowing a subsystem to
* define a BPF_MAP_TYPE_STRUCT_OPS map type composed
* of BPF_PROG_TYPE_STRUCT_OPS progs.
* @verifier_ops: A structure of callbacks that are invoked by the verifier
* when determining whether the struct_ops progs in the
* struct_ops map are valid.
* @init: A callback that is invoked a single time, and before any other
* callback, to initialize the structure. A nonzero return value means
* the subsystem could not be initialized.
* @check_member: When defined, a callback invoked by the verifier to allow
* the subsystem to determine if an entry in the struct_ops map
* is valid. A nonzero return value means that the map is
* invalid and should be rejected by the verifier.
* @init_member: A callback that is invoked for each member of the struct_ops
* map to allow the subsystem to initialize the member. A nonzero
* value means the member could not be initialized. This callback
* is exclusive with the @type, @type_id, @value_type, and
* @value_id fields.
* @reg: A callback that is invoked when the struct_ops map has been
* initialized and is being attached to. Zero means the struct_ops map
* has been successfully registered and is live. A nonzero return value
* means the struct_ops map could not be registered.
* @unreg: A callback that is invoked when the struct_ops map should be
* unregistered.
* @update: A callback that is invoked when the live struct_ops map is being
* updated to contain new values. This callback is only invoked when
* the struct_ops map is loaded with BPF_F_LINK. If not defined, the
* it is assumed that the struct_ops map cannot be updated.
* @validate: A callback that is invoked after all of the members have been
* initialized. This callback should perform static checks on the
* map, meaning that it should either fail or succeed
* deterministically. A struct_ops map that has been validated may
* not necessarily succeed in being registered if the call to @reg
* fails. For example, a valid struct_ops map may be loaded, but
* then fail to be registered due to there being another active
* struct_ops map on the system in the subsystem already. For this
* reason, if this callback is not defined, the check is skipped as
* the struct_ops map will have final verification performed in
* @reg.
* @type: BTF type.
* @value_type: Value type.
* @name: The name of the struct bpf_struct_ops object.
* @func_models: Func models
* @type_id: BTF type id.
* @value_id: BTF value id.
*/
struct bpf_struct_ops {
const struct bpf_verifier_ops *verifier_ops;
int (*init)(struct btf *btf);
......
......@@ -509,9 +509,12 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
}
if (st_map->map.map_flags & BPF_F_LINK) {
err = st_ops->validate(kdata);
if (err)
goto reset_unlock;
err = 0;
if (st_ops->validate) {
err = st_ops->validate(kdata);
if (err)
goto reset_unlock;
}
set_memory_rox((long)st_map->image, 1);
/* Let bpf_link handle registration & unregistration.
*
......@@ -663,9 +666,6 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr)
if (attr->value_size != vt->size)
return ERR_PTR(-EINVAL);
if (attr->map_flags & BPF_F_LINK && (!st_ops->validate || !st_ops->update))
return ERR_PTR(-EOPNOTSUPP);
t = st_ops->type;
st_map_size = sizeof(*st_map) +
......@@ -823,6 +823,9 @@ static int bpf_struct_ops_map_link_update(struct bpf_link *link, struct bpf_map
if (!bpf_struct_ops_valid_to_reg(new_map))
return -EINVAL;
if (!st_map->st_ops->update)
return -EOPNOTSUPP;
mutex_lock(&update_mutex);
old_map = rcu_dereference_protected(st_link->map, lockdep_is_held(&update_mutex));
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment