• David Vernet's avatar
    bpf: Remove KF_KPTR_GET kfunc flag · 7b4ddf39
    David Vernet authored
    We've managed to improve the UX for kptrs significantly over the last 9
    months. All of the existing use cases which previously had KF_KPTR_GET
    kfuncs (struct bpf_cpumask *, struct task_struct *, and struct cgroup *)
    have all been updated to be synchronized using RCU. In other words,
    their KF_KPTR_GET kfuncs have been removed in favor of KF_RCU |
    KF_ACQUIRE kfuncs, with the pointers themselves also being readable from
    maps in an RCU read region thanks to the types being RCU safe.
    
    While KF_KPTR_GET was a logical starting point for kptrs, it's become
    clear that they're not the correct abstraction. KF_KPTR_GET is a flag
    that essentially does nothing other than enforcing that the argument to
    a function is a pointer to a referenced kptr map value. At first glance,
    that's a useful thing to guarantee to a kfunc. It gives kfuncs the
    ability to try and acquire a reference on that kptr without requiring
    the BPF prog to do something like this:
    
    struct kptr_type *in_map, *new = NULL;
    
    in_map = bpf_kptr_xchg(&map->value, NULL);
    if (in_map) {
            new = bpf_kptr_type_acquire(in_map);
            in_map = bpf_kptr_xchg(&map->value, in_map);
            if (in_map)
                    bpf_kptr_type_release(in_map);
    }
    
    That's clearly a pretty ugly (and racy) UX, and if using KF_KPTR_GET is
    the only alternative, it's better than nothing. However, the problem
    with any KF_KPTR_GET kfunc lies in the fact that it always requires some
    kind of synchronization in order to safely do an opportunistic acquire
    of the kptr in the map. This is because a BPF program running on another
    CPU could do a bpf_kptr_xchg() on that map value, and free the kptr
    after it's been read by the KF_KPTR_GET kfunc. For example, the
    now-removed bpf_task_kptr_get() kfunc did the following:
    
    struct task_struct *bpf_task_kptr_get(struct task_struct **pp)
    {
                struct task_struct *p;
    
            rcu_read_lock();
            p = READ_ONCE(*pp);
            /* If p is non-NULL, it could still be freed by another CPU,
             * so we have to do an opportunistic refcount_inc_not_zero()
             * and return NULL if the task will be freed after the
             * current RCU read region.
             */
            |f (p && !refcount_inc_not_zero(&p->rcu_users))
                    p = NULL;
            rcu_read_unlock();
    
            return p;
    }
    
    In other words, the kfunc uses RCU to ensure that the task remains valid
    after it's been peeked from the map. However, this is completely
    redundant with just defining a KF_RCU kfunc that itself does a
    refcount_inc_not_zero(), which is exactly what bpf_task_acquire() now
    does.
    
    So, the question of whether KF_KPTR_GET is useful is actually, "Are
    there any synchronization mechanisms / safety flags that are required by
    certain kptrs, but which are not provided by the verifier to kfuncs?"
    The answer to that question today is "No", because every kptr we
    currently care about is RCU protected.
    
    Even if the answer ever became "yes", the proper way to support that
    referenced kptr type would be to add support for whatever
    synchronization mechanism it requires in the verifier, rather than
    giving kfuncs a flag that says, "Here's a pointer to a referenced kptr
    in a map, do whatever you need to do."
    
    With all that said -- so as to allow us to consolidate the kfunc API,
    and simplify the verifier a bit, this patch removes KF_KPTR_GET, and all
    relevant logic from the verifier.
    Signed-off-by: default avatarDavid Vernet <void@manifault.com>
    Link: https://lore.kernel.org/r/20230416084928.326135-3-void@manifault.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    7b4ddf39
verifier.c 554 KB