• Kui-Feng Lee's avatar
    bpf: Retire the struct_ops map kvalue->refcnt. · b671c206
    Kui-Feng Lee authored
    We have replaced kvalue-refcnt with synchronize_rcu() to wait for an
    RCU grace period.
    
    Maintenance of kvalue->refcnt was a complicated task, as we had to
    simultaneously keep track of two reference counts: one for the
    reference count of bpf_map. When the kvalue->refcnt reaches zero, we
    also have to reduce the reference count on bpf_map - yet these steps
    are not performed in an atomic manner and require us to be vigilant
    when managing them. By eliminating kvalue->refcnt, we can make our
    maintenance more straightforward as the refcount of bpf_map is now
    solely managed!
    
    To prevent the trampoline image of a struct_ops from being released
    while it is still in use, we wait for an RCU grace period. The
    setsockopt(TCP_CONGESTION, "...") command allows you to change your
    socket's congestion control algorithm and can result in releasing the
    old struct_ops implementation. It is fine. However, this function is
    exposed through bpf_setsockopt(), it may be accessed by BPF programs
    as well. To ensure that the trampoline image belonging to struct_op
    can be safely called while its method is in use, the trampoline
    safeguarde the BPF program with rcu_read_lock(). Doing so prevents any
    destruction of the associated images before returning from a
    trampoline and requires us to wait for an RCU grace period.
    Signed-off-by: default avatarKui-Feng Lee <kuifeng@meta.com>
    Link: https://lore.kernel.org/r/20230323032405.3735486-2-kuifeng@meta.comSigned-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
    b671c206
syscall.c 129 KB