• Hou Tao's avatar
    bpf: Defer the free of inner map when necessary · 87667336
    Hou Tao authored
    When updating or deleting an inner map in map array or map htab, the map
    may still be accessed by non-sleepable program or sleepable program.
    However bpf_map_fd_put_ptr() decreases the ref-counter of the inner map
    directly through bpf_map_put(), if the ref-counter is the last one
    (which is true for most cases), the inner map will be freed by
    ops->map_free() in a kworker. But for now, most .map_free() callbacks
    don't use synchronize_rcu() or its variants to wait for the elapse of a
    RCU grace period, so after the invocation of ops->map_free completes,
    the bpf program which is accessing the inner map may incur
    use-after-free problem.
    
    Fix the free of inner map by invoking bpf_map_free_deferred() after both
    one RCU grace period and one tasks trace RCU grace period if the inner
    map has been removed from the outer map before. The deferment is
    accomplished by using call_rcu() or call_rcu_tasks_trace() when
    releasing the last ref-counter of bpf map. The newly-added rcu_head
    field in bpf_map shares the same storage space with work field to
    reduce the size of bpf_map.
    
    Fixes: bba1dc0b ("bpf: Remove redundant synchronize_rcu.")
    Fixes: 638e4b82 ("bpf: Allows per-cpu maps and map-in-map in sleepable programs")
    Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
    Link: https://lore.kernel.org/r/20231204140425.1480317-5-houtao@huaweicloud.comSigned-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    87667336
syscall.c 140 KB