• John Fastabend's avatar
    bpf, sockmap: RCU dereferenced psock may be used outside RCU block · 8025751d
    John Fastabend authored
    If an ingress verdict program specifies message sizes greater than
    skb->len and there is an ENOMEM error due to memory pressure we
    may call the rcv_msg handler outside the strp_data_ready() caller
    context. This is because on an ENOMEM error the strparser will
    retry from a workqueue. The caller currently protects the use of
    psock by calling the strp_data_ready() inside a rcu_read_lock/unlock
    block.
    
    But, in above workqueue error case the psock is accessed outside
    the read_lock/unlock block of the caller. So instead of using
    psock directly we must do a look up against the sk again to
    ensure the psock is available.
    
    There is an an ugly piece here where we must handle
    the case where we paused the strp and removed the psock. On
    psock removal we first pause the strparser and then remove
    the psock. If the strparser is paused while an skb is
    scheduled on the workqueue the skb will be dropped on the
    flow and kfree_skb() is called. If the workqueue manages
    to get called before we pause the strparser but runs the rcvmsg
    callback after the psock is removed we will hit the unlikely
    case where we run the sockmap rcvmsg handler but do not have
    a psock. For now we will follow strparser logic and drop the
    skb on the floor with skb_kfree(). This is ugly because the
    data is dropped. To date this has not caused problems in practice
    because either the application controlling the sockmap is
    coordinating with the datapath so that skbs are "flushed"
    before removal or we simply wait for the sock to be closed before
    removing it.
    
    This patch fixes the describe RCU bug and dropping the skb doesn't
    make things worse. Future patches will improve this by allowing
    the normal case where skbs are not merged to skip the strparser
    altogether. In practice many (most?) use cases have no need to
    merge skbs so its both a code complexity hit as seen above and
    a performance issue. For example, in the Cilium case we always
    set the strparser up to return sbks 1:1 without any merging and
    have avoided above issues.
    
    Fixes: e91de6af ("bpf: Fix running sk_skb program types with ktls")
    Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
    Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
    Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
    Link: https://lore.kernel.org/bpf/159312679888.18340.15248924071966273998.stgit@john-XPS-13-9370
    8025751d
skmsg.c 20.2 KB