Commit 8025751d authored by John Fastabend's avatar John Fastabend Committed by Alexei Starovoitov

bpf, sockmap: RCU dereferenced psock may be used outside RCU block

If an ingress verdict program specifies message sizes greater than
skb->len and there is an ENOMEM error due to memory pressure we
may call the rcv_msg handler outside the strp_data_ready() caller
context. This is because on an ENOMEM error the strparser will
retry from a workqueue. The caller currently protects the use of
psock by calling the strp_data_ready() inside a rcu_read_lock/unlock
block.

But, in above workqueue error case the psock is accessed outside
the read_lock/unlock block of the caller. So instead of using
psock directly we must do a look up against the sk again to
ensure the psock is available.

There is an an ugly piece here where we must handle
the case where we paused the strp and removed the psock. On
psock removal we first pause the strparser and then remove
the psock. If the strparser is paused while an skb is
scheduled on the workqueue the skb will be dropped on the
flow and kfree_skb() is called. If the workqueue manages
to get called before we pause the strparser but runs the rcvmsg
callback after the psock is removed we will hit the unlikely
case where we run the sockmap rcvmsg handler but do not have
a psock. For now we will follow strparser logic and drop the
skb on the floor with skb_kfree(). This is ugly because the
data is dropped. To date this has not caused problems in practice
because either the application controlling the sockmap is
coordinating with the datapath so that skbs are "flushed"
before removal or we simply wait for the sock to be closed before
removing it.

This patch fixes the describe RCU bug and dropping the skb doesn't
make things worse. Future patches will improve this by allowing
the normal case where skbs are not merged to skip the strparser
altogether. In practice many (most?) use cases have no need to
merge skbs so its both a code complexity hit as seen above and
a performance issue. For example, in the Cilium case we always
set the strparser up to return sbks 1:1 without any merging and
have avoided above issues.

Fixes: e91de6af ("bpf: Fix running sk_skb program types with ktls")
Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
Acked-by: default avatarMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/159312679888.18340.15248924071966273998.stgit@john-XPS-13-9370
parent 93dd5f18
...@@ -781,11 +781,18 @@ static void sk_psock_verdict_apply(struct sk_psock *psock, ...@@ -781,11 +781,18 @@ static void sk_psock_verdict_apply(struct sk_psock *psock,
static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
{ {
struct sk_psock *psock = sk_psock_from_strp(strp); struct sk_psock *psock;
struct bpf_prog *prog; struct bpf_prog *prog;
int ret = __SK_DROP; int ret = __SK_DROP;
struct sock *sk;
rcu_read_lock(); rcu_read_lock();
sk = strp->sk;
psock = sk_psock(sk);
if (unlikely(!psock)) {
kfree_skb(skb);
goto out;
}
prog = READ_ONCE(psock->progs.skb_verdict); prog = READ_ONCE(psock->progs.skb_verdict);
if (likely(prog)) { if (likely(prog)) {
skb_orphan(skb); skb_orphan(skb);
...@@ -794,6 +801,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb) ...@@ -794,6 +801,7 @@ static void sk_psock_strp_read(struct strparser *strp, struct sk_buff *skb)
ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb)); ret = sk_psock_map_verd(ret, tcp_skb_bpf_redirect_fetch(skb));
} }
sk_psock_verdict_apply(psock, skb, ret); sk_psock_verdict_apply(psock, skb, ret);
out:
rcu_read_unlock(); rcu_read_unlock();
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment