• Vladimir Oltean's avatar
    net: enetc: avoid buffer leaks on xdp_do_redirect() failure · 628050ec
    Vladimir Oltean authored
    Before enetc_clean_rx_ring_xdp() calls xdp_do_redirect(), each software
    BD in the RX ring between index orig_i and i can have one of 2 refcount
    values on its page.
    
    We are the owner of the current buffer that is being processed, so the
    refcount will be at least 1.
    
    If the current owner of the buffer at the diametrically opposed index
    in the RX ring (i.o.w, the other half of this page) has not yet called
    kfree(), this page's refcount could even be 2.
    
    enetc_page_reusable() in enetc_flip_rx_buff() tests for the page
    refcount against 1, and [ if it's 2 ] does not attempt to reuse it.
    
    But if enetc_flip_rx_buff() is put after the xdp_do_redirect() call,
    the page refcount can have one of 3 values. It can also be 0, if there
    is no owner of the other page half, and xdp_do_redirect() for this
    buffer ran so far that it triggered a flush of the devmap/cpumap bulk
    queue, and the consumers of those bulk queues also freed the buffer,
    all by the time xdp_do_redirect() returns the execution back to enetc.
    
    This is the reason why enetc_flip_rx_buff() is called before
    xdp_do_redirect(), but there is a big flaw with that reasoning:
    enetc_flip_rx_buff() will set rx_swbd->page = NULL on both sides of the
    enetc_page_reusable() branch, and if xdp_do_redirect() returns an error,
    we call enetc_xdp_free(), which does not deal gracefully with that.
    
    In fact, what happens is quite special. The page refcounts start as 1.
    enetc_flip_rx_buff() figures they're reusable, transfers these
    rx_swbd->page pointers to a different rx_swbd in enetc_reuse_page(), and
    bumps the refcount to 2. When xdp_do_redirect() later returns an error,
    we call the no-op enetc_xdp_free(), but we still haven't lost the
    reference to that page. A copy of it is still at rx_ring->next_to_alloc,
    but that has refcount 2 (and there are no concurrent owners of it in
    flight, to drop the refcount). What really kills the system is when
    we'll flip the rx_swbd->page the second time around. With an updated
    refcount of 2, the page will not be reusable and we'll really leak it.
    Then enetc_new_page() will have to allocate more pages, which will then
    eventually leak again on further errors from xdp_do_redirect().
    
    The problem, summarized, is that we zeroize rx_swbd->page before we're
    completely done with it, and this makes it impossible for the error path
    to do something with it.
    
    Since the packet is potentially multi-buffer and therefore the
    rx_swbd->page is potentially an array, manual passing of the old
    pointers between enetc_flip_rx_buff() and enetc_xdp_free() is a bit
    difficult.
    
    For the sake of going with a simple solution, we accept the possibility
    of racing with xdp_do_redirect(), and we move the flip procedure to
    execute only on the redirect success path. By racing, I mean that the
    page may be deemed as not reusable by enetc (having a refcount of 0),
    but there will be no leak in that case, either.
    
    Once we accept that, we have something better to do with buffers on
    XDP_REDIRECT failure. Since we haven't performed half-page flipping yet,
    we won't, either (and this way, we can avoid enetc_xdp_free()
    completely, which gives the entire page to the slab allocator).
    Instead, we'll call enetc_xdp_drop(), which will recycle this half of
    the buffer back to the RX ring.
    
    Fixes: 9d2b68cc ("net: enetc: add support for XDP_REDIRECT")
    Suggested-by: default avatarLorenzo Bianconi <lorenzo.bianconi@redhat.com>
    Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://lore.kernel.org/r/20221213001908.2347046-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    628050ec
enetc.c 70.1 KB