• Björn Töpel's avatar
    i40e: avoid premature Rx buffer reuse · 75aab4e1
    Björn Töpel authored
    The page recycle code, incorrectly, relied on that a page fragment
    could not be freed inside xdp_do_redirect(). This assumption leads to
    that page fragments that are used by the stack/XDP redirect can be
    reused and overwritten.
    
    To avoid this, store the page count prior invoking xdp_do_redirect().
    
    Longer explanation:
    
    Intel NICs have a recycle mechanism. The main idea is that a page is
    split into two parts. One part is owned by the driver, one part might
    be owned by someone else, such as the stack.
    
    t0: Page is allocated, and put on the Rx ring
                  +---------------
    used by NIC ->| upper buffer
    (rx_buffer)   +---------------
                  | lower buffer
                  +---------------
      page count  == USHRT_MAX
      rx_buffer->pagecnt_bias == USHRT_MAX
    
    t1: Buffer is received, and passed to the stack (e.g.)
                  +---------------
                  | upper buff (skb)
                  +---------------
    used by NIC ->| lower buffer
    (rx_buffer)   +---------------
      page count  == USHRT_MAX
      rx_buffer->pagecnt_bias == USHRT_MAX - 1
    
    t2: Buffer is received, and redirected
                  +---------------
                  | upper buff (skb)
                  +---------------
    used by NIC ->| lower buffer
    (rx_buffer)   +---------------
    
    Now, prior calling xdp_do_redirect():
      page count  == USHRT_MAX
      rx_buffer->pagecnt_bias == USHRT_MAX - 2
    
    This means that buffer *cannot* be flipped/reused, because the skb is
    still using it.
    
    The problem arises when xdp_do_redirect() actually frees the
    segment. Then we get:
      page count  == USHRT_MAX - 1
      rx_buffer->pagecnt_bias == USHRT_MAX - 2
    
    From a recycle perspective, the buffer can be flipped and reused,
    which means that the skb data area is passed to the Rx HW ring!
    
    To work around this, the page count is stored prior calling
    xdp_do_redirect().
    
    Note that this is not optimal, since the NIC could actually reuse the
    "lower buffer" again. However, then we need to track whether
    XDP_REDIRECT consumed the buffer or not.
    
    Fixes: d9314c47 ("i40e: add support for XDP_REDIRECT")
    Reported-and-analyzed-by: default avatarLi RongQing <lirongqing@baidu.com>
    Signed-off-by: default avatarBjörn Töpel <bjorn.topel@intel.com>
    Tested-by: default avatarGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
    Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
    75aab4e1
i40e_txrx.c 106 KB