• Jesper Dangaard Brouer's avatar
    xdp: transition into using xdp_frame for return API · 03993094
    Jesper Dangaard Brouer authored
    Changing API xdp_return_frame() to take struct xdp_frame as argument,
    seems like a natural choice. But there are some subtle performance
    details here that needs extra care, which is a deliberate choice.
    
    When de-referencing xdp_frame on a remote CPU during DMA-TX
    completion, result in the cache-line is change to "Shared"
    state. Later when the page is reused for RX, then this xdp_frame
    cache-line is written, which change the state to "Modified".
    
    This situation already happens (naturally) for, virtio_net, tun and
    cpumap as the xdp_frame pointer is the queued object.  In tun and
    cpumap, the ptr_ring is used for efficiently transferring cache-lines
    (with pointers) between CPUs. Thus, the only option is to
    de-referencing xdp_frame.
    
    It is only the ixgbe driver that had an optimization, in which it can
    avoid doing the de-reference of xdp_frame.  The driver already have
    TX-ring queue, which (in case of remote DMA-TX completion) have to be
    transferred between CPUs anyhow.  In this data area, we stored a
    struct xdp_mem_info and a data pointer, which allowed us to avoid
    de-referencing xdp_frame.
    
    To compensate for this, a prefetchw is used for telling the cache
    coherency protocol about our access pattern.  My benchmarks show that
    this prefetchw is enough to compensate the ixgbe driver.
    
    V7: Adjust for commit d9314c47 ("i40e: add support for XDP_REDIRECT")
    V8: Adjust for commit bd658dda ("net/mlx5e: Separate dma base address
    and offset in dma_sync call")
    Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    03993094
cpumap.c 18.7 KB