• Michael Chan's avatar
    bnx2: Remove some unnecessary smp_mb() in tx fast path. · 11848b96
    Michael Chan authored
    smp_mb() inside bnx2_tx_avail() is used twice in the normal
    bnx2_start_xmit() path (see illustration below).  The full memory
    barrier is only necessary during race conditions with tx completion.
    We can speed up the tx path by replacing smp_mb() in bnx2_tx_avail()
    with a compiler barrier.  The compiler barrier is to force the
    compiler to fetch the tx_prod and tx_cons from memory.
    
    In the race condition between bnx2_start_xmit() and bnx2_tx_int(),
    we have the following situation:
    
    bnx2_start_xmit()                       bnx2_tx_int()
        if (!bnx2_tx_avail())
                BUG();
    
        ...
    
        if (!bnx2_tx_avail())
                netif_tx_stop_queue();          update_tx_index();
                smp_mb();                       smp_mb();
                if (bnx2_tx_avail())            if (netif_tx_queue_stopped() &&
                        netif_tx_wake_queue();      bnx2_tx_avail())
    
    With smp_mb() removed from bnx2_tx_avail(), we need to add smp_mb() to
    bnx2_start_xmit() as shown above to properly order netif_tx_stop_queue()
    and bnx2_tx_avail() to check the ring index.  If it is not strictly
    ordered, the tx queue can be stopped forever.
    
    This improves performance by about 5% with 2 ports running bi-directional
    64-byte packets.
    Reviewed-by: default avatarBenjamin Li <benli@broadcom.com>
    Reviewed-by: default avatarMatt Carlson <mcarlson@broadcom.com>
    Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    11848b96
bnx2.c 208 KB