• Alan Brady's avatar
    i40e/i40evf: fix interrupt affinity bug · 96db776a
    Alan Brady authored
    There exists a bug in which a 'perfect storm' can occur and cause
    interrupts to fail to be correctly affinitized. This causes unexpected
    behavior and has a substantial impact on performance when it happens.
    
    The bug occurs if there is heavy traffic, any number of CPUs that have
    an i40e interrupt are pegged at 100%, and the interrupt afffinity for
    those CPUs is changed.  Instead of moving to the new CPU, the interrupt
    continues to be polled while there is heavy traffic.
    
    The bug is most readily realized as the driver is first brought up and
    all interrupts start on CPU0. If there is heavy traffic and the
    interrupt starts polling before the interrupt is affinitized, the
    interrupt will be stuck on CPU0 until traffic stops. The bug, however,
    can also be wrought out more simply by affinitizing all the interrupts
    to a single CPU and then attempting to move any of those interrupts off
    while there is heavy traffic.
    
    This patch fixes the bug by registering for update notifications from
    the kernel when the interrupt affinity changes. When that fires, we
    cache the intended affinity mask. Then, while polling, if the cpu is
    pegged at 100% and we failed to clean the rings, we check to make sure
    we have the correct affinity and stop polling if we're firing on the
    wrong CPU.  When the kernel successfully moves the interrupt, it will
    start polling on the correct CPU. The performance impact is minimal
    since the only time this section gets executed is when performance is
    already compromised by the CPU.
    
    Change-ID: I4410a880159b9dba1f8297aa72bef36dca34e830
    Signed-off-by: default avatarAlan Brady <alan.brady@intel.com>
    Tested-by: default avatarAndrew Bowers <andrewx.bowers@intel.com>
    Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
    96db776a
i40e_txrx.c 61.4 KB