• Herbert Xu's avatar
    virtio_net: Make delayed refill more reliable · 39d32157
    Herbert Xu authored
    I have seen RX stalls on a machine that experienced a suspected
    OOM.  After the stall, the RX buffer is empty on the guest side
    and there are exactly 16 entries available on the host side.  As
    the number of entries is less than that required by a maximal
    skb, the host cannot proceed.
    
    The guest did not have a refill job scheduled.
    
    My diagnosis is that an OOM had occured, with the delayed refill
    job scheduled.  The job was able to allocate at least one skb, but
    not enough to overcome the minimum required by the host to proceed.
    
    As the refill job would only reschedule itself if it failed completely
    to allocate any skbs, this would lead to an RX stall.
    
    The following patch removes this stall possibility by always
    rescheduling the refill job until the ring is totally refilled.
    
    Testing has shown that the RX stall no longer occurs whereas
    previously it would occur within a day.
    Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
    Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
    39d32157
virtio_net.c 25.8 KB