1. 25 Apr, 2017 6 commits
    • Willem de Bruijn's avatar
      virtio-net: keep tx interrupts disabled unless kick · bdb12e0d
      Willem de Bruijn authored
      Tx napi mode increases the rate of transmit interrupts. Suppress some
      by masking interrupts while more packets are expected. The interrupts
      will be reenabled before the last packet is sent.
      
      This optimization reduces the througput drop with tx napi for
      unidirectional flows such as UDP_STREAM that do not benefit from
      cleaning tx completions in the the receive napi handler.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bdb12e0d
    • Willem de Bruijn's avatar
      virtio-net: clean tx descriptors from rx napi · 7b0411ef
      Willem de Bruijn authored
      Amortize the cost of virtual interrupts by doing both rx and tx work
      on reception of a receive interrupt if tx napi is enabled. With
      VIRTIO_F_EVENT_IDX, this suppresses most explicit tx completion
      interrupts for bidirectional workloads.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7b0411ef
    • Willem de Bruijn's avatar
      virtio-net: move free_old_xmit_skbs · ea7735d9
      Willem de Bruijn authored
      An upcoming patch will call free_old_xmit_skbs indirectly from
      virtnet_poll. Move the function above this to avoid having to
      introduce a forward declaration.
      
      This is a pure move: no code changes.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea7735d9
    • Willem de Bruijn's avatar
      virtio-net: transmit napi · b92f1e67
      Willem de Bruijn authored
      Convert virtio-net to a standard napi tx completion path. This enables
      better TCP pacing using TCP small queues and increases single stream
      throughput.
      
      The virtio-net driver currently cleans tx descriptors on transmission
      of new packets in ndo_start_xmit. Latency depends on new traffic, so
      is unbounded. To avoid deadlock when a socket reaches its snd limit,
      packets are orphaned on tranmission. This breaks socket backpressure,
      including TSQ.
      
      Napi increases the number of interrupts generated compared to the
      current model, which keeps interrupts disabled as long as the ring
      has enough free descriptors. Keep tx napi optional and disabled for
      now. Follow-on patches will reduce the interrupt cost.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b92f1e67
    • Willem de Bruijn's avatar
      virtio-net: napi helper functions · e4e8452a
      Willem de Bruijn authored
      Prepare virtio-net for tx napi by converting existing napi code to
      use helper functions. This also deduplicates some logic.
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e4e8452a
    • David S. Miller's avatar
      sparc64: Improve 64-bit constant loading in eBPF JIT. · 14933dc8
      David S. Miller authored
      Doing a full 64-bit decomposition is really stupid especially for
      simple values like 0 and -1.
      
      But if we are going to optimize this, go all the way and try for all 2
      and 3 instruction sequences not requiring a temporary register as
      well.
      
      First we do the easy cases where it's a zero or sign extended 32-bit
      number (sethi+or, sethi+xor, respectively).
      
      Then we try to find a range of set bits we can load simply then shift
      up into place, in various ways.
      
      Then we try negating the constant and see if we can do a simple
      sequence using that with a xor at the end.  (f.e. the range of set
      bits can't be loaded simply, but for the negated value it can)
      
      The final optimized strategy involves 4 instructions sequences not
      needing a temporary register.
      
      Otherwise we sadly fully decompose using a temp..
      
      Example, from ALU64_XOR_K: 0x0000ffffffff0000 ^ 0x0 = 0x0000ffffffff0000:
      
      0000000000000000 <foo>:
         0:   9d e3 bf 50     save  %sp, -176, %sp
         4:   01 00 00 00     nop
         8:   90 10 00 18     mov  %i0, %o0
         c:   13 3f ff ff     sethi  %hi(0xfffffc00), %o1
        10:   92 12 63 ff     or  %o1, 0x3ff, %o1     ! ffffffff <foo+0xffffffff>
        14:   93 2a 70 10     sllx  %o1, 0x10, %o1
        18:   15 3f ff ff     sethi  %hi(0xfffffc00), %o2
        1c:   94 12 a3 ff     or  %o2, 0x3ff, %o2     ! ffffffff <foo+0xffffffff>
        20:   95 2a b0 10     sllx  %o2, 0x10, %o2
        24:   92 1a 60 00     xor  %o1, 0, %o1
        28:   12 e2 40 8a     cxbe  %o1, %o2, 38 <foo+0x38>
        2c:   9a 10 20 02     mov  2, %o5
        30:   10 60 00 03     b,pn   %xcc, 3c <foo+0x3c>
        34:   01 00 00 00     nop
        38:   9a 10 20 01     mov  1, %o5     ! 1 <foo+0x1>
        3c:   81 c7 e0 08     ret
        40:   91 eb 40 00     restore  %o5, %g0, %o0
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      14933dc8
  2. 24 Apr, 2017 34 commits