1. 07 Feb, 2017 15 commits
    • Andrew Lunn's avatar
      net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10 · f39908d3
      Andrew Lunn authored
      Unlike most ports, ports 9 and 10 of the 6390X family have configurable
      PHY modes. Set the mode as part of adjust_link().
      
      Ordering is important, because the SERDES interfaces connected to
      ports 9 and 10 can be split and assigned to other ports. The CMODE has
      to be correctly set before the SERDES interface on another port can be
      configured. Such configuration is likely to be performed in
      port_enable() and port_disabled(), called on slave_open() and
      slave_close().
      
      The simple case is port 9 and 10 are used for 'CPU' or 'DSA'. In this
      case, the CMODE is set via a phy-mode in dsa_cpu_dsa_setup(), which is
      called early in the switch setup.
      
      When ports 9 or 10 are used as user ports, and have a fixed-phy, when
      the fixed fixed-phy is attached, dsa_slave_adjust_link() is called,
      which results in the adjust_link function being called, setting the
      cmode. The port_enable() will for other ports will be called much
      later.
      
      When ports 9 or 10 are used as user ports and have a real phy attached
      which does not use all the available SERDES interface, e.g. a 1Gbps
      SGMII, there is currently no mechanism in place to set the CMODE of
      the port from software. It must be hoped the stripping resistors are
      correct.
      
      At the same time, add a function to get the cmode. This will be needed
      when configuring the SERDES interfaces.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f39908d3
    • Andrew Lunn's avatar
      net: phy: Add 2000base-x, 2500base-x and rxaui modes · 55601a88
      Andrew Lunn authored
      The mv88e6390 ports 9 and 10 supports some additional PHY modes. Add
      these modes to the PHY core so they can be used in the binding.
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      55601a88
    • David S. Miller's avatar
      Merge branch 'virtio_net-XDP-adjust_head' · 108d9c71
      David S. Miller authored
      John Fastabend says:
      
      ====================
      XDP adjust head support for virtio
      
      This series adds adjust head support for virtio. The following is my
      test setup. I use qemu + virtio as follows,
      
      ./x86_64-softmmu/qemu-system-x86_64 \
        -hda /var/lib/libvirt/images/Fedora-test0.img \
        -m 4096  -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
        -device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9
      
      In order to use XDP with virtio until LRO is supported TSO must be
      turned off in the host. The important fields in the above command line
      are the following,
      
        guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off
      
      Also note it is possible to conusme more queues than can be supported
      because when XDP is enabled for retransmit XDP attempts to use a queue
      per cpu. My standard queue count is 'queues=4'.
      
      After loading the VM I run the relevant XDP test programs in,
      
        ./sammples/bpf
      
      For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
      with iperf (-d option to get bidirectional traffic), ping, and pktgen.
      I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
      the normal traffic path to the stack continues to work with XDP loaded.
      
      It would be great to automate this soon. At the moment I do it by hand
      which is starting to get tedious.
      
      v2: original series dropped trace points after merge.
      ====================
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      108d9c71
    • John Fastabend's avatar
      virtio_net: XDP support for adjust_head · 2de2f7f4
      John Fastabend authored
      Add support for XDP adjust head by allocating a 256B header region
      that XDP programs can grow into. This is only enabled when a XDP
      program is loaded.
      
      In order to ensure that we do not have to unwind queue headroom push
      queue setup below bpf_prog_add. It reads better to do a prog ref
      unwind vs another queue setup call.
      
      At the moment this code must do a full reset to ensure old buffers
      without headroom on program add or with headroom on program removal
      are not used incorrectly in the datapath. Ideally we would only
      have to disable/enable the RX queues being updated but there is no
      API to do this at the moment in virtio so use the big hammer. In
      practice it is likely not that big of a problem as this will only
      happen when XDP is enabled/disabled changing programs does not
      require the reset. There is some risk that the driver may either
      have an allocation failure or for some reason fail to correctly
      negotiate with the underlying backend in this case the driver will
      be left uninitialized. I have not seen this ever happen on my test
      systems and for what its worth this same failure case can occur
      from probe and other contexts in virtio framework.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2de2f7f4
    • John Fastabend's avatar
      virtio_net: refactor freeze/restore logic into virtnet reset logic · 9fe7bfce
      John Fastabend authored
      For XDP we will need to reset the queues to allow for buffer headroom
      to be configured. In order to do this we need to essentially run the
      freeze()/restore() code path. Unfortunately the locking requirements
      between the freeze/restore and reset paths are different however so
      we can not simply reuse the code.
      
      This patch refactors the code path and adds a reset helper routine.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9fe7bfce
    • John Fastabend's avatar
      virtio_net: remove duplicate queue pair binding in XDP · 722d8283
      John Fastabend authored
      Factor out qp assignment.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      722d8283
    • John Fastabend's avatar
      virtio_net: factor out xdp handler for readability · 0354e4d1
      John Fastabend authored
      At this point the do_xdp_prog is mostly if/else branches handling
      the different modes of virtio_net. So remove it and handle running
      the program in the per mode handlers.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0354e4d1
    • John Fastabend's avatar
      virtio_net: wrap rtnl_lock in test for calling with lock already held · 47315329
      John Fastabend authored
      For XDP use case and to allow ethtool reset tests it is useful to be
      able to use reset paths from contexts where rtnl lock is already
      held.
      
      This requries updating virtnet_set_queues and free_receive_bufs the
      two places where rtnl_lock is taken in virtio_net. To do this we
      use the following pattern,
      
      	_foo(...) { do stuff }
      	foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()};
      
      this allows us to use freeze()/restore() flow from both contexts.
      Signed-off-by: default avatarJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      47315329
    • David S. Miller's avatar
      Merge branch 'bridge-improve-cache-utilization' · 152bff37
      David S. Miller authored
      Nikolay Aleksandrov says:
      
      ====================
      bridge: improve cache utilization
      
      This is the first set which begins to deal with the bad bridge cache
      access patterns. The first patch rearranges the bridge and port structs
      a little so the frequently (and closely) accessed members are in the same
      cache line. The second patch then moves the garbage collection to a
      workqueue trying to improve system responsiveness under load (many fdbs)
      and more importantly removes the need to check if the matched entry is
      expired in __br_fdb_get which was a major source of false-sharing.
      The third patch is a preparation for the final one which
      If properly configured, i.e. ports bound to CPUs (thus updating "updated"
      locally) then the bridge's HitM goes from 100% to 0%, but even without
      binding we get a win because previously every lookup that iterated over
      the hash chain caused false-sharing due to the first cache line being
      used for both mac/vid and used/updated fields.
      
      Some results from tests I've run:
      (note that these were run in good conditions for the baseline, everything
       ran on a single NUMA node and there were only 3 fdbs)
      
      1. baseline
      100% Load HitM on the fdbs (between everyone who has done lookups and hit
                                  one of the 3 hash chains of the communicating
                                  src/dst fdbs)
      Overall 5.06% Load HitM for the bridge, first place in the list
      
      2. patched & ports bound to CPUs
      0% Local load HitM, bridge is not even in the c2c report list
      Also there's 3% consistent improvement in netperf tests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      152bff37
    • Nikolay Aleksandrov's avatar
      bridge: fdb: write to used and updated at most once per jiffy · 83a718d6
      Nikolay Aleksandrov authored
      Writing once per jiffy is enough to limit the bridge's false sharing.
      After this change the bridge doesn't show up in the local load HitM stats.
      Suggested-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83a718d6
    • Nikolay Aleksandrov's avatar
      bridge: move write-heavy fdb members in their own cache line · 1214628c
      Nikolay Aleksandrov authored
      Fdb's used and updated fields are written to on every packet forward and
      packet receive respectively. Thus if we are receiving packets from a
      particular fdb, they'll cause false-sharing with everyone who has looked
      it up (even if it didn't match, since mac/vid share cache line!). The
      "used" field is even worse since it is updated on every packet forward
      to that fdb, thus the standard config where X ports use a single gateway
      results in 100% fdb false-sharing. Note that this patch does not prevent
      the last scenario, but it makes it better for other bridge participants
      which are not using that fdb (and are only doing lookups over it).
      The point is with this move we make sure that only communicating parties
      get the false-sharing, in a later patch we'll show how to avoid that too.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1214628c
    • Nikolay Aleksandrov's avatar
      bridge: move to workqueue gc · f7cdee8a
      Nikolay Aleksandrov authored
      Move the fdb garbage collector to a workqueue which fires at least 10
      milliseconds apart and cleans chain by chain allowing for other tasks
      to run in the meantime. When having thousands of fdbs the system is much
      more responsive. Most importantly remove the need to check if the
      matched entry has expired in __br_fdb_get that causes false-sharing and
      is completely unnecessary if we cleanup entries, at worst we'll get 10ms
      of traffic for that entry before it gets deleted.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7cdee8a
    • Nikolay Aleksandrov's avatar
      bridge: modify bridge and port to have often accessed fields in one cache line · 1f90c7f3
      Nikolay Aleksandrov authored
      Move around net_bridge so the vlan fields are in the beginning since
      they're checked on every packet even if vlan filtering is disabled.
      For the port move flags & vlan group to the beginning, so they're in the
      same cache line with the port's state (both flags and state are checked
      on each packet).
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1f90c7f3
    • William Tu's avatar
      bpf: enable verifier to add 0 to packet ptr · 63dfef75
      William Tu authored
      The patch fixes the case when adding a zero value to the packet
      pointer.  The zero value could come from src_reg equals type
      BPF_K or CONST_IMM.  The patch fixes both, otherwise the verifer
      reports the following error:
        [...]
          R0=imm0,min_value=0,max_value=0
          R1=pkt(id=0,off=0,r=4)
          R2=pkt_end R3=fp-12
          R4=imm4,min_value=4,max_value=4
          R5=pkt(id=0,off=4,r=4)
        269: (bf) r2 = r0     // r2 becomes imm0
        270: (77) r2 >>= 3
        271: (bf) r4 = r1     // r4 becomes pkt ptr
        272: (0f) r4 += r2    // r4 += 0
        addition of negative constant to packet pointer is not allowed
      Signed-off-by: default avatarWilliam Tu <u9012063@gmail.com>
      Signed-off-by: default avatarMihai Budiu <mbudiu@vmware.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63dfef75
    • Josef Bacik's avatar
      bpf: test for AND edge cases · 29200c19
      Josef Bacik authored
      These two tests are based on the work done for f23cc643.  The first test is
      just a basic one to make sure we don't allow AND'ing negative values, even if it
      would result in a valid index for the array.  The second is a cleaned up version
      of the original testcase provided by Jann Horn that resulted in the commit.
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      29200c19
  2. 06 Feb, 2017 25 commits