1. 16 May, 2014 40 commits
    • Phoebe Buckheister's avatar
    • Phoebe Buckheister's avatar
    • Phoebe Buckheister's avatar
    • Phoebe Buckheister's avatar
      mac802154: add llsec structures and mutators · 5d637d5a
      Phoebe Buckheister authored
      This patch adds containers and mutators for the major ieee802154_llsec
      structures to mac802154. Most of the (rather simple) ieee802154_llsec
      structs are wrapped only to provide an rcu_head for orderly disposal,
      but some structs - llsec keys notably - require more complex
      bookkeeping.
      
      Since each llsec key may be referenced by a number of llsec key table
      entries (with differing key ids, but the same actual key), we want to
      save memory and not allocate crypto transforms for each entry in the
      table. Thus, the mac802154 llsec key is reference-counted instead.
      Further, each key will have four associated crypto transforms - three
      CCM transforms for the authsizes 4/8/16 and one CTR transform for
      unauthenticated encryption. If we had a CCM* transform that allowed
      authsize 0, and authsize as part of requests instead of transforms, this
      would not be necessary.
      Signed-off-by: default avatarPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d637d5a
    • Phoebe Buckheister's avatar
      mac802154: update Kconfig · 87de726c
      Phoebe Buckheister authored
      Link-layer security requires AES CCM for authenticated modes and AES CTR
      for the unauthenticated encryption mode.
      Signed-off-by: default avatarPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      87de726c
    • Phoebe Buckheister's avatar
      ieee802154: add types for link-layer security · dc20759f
      Phoebe Buckheister authored
      The added structures match 802.15.4-2011 link-layer security PIBs as
      closely as is reasonable. Some lists required by the standard were
      modeled as bitmaps (frame_types and command_frame_ids in *llsec_key,
      802.15.4-2011 7.5/Table 61), since using lists for those seems a bit
      excessive and not particularly useful. The DeviceDescriptorHandleList
      was inverted and is here a per-device list, since operations on this
      list are likely to have both a key and a device at hand, and per-device
      lists of keys are shorter than per-key lists of devices.
      Signed-off-by: default avatarPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dc20759f
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch · e54740e6
      David S. Miller authored
      Jesse Gross says:
      
      ====================
      A set of OVS changes for net-next/3.16.
      
      The major change here is a switch from per-CPU to per-NUMA flow
      statistics. This improves scalability by reducing kernel overhead
      in flow setup and maintenance.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e54740e6
    • David S. Miller's avatar
      Merge branch 'dt_fixed_phy' · ad2ebb3d
      David S. Miller authored
      Thomas Petazzoni says:
      
      ====================
      Add DT support for fixed PHYs
      
      Here is a fourth version of the patch set that adds a Device Tree
      binding and the related code to support fixed PHYs. I'm hoping to get
      this merged in 3.16.
      
      Changes since v3:
      
       * Rebased on top of v3.15-rc5
      
       * In patch "net: phy: decouple PHY id and PHY address in fixed PHY
         driver", changed the PHY ID of fixed PHYs from 0xdeadbeef to 0x0,
         as suggested by Grant Likely.
      
       * Fixed the !CONFIG_PHY_FIXED case in patch "net: phy: extend fixed
         driver with fixed_phy_register()". Noticed by Florian Fainelli.
      
       * Added Acked-by from Grant Likely and Florian Fainelli on patch
         "net: phy: extend fixed driver with fixed_phy_register()".
      
       * Reworked the new fixed-link DT binding to be just a sub-node of the
         Ethernet MAC node, and not a node referenced by the 'phy'
         property. This was requested by Grant Likely.
      
       * Reworked the code implementing the new DT binding to also make it
         accept the old, single property based, DT binding.
      
       * Added a patch that actually uses the new fixed link DT binding for
         the Armada XP Matrix board.
      
      Changes since v2:
      
       * Rebased on top of v3.14-rc1, and re-tested on hardware.
      
       * Removed the RFC tag, since there seems to be some real interest in
         this feature, and the code has gone through several iterations
         already.
      
       * The error handling in fixed_phy_register() has been fixed.
      
      Changes since v1:
      
       * Instead of using a 'fixed-link' property inside the Ethernet device
         DT node, with a fairly cryptic succession of integer values, we now
         use a PHY subnode under the Ethernet device DT node, with explicit
         properties to configure the duplex, speed, pause and other PHY
         properties.
      
       * The PHY address is automatically allocated by the kernel and no
         longer visible in the Device Tree binding.
      
       * The PHY device is created directly when the network driver calls
         of_phy_connect_fixed_link(), and associated to the PHY DT node,
         which allows the existing of_phy_connect() function to work,
         without the need to use the deprecated of_phy_connect_fixed_link().
      
      Posts of previous versions:
      
        RFCv1:   http://www.spinics.net/lists/netdev/msg243253.html
        RFCv2:   http://lists.infradead.org/pipermail/linux-arm-kernel/2013-September/196919.html
        PATCHv3: http://www.spinics.net/lists/netdev/msg273117.html
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad2ebb3d
    • Thomas Petazzoni's avatar
      ARM: mvebu: use the fixed-link PHY DT binding for the Armada XP Matrix board · 84f6e11f
      Thomas Petazzoni authored
      The Armada XP Matrix board has an Ethernet PHY that isn't configurable
      through the MDIO bus, so we use the newly introduced fixed-link PHY DT
      binding to represent the PHY of this platform and get network working.
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      84f6e11f
    • Thomas Petazzoni's avatar
      net: mvneta: add support for fixed links · 83895bed
      Thomas Petazzoni authored
      Following the introduction of of_phy_register_fixed_link(), this patch
      introduces fixed link support in the mvneta driver, for Marvell Armada
      370/XP SOCs.
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      83895bed
    • Thomas Petazzoni's avatar
      of: provide a binding for fixed link PHYs · 3be2a49e
      Thomas Petazzoni authored
      Some Ethernet MACs have a "fixed link", and are not connected to a
      normal MDIO-managed PHY device. For those situations, a Device Tree
      binding allows to describe a "fixed link" using a special PHY node.
      
      This patch adds:
      
       * A documentation for the fixed PHY Device Tree binding.
      
       * An of_phy_is_fixed_link() function that an Ethernet driver can call
         on its PHY phandle to find out whether it's a fixed link PHY or
         not. It should typically be used to know if
         of_phy_register_fixed_link() should be called.
      
       * An of_phy_register_fixed_link() function that instantiates the
         fixed PHY into the PHY subsystem, so that when the driver calls
         of_phy_connect(), the PHY device associated to the OF node will be
         found.
      
      These two additional functions also support the old fixed-link Device
      Tree binding used on PowerPC platforms, so that ultimately, the
      network device drivers for those platforms could be converted to use
      of_phy_is_fixed_link() and of_phy_register_fixed_link() instead of
      of_phy_connect_fixed_link(), while keeping compatibility with their
      respective Device Tree bindings.
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3be2a49e
    • Thomas Petazzoni's avatar
      net: phy: extend fixed driver with fixed_phy_register() · a7595121
      Thomas Petazzoni authored
      The existing fixed_phy_add() function has several drawbacks that
      prevents it from being used as is for OF-based declaration of fixed
      PHYs:
      
       * The address of the PHY on the fake bus needs to be passed, while a
         dynamic allocation is desired.
      
       * Since the phy_device instantiation is post-poned until the next
         mdiobus scan, there is no way to associate the fixed PHY with its
         OF node, which later prevents of_phy_connect() from finding this
         fixed PHY from a given OF node.
      
      To solve this, this commit introduces fixed_phy_register(), which will
      allocate an available PHY address, add the PHY using fixed_phy_add()
      and instantiate the phy_device structure associated with the provided
      OF node.
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Acked-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: default avatarGrant Likely <grant.likely@linaro.org>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a7595121
    • Thomas Petazzoni's avatar
      net: phy: decouple PHY id and PHY address in fixed PHY driver · 9b744942
      Thomas Petazzoni authored
      Until now, the fixed_phy_add() function was taking as argument
      'phy_id', which was used both as the PHY address on the fake fixed
      MDIO bus, and as the PHY id, as available in the MII_PHYSID1 and
      MII_PHYSID2 registers. However, those two informations are completely
      unrelated.
      
      This patch decouples them. The PHY id of fixed PHYs is hardcoded to be
      0x0. Ideally, a really reserved value would be nicer, but there
      doesn't seem to be an easy of making sure a dummy value can be
      assigned to the Linux kernel for such usage.
      
      The PHY address remains passed by the caller of phy_fixed_add().
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9b744942
    • David S. Miller's avatar
      Merge branch 'bridge-non-promisc' · 2770abcc
      David S. Miller authored
      Vlad Yasevich says:
      
      ====================
      bridge: Non-promisc bridge ports support
      
      This series adds functionality to the bridge device to enable
      operations without setting all ports to promiscuous mode.
      
      The basic concept is this.  The bridge keeps track of the ports
      that support learning and flooding packets to unknown destinations.
      We call these ports auto-discovery ports since they automatically
      discover who is behind them through learning and flooding.
      
      If flooding and learning are disabled via flags, then the port
      requires static configuration to tell it which mac addresses
      are behind it.  This is accomplished through adding of fdbs.
      These fdbs should be static as dynamic fdbs can expire and systems
      will become unreachable due to lack of flooding.
      
      If the user marks all ports as needing static configuration then
      we can safely make them non-promiscuous since we will know all the
      information about them.
      
      If the user leaves only 1 port as automatic, then we can mark
      that port as not-promiscuous as well.  One could think of
      this a edge relay similar to what's support by embedded switches
      in SRIOV devices.  Since we have all the information about the
      other ports, we can just program the mac addresses into the
      single automatic port to receive all necessary traffic.
      More information about this is patch 6.
      
      In other cases, we keep all ports promiscuous as before.
      
      There are some other cases when promiscuous mode has to be turned
      back on.  One is when the bridge itself if placed in promiscuous
      mode (user sets promisc flag).  The other is if vlan filtering is
      turned off.  Since this is the default configuration, the default
      bridge operation is not changed.
      
      Changes since v2:
       - White space and spelling fixes from Michael Tsirkin
       - Squash patches 6, 7 and 8 to prevent bisect breakage.
      
      Changes since v1:
       - Address issues rasied by Stephen Heminger
       - Address initializer comments raised by Sergey Shtylyov
       - Rebased recent net-next.
      
      Changes since rfc v2:
       - Better description of in the commit logs
       - Leave port in promiscuous mode if IFF_UNICAST_FLT is disabled on the
         device.
       - Fix issue with flag masking
       - Rework patch ordering a bit.
      
      Changes since rfc v1:
       - Removed private list.  We now traverse the fdb hashtable itself
         to write necessary addresses to the ports (Stephen's concern)
       - Add learning flag to the mask for flags that decides if the port
         is 'auto' or not (suggest by MST and Jamal).
       - Simplified tracking of such ports at the cost of a loop over all
         ports (suggested by MST)
      
      I've played with quite a large number of ports and the current approach
      seems to work fairly well.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2770abcc
    • Vlad Yasevich's avatar
      bridge: Automatically manage port promiscuous mode. · 2796d0c6
      Vlad Yasevich authored
      There exist configurations where the administrator or another management
      entity has the foreknowledge of all the mac addresses of end systems
      that are being bridged together.
      
      In these environments, the administrator can statically configure known
      addresses in the bridge FDB and disable flooding and learning on ports.
      This makes it possible to turn off promiscuous mode on the interfaces
      connected to the bridge.
      
      Here is why disabling flooding and learning allows us to control
      promiscuity:
       Consider port X.  All traffic coming into this port from outside the
      bridge (ingress) will be either forwarded through other ports of the
      bridge (egress) or dropped.  Forwarding (egress) is defined by FDB
      entries and by flooding in the event that no FDB entry exists.
      In the event that flooding is disabled, only FDB entries define
      the egress.  Once learning is disabled, only static FDB entries
      provided by a management entity define the egress.  If we provide
      information from these static FDBs to the ingress port X, then we'll
      be able to accept all traffic that can be successfully forwarded and
      drop all the other traffic sooner without spending CPU cycles to
      process it.
       Another way to define the above is as following equations:
          ingress = egress + drop
       expanding egress
          ingress = static FDB + learned FDB + flooding + drop
       disabling flooding and learning we a left with
          ingress = static FDB + drop
      
      By adding addresses from the static FDB entries to the MAC address
      filter of an ingress port X, we fully define what the bridge can
      process without dropping and can thus turn off promiscuous mode,
      thus dropping packets sooner.
      
      There have been suggestions that we may want to allow learning
      and update the filters with learned addresses as well.  This
      would require mac-level authentication similar to 802.1x to
      prevent attacks against the hw filters as they are limited
      resource.
      
      Additionally, if the user places the bridge device in promiscuous mode,
      all ports are placed in promiscuous mode regardless of the changes
      to flooding and learning.
      
      Since the above functionality depends on full static configuration,
      we have also require that vlan filtering be enabled to take
      advantage of this.  The reason is that the bridge has to be
      able to receive and process VLAN-tagged frames and the there
      are only 2 ways to accomplish this right now: promiscuous mode
      or vlan filtering.
      Suggested-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2796d0c6
    • Vlad Yasevich's avatar
      bridge: Add addresses from static fdbs to non-promisc ports · 145beee8
      Vlad Yasevich authored
      When a static fdb entry is created, add the mac address
      from this fdb entry to any ports that are currently running
      in non-promiscuous mode.  These ports need this data so that
      they can receive traffic destined to these addresses.
      By default ports start in promiscuous mode, so this feature
      is disabled.
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      145beee8
    • Vlad Yasevich's avatar
      bridge: Introduce BR_PROMISC flag · f3a6ddf1
      Vlad Yasevich authored
      Introduce a BR_PROMISC per-port flag that will help us track if the
      current port is supposed to be in promiscuous mode or not.  For now,
      always start in promiscuous mode.
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f3a6ddf1
    • Vlad Yasevich's avatar
      bridge: Add functionality to sync static fdb entries to hw · 8db24af7
      Vlad Yasevich authored
      Add code that allows static fdb entires to be synced to the
      hw list for a specified port.  This will be used later to
      program ports that can function in non-promiscuous mode.
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8db24af7
    • Vlad Yasevich's avatar
      bridge: Keep track of ports capable of automatic discovery. · e028e4b8
      Vlad Yasevich authored
      By default, ports on the bridge are capable of automatic
      discovery of nodes located behind the port.  This is accomplished
      via flooding of unknown traffic (BR_FLOOD) and learning the
      mac addresses from these packets (BR_LEARNING).
      If the above functionality is disabled by turning off these
      flags, the port requires static configuration in the form
      of static FDB entries to function properly.
      
      This patch adds functionality to keep track of all ports
      capable of automatic discovery.  This will later be used
      to control promiscuity settings.
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e028e4b8
    • Vlad Yasevich's avatar
      bridge: Turn flag change macro into a function. · 63c3a622
      Vlad Yasevich authored
      Turn the flag change macro into a function to allow
      easier updates and to reduce space.
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63c3a622
    • Jean Delvare's avatar
      net: pch_gbe depends on x86_32 · 4c30b525
      Jean Delvare authored
      The pch_gbe driver is for a companion chip to the Intel Atom E600
      series processors. These are 32-bit x86 processors so the driver is
      only needed on X86_32.
      Signed-off-by: default avatarJean Delvare <jdelvare@suse.de>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c30b525
    • Duan Jiong's avatar
      ip_tunnel: don't add tunnel twice · ee30ef4d
      Duan Jiong authored
      When using command "ip tunnel add" to add a tunnel, the tunnel will be added twice,
      through ip_tunnel_create() and ip_tunnel_update().
      
      Because the second is unnecessary, so we can just break after adding tunnel
      through ip_tunnel_create().
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee30ef4d
    • Alexei Starovoitov's avatar
      tools: bpf_jit_disasm: increase image buffer size · 9bb1a208
      Alexei Starovoitov authored
      JITed seccomp filters can be quite large if they check a lot of syscalls
      Simply increase buffer size
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bb1a208
    • Alexei Starovoitov's avatar
      tools: bpf_jit_disasm: ignore image address for disasm · ed4afd45
      Alexei Starovoitov authored
      seccomp filters use kernel JIT image addresses, so bpf_jit_enable=2 prints
      [ 20.146438] flen=3 proglen=82 pass=0 image=0000000000000000
      [ 20.146442] JIT code: 00000000: 55 48 89 e5 48 81 ec 28 02 00 00 ...
      
      ignore image address, so that seccomp filters can be disassembled
      Signed-off-by: default avatarAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed4afd45
    • David S. Miller's avatar
      Merge branch 'systemport-next' · e0a1272c
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: systemport: DMA and MAC fixes
      
      This patch series contains a critical fix in how the DMA unmapping of packet
      is done, as well as a less critical fix in how we disable the Ethernet MAC
      RX/TX functions.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0a1272c
    • Florian Fainelli's avatar
      net: systemport: wait for packet in umac_enable_set() · 00b91c69
      Florian Fainelli authored
      When umac_enable_set() is used to disable the UniMAC receiver or
      transmitter, we need to make sure that we wait for a full-sized packet
      to be processed because the UniMAC hardware stops on a packet boundary,
      not immediately.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      00b91c69
    • Florian Fainelli's avatar
      net: systemport: fix dma_unmap_single() len · b1ff53e9
      Florian Fainelli authored
      dma_unmap_single() was called with dma_unmap_len(cb, dma_len),
      unfortunately we failed to assign this length field in
      bcm_sysport_rx_refill() or bcm_sysport_alloc_rx_bufs() using
      dma_unmap_len_set().
      
      This causes packet contents corruption because are we not invoking the
      cache invalidation routines with the proper length.  Fix this by using
      the full RX buffer size (RX_BUF_LENGTH) because the mappings for the RX
      bufers are created with that size.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b1ff53e9
    • Monam Agarwal's avatar
      net/openvswitch: Use with RCU_INIT_POINTER(x, NULL) in vport-gre.c · 944df8ae
      Monam Agarwal authored
      This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)
      
      The rcu_assign_pointer() ensures that the initialization of a structure
      is carried out before storing a pointer to that structure.
      And in the case of the NULL pointer, there is no structure to initialize.
      So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL)
      Signed-off-by: default avatarMonam Agarwal <monamagarwal123@gmail.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      944df8ae
    • Jarno Rajahalme's avatar
      openvswitch: Use TCP flags in the flow key for stats. · 88d73f6c
      Jarno Rajahalme authored
      We already extract the TCP flags for the key, might as well use that
      for stats.
      Signed-off-by: default avatarJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      88d73f6c
    • Jarno Rajahalme's avatar
      openvswitch: Fix output of SCTP mask. · d92ab135
      Jarno Rajahalme authored
      The 'output' argument of the ovs_nla_put_flow() is the one from which
      the bits are written to the netlink attributes.  For SCTP we
      accidentally used the bits from the 'swkey' instead.  This caused the
      mask attributes to include the bits from the actual flow key instead
      of the mask.
      Signed-off-by: default avatarJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      d92ab135
    • Jarno Rajahalme's avatar
      openvswitch: Per NUMA node flow stats. · 63e7959c
      Jarno Rajahalme authored
      Keep kernel flow stats for each NUMA node rather than each (logical)
      CPU.  This avoids using the per-CPU allocator and removes most of the
      kernel-side OVS locking overhead otherwise on the top of perf reports
      and allows OVS to scale better with higher number of threads.
      
      With 9 handlers and 4 revalidators netperf TCP_CRR test flow setup
      rate doubles on a server with two hyper-threaded physical CPUs (16
      logical cores each) compared to the current OVS master.  Tested with
      non-trivial flow table with a TCP port match rule forcing all new
      connections with unique port numbers to OVS userspace.  The IP
      addresses are still wildcarded, so the kernel flows are not considered
      as exact match 5-tuple flows.  This type of flows can be expected to
      appear in large numbers as the result of more effective wildcarding
      made possible by improvements in OVS userspace flow classifier.
      
      Perf results for this test (master):
      
      Events: 305K cycles
      +   8.43%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
      +   5.64%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   4.75%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
      +   3.32%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
      +   2.61%     ovs-vswitchd  [kernel.kallsyms]   [k] pcpu_alloc_area
      +   2.19%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
      +   2.03%          swapper  [kernel.kallsyms]   [k] intel_idle
      +   1.84%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
      +   1.64%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
      +   1.58%     ovs-vswitchd  libc-2.15.so        [.] 0x7f4e6
      +   1.07%     ovs-vswitchd  [kernel.kallsyms]   [k] memset
      +   1.03%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   0.92%          swapper  [kernel.kallsyms]   [k] __ticket_spin_lock
      ...
      
      And after this patch:
      
      Events: 356K cycles
      +   6.85%     ovs-vswitchd  ovs-vswitchd        [.] find_match_wc
      +   4.63%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_lock
      +   3.06%     ovs-vswitchd  [kernel.kallsyms]   [k] __ticket_spin_lock
      +   2.81%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask_range
      +   2.51%     ovs-vswitchd  libpthread-2.15.so  [.] pthread_mutex_unlock
      +   2.27%     ovs-vswitchd  ovs-vswitchd        [.] classifier_lookup
      +   1.84%     ovs-vswitchd  libc-2.15.so        [.] 0x15d30f
      +   1.74%     ovs-vswitchd  [kernel.kallsyms]   [k] mutex_spin_on_owner
      +   1.47%          swapper  [kernel.kallsyms]   [k] intel_idle
      +   1.34%     ovs-vswitchd  ovs-vswitchd        [.] flow_hash_in_minimask
      +   1.33%     ovs-vswitchd  ovs-vswitchd        [.] rule_actions_unref
      +   1.16%     ovs-vswitchd  ovs-vswitchd        [.] hindex_node_with_hash
      +   1.16%     ovs-vswitchd  ovs-vswitchd        [.] do_xlate_actions
      +   1.09%     ovs-vswitchd  ovs-vswitchd        [.] ofproto_rule_ref
      +   1.01%          netperf  [kernel.kallsyms]   [k] __ticket_spin_lock
      ...
      
      There is a small increase in kernel spinlock overhead due to the same
      spinlock being shared between multiple cores of the same physical CPU,
      but that is barely visible in the netperf TCP_CRR test performance
      (maybe ~1% performance drop, hard to tell exactly due to variance in
      the test results), when testing for kernel module throughput (with no
      userspace activity, handful of kernel flows).
      
      On flow setup, a single stats instance is allocated (for the NUMA node
      0).  As CPUs from multiple NUMA nodes start updating stats, new
      NUMA-node specific stats instances are allocated.  This allocation on
      the packet processing code path is made to never block or look for
      emergency memory pools, minimizing the allocation latency.  If the
      allocation fails, the existing preallocated stats instance is used.
      Also, if only CPUs from one NUMA-node are updating the preallocated
      stats instance, no additional stats instances are allocated.  This
      eliminates the need to pre-allocate stats instances that will not be
      used, also relieving the stats reader from the burden of reading stats
      that are never used.
      Signed-off-by: default avatarJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: default avatarPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      63e7959c
    • Jarno Rajahalme's avatar
      openvswitch: Remove 5-tuple optimization. · 23dabf88
      Jarno Rajahalme authored
      The 5-tuple optimization becomes unnecessary with a later per-NUMA
      node stats patch.  Remove it first to make the changes easier to
      grasp.
      Signed-off-by: default avatarJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      23dabf88
    • Joe Perches's avatar
      openvswitch: Use ether_addr_copy · 8c63ff09
      Joe Perches authored
      It's slightly smaller/faster for some architectures.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      8c63ff09
    • Joe Perches's avatar
      openvswitch: flow_netlink: Use pr_fmt to OVS_NLERR output · 2235ad1c
      Joe Perches authored
      Add "openvswitch: " prefix to OVS_NLERR output
      to match the other OVS_NLERR output of datapath.c
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      2235ad1c
    • Joe Perches's avatar
      openvswitch: Use net_ratelimit in OVS_NLERR · 1815a883
      Joe Perches authored
      Each use of pr_<level>_once has a per-site flag.
      
      Some of the OVS_NLERR messages look as if seeing them
      multiple times could be useful, so use net_ratelimit()
      instead of pr_info_once.
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      1815a883
    • Daniele Di Proietto's avatar
      openvswitch: Added (unsigned long long) cast in printf · cc23ebf3
      Daniele Di Proietto authored
      This is necessary, since u64 is not unsigned long long
      in all architectures: u64 could be also uint64_t.
      Signed-off-by: default avatarDaniele Di Proietto <daniele.di.proietto@gmail.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      cc23ebf3
    • Daniele Di Proietto's avatar
      openvswitch: avoid cast-qual warning in vport_priv · 07dc0602
      Daniele Di Proietto authored
      This function must cast a const value to a non const value.
      By adding an uintptr_t cast the warning is suppressed.
      To avoid the cast (proper solution) several function signatures
      must be changed.
      Signed-off-by: default avatarDaniele Di Proietto <daniele.di.proietto@gmail.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      07dc0602
    • Daniele Di Proietto's avatar
      openvswitch: avoid warnings in vport_from_priv · d0b4da13
      Daniele Di Proietto authored
      This change, firstly, avoids declaring the formal parameter const,
      since it is treated as non const. (to avoid -Wcast-qual)
      Secondly, it cast the pointer from void* to u8*, since it is used
      in arithmetic (to avoid -Wpointer-arith)
      Signed-off-by: default avatarDaniele Di Proietto <daniele.di.proietto@gmail.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      d0b4da13
    • Daniele Di Proietto's avatar
      openvswitch: use const in some local vars and casts · 7085130b
      Daniele Di Proietto authored
      In few functions, const formal parameters are assigned or cast to
      non-const.
      These changes suppress warnings if compiled with -Wcast-qual.
      Signed-off-by: default avatarDaniele Di Proietto <daniele.di.proietto@gmail.com>
      Signed-off-by: default avatarJesse Gross <jesse@nicira.com>
      7085130b
    • David S. Miller's avatar
      Merge branch 'bonding-next' · bd508065
      David S. Miller authored
      Veaceslav Falico says:
      
      ====================
      bonding: simple macro cleanup
      
      Trivial patchset that converts most of the bonding's macros into inline
      functions. It introduces only one macro, BOND_MODE(), which is just
      bond->params.mode, better to write/understand/remember.
      
      The only real change is the removal of IFF_UP verification, which always
      came in pair with && netif_running(), and is though useless, as it's always
      IFF_UP when LINK_STATE_RUNNING.
      
      v2->v3: fix 3/9 to actually invert bond_mode_uses_arp() and add
      	bond_uses_arp() alongside bond_mode_uses_arp()
      v1->v2: use inlined functions instead of macros.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bd508065