1. 26 Aug, 2016 17 commits
    • David S. Miller's avatar
      Merge branch 'bcm_sf2-utilize-b53_common' · a29ca894
      David S. Miller authored
      Florian Fainelli says:
      
      ====================
      net: dsa: Make bcm_sf2 utilize b53_common
      
      This patch series makes the bcm_sf2 driver utilize a large number of the core
      functions offered by the b53_common driver since the SWITCH_CORE registers are
      mostly register compatible with the switches driven by b53_common.
      
      In order to accomplish that, we just override the dsa_driver_ops callbacks that
      we need to. There are still integration specific logic from the bcm_sf2 that we
      cannot absorb into b53_common because it is just not there, mostly in the area
      of link management and power management, but most of the features are within
      b53_common now: VLAN, FDB, bridge
      
      Along the process, we also improve support for the BCM58xx SoCs, since those
      also have the same version of the switching IP that 7445 has (for which bcm_sf2
      was developed).
      
      Changes in v3:
      
      - rebase against 145dd5f9 ("net: flush the
        softnet backlog in process context")
      
      Changes in v2:
      
      - rebased against "net: dsa: rename switch operations structure"
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a29ca894
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Remove duplicate code · de0b9d3b
      Florian Fainelli authored
      Now that we are using b53_common for most VLAN, FDB and bridge
      operations, delete all the redundant code that we had in bcm_sf2.c to
      keep only the integration specific logic that we have to deal with:
      power management, link management and the external interfaces (RGMII,
      MDIO).
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de0b9d3b
    • Florian Fainelli's avatar
      net: dsa: bcm_sf2: Utilize core B53 driver when possible · f458995b
      Florian Fainelli authored
      The Broadcom Starfighter2 is almost entirely register compatible with
      B53, yet for historical reasons came up first in the tree and is now
      being updated to utilize b53_common.c to the fullest extent possible. A
      few things need to be adjusted to allow that:
      
      - the switch "core" registers currently operate on a 32-bit address,
        whereas b53 passes a page + reg pair to offset from, so we need to
        convert that, thankfully there is a generic formula to do that
      
      - the link managemenent is not self contained with the B53/CORE register
        set, but instead is in the SWITCH_REG block which is part of the
        integration glue logic, so we keep that entirely custom here because
        this really is part of the existing bcm_sf2 implementation
      
      - there are additional power management constraints on the port's
        memories that make us keep the port_enable/disable callbacks custom
        for now, also, we support tagging whereas b53_common does not support
        that yet
      
      All the VLAN and bridge code is entirely identical though so, avoid
      duplicating it. Other things will be migrated in the future like EEE and
      possibly Wake-on-LAN.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f458995b
    • Florian Fainelli's avatar
      net: dsa: b53: Add JOIN_ALL_VLAN support · 48aea33a
      Florian Fainelli authored
      In order to migrate the bcm_sf2 driver over to the b53 driver for most
      VLAN/FDB/bridge operations, we need to add support for the "join all
      VLANs" register and behavior which allows us to make a given port join
      all VLANs and avoid setting specific VLAN entries when it is leaving the
      bridge.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48aea33a
    • Florian Fainelli's avatar
      net: dsa: b53: Define SF2 MIB layout · bde5d132
      Florian Fainelli authored
      The 58xx and 7445 chips use the Starfighter2 code, define its MIB layout
      and introduce a helper function: is58xx() which checks for both of these
      IDs for now.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bde5d132
    • Florian Fainelli's avatar
      net: dsa: b53: Prepare to support 7445 switch · 130401d9
      Florian Fainelli authored
      Allocate a device entry for the Broadcom BCM7445 integrated switch
      currently backed by bcm_sf2.c. Since this is the latest generation, it
      has 4 ARL entries, 4K VLANs and uses Port 8 for the CPU/IMP port.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      130401d9
    • Florian Fainelli's avatar
      net: dsa: b53: Initialize ds->ops in b53_switch_alloc · 485ebd61
      Florian Fainelli authored
      In order to allow drivers to override specific dsa_switch_driver
      callbacks, initialize ds->ops to b53_switch_ops earlier, which avoids
      having to expose this structure to glue drivers.
      Signed-off-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      485ebd61
    • David S. Miller's avatar
      Merge branch 'mlxsw-fw-mark-offload' · ed35ca99
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Introduce support for offload forward mark
      
      Ido says:
      This patchset enables the forwarding of certain control packets by the
      device instead of relying on the CPU to do the forwarding.
      
      The first two patches simplify the current switchdev offload forward
      infrastructure and make it usable for stacked devices. This is done by
      moving the packet and port marking to the bridge driver instead of the
      switch driver.
      
      Patches 3-5 add the mlxsw specific bits to support the forward mark.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ed35ca99
    • Ido Schimmel's avatar
      mlxsw: spectrum: Mirror certain packets to CPU · 1c6c6d22
      Ido Schimmel authored
      Instead of trapping certain packets to the CPU and then relying on it to
      flood them we can instead make the device mirror them.
      
      The following packet types are mirrored:
      
      * DHCP: Broadcast packets that should be flooded by the device, but also
      trapped in case CPU is running the DHCP server.
      
      * IGMP query: Multicast packets that need to be forwarded to other
      bridge ports, but also trapped so that receiving netdev will be marked
      as a router port by the bridge driver.
      
      * ARP request: Broadcast packets that should be forwarded to other
      bridge ports, but also trapped in case requested IP is of the local
      machine.
      
      * ARP response: Unicast packets that should be forwarded by the bridge
      but also trapped in case response is directed at us.
      
      Set the trap action of such packets to mirror and mark them using
      'offload_fwd_mark' to prevent the bridge driver from forwarding them
      itself.
      
      Note that OSPF packets are also marked despite their action being trap.
      The reason for this is that the device traps such packets in the
      pipeline after they were already flooded.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c6c6d22
    • Ido Schimmel's avatar
      mlxsw: spectrum: Allow different traps to have different actions · 63a81141
      Ido Schimmel authored
      Up until now we only trapped packets to CPU, but we are going to allow
      some packets to be mirrored (trap & forward) to CPU.
      
      Extend the Rx listener with 'action' member.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63a81141
    • Ido Schimmel's avatar
      mlxsw: spectrum: Simplify traps definition · 93393b33
      Ido Schimmel authored
      Instead of copying & pasting the same struct initialization for every
      Rx listener, just use a macro.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93393b33
    • Ido Schimmel's avatar
      bridge: switchdev: Add forward mark support for stacked devices · 6bc506b4
      Ido Schimmel authored
      switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
      port netdevs so that packets being flooded by the device won't be
      flooded twice.
      
      It works by assigning a unique identifier (the ifindex of the first
      bridge port) to bridge ports sharing the same parent ID. This prevents
      packets from being flooded twice by the same switch, but will flood
      packets through bridge ports belonging to a different switch.
      
      This method is problematic when stacked devices are taken into account,
      such as VLANs. In such cases, a physical port netdev can have upper
      devices being members in two different bridges, thus requiring two
      different 'offload_fwd_mark's to be configured on the port netdev, which
      is impossible.
      
      The main problem is that packet and netdev marking is performed at the
      physical netdev level, whereas flooding occurs between bridge ports,
      which are not necessarily port netdevs.
      
      Instead, packet and netdev marking should really be done in the bridge
      driver with the switch driver only telling it which packets it already
      forwarded. The bridge driver will mark such packets using the mark
      assigned to the ingress bridge port and will prevent the packet from
      being forwarded through any bridge port sharing the same mark (i.e.
      having the same parent ID).
      
      Remove the current switchdev 'offload_fwd_mark' implementation and
      instead implement the proposed method. In addition, make rocker - the
      sole user of the mark - use the proposed method.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6bc506b4
    • Ido Schimmel's avatar
      switchdev: Support parent ID comparison for stacked devices · 5c326ab4
      Ido Schimmel authored
      switchdev_port_same_parent_id() currently expects port netdevs, but we
      need it to support stacked devices in the next patch, so drop the
      NO_RECURSE flag.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c326ab4
    • Ivan Vecera's avatar
      devlink: remove unused priv_size · 2a313cdf
      Ivan Vecera authored
      Remove unused and useless priv_size member from struct devlink_ops.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a313cdf
    • Paolo Abeni's avatar
      net: flush the softnet backlog in process context · 145dd5f9
      Paolo Abeni authored
      Currently in process_backlog(), the process_queue dequeuing is
      performed with local IRQ disabled, to protect against
      flush_backlog(), which runs in hard IRQ context.
      
      This patch moves the flush operation to a work queue and runs the
      callback with bottom half disabled to protect the process_queue
      against dequeuing.
      Since process_queue is now always manipulated in bottom half context,
      the irq disable/enable pair around the dequeue operation are removed.
      
      To keep the flush time as low as possible, the flush
      works are scheduled on all online cpu simultaneously, using the
      high priority work-queue and statically allocated, per cpu,
      work structs.
      
      Overall this change increases the time required to destroy a device
      to improve slightly the packets reinjection performances.
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      145dd5f9
    • Nikolay Aleksandrov's avatar
      net: bridge: export also pvid flag in the xstats flags · 72f4af4e
      Nikolay Aleksandrov authored
      When I added support to export the vlan entry flags via xstats I forgot to
      add support for the pvid since it is manually matched, so check if the
      entry matches the vlan_group's pvid and set the flag appropriately.
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72f4af4e
    • Xin Long's avatar
      veth: sctp: add NETIF_F_SCTP_CRC to device features · c80fafbb
      Xin Long authored
      Commit b17c7069 ("loopback: sctp: add NETIF_F_SCTP_CSUM to device
      features") added NETIF_F_SCTP_CRC to device features for lo device to
      improve the performance of sctp over lo.
      
      This patch is to add NETIF_F_SCTP_CRC to device features for veth to
      improve the performance of sctp over veth.
      
      Before this patch:
        ip netns exec cs_client netperf -H 10.167.12.2 -t SCTP_STREAM -- -m 10K
        Recv   Send    Send
        Socket Socket  Message  Elapsed
        Size   Size    Size     Time     Throughput
        bytes  bytes   bytes    secs.    10^6bits/sec
      
        212992 212992  10240    10.00    1117.16
      
      After this patch:
        ip netns exec cs_client netperf -H 10.167.12.2 -t SCTP_STREAM -- -m 10K
        Recv   Send    Send
        Socket Socket  Message  Elapsed
        Size   Size    Size     Time     Throughput
        bytes  bytes   bytes    secs.    10^6bits/sec
      
        212992 212992  10240    10.20    1415.22
      Tested-by: default avatarLi Shuang <tjlishuang@yeah.net>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c80fafbb
  2. 25 Aug, 2016 10 commits
  3. 24 Aug, 2016 13 commits
    • Yuval Mintz's avatar
      bnx2x: Don't flush multicast MACs · c7b7b483
      Yuval Mintz authored
      When ndo_set_rx_mode() is called for bnx2x, as part of process of
      configuring the new MAC address filters [both unicast & multicast]
      driver begins by flushing the existing configuration and then iterating
      over the network device's list of addresses and configures those instead.
      
      This has the side-effect of creating a short gap where traffic wouldn't
      be properly classified, as no filters are configured in HW.
      While for unicasts this is rather insignificant [as unicast MACs don't
      frequently change while interface is actually running],
      for multicast traffic it does pose an issue as there are multicast-based
      networks where new multicast groups would constantly be removed and
      added.
      
      This patch tries to remedy this [at least for the newer adapters] -
      Instead of flushing & reconfiguring all existing multicast filters,
      the driver would instead create the approximate hash match that would
      result from the required filters. It would then compare it against the
      currently configured approximate hash match, and only add and remove the
      delta between those.
      Signed-off-by: default avatarYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7b7b483
    • David S. Miller's avatar
      Merge tag 'rxrpc-rewrite-20160824-2' of... · 6546c78e
      David S. Miller authored
      Merge tag 'rxrpc-rewrite-20160824-2' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      David Howells says:
      
      ====================
      rxrpc: Add better client conn management strategy
      
      These two patches add a better client connection management strategy.  They
      need to be applied on top of the just-posted fixes.
      
       (1) Duplicate the connection list and separate out procfs iteration from
           garbage collection.  This is necessary for the next patch as with that
           client connections no longer appear on a single list and may not
           appear on a list at all - and really don't want to be exposed to the
           old garbage collector.
      
           (Note that client conns aren't left dangling, they're also in a tree
           rooted in the local endpoint so that they can be found by a user
           wanting to make a new client call.  Service conns do not appear in
           this tree.)
      
       (2) Implement a better lifetime management and garbage collection strategy
           for client connections.
      
           In this, a client connection can be in one of five cache states
           (inactive, waiting, active, culled and idle).  Limits are set on the
           number of client conns that may be active at any one time and makes
           users wait if they want to start a new call when there isn't capacity
           available.
      
           To make capacity available, active and idle connections can be culled,
           after a short delay (to allow for retransmission).  The delay is
           reduced if the capacity exceeds a tunable threshold.
      
           If there is spare capacity, client conns are permitted to hang around
           a fair bit longer (tunable) so as to allow reuse of negotiated
           security contexts.
      
           After this patch, the client conn strategy is separate from that of
           service conns (which continues to use the old code for the moment).
      
           This difference in strategy is because the client side retains control
           over when it allows a connection to become active, whereas the service
           side has no control over when it sees a new connection or a new call
           on an old connection.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6546c78e
    • David S. Miller's avatar
      Merge tag 'rxrpc-rewrite-20160824-1' of... · d3c10db1
      David S. Miller authored
      Merge tag 'rxrpc-rewrite-20160824-1' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      David Howells says:
      
      ====================
      rxrpc: More fixes
      
      Here are a couple of fix patches:
      
       (1) Fix the conn-based retransmission patch posted yesterday.  This breaks
           if it actually has to retransmit.  However, it seems the likelihood of
           this happening is really low, despite the server I'm testing against
           being located >3000 miles away, and sometime of the time it's handled
           in the call background processor before we manage to disconnect the
           call - hence why I didn't spot it.
      
       (2) /proc/net/rxrpc_calls can cause a crash it accessed whilst a call is
           being torn down.  The window of opportunity is pretty small, however,
           as calls don't stay in this state for long.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d3c10db1
    • David S. Miller's avatar
      Merge branch 'mlxsw-fdb-learning-offload' · d14c800b
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Offload FDB learning configuration
      
      Ido says:
      This patchset addresses two long standing issues in the mlxsw driver
      concerning FDB learning.
      
      Patch 1 limits the number of FDB records processed by the driver in a
      single session. This is useful in situations in which many new records
      need to be processed, thereby causing the RTNL mutex to be held for
      long periods of time.
      
      Patches 2-6 offload the learning configuration (on / off) of bridge
      ports to the device instead of having the driver decide whether a
      record needs to be learned or not.
      
      The last patch is fallout and removes configuration no longer necessary
      after the first patches are applied.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d14c800b
    • Ido Schimmel's avatar
      mlxsw: spectrum: Don't set learning when creating vPorts · 0f7a4d8a
      Ido Schimmel authored
      Before commit 99724c18 ("mlxsw: spectrum: Introduce support for
      router interfaces") we used to assign vFIDs to the created vPorts. Since
      these vPorts were used for slow path traffic we had to disable learning
      for them, as it doesn't make sense to have it enabled.
      
      This is no longer the case and now vPorts are either used for router
      interfaces (for which learning is disabled by the firmware) or bridge
      ports (for which learning is explicitly enabled by the driver).
      
      Therefore, we can remove the learning configuration upon vPort creation.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0f7a4d8a
    • Ido Schimmel's avatar
      mlxsw: spectrum: Remove unnecessary check in FDB processing · 81f77bc0
      Ido Schimmel authored
      We now offload the learning configuration to the device and don't rely
      on the driver to decide whether to learn the FDB record, so remove the
      check.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      81f77bc0
    • Ido Schimmel's avatar
      mlxsw: spectrum: Offload learning to the switch ASIC · 89b548f0
      Ido Schimmel authored
      Up until now we simply stored the learning configuration of a bridge
      port in the driver and decided whether to learn a new FDB record based
      on this value.
      
      However, this is sub-optimal in cases where learning is disabled on the
      bridge port, as the device repeatedly generates learning notifications
      for the same record.
      
      Instead, offload the learning configuration to the device, thereby
      preventing it from generating notifications when learning is disabled.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      89b548f0
    • Ido Schimmel's avatar
      mlxsw: spectrum: Configure learning for VLAN-aware bridge port · 584d73df
      Ido Schimmel authored
      We are going to prevent the device from generating learning
      notifications for a port that was configured with learning disabled.
      
      Since learning configuration is done per {Port, VID} we need to apply
      the port's learning configuration for any VID that is added to the
      bridge port's VLAN filter list.
      
      When a VID is added to the VLAN filter list of a VLAN-aware bridge port,
      configure the {Port, VID} learning status according to the port's
      configuration. When the VID is removed, disable learning for the {Port,
      VID}.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      584d73df
    • Ido Schimmel's avatar
      mlxsw: spectrum: Don't abort on first error when removing VLANs · 640be7b7
      Ido Schimmel authored
      When removing VLANs from the VLAN-aware bridge we shouldn't abort on the
      first error, as we'll otherwise have resources that will never be freed.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      640be7b7
    • Ido Schimmel's avatar
      mlxsw: spectrum: Make VLAN deletion function symmetric · f7a8f6ce
      Ido Schimmel authored
      Commit 05978481 ("mlxsw: spectrum: Create PVID vPort before
      registering netdevice") removed __mlxsw_sp_port_vlans_del() from the
      init sequence of the driver, which forced it to be non-symmetric with
      regards to __mlxsw_sp_port_vlans_add().
      
      Make both functions symmetric as the constraint no longer exists.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7a8f6ce
    • Ido Schimmel's avatar
      mlxsw: spectrum: Limit number of FDB records per learning session · 1803e0fb
      Ido Schimmel authored
      Up until now a learning session ended whenever the number of queried
      records was zero. This turned out to be problematic in situations where
      a large number of MACs (48K) had to be processed by the switch driver,
      as RTNL mutex is held during the learning session.
      
      Instead, limit the number of FDB records that can be processed in a
      session to 64. This means that every time the device is queried for
      learning notifications (currently, every 100ms), up to 64 records will
      be processed by the switch driver.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1803e0fb
    • David S. Miller's avatar
      Merge tag 'shared-for-4.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma · fff84d2a
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      Mellanox mlx5 core driver updates 2016-08-24
      
      This series contains some low level and API updates for mlx5 core
      driver interface and mlx5_ifc.h, plus mlx5 LAG core driver support,
      to be shared as base code for net-next and rdma mlx5 4.9 submissions.
      
      From Alex and Artemy, Update mlx5_ifc for modify RQ and XRC bits.
      
      From Noa, Expose mlx5 link modes so they can be used in RDMA tree for rdma tools.
      
      From Aviv, LAG support needed for RDMA.
          - Add needed hardware structures, layouts and interface
          - mlx5 core driver LAG implementation
          - Introduce mlx5 core driver LAG API for mlx5_ib
      
      From Maor, add two low level patches for mlx5 hardware sniffer QP
      infrastructure bits and capabilities, plus added the namespace for sniffer
      steering tables.  Needed for RDMA subtree.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fff84d2a
    • David Howells's avatar
      rxrpc: Improve management and caching of client connection objects · 45025bce
      David Howells authored
      Improve the management and caching of client rxrpc connection objects.
      From this point, client connections will be managed separately from service
      connections because AF_RXRPC controls the creation and re-use of client
      connections but doesn't have that luxury with service connections.
      
      Further, there will be limits on the numbers of client connections that may
      be live on a machine.  No direct restriction will be placed on the number
      of client calls, excepting that each client connection can support a
      maximum of four concurrent calls.
      
      Note that, for a number of reasons, we don't want to simply discard a
      client connection as soon as the last call is apparently finished:
      
       (1) Security is negotiated per-connection and the context is then shared
           between all calls on that connection.  The context can be negotiated
           again if the connection lapses, but that involves holding up calls
           whilst at least two packets are exchanged and various crypto bits are
           performed - so we'd ideally like to cache it for a little while at
           least.
      
       (2) If a packet goes astray, we will need to retransmit a final ACK or
           ABORT packet.  To make this work, we need to keep around the
           connection details for a little while.
      
       (3) The locally held structures represent some amount of setup time, to be
           weighed against their occupation of memory when idle.
      
      
      To this end, the client connection cache is managed by a state machine on
      each connection.  There are five states:
      
       (1) INACTIVE - The connection is not held in any list and may not have
           been exposed to the world.  If it has been previously exposed, it was
           discarded from the idle list after expiring.
      
       (2) WAITING - The connection is waiting for the number of client conns to
           drop below the maximum capacity.  Calls may be in progress upon it
           from when it was active and got culled.
      
           The connection is on the rxrpc_waiting_client_conns list which is kept
           in to-be-granted order.  Culled conns with waiters go to the back of
           the queue just like new conns.
      
       (3) ACTIVE - The connection has at least one call in progress upon it, it
           may freely grant available channels to new calls and calls may be
           waiting on it for channels to become available.
      
           The connection is on the rxrpc_active_client_conns list which is kept
           in activation order for culling purposes.
      
       (4) CULLED - The connection got summarily culled to try and free up
           capacity.  Calls currently in progress on the connection are allowed
           to continue, but new calls will have to wait.  There can be no waiters
           in this state - the conn would have to go to the WAITING state
           instead.
      
       (5) IDLE - The connection has no calls in progress upon it and must have
           been exposed to the world (ie. the EXPOSED flag must be set).  When it
           expires, the EXPOSED flag is cleared and the connection transitions to
           the INACTIVE state.
      
           The connection is on the rxrpc_idle_client_conns list which is kept in
           order of how soon they'll expire.
      
      A connection in the ACTIVE or CULLED state must have at least one active
      call upon it; if in the WAITING state it may have active calls upon it;
      other states may not have active calls.
      
      As long as a connection remains active and doesn't get culled, it may
      continue to process calls - even if there are connections on the wait
      queue.  This simplifies things a bit and reduces the amount of checking we
      need do.
      
      
      There are a couple flags of relevance to the cache:
      
       (1) EXPOSED - The connection ID got exposed to the world.  If this flag is
           set, an extra ref is added to the connection preventing it from being
           reaped when it has no calls outstanding.  This flag is cleared and the
           ref dropped when a conn is discarded from the idle list.
      
       (2) DONT_REUSE - The connection should be discarded as soon as possible and
           should not be reused.
      
      
      This commit also provides a number of new settings:
      
       (*) /proc/net/rxrpc/max_client_conns
      
           The maximum number of live client connections.  Above this number, new
           connections get added to the wait list and must wait for an active
           conn to be culled.  Culled connections can be reused, but they will go
           to the back of the wait list and have to wait.
      
       (*) /proc/net/rxrpc/reap_client_conns
      
           If the number of desired connections exceeds the maximum above, the
           active connection list will be culled until there are only this many
           left in it.
      
       (*) /proc/net/rxrpc/idle_conn_expiry
      
           The normal expiry time for a client connection, provided there are
           fewer than reap_client_conns of them around.
      
       (*) /proc/net/rxrpc/idle_conn_fast_expiry
      
           The expedited expiry time, used when there are more than
           reap_client_conns of them around.
      
      
      Note that I combined the Tx wait queue with the channel grant wait queue to
      save space as only one of these should be in use at once.
      
      Note also that, for the moment, the service connection cache still uses the
      old connection management code.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      45025bce