1. 28 Sep, 2017 22 commits
  2. 27 Sep, 2017 18 commits
    • David S. Miller's avatar
      Merge branch 'mlxsw-Add-support-for-offloading-IPv4-multicast-routes' · a2e4a219
      David S. Miller authored
      Jiri Pirko says:
      
      ====================
      mlxsw: Add support for offloading IPv4 multicast routes
      
      Yotam says:
      
      This patch-set introduces offloading of the kernel IPv4 multicast router
      logic in the Spectrum driver.
      
      The first patch makes the Spectrum driver ignore FIB notifications that are
      not of address family IPv4 or IPv6. This is needed in order to prevent
      crashes while the next patches introduce the RTNL_FAMILY_IPMR FIB
      notifications.
      
      Patches 2-5 update ipmr to use the FIB notification chain for both MFC and
      VIF notifications, and patches 8-12 update the Spectrum driver to register
      to these notifications and offload the routes.
      
      Similarly to IPv4 and IPv6, any failure will trigger the abort mechanism
      which is updated in this patch-set to eject multicast route tables too.
      
      At this stage, the following limitations apply:
       - A multicast MFC route will be offloaded by the driver if all the output
         interfaces are Spectrum router interfaces (RIFs). In any other case
         (which includes pimreg device, tunnel devices and management ports) the
         route will be trapped to the CPU and the packets will be forwarded by
         software.
       - ipmr proxy routes are not supported and will trigger the abort
         mechanism.
       - The MFC TTL values are currently treated as boolean: if the value is
         different than 255, the traffic is forwarded to the interface and if the
         value is 255 it is not forwarded. Dropping packets based on their TTL isn't
         currently supported.
      
      To allow users to have visibility on which of the routes are offloaded and
      which are not, patch 6 introduces a per-route offload indication similar to
      IPv4 and IPv6 routes which is sent to the user via the RTNetlink interface.
      
      The Spectrum driver multicast router offloading support, which is
      introduced in patches 8 and 9, is divided into two parts:
       - The hardware logic which abstracts the Spectrum hardware and provides a
         simple API for the upper levels.
       - The offloading logic which gets the MFC and VIF notifications from the
         kernel and updates the hardware using the hardware logic part.
      
      Finally, the last patch makes the Spectrum router logic not ignore the
      multicast FIB notifications and call the corresponding functions in the
      multicast router offloading logic.
      
      ---
      v2->v3:
       - Move the ipmr_rule_default function definition to be inside the already
         existing CONFIG_IP_MROUTE_MULTIPLE_TABLES ifdef block (patch 6)
       - Remove double =0 initialization in spectrum_mr.c (patch 7)
       - Fix route4 allocation size (patch 7)
      v1->v2:
       - Add comments for struct fields in mroute.h
       - Take the mrt_lock while dumping VIFs in the fib_notifier dump callback
       - Update the MFC lastuse field too
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a2e4a219
    • Yotam Gigi's avatar
      mlxsw: spectrum: router: Don't ignore IPMR notifications · 664375e9
      Yotam Gigi authored
      Make the Spectrum router logic not ignore the RTNL_FAMILY_IPMR FIB
      notifications.
      
      Past commits added the IPMR VIF and MFC add/del notifications via the
      fib_notifier chain. In addition, a code for handling these notifications in
      the Spectrum router logic was added. Make the Spectrum router logic not
      ignore these notifications and forward the requests to the Spectrum
      multicast router offloading logic.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      664375e9
    • Yotam Gigi's avatar
      mlxsw: spectrum: Notify multicast router on RIF MTU changes · fd890fe9
      Yotam Gigi authored
      Due to the fact that multicast routes hold the minimum MTU of all the
      egress RIFs and trap packets that don't meet it, notify the mulitcast
      router code on RIF MTU changes.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd890fe9
    • Yotam Gigi's avatar
      mlxsw: spectrum_router: Add multicast routes notification handling functionality · d42b0965
      Yotam Gigi authored
      Add functionality for calling the multicast routing offloading logic upon
      MFC and VIF add and delete notifications. In addition, call the multicast
      routing upon RIF addition and deletion events.
      
      As the multicast routing offload logic may sleep, the actual calls are done
      in a deferred work. To ensure the MFC object is not freed in that interval,
      a reference is held to it. In case of a failure, the abort mechanism is
      used, which ejects all the routes from the hardware and triggers the
      traffic to flow through the kernel.
      
      Note: At that stage, the FIB notifications are still ignored, and will be
      enabled in a further patch.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d42b0965
    • Yotam Gigi's avatar
      mlxsw: spectrum: router: Squash the default route table to main · 7e50d435
      Yotam Gigi authored
      Currently, the mlxsw Spectrum driver offloads only either the RT_TABLE_MAIN
      FIB table or the VRF tables, so the RT_TABLE_LOCAL table is squashed to the
      RT_TABLE_MAIN table to allow local routes to be offloaded too.
      
      By default, multicast MFC routes which are not assigned to any user
      requested table are put in the RT_TABLE_DEFAULT table.
      
      Due to the fact that offloading multicast MFC routes support in Spectrum
      router logic is going to be introduced soon, squash the default table to
      MAIN too.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e50d435
    • Yotam Gigi's avatar
      mlxsw: spectrum: Add the multicast routing hardware logic · 0e14c777
      Yotam Gigi authored
      Implement the multicast routing hardware API introduced in previous patch
      for the specific spectrum hardware.
      
      The spectrum hardware multicast routes are written using the RMFT2 register
      and point to an ACL flexible action set. The actions used for multicast
      routes are:
       - Counter action, which allows counting bytes and packets on multicast
         routes.
       - Multicast route action, which provide RPF check and do the actual packet
         duplication to a list of RIFs.
       - Trap action, in the case the route action specified by the called is
         trap.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0e14c777
    • Yotam Gigi's avatar
      mlxsw: spectrum: Add the multicast routing offloading logic · c011ec1b
      Yotam Gigi authored
      Add the multicast router offloading logic, which is in charge of handling
      the VIF and MFC notifications and translating it to the hardware logic API.
      
      The offloading logic has to overcome several obstacles in order to safely
      comply with the kernel multicast router user API:
       - It must keep track of the mapping between VIFs to netdevices. The user
         can add an MFC cache entry pointing to a VIF, delete the VIF and add
         re-add it with a different netdevice. The offloading logic has to handle
         this in order to be compatible with the kernel logic.
       - It must keep track of the mapping between netdevices to spectrum RIFs,
         as the current hardware implementation assume having a RIF for every
         port in a multicast router.
       - It must handle routes pointing to pimreg device to be trapped to the
         kernel, as the packet should be delivered to userspace.
       - It must handle routes pointing tunnel VIFs. The current implementation
         does not support multicast forwarding to tunnels, thus routes that point
         to a tunnel should be trapped to the kernel.
       - It must be aware of proxy multicast routes, which include both (*,*)
         routes and duplicate routes. Currently proxy routes are not offloaded
         and trigger the abort mechanism: removal of all routes from hardware and
         triggering the traffic to go through the kernel.
      
      The multicast routing offloading logic also updates the counters of the
      offloaded MFC routes in a periodic work.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c011ec1b
    • Yotam Gigi's avatar
      net: mroute: Check if rule is a default rule · 478e4c2f
      Yotam Gigi authored
      When the ipmr starts, it adds one default FIB rule that matches all packets
      and sends them to the DEFAULT (multicast) FIB table. A more complex rule
      can be added by user to specify that for a specific interface, a packet
      should be look up at either an arbitrary table or according to the l3mdev
      of the interface.
      
      For drivers willing to offload the ipmr logic into a hardware but don't
      want to offload all the FIB rules functionality, provide a function that
      can indicate whether the FIB rule is the default multicast rule, thus only
      one routing table is needed.
      
      This way, a driver can register to the FIB notification chain, get
      notifications about FIB rules added and trigger some kind of an internal
      abort mechanism when a non default rule is added by the user.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      478e4c2f
    • Yotam Gigi's avatar
      net: ipmr: Add MFC offload indication · c7c0bbea
      Yotam Gigi authored
      Allow drivers, registered to the fib notification chain indicate whether a
      multicast MFC route is offloaded or not, similarly to unicast routes. The
      indication of whether a route is offloaded is done using the mfc_flags
      field on an mfc_cache struct, and the information is sent to the userspace
      via the RTNetlink interface only.
      
      Currently, MFC routes are either offloaded or not, thus there is no need to
      add per-VIF offload indication.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c7c0bbea
    • Yotam Gigi's avatar
      ipmr: Send FIB notifications on MFC and VIF entries · b362053a
      Yotam Gigi authored
      Use the newly introduced notification chain to send events upon VIF and MFC
      addition and deletion. The MFC notifications are sent only on resolved MFC
      entries, as unresolved cannot be offloaded.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b362053a
    • Yotam Gigi's avatar
      ipmr: Add FIB notification access functions · 4d65b948
      Yotam Gigi authored
      Make the ipmr module register as a FIB notifier. To do that, implement both
      the ipmr_seq_read and ipmr_dump ops.
      
      The ipmr_seq_read op returns a sequence counter that is incremented on
      every notification related operation done by the ipmr. To implement that,
      add a sequence counter in the netns_ipv4 struct and increment it whenever a
      new MFC route or VIF are added or deleted. The sequence operations are
      protected by the RTNL lock.
      
      The ipmr_dump iterates the list of MFC routes and the list of VIF entries
      and sends notifications about them. The entries dump is done under RCU
      where the VIF dump uses the mrt_lock too, as the vif->dev field can change
      under RCU.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d65b948
    • Yotam Gigi's avatar
      ipmr: Add reference count to MFC entries · 310ebbba
      Yotam Gigi authored
      Next commits will introduce MFC notifications through the atomic
      fib_notification chain, thus allowing modules to be aware of MFC entries.
      
      Due to the fact that modules may need to hold a reference to an MFC entry,
      add reference count to MFC entries to prevent them from being freed while
      these modules use them.
      
      The reference counting is done only on resolved MFC entries currently.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      310ebbba
    • Yotam Gigi's avatar
      fib: notifier: Add VIF add and delete event types · 85e48228
      Yotam Gigi authored
      In order for an interface to forward packets according to the kernel
      multicast routing table, it must be configured with a VIF index according
      to the mroute user API. The VIF index is then used to refer to that
      interface in the mroute user API, for example, to set the iif and oifs of
      an MFC entry.
      
      In order to allow drivers to be aware and offload multicast routes, they
      have to be aware of the VIF add and delete notifications.
      
      Due to the fact that a specific VIF can be deleted and re-added pointing to
      another netdevice, and the MFC routes that point to it will forward the
      matching packets to the new netdevice, a driver willing to offload MFC
      cache entries must be aware of the VIF add and delete events in addition to
      MFC routes notifications.
      Signed-off-by: default avatarYotam Gigi <yotamg@mellanox.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85e48228
    • David S. Miller's avatar
      Merge branch 'nfp-flower-vxlan-tunnel-offload' · 1ca94d79
      David S. Miller authored
      Simon Horman says:
      
      ====================
      nfp: flower vxlan tunnel offload
      
      John says:
      
      This patch set allows offloading of TC flower match and set tunnel fields
      to the NFP. The initial focus is on VXLAN traffic. Due to the current
      state of the NFP firmware, only VXLAN traffic on well known port 4789 is
      handled. The match and action fields must explicity set this value to be
      supported. Tunnel end point information is also offloaded to the NFP for
      both encapsulation and decapsulation. The NFP expects 3 separate data sets
      to be supplied.
      
      For decapsulation, 2 separate lists exist; a list of MAC addresses
      referenced by an index comprised of the port number, and a list of IP
      addresses. These IP addresses are not connected to a MAC or port. The MAC
      addresses can be written as a block or one at a time (because they have an
      index, previous values can be overwritten) while the IP addresses are
      always written as a list of all the available IPs. Because the MAC address
      used as a tunnel end point may be associated with a physical port or may
      be a virtual netdev like an OVS bridge, we do not know which addresses
      should be offloaded. For this reason, all MAC addresses of active netdevs
      are offloaded to the NFP. A notifier checks for changes to any currently
      offloaded MACs or any new netdevs that may occur. For IP addresses, the
      tunnel end point used in the rules is known as the destination IP address
      must be specified in the flower classifier rule. When a new IP address
      appears in a rule, the IP address is offloaded. The IP is removed from the
      offloaded list when all rules matching on that IP are deleted.
      
      For encapsulation, a next hop table is updated on the NFP that contains
      the source/dest IPs, MACs and egress port. These are written individually
      when requested. If the NFP tries to encapsulate a packet but does not know
      the next hop, then is sends a request to the host. The host carries out a
      route lookup and populates the given entry on the NFP table. A notifier
      also exists to check for any links changing or going down in the kernel
      next hop table. If an offloaded next hop entry is removed from the kernel
      then it is also removed on the NFP.
      
      The NFP periodically sends a message to the host telling it which tunnel
      ports have packets egressing the system. The host uses this information to
      update the used value in the neighbour entry. This means that, rather than
      expire when it times out, the kernel will send an ARP to check if the link
      is still live. From an NFP perspective, this means that valid entries will
      not be removed from its next hop table.
      ====================
      Acked-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1ca94d79
    • John Hurley's avatar
      nfp: flower vxlan neighbour keep-alive · 856f5b13
      John Hurley authored
      Periodically receive messages containing the destination IPs of tunnels
      that have recently forwarded traffic. Update the neighbour entries 'used'
      value for these IPs next hop.
      
      This prevents the neighbour entry from expiring on timeout but rather
      signals an ARP to verify the connection. From an NFP perspective, packets
      will not fall back mid-flow unless the link is verified to be down.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      856f5b13
    • John Hurley's avatar
      nfp: flower vxlan neighbour offload · 8e6a9046
      John Hurley authored
      Receive a request when the NFP does not know the next hop for a packet
      that is to be encapsulated in a VXLAN tunnel. Do a route lookup, determine
      the next hop entry and update neighbour table on NFP. Monitor the kernel
      neighbour table for link changes and update NFP with relevant information.
      Overwrite routes with zero values on the NFP when they expire.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8e6a9046
    • John Hurley's avatar
      nfp: offload vxlan IPv4 endpoints of flower rules · 2d9ad71a
      John Hurley authored
      Maintain a list of IPv4 addresses used as the tunnel destination IP match
      fields in currently active flower rules. Offload the entire list of
      NFP_FL_IPV4_ADDRS_MAX (even if some are unused) when new IPs are added or
      removed. The NFP should only be aware of tunnel end points that are
      currently used by rules on the device
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d9ad71a
    • John Hurley's avatar
      nfp: offload flower vxlan endpoint MAC addresses · fd0dd1ab
      John Hurley authored
      Generate a list of MAC addresses of netdevs that could be used as VXLAN
      tunnel end points. Give offloaded MACs an index for storage on the NFP in
      the ranges:
      0x100-0x1ff physical port representors
      0x200-0x2ff VF port representors
      0x300-0x3ff other offloads (e.g. vxlan netdevs, ovs bridges)
      
      Assign phys and vf indexes based on unique 8 bit values in the port num.
      Maintain list of other netdevs to ensure same netdev is not offloaded
      twice and each gets a unique ID without exhausting the entries. Because
      the IDs are unique but constant for a netdev, any changes are implemented
      by overwriting the index on NFP.
      Signed-off-by: default avatarJohn Hurley <john.hurley@netronome.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@netronome.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fd0dd1ab