1. 28 May, 2020 19 commits
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2020-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 1eba1110
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2020-05-26
      
      Updates highlights:
      
      1) From Vu Pham (8): Support VM traffics failover with bonded VF
      representors and e-switch egress/ingress ACLs
      
      This series introduce the support for Virtual Machine running I/O
      traffic over direct/fast VF path and failing over to slower
      paravirtualized path using the following features:
      
           __________________________________
          |  VM      _________________        |
          |          |FAILOVER device |       |
          |          |________________|       |
          |                  |                |
          |              ____|_____           |
          |              |         |          |
          |       ______ |___  ____|_______   |
          |       |  VF PT  |  |VIRTIO-NET |  |
          |       | device  |  | device    |  |
          |       |_________|  |___________|  |
          |___________|______________|________|
                      |              |
                      | HYPERVISOR   |
                      |          ____|______
                      |         |  macvtap  |
                      |         |virtio BE  |
                      |         |___________|
                      |               |
                      |           ____|_____
                      |           |host VF  |
                      |           |_________|
                      |               |
                 _____|______    _____|_____
                 |  PT VF    |  |  host VF  |
                 |representor|  |representor|
                 |___________|  |___________|
                      \               /
                       \             /
                        \           /
                         \         /                     _________________
                          \_______/                     |                |
                       _______|________                 |    V-SWITCH    |
                      |VF representors |________________|      (OVS)     |
                      |      bond      |                |________________|
                      |________________|                        |
                                                        ________|________
                                                       |    Uplink       |
                                                       |  representor    |
                                                       |_________________|
      
      Summary:
      --------
      Problem statement:
      ------------------
      Currently in above topology, when netfailover device is configured using
      VFs and eswitch VF representors, and when traffic fails over to stand-by
      VF which is exposed using macvtap device to guest VM, eswitch fails to
      switch the traffic to the stand-by VF representor. This occurs because
      there is no knowledge at eswitch level of the stand-by representor
      device.
      
      Solution:
      ---------
      Using standard bonding driver, a bond netdevice is created over VF
      representor device which is used for offloading tc rules.
      Two VF representors are bonded together, one for the passthrough VF
      device and another one for the stand-by VF device.
      With this solution, mlx5 driver listens to the failover events
      occuring at the bond device level to failover traffic to either of
      the active VF representor of the bond.
      
      a. VM with netfailover device of VF pass-thru (PT) device and virtio-net
         paravirtualized device with same MAC-address to handle failover
         traffics at VM level.
      
      b. Host bond is active-standby mode, with the lower devices being the VM
         VF PT representor, and the representor of the 2nd VF to handle
         failover traffics at Hypervisor/V-Switch OVS level.
         - During the steady state (fast datapath): set the bond active
           device to be the VM PT VF representor.
         - During failover: apply bond failover to the second VF representor
           device which connects to the VM non-accelerated path.
      
      c. E-Switch ingress/egress ACL tables to support failover traffics at
         E-Switch level
         I. E-Switch egress ACL with forward-to-vport rule:
           - By default, eswitch vport egress acl forward packets to its
             counterpart NIC vport.
           - During port failover, the egress acl forward-to-vport rule will
             be added to e-switch vport of passive/in-active slave VF
      representor
             to forward packets to other e-switch vport ie. the active slave
             representor's e-switch vport to handle egress "failover"
      traffics.
           - Using lower change netdev event to detect a representor is a
             lower
             dev (slave) of bond and becomes active, adding egress acl
             forward-to-vport rule of all other slave netdevs to forward to
      this
             representor's vport.
           - Using upper change netdev event to detect a representor unslaving
             from bond device to delete its vport's egress acl forward-to-vport
             rule.
      
         II. E-Switch ingress ACL metadata reg_c for match
           - Bonded representors' vorts sharing tc block have the same
             root ingress acl table and a unique metadata for match.
           - Traffics from both representors's vports will be tagged with same
             unique metadata reg_c.
           - Using upper change netdev event to detect a representor
             enslaving/unslaving from bond device to setup shared root ingress
             acl and unique metadata.
      
      2) From Alex Vesker (2): Slpit RX and TX lock for parallel rule insertion in
      software steering
      
      3) Eli Britstein (2): Optimize performance for IPv4/IPv6 ethertype use the HW
      ip_version register rather than parsing eth frames for ethertype.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1eba1110
    • Eric Dumazet's avatar
      tcp: ipv6: support RFC 6069 (TCP-LD) · d2924569
      Eric Dumazet authored
      Make tcp_ld_RTO_revert() helper available to IPv6, and
      implement RFC 6069 :
      
      Quoting this RFC :
      
      3. Connectivity Disruption Indication
      
         For Internet Protocol version 6 (IPv6) [RFC2460], the counterpart of
         the ICMP destination unreachable message of code 0 (net unreachable)
         and of code 1 (host unreachable) is the ICMPv6 destination
         unreachable message of code 0 (no route to destination) [RFC4443].
         As with IPv4, a router should generate an ICMPv6 destination
         unreachable message of code 0 in response to a packet that cannot be
         delivered to its destination address because it lacks a matching
         entry in its routing table.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarYuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2924569
    • Vladimir Oltean's avatar
      net: dsa: sja1105: offload the Credit-Based Shaper qdisc · 4d752508
      Vladimir Oltean authored
      SJA1105, being AVB/TSN switches, provide hardware assist for the
      Credit-Based Shaper as described in the IEEE 8021Q-2018 document.
      
      First generation has 10 shapers, freely assignable to any of the 4
      external ports and 8 traffic classes, and second generation has 16
      shapers.
      
      The Credit-Based Shaper tables are accessed through the dynamic
      reconfiguration interface, so we have to restore them manually after a
      switch reset. The tables are backed up by the static config only on
      P/Q/R/S, and we don't want to add custom code only for that family,
      since the procedure that is in place now works for both.
      
      Tested with the following commands:
      
      data_rate_kbps=67000
      port_transmit_rate_kbps=1000000
      idleslope=$data_rate_kbps
      sendslope=$(($idleslope - $port_transmit_rate_kbps))
      locredit=$((-0x80000000))
      hicredit=$((0x7fffffff))
      tc qdisc add dev swp2 root handle 1: mqprio hw 0 num_tc 8 \
              map 0 1 2 3 4 5 6 7 \
              queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7
      tc qdisc replace dev swp2 parent 1:1 cbs \
              idleslope $idleslope \
              sendslope $sendslope \
              hicredit $hicredit \
              locredit $locredit \
              offload 1
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d752508
    • David Ahern's avatar
      selftests: Add torture tests to nexthop tests · 7c741868
      David Ahern authored
      Add Nik's torture tests as a new set to stress the replace and cleanup
      paths.
      
      Torture test created by Nikolay Aleksandrov and then I adapted to
      selftest and added IPv6 version.
      Signed-off-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c741868
    • Alex Vesker's avatar
      net/mlx5: DR, Split RX and TX lock for parallel insertion · ed03a418
      Alex Vesker authored
      Change the locking flow to support RX and TX locks, splitting
      the single lock to two will allow inserting rules in parallel
      for RX and TX parts of the FDB.
      
      Locking the dr_domain will be done by locking the RX domain
      and the TX domain locks, this is mostly used for control operations
      on the dr_domain. When inserting rules for RX or TX the single
      nic_doamin RX or TX lock will be used. Splitting the lock is safe since
      RX and TX domains are logically separated from each other, shared
      objects such the send-ring and memory pool are protected by locks.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Reviewed-by: default avatarErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ed03a418
    • Alex Vesker's avatar
      net/mlx5: DR, Add a spinlock to protect the send ring · cedb2819
      Alex Vesker authored
      Adding this lock will allow writing steering entries without
      locking the dr_domain and allow parallel insertion.
      Signed-off-by: default avatarAlex Vesker <valex@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      cedb2819
    • Eli Britstein's avatar
      net/mlx5e: Optimize performance for IPv4/IPv6 ethertype · fca53304
      Eli Britstein authored
      The HW is optimized for IPv4/IPv6. For such cases, pending capability,
      avoid matching on ethertype, and use ip_version field instead.
      Signed-off-by: default avatarEli Britstein <elibr@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      fca53304
    • Eli Britstein's avatar
      net/mlx5e: Helper function to set ethertype · 4a5d5d73
      Eli Britstein authored
      Set ethertype match in a helper function as a pre-step towards
      optimizing it.
      Signed-off-by: default avatarEli Britstein <elibr@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      4a5d5d73
    • Parav Pandit's avatar
      net/mlx5: Add missing mutex destroy · 810cbb25
      Parav Pandit authored
      Add mutex destroy calls to balance with mutex_init() done in the init
      path.
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      810cbb25
    • Vu Pham's avatar
      net/mlx5e: Use change upper event to setup representors' bond_metadata · 9728366f
      Vu Pham authored
      Use change upper event to detect slave representor from
      enslaving/unslaving to/from lag device.
      
      On enslaving event, call mlx5_enslave_rep() API to create, add
      this slave representor shadow entry to the slaves list of
      bond_metadata structure representing master lag device and use
      its metadata to setup ingress acl metadata header.
      
      On unslaving event, resetting the vport of unslaved representor
      to use its default ingress/egress acls and rx rules with its
      default_metadata.
      
      The last slave will free the shared bond_metadata and its
      unique metadata.
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      9728366f
    • Vu Pham's avatar
      net/mlx5e: Slave representors sharing unique metadata for match · 88e96e53
      Vu Pham authored
      Bonded slave representors' vports must share a unique metadata
      for match.
      
      On enslaving event of slave representor to lag device, allocate
      new unique "bond_metadata" for match if this is the first slave.
      The subsequent enslaved representors will share the same unique
      "bond_metadata".
      
      On unslaving event of slave representor, reset the slave
      representor's vport to use its own default metadata.
      
      Replace ingress acl and rx rules of the slave representors' vports
      using new vport->bond_metadata.
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      88e96e53
    • Vu Pham's avatar
      net/mlx5: E-Switch, Alloc and free unique metadata for match · 133dcfc5
      Vu Pham authored
      Introduce infrastructure to create unique metadata for match
      for vport without depending on vport_num. Vport uses its
      default metadata for match in standalone configuration but
      will share a different unique "bond_metadata" for match with
      other vports in bond configuration.
      
      Using ida to generate unique metadata for match for vports
      in default and bond configurations.
      
      Introduce APIs to generate, free metadata for match.
      Introduce APIs to set vport's bond_metadata and replace its
      ingress acl rules with bond_metatada.
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      133dcfc5
    • Vu Pham's avatar
      net/mlx5e: Add bond_metadata and its slave entries · d97555e1
      Vu Pham authored
      Adding bond_metadata and its slave entries to represent a lag device
      and its slaves VF representors. Bond_metadata structure includes a
      unique metadata shared by slaves VF respresentors, and a list of slaves
      representors slave entries.
      
      On enslaving event, create a bond_metadata structure representing
      the upper lag device of this slave representor if it has not been
      created yet. Create and add entry for the slave representor to the
      slaves list.
      
      On unslaving event, free the slave entry of the slave representor.
      On the last unslave event, free the bond_metadata structure and its
      resources.
      
      Introduce APIs to create and remove bond_metadata and its resources,
      enslave and unslave VF representor slave entries.
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d97555e1
    • Or Gerlitz's avatar
      net/mlx5e: Offload flow rules to active lower representor · d34eb2fc
      Or Gerlitz authored
      When a bond device is created over one or more non uplink representors,
      and when a flow rule is offloaded to such bond device, offload a rule
      to the active lower device.
      
      Assuming that this is active-backup lag, the rules should be offloaded
      to the active lower device which is the representor of the direct
      path (not the failover).
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      d34eb2fc
    • Vu Pham's avatar
      net/mlx5e: Support tc block sharing for representors · 553f9328
      Vu Pham authored
      Currently offloading a rule over a tc block shared by multiple
      representors fails because an e-switch global hashtable to keep
      the mapping from tc cookies to mlx5e flow instances is used, and
      tc block sharing offloads the same rule/cookie multiple times,
      each time for different representor sharing the tc block.
      
      Changing the implementation and behavior by acknowledging and returning
      success if the same rule/cookie is offloaded again to other slave
      representor sharing the tc block by setting, checking and comparing
      the netdev that added the rule first.
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Reviewed-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      553f9328
    • Or Gerlitz's avatar
      net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule · 7e51891a
      Or Gerlitz authored
      Register a notifier block to handle netdev events for bond device
      of non-uplink representors to support eswitch vports bonding.
      
      When a non-uplink representor is a lower dev (slave) of bond and
      becomes active, adding egress acl forward-to-vport rule of all slave
      netdevs (active + standby) to forward to this representor's vport. Use
      change lower netdev event to do this.
      
      Use change upper event to detect slave representor unslaved from lag
      device to delete its vport egress acl forward rule if any.
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      7e51891a
    • Vu Pham's avatar
      net/mlx5: E-Switch, Introduce APIs to enable egress acl forward-to-vport rule · bf773dc0
      Vu Pham authored
      By default, e-switch vport's egress acl just forward packets to its
      counterpart NIC vport using existing egress acl table.
      
      During port failover in bonding scenario where two VFs representors
      are bonded, the egress acl forward-to-vport rule will be added to
      the existing egress acl table of e-switch vport of passive/inactive
      slave representor to forward packets to other NIC vport ie. the active
      slave representor's NIC vport to handle egress "failover" traffic.
      
      Enable egress acl and have APIs to create and destroy egress acl
      forward-to-vport rule and group.
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Reviewed-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarRoi Dayan <roid@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      bf773dc0
    • Vu Pham's avatar
      net/mlx5: E-Switch, Refactor eswitch ingress acl codes · 07bab950
      Vu Pham authored
      Restructure the eswitch ingress acl codes into eswitch directory
      and different files:
      . Acl ingress helper functions to acl_helper.c/h
      . Acl ingress functions used in offloads mode to acl_ingress_ofld.c
      . Acl ingress functions used in legacy mode to acl_ingress_lgy.c
      
      This patch does not change any functionality.
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      07bab950
    • Vu Pham's avatar
      net/mlx5: E-Switch, Refactor eswitch egress acl codes · ea651a86
      Vu Pham authored
      Refactor the egress acl codes so that offloads and legacy modes
      can configure specifically their own needs of egress acl table,
      groups and rules. While at it, restructure the eswitch egress
      acl codes into eswitch directory and different files:
      . Acl egress helper functions to acl_helper.c/h
      . Acl egress functions used in offloads mode to acl_egress_ofld.c
      . Acl egress functions used in legacy mode to acl_egress_lgy.c
      
      This patch does not change any functionality.
      Signed-off-by: default avatarVu Pham <vuhuong@mellanox.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@mellanox.com>
      ea651a86
  2. 27 May, 2020 21 commits