1. 20 Aug, 2021 33 commits
    • Vladimir Oltean's avatar
      net: dpaa2-switch: phylink_disconnect_phy needs rtnl_lock · d52ef12f
      Vladimir Oltean authored
      There is an ASSERT_RTNL in phylink_disconnect_phy which triggers
      whenever dpaa2_switch_port_disconnect_mac is called.
      
      To follow the pattern established by dpaa2_eth_disconnect_mac, take the
      rtnl_mutex every time we call dpaa2_switch_port_disconnect_mac.
      
      Fixes: 84cba729 ("dpaa2-switch: integrate the MAC endpoint support")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d52ef12f
    • David S. Miller's avatar
      Merge branch 'gmii2rgmii-loopback' · 6985157c
      David S. Miller authored
      Gerhard Engleder says:
      
      ====================
      Add Xilinx GMII2RGMII loopback support
      
      The Xilinx GMII2RGMII driver overrides PHY driver functions in order to
      configure the device according to the link speed of the PHY attached to
      it. This is implemented for a normal link but not for loopback.
      
      Andrew told me to use phy_loopback and this changes make phy_loopback
      work in combination with Xilinx GMII2RGMII.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6985157c
    • Gerhard Engleder's avatar
      net: phy: gmii2rgmii: Support PHY loopback · ceaeaafc
      Gerhard Engleder authored
      Configure speed if loopback is used. read_status is not called for
      loopback.
      Signed-off-by: default avatarGerhard Engleder <gerhard@engleder-embedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ceaeaafc
    • Gerhard Engleder's avatar
      net: phy: Uniform PHY driver access · 3ac8eed6
      Gerhard Engleder authored
      struct phy_device contains a pointer to the PHY driver and nearly
      everywhere this pointer is used to access the PHY driver. Only
      mdio_bus_phy_may_suspend() is still using to_phy_driver() instead of the
      PHY driver pointer. Uniform PHY driver access by eliminating
      to_phy_driver() use in mdio_bus_phy_may_suspend().
      
      Only phy_bus_match() and phy_probe() are still using to_phy_driver(),
      because PHY driver pointer is not available there.
      Signed-off-by: default avatarGerhard Engleder <gerhard@engleder-embedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ac8eed6
    • Gerhard Engleder's avatar
      net: phy: Support set_loopback override · 4ed311b0
      Gerhard Engleder authored
      phy_read_status and various other PHY functions support PHY specific
      overriding of driver functions by using a PHY specific pointer to the
      PHY driver. Add support of PHY specific override to phy_loopback too.
      Signed-off-by: default avatarGerhard Engleder <gerhard@engleder-embedded.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ed311b0
    • David S. Miller's avatar
      Merge branch 'sparx5-dma' · 600003a3
      David S. Miller authored
      Steen Hegelund says:
      
      ====================
      Adding Frame DMA functionality to Sparx5
      
      v2:
          Removed an unused variable (proc_ctrl) from sparx5_fdma_start.
      
      This add frame DMA functionality to the Sparx5 platform.
      
      Until now the Sparx5 SwitchDev driver has been using register based
      injection and extraction when sending frames to/from the host CPU.
      
      With this series the Frame DMA functionality now added.
      
      The Frame DMA is only used if the Frame DMA interrupt is configured in the
      device tree; otherwise the existing register based injection and extraction
      is used.
      
      The Sparx5 has two ports that can be used for sending and receiving frames,
      but there are 8 channels that can be configured: 6 for injection and 2 for
      extraction.
      
      The additional channels can be used for more advanced scenarios e.g. where
      virtual cores are used, but currently the driver only uses port 0 and
      channel 0 and 6 respectively.
      
      DCB (data control block) structures are passed to the Frame DMA with
      suitable information about frame start/end etc, as well as pointers to DB
      (data blocks) buffers.
      
      The Frame DMA engine can use interrupts to signal back when the frames have
      been injected or extracted.
      
      There is a limitation on the DB alignment also for injection: Block must
      start on 16byte boundaries, and this is why the driver currently copies the
      data to into separate buffers.
      
      The Sparx5 switch core needs a IFH (Internal Frame Header) to pass
      information from the port to the switch core, and this header is added
      before injection and stripped after extraction.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      600003a3
    • Steen Hegelund's avatar
      arm64: dts: sparx5: Add the Sparx5 switch frame DMA support · 920c293a
      Steen Hegelund authored
      This adds the interrupt for the Sparx5 Frame DMA.
      
      If this configuration is present the Sparx5 SwitchDev driver will use the
      Frame DMA feature, and if not it will use register based injection and
      extraction for sending and receiving frames to the CPU.
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      920c293a
    • Steen Hegelund's avatar
      net: sparx5: switchdev: adding frame DMA functionality · 10615907
      Steen Hegelund authored
      This add frame DMA functionality to the Sparx5 platform.
      
      Ethernet frames can be extracted or injected autonomously to or from the
      device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list data
      structures in memory are used for injecting or extracting Ethernet frames.
      The FDMA generates interrupts when frame extraction or injection is done
      and when the linked lists need updating.
      
      The FDMA implements two extraction channels, one per switch core port
      towards the VCore CPU system and a total of six injection channels.
      Extraction channels are mapped one-to-one to the CPU ports, while injection
      channels can be individually assigned to any CPU port.
      
      - FDMA channel 0 through 5 corresponds to CPU port 0 injection direction
        FDMA_CH_CFG[channel].CH_INJ_PORT is set to 0.
      - FDMA channel 0 through 5 corresponds to CPU port 1 injection direction when
        FDMA_CH_CFG[channel].CH_INJ_PORT is set to 1.
      - FDMA channel 6 corresponds to CPU port 0 extraction direction.
      - FDMA channel 7 corresponds to CPU port 1 extraction direction.
      
      The FDMA implements a strict priority scheme among channels. Extraction
      channels are prioritized over injection channels and secondarily channels
      with higher channel number are prioritized over channels with lower number.
      On the other hand, ports are being served on an equal-bandwidth principle
      both on injection and extraction directions.  The equal-bandwidth principle
      will not force an equal bandwidth. Instead, it ensures that the ports
      perform at their best considering the operating conditions.
      
      When more than one injection channel is enabled for injection on the same
      CPU port, priority determines which channel can inject data. Ownership
      is re-arbitrated on frame boundaries.
      
      The FDMA processes linked lists of DMA Control Block Structures (DCBs). The
      DCBs have the same basic structure for both injection and extraction. A DCB
      must be placed on a 64-bit word-aligned address in memory. Each DCB has a
      per-channel configurable amount of associated data blocks in memory, where
      the frame data is stored.
      
      The data blocks that are used by extraction channels must be placed on
      64-bit word aligned addresses in memory, and their length must be a
      multiple of 128 bytes.
      
      A DCB carries the pointer to the next DCB of the linked list, the INFO word
      which holds information for the DCB, and a pair of status word and memory
      pointer for every data block that it is associated with.
      Signed-off-by: default avatarSteen Hegelund <steen.hegelund@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10615907
    • David S. Miller's avatar
      Merge tag 'batadv-next-pullrequest-20210820' of git://git.open-mesh.org/linux-merge · f402303b
      David S. Miller authored
      Simon Wunderlich says:
      
      ====================
      This (updated) cleanup patchset includes the following patches:
      
       - bump version strings, by Simon Wunderlich
      
       - update docs about move IRC channel away from freenode,
         by Sven Eckelmann (updated, added missing sign-off)
      
       - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann
      
       - Update NULL checks, by Sven Eckelmann (2 patches)
      
       - remove remaining skb-copy calls for broadcast packets,
         by Linus Lüssing
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f402303b
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · f96b48c6
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2021-08-19
      
      This series introduces the support for two new mlx5 features:
      
      1) Sample offload for tunneled traffic
      2) devlink rate objects support
      
      1) From Chris Mi: Sample offload for tunneled traffic
      =====================================================
      
      Background and solution
      -----------------------
      
      Currently the sample offload actions send the encapsulated packet
      to software. This series de-capsulates the packet before performing
      the sampling and set the tunnel properties on the skb metadata
      fields to make the behavior consistent with OVS sFlow.
      
      If de-capsulating first, we can't use the same match like before in
      default table. So instantiate a post action instance to continue
      processing the action list. If HW can preserve reg_c, also use the
      post action instance.
      
      Post action infrastructure
      --------------------------
      
      Some tc actions are modeled in hardware using multiple tables
      causing a tc action list split. For example, CT action is modeled
      by jumping to a ct table which is controlled by nf flow table.
      sFlow jumps in hardware to a sample table, which continues to a
      "default table" where it should continue processing the action list.
      
      Multi table actions are modeled in hardware using a unique fte_id.
      The fte_id is set before jumping to a table. Split actions continue
      to a post-action table where the matched fte_id value continues the
      execution the tc action list.
      
      This series also introduces post action infrastructure. Both ct and
      sample use it.
      
      Sample for tunnel in TC SW
      --------------------------
      
      tc filter add dev vxlan1 protocol ip parent ffff: prio 3		\
      	flower src_mac 24:25:d0:e1:00:00 dst_mac 02:25:d0:13:01:02	\
      	enc_src_ip 192.168.1.14 enc_dst_ip 192.168.1.13			\
      	enc_dst_port 4789 enc_key_id 4					\
      	action sample rate 1 group 6					\
      	action tunnel_key unset						\
      	action mirred egress redirect dev enp4s0f0_1
      
      MLX5 sample HW offload
      ----------------------
      
      For the following typical flow table:
      
      +-------------------------------+
      +       original flow table     +
      +-------------------------------+
      +         original match        +
      +-------------------------------+
      + sample action + other actions +
      +-------------------------------+
      
      We translate the tc filter with sample action to the following HW model:
      
              +---------------------+
              + original flow table +
              +---------------------+
              +   original match    +
              +---------------------+
                    | set fte_id (if reg_c preserve cap)
                    | do decap
                    v
      +------------------------------------------------+
      +                Flow Sampler Object             +
      +------------------------------------------------+
      +                    sample ratio                +
      +------------------------------------------------+
      +    sample table id    |    default table id    +
      +------------------------------------------------+
                 |                            |
                 v                            v
      +-----------------------------+  +-------------------+
      +        sample table         +  +   default table   +
      +-----------------------------+  +-------------------+
      + forward to management vport +             |
      +-----------------------------+             |
                                          +-------+------+
                                          |              |reg_c preserve cap
                                          |              |or decap action
                                          v              v
                             +-----------------+   +-------------+
                             + per vport table +   + post action +
                             +-----------------+   +-------------+
                             + original match  +
                             +-----------------+
                             + other actions   +
                             +-----------------+
      
      2) From Dmytro Linkin: devlink rate object support for mlx5_core driver
      =======================================================================
      
      HIGH-LEVEL OVERVIEW
      
      Devlink leaf rate objects created per vport (VF/SF, and PF on BlueField)
      in switchdev mode on devlink port registration.
      Implement devlink ops callbacks to create/destroy rate groups, set TX
      rate values of the vport/group, assign vport to the group.
      Driver accepts TX rate values as fraction of 1Mbps.
      
      Refactor existing eswitch QoS infrastructure to be accessible by legacy
      NDO rate API and new devlink rate API. NDO rate API is not
      removed/disabled in switchdev mode to not break existing users. Rate
      values configured with NDO rate API are not visible for devlink
      infrastructure, therefore APIs should not be used simultaneously.
      
      IMPLEMENTATION DETAILS
      
      Driver provide two level rate hierarchy to manage bandwidth - group
      level and vport level. Initially each vport added to internal unlimited
      group created by default. Each rate element (vport or group) receive
      bandwidth relative to its parent element (for groups the parent is a
      physical link itself) in a Round Robin manner, where element get
      bandwidth value according to its weight. Example:
      
      Created four rate groups with tx_share limits:
      
      $ devlink port function rate add \
          pci/0000:06:00.0/group_1 tx_share 30gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_2 tx_share 20gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_3 tx_share 20gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_4 tx_share 10gbit
      
      Weights created in HW for each group are relative to the bigest tx_share
      value, which is 30gbit:
      
      <group_1> 1.0
      <group_2> 0.67
      <group_3> 0.67
      <group_4> 0.33
      
      Assuming link speed is 50 Gbit/sec and each group can sustain such
      amount of traffic, maximum bandwidth is 50 / (1.0 + 0.67 + 0.67 + 0.33)
       = ~18.75 Gbit/sec. Normilized bandwidth values for groups:
      
      <group_1> 18.75 * 1.0  = 18.75 Gbit/sec
      <group_2> 18.75 * 0.67 = 12.5 Gbit/sec
      <group_3> 18.75 * 0.67 = 12.5 Gbit/sec
      <group_4> 18.75 * 0.33 = 6.25 Gbit/sec
      
      If in example above group_1 doesn't produce any traffic, then maximum
      bandwidth becomes 50 / (0.67 + 0.67 + 0.33) = ~30.0 Gbit/sec. Normalized
      values:
      
      <group_2> 30.0 * 0.67 = 20.0 Gbit/sec
      <group_3> 30.0 * 0.67 = 20.0 Gbit/sec
      <group_4> 30.0 * 0.33 = 10.0 Gbit/sec
      
      Same normalization applied to each vport in the group.
      
      Normalized values are internal, therefore driver provides QoS
      tracepoints for next events:
      
      * vport rate element creation/deletion:
      * vport rate element configuration;
      * group rate element creation/deletion;
      * group rate element configuration.
      
      PATCHES OVERVIEW
      
      1 - Moving and isolation of eswitch QoS logic in separate file;
      
      2 - Implement devlink leaf rate object support for vports;
      
      3 - Implement rate groups creation/deletion;
      
      4 - Implement TX rate management for the groups;
      
      5 - Implement parent set for vports;
      
      6 - Eswitch QoS tracepoints.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f96b48c6
    • David S. Miller's avatar
      Merge tag 'for-net-next-2021-08-19' of... · e61fbee7
      David S. Miller authored
      Merge tag 'for-net-next-2021-08-19' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Luiz Augusto von Dentz says:
      
      ====================
      bluetooth-next pull request for net-next:
      
       - Add support for Foxconn Mediatek Chip
       - Add support for LG LGSBWAC92/TWCM-K505D
       - hci_h5 flow control fixes and suspend support
       - Switch to use lock_sock for SCO and RFCOMM
       - Various fixes for extended advertising
       - Reword Intel's setup on btusb unifying the supported generations
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e61fbee7
    • David S. Miller's avatar
      Merge tag 'batadv-next-pullrequest-20210819' of git://git.open-mesh.org/linux-merge · 815cc21d
      David S. Miller authored
      Simon Wunderlich says:
      
      ====================
      This cleanup patchset includes the following patches:
      
       - bump version strings, by Simon Wunderlich
      
       - update docs about move IRC channel away from freenode,
         by Sven Eckelmann
      
       - Switch to kstrtox.h for kstrtou64, by Sven Eckelmann
      
       - Update NULL checks, by Sven Eckelmann (2 patches)
      
       - remove remaining skb-copy calls for broadcast packets,
         by Linus Lüssing
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      815cc21d
    • Linus Lüssing's avatar
      batman-adv: bcast: remove remaining skb-copy calls · a006aa51
      Linus Lüssing authored
      We currently have two code paths for broadcast packets:
      
      A) self-generated, via batadv_interface_tx()->
         batadv_send_bcast_packet().
      B) received/forwarded, via batadv_recv_bcast_packet()->
         batadv_forw_bcast_packet().
      
      For A), self-generated broadcast packets:
      
      The only modifications to the skb data is the ethernet header which is
      added/pushed to the skb in
      batadv_send_broadcast_skb()->batadv_send_skb_packet(). However before
      doing so, batadv_skb_head_push() is called which calls skb_cow_head() to
      unshare the space for the to be pushed ethernet header. So for this
      case, it is safe to use skb clones.
      
      For B), received/forwarded packets:
      
      The same applies as in A) for the to be forwarded packets. Only the
      ethernet header is added. However after (queueing for) forwarding the
      packet in batadv_recv_bcast_packet()->batadv_forw_bcast_packet(), a
      packet is additionally decapsulated and is sent up the stack through
      batadv_recv_bcast_packet()->batadv_interface_rx().
      
      Protocols higher up the stack are already required to check if the
      packet is shared and create a copy for further modifications. When the
      next (protocol) layer works correctly, it cannot happen that it tries to
      operate on the data behind the skb clone which is still queued up for
      forwarding.
      Co-authored-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      a006aa51
    • Sven Eckelmann's avatar
      batman-adv: Drop NULL check before dropping references · a2b7b148
      Sven Eckelmann authored
      The check if a batman-adv related object is NULL or not is now directly in
      the batadv_*_put functions. It is not needed anymore to perform this check
      outside these function:
      
      The changes were generated using a coccinelle semantic patch:
      
        @@
        expression E;
        @@
        - if (likely(E != NULL))
        (
        batadv_backbone_gw_put
        |
        batadv_claim_put
        |
        batadv_dat_entry_put
        |
        batadv_gw_node_put
        |
        batadv_hardif_neigh_put
        |
        batadv_hardif_put
        |
        batadv_nc_node_put
        |
        batadv_nc_path_put
        |
        batadv_neigh_ifinfo_put
        |
        batadv_neigh_node_put
        |
        batadv_orig_ifinfo_put
        |
        batadv_orig_node_put
        |
        batadv_orig_node_vlan_put
        |
        batadv_softif_vlan_put
        |
        batadv_tp_vars_put
        |
        batadv_tt_global_entry_put
        |
        batadv_tt_local_entry_put
        |
        batadv_tt_orig_list_entry_put
        |
        batadv_tt_req_node_put
        |
        batadv_tvlv_container_put
        |
        batadv_tvlv_handler_put
        )(E);
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      a2b7b148
    • Sven Eckelmann's avatar
      batman-adv: Check ptr for NULL before reducing its refcnt · e78783da
      Sven Eckelmann authored
      The commit b37a4668 ("netdevice: add the case if dev is NULL") changed
      the way how the NULL check for net_devices have to be handled when trying
      to reduce its reference counter. Before this commit, it was the
      responsibility of the caller to check whether the object is NULL or not.
      But it was changed to behave more like kfree. Now the callee has to handle
      the NULL-case.
      
      The batman-adv code was scanned via cocinelle for similar places. These
      were changed to use the paradigm
      
        @@
        identifier E, T, R, C;
        identifier put;
        @@
         void put(struct T *E)
         {
        +	if (!E)
        +		return;
        	kref_put(&E->C, R);
         }
      
      Functions which were used in other sources files were moved to the header
      to allow the compiler to inline the NULL check and the kref_put call.
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      e78783da
    • Sven Eckelmann's avatar
      batman-adv: Switch to kstrtox.h for kstrtou64 · 55207227
      Sven Eckelmann authored
      The commit 4c527293 ("kernel.h: split out kstrtox() and simple_strtox()
      to a separate header") moved the kstrtou64 function to a new header called
      linux/kstrtox.h.
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      55207227
    • Sven Eckelmann's avatar
      batman-adv: Move IRC channel to hackint.org · 3baa9f52
      Sven Eckelmann authored
      Due to recent developments around the Freenode.org IRC network, the
      opinions about the usage of this service shifted dramatically. The majority
      of the still active users of the #batman channel prefers a move to the
      hackint.org network.
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarSimon Wunderlich <sw@simonwunderlich.de>
      3baa9f52
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Add QoS tracepoints · 3202ea65
      Dmytro Linkin authored
      Add tracepoints to log QoS enabling/disabling/configuration for vports
      and rate groups.
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      3202ea65
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Allow to add vports to rate groups · 0fe132ea
      Dmytro Linkin authored
      Implement eswitch API that allows updating rate groups. If group
      pointer is NULL, then move the vport to internal unlimited group zero.
      
      Implement devlink_ops->rate_parent_node_set() callback in the terms of
      the new eswitch group update API.
      
      Enable QoS for all group's elements if a group has allocated BW share.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0fe132ea
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Allow setting share/max tx rate limits of rate groups · f47e04eb
      Dmytro Linkin authored
      Provide eswitch API to allow controlling group rate limits. Use it to
      implement devlink_ops->mlx5_devlink_rate_node_tx_{share|max}_set().
      
      The share rate will create relative bandwidth share on the groups level
      while within the group the user can set shared rate on the member vports
      of that group and this rate will be relative to the group's share rate.
      The group with the highest shared rate will get a BW share of 100 and
      the rest of the groups will get a value that reflects the ratio between
      their share rate and the maximum share rate.
      
      Example:
      Created four rate groups with tx_share limits:
      
      $ devlink port function rate add \
          pci/0000:06:00.0/group_1 tx_share 30gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_2 tx_share 20gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_3 tx_share 20gbit
      $ devlink port function rate add \
          pci/0000:06:00.0/group_4 tx_share 10gbit
      
      Assuming link speed is 50 Gbit/sec ratio divider will be
      50 / (30+20+20+10) = 0.625. Normalized rate values for the groups:
      
      <group_1> 30 * 0.625 = 18.75 Gbit/sec
      <group_2> 20 * 0.625 = 12.5 Gbit/sec
      <group_3> 20 * 0.625 = 12.5 Gbit/sec
      <group_4> 10 * 0.625 = 6.25 Gbit/sec
      
      Rate group with unlimited tx_share rate will receive minimum BW value
      (1Mbit/sec) if presented any group with tx_share rate limit. This allow
      to not drop all packets in case of heavy traffic.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f47e04eb
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Introduce rate limiting groups API · 1ae258f8
      Dmytro Linkin authored
      Extend eswitch API with rate limiting groups:
      
      - Define new struct mlx5_esw_rate_group that is used to hold all
        internal group data.
      
      - Implement functions that allow creation, destruction and cleanup of
        groups.
      
      - Assign all vports to internal unlimited zero group by default.
      
      This commit lays the groundwork for group rate limiting by implementing
      devlink_ops->rate_node_{new|del}() callbacks to support creating and
      deleting groups through devlink rate node objects. APIs that allows
      setting rates and adding/removing members are implemented in following
      patches.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1ae258f8
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Enable devlink port tx_{share|max} rate control · ad34f02f
      Dmytro Linkin authored
      Register devlink rate leaf object for every eswitch vport.
      Implement devlink ops that enable setting shared and max tx rates
      through devlink API.
      Extract common eswitch code from existing tx rate set function that is
      accessed through NDO to be reused for the devlink. Values configured
      with NDO API are not visible for the devlink API, therefore shouldn't be
      used simultaneously.
      
      When normalizing the BW share value, dividing the desired minimum rate
      by the common divider results in losing information since the quotient
      is rounded down. This has a significant affect on configurations of low
      rate where the round down eliminates a large percentage of the total
      rate. To improve the formula, round up the division result to make sure
      that the BW share is at least the value it was supposed to be and won't
      lost a significant amount of the expected value.
      Co-developed-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ad34f02f
    • Dmytro Linkin's avatar
      net/mlx5: E-switch, Move QoS related code to dedicated file · 2d116e3e
      Dmytro Linkin authored
      Move eswitch QoS related code into dedicated file. Provide eswitch API
      to access this code meaning it is isolated and restricted to be used
      only by eswitch.c. Exception is legacy NDO vf set rate, which moved to
      esw/legacy.c.
      Signed-off-by: default avatarDmytro Linkin <dlinkin@nvidia.com>
      Reviewed-by: default avatarHuy Nguyen <huyn@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Reviewed-by: default avatarParav Pandit <parav@nvidia.com>
      Reviewed-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2d116e3e
    • Chris Mi's avatar
      net/mlx5e: TC, Support sample offload action for tunneled traffic · 2741f223
      Chris Mi authored
      Currently the sample offload actions send the encapsulated packet
      to software. This commit decapsulates the packet before performing
      the sampling and set the tunnel properties on the skb metadata
      fields to make the behavior consistent with OVS sFlow.
      
      If decapsulating first, we can't use the same match like before in
      default table. So instantiate a post action instance to continue
      processing the action list. If HW can preserve reg_c, also use the
      post action instance.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      2741f223
    • Chris Mi's avatar
      net/mlx5e: TC, Restore tunnel info for sample offload · ee950e5d
      Chris Mi authored
      Currently the sample offload actions send the encapsulated packet
      to software. sFlow expects tunneled packets to be decapsulated while
      having the tunnel properties on the skb metadata fields.
      
      Reuse the functions used by connection tracking to map the outer
      header properties to a unique id. The next patch  will use that id
      to restore the tunnel information of decapsulated packets onto the
      skb.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Reviewed-by: default avatarMark Bloch <mbloch@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      ee950e5d
    • Chris Mi's avatar
      net/mlx5e: TC, Remove CONFIG_NET_TC_SKB_EXT dependency when restoring tunnel · d12e20ac
      Chris Mi authored
      CONFIG_NET_TC_SKB_EXT controls the SKB extension support for
      restoring chain ids. SKB extension is not required for tunnel
      restoration.
      
      Remove the CONFIG_NET_TC_SKB_EXT dependency as a pre-step for
      using the tunnel restore methods for sample offload use cases.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      d12e20ac
    • Chris Mi's avatar
      net/mlx5e: Refactor ct to use post action infrastructure · f0da4daa
      Chris Mi authored
      Move post action table management to common library providing
      add/del/get API. Refactor the ct action offload to use the common
      API.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      f0da4daa
    • Chris Mi's avatar
      net/mlx5e: Introduce post action infrastructure · 6f0b692a
      Chris Mi authored
      Some tc actions are modeled in hardware using multiple tables
      causing a tc action list split. For example, CT action is modeled
      by jumping to a ct table which is controlled by nf flowtable.
      sFlow jumps in hardware to a sample table, which continues to a
      "default table" where it should continue processing the action list.
      
      Multi table actions are modeled in hardware using a unique fte_id.
      The fte_id is set before jumping to a table. Split actions continue
      to a post-action table where the matched fte_id value continues the
      execution the tc action list.
      
      Currently the post-action design is implemented only by the ct
      action. Introduce post action infrastructure as a pre-step for
      reusing it with the sFlow offload feature. Init and destroy the
      common post action table. Refactor the ct offload to use the
      common post table infrastructure in the next patch.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      6f0b692a
    • Chris Mi's avatar
      net/mlx5e: CT, Use xarray to manage fte ids · 27997978
      Chris Mi authored
      IDR is deprecated. Use xarray instead.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarOz Shlomo <ozsh@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      27997978
    • Chris Mi's avatar
      net/mlx5e: Move sample attribute to flow attribute · bcd6740c
      Chris Mi authored
      Currently it is in eswitch attribute. Move it to flow attribute to
      reflect the change in previous patch.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      bcd6740c
    • Chris Mi's avatar
      net/mlx5e: Move esw/sample to en/tc/sample · 0027d70c
      Chris Mi authored
      Module sample belongs to en/tc instead of esw. Move it and rename
      accordingly.
      Signed-off-by: default avatarChris Mi <cmi@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      0027d70c
    • Saeed Mahameed's avatar
      net/mlx5e: Remove mlx5e dependency from E-Switch sample · 5024fa95
      Saeed Mahameed authored
      mlx5/esw/sample.c doesn't really need mlx5e_priv object, we can remove
      this redundant dependency by passing the eswitch object directly to
      the sample object constructor.
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Reviewed-by: default avatarRoi Dayan <roid@nvidia.com>
      5024fa95
    • Jakub Kicinski's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f444fea7
      Jakub Kicinski authored
      drivers/ptp/Kconfig:
        55c8fca1 ("ptp_pch: Restore dependency on PCI")
        e5f31552 ("ethernet: fix PTP_1588_CLOCK dependencies")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f444fea7
  2. 19 Aug, 2021 7 commits
    • Linus Torvalds's avatar
      Merge tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · f87d6431
      Linus Torvalds authored
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes, including fixes from bpf, wireless and mac80211
        trees.
      
        Current release - regressions:
      
         - tipc: call tipc_wait_for_connect only when dlen is not 0
      
         - mac80211: fix locking in ieee80211_restart_work()
      
        Current release - new code bugs:
      
         - bpf: add rcu_read_lock in bpf_get_current_[ancestor_]cgroup_id()
      
         - ethernet: ice: fix perout start time rounding
      
         - wwan: iosm: prevent underflow in ipc_chnl_cfg_get()
      
        Previous releases - regressions:
      
         - bpf: clear zext_dst of dead insns
      
         - sch_cake: fix srchost/dsthost hashing mode
      
         - vrf: reset skb conntrack connection on VRF rcv
      
         - net/rds: dma_map_sg is entitled to merge entries
      
        Previous releases - always broken:
      
         - ethernet: bnxt: fix Tx path locking and races, add Rx path
           barriers"
      
      * tag 'net-5.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
        net: dpaa2-switch: disable the control interface on error path
        Revert "flow_offload: action should not be NULL when it is referenced"
        iavf: Fix ping is lost after untrusted VF had tried to change MAC
        i40e: Fix ATR queue selection
        r8152: fix the maximum number of PLA bp for RTL8153C
        r8152: fix writing USB_BP2_EN
        mptcp: full fully established support after ADD_ADDR
        mptcp: fix memory leak on address flush
        net/rds: dma_map_sg is entitled to merge entries
        net: mscc: ocelot: allow forwarding from bridge ports to the tag_8021q CPU port
        net: asix: fix uninit value bugs
        ovs: clear skb->tstamp in forwarding path
        net: mdio-mux: Handle -EPROBE_DEFER correctly
        net: mdio-mux: Don't ignore memory allocation errors
        net: mdio-mux: Delete unnecessary devm_kfree
        net: dsa: sja1105: fix use-after-free after calling of_find_compatible_node, or worse
        sch_cake: fix srchost/dsthost hashing mode
        ixgbe, xsk: clean up the resources in ixgbe_xsk_pool_enable error path
        net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32
        mac80211: fix locking in ieee80211_restart_work()
        ...
      f87d6431
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v5.14-4' of... · e649e4c8
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
      
      Pull x86 platform driver fixes from Hans de Goede:
      
       - Enable SW_TABLET_MODE support for the TP200s
      
       - Enable WMI on two more Gigabyte motherboards
      
      * tag 'platform-drivers-x86-v5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
        platform/x86: gigabyte-wmi: add support for B450M S2H V2
        platform/x86: gigabyte-wmi: add support for X570 GAMING X
        platform/x86: asus-nb-wmi: Add tablet_mode_sw=lid-flip quirk for the TP200s
        platform/x86: asus-nb-wmi: Allow configuring SW_TABLET_MODE method with a module option
      e649e4c8
    • Colin Ian King's avatar
      octeontx2-af: remove redudant second error check on variable err · 9e5f10fe
      Colin Ian King authored
      A recent change added error checking messages and failed to remove one
      of the previous error checks. There are now two checks on variable err
      so the second one is redundant dead code and can be removed.
      
      Addresses-Coverity: ("Logically dead code")
      Fixes: a83bdada ("octeontx2-af: Add debug messages for failures")
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Link: https://lore.kernel.org/r/20210818130927.33895-1-colin.king@canonical.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9e5f10fe
    • Jakub Kicinski's avatar
      Merge tag 'linux-can-next-for-5.15-20210819' of... · 185f690f
      Jakub Kicinski authored
      Merge tag 'linux-can-next-for-5.15-20210819' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      linux-can-next-for-5.15-20210819
      
      The first patch is by me, for the mailmap file and maps the email
      address of two former ESD employees to a newly created role account.
      
      The next 3 patches are by Oleksij Rempel and add support for GPIO
      based switchable CAN bus termination.
      
      The next 3 patches are by Vincent Mailhol. The first one changes the
      CAN netlink interface to not bail out if the user switched off
      unsupported features. The next one adds Vincent as the maintainer of
      the etas_es58x driver and the last one cleans up the documentation of
      struct es58x_fd_tx_conf_msg.
      
      The next patch is by me, for the mcp251xfd driver and marks some
      instances of struct mcp251xfd_priv as const. Lad Prabhakar contributes
      2 patches for the rcar_canfd driver, that add support for RZ/G2L
      family.
      
      The next 5 patches target the m_can/tcan45x5 driver. 2 are by me an
      fix trivial checkpatch warnings. The remaining 3 patches are by Matt
      Kline and improve the performance on the SPI based tcan4x5x chip by
      batching FIFO reads and writes.
      
      The last 7 patches are for the c_can driver. Dario Binacchi's patch
      converts the DT bindings to yaml, 2 patches by me fix a typo and
      rename a macro to properly represent the usage. The last 4 patches are
      again by Dario Binacchi and provide a performance improvement for the
      TX path by operating the TX mailboxes as a true FIFO.
      
      * tag 'linux-can-next-for-5.15-20210819' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (22 commits)
        can: c_can: cache frames to operate as a true FIFO
        can: c_can: support tx ring algorithm
        can: c_can: exit c_can_do_tx() early if no frames have been sent
        can: c_can: remove struct c_can_priv::priv field
        can: c_can: rename IF_RX -> IF_NAPI
        can: c_can: c_can_do_tx(): fix typo in comment
        dt-bindings: net: can: c_can: convert to json-schema
        can: m_can: Batch FIFO writes during CAN transmit
        can: m_can: Batch FIFO reads during CAN receive
        can: m_can: Disable IRQs on FIFO bus errors
        can: m_can: fix block comment style
        can: tcan4x5x: cdev_to_priv(): remove stray empty line
        can: rcar_canfd: Add support for RZ/G2L family
        dt-bindings: net: can: renesas,rcar-canfd: Document RZ/G2L SoC
        can: mcp251xfd: mark some instances of struct mcp251xfd_priv as const
        can: etas_es58x: clean-up documentation of struct es58x_fd_tx_conf_msg
        MAINTAINERS: add Vincent MAILHOL as maintainer for the ETAS ES58X CAN/USB driver
        can: netlink: allow user to turn off unsupported features
        can: dev: provide optional GPIO based termination support
        dt-bindings: can: fsl,flexcan: enable termination-* bindings
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20210819133913.657715-1-mkl@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      185f690f
    • Vladimir Oltean's avatar
      net: dpaa2-switch: disable the control interface on error path · cd0a719f
      Vladimir Oltean authored
      Currently dpaa2_switch_takedown has a funny name and does not do the
      opposite of dpaa2_switch_init, which makes probing fail when we need to
      handle an -EPROBE_DEFER.
      
      A sketch of what dpaa2_switch_init does:
      
      	dpsw_open
      
      	dpaa2_switch_detect_features
      
      	dpsw_reset
      
      	for (i = 0; i < ethsw->sw_attr.num_ifs; i++) {
      		dpsw_if_disable
      
      		dpsw_if_set_stp
      
      		dpsw_vlan_remove_if_untagged
      
      		dpsw_if_set_tci
      
      		dpsw_vlan_remove_if
      	}
      
      	dpsw_vlan_remove
      
      	alloc_ordered_workqueue
      
      	dpsw_fdb_remove
      
      	dpaa2_switch_ctrl_if_setup
      
      When dpaa2_switch_takedown is called from the error path of
      dpaa2_switch_probe(), the control interface, enabled by
      dpaa2_switch_ctrl_if_setup from dpaa2_switch_init, remains enabled,
      because dpaa2_switch_takedown does not call
      dpaa2_switch_ctrl_if_teardown.
      
      Since dpaa2_switch_probe might fail due to EPROBE_DEFER of a PHY, this
      means that a second probe of the driver will happen with the control
      interface directly enabled.
      
      This will trigger a second error:
      
      [   93.273528] fsl_dpaa2_switch dpsw.0: dpsw_ctrl_if_set_pools() failed
      [   93.281966] fsl_dpaa2_switch dpsw.0: fsl_mc_driver_probe failed: -13
      [   93.288323] fsl_dpaa2_switch: probe of dpsw.0 failed with error -13
      
      Which if we investigate the /dev/dpaa2_mc_console log, we find out is
      caused by:
      
      [E, ctrl_if_set_pools:2211, DPMNG]  ctrl_if must be disabled
      
      So make dpaa2_switch_takedown do the opposite of dpaa2_switch_init (in
      reasonable limits, no reason to change STP state, re-add VLANs etc), and
      rename it to something more conventional, like dpaa2_switch_teardown.
      
      Fixes: 613c0a58 ("staging: dpaa2-switch: enable the control interface")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarIoana Ciornei <ioana.ciornei@nxp.com>
      Link: https://lore.kernel.org/r/20210819141755.1931423-1-vladimir.oltean@nxp.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      cd0a719f
    • Ido Schimmel's avatar
      Revert "flow_offload: action should not be NULL when it is referenced" · fa05bdb8
      Ido Schimmel authored
      This reverts commit 9ea3e52c.
      
      Cited commit added a check to make sure 'action' is not NULL, but
      'action' is already dereferenced before the check, when calling
      flow_offload_has_one_action().
      
      Therefore, the check does not make any sense and results in a smatch
      warning:
      
      include/net/flow_offload.h:322 flow_action_mixed_hw_stats_check() warn:
      variable dereferenced before check 'action' (see line 319)
      
      Fix by reverting this commit.
      
      Cc: gushengxian <gushengxian@yulong.com>
      Fixes: 9ea3e52c ("flow_offload: action should not be NULL when it is referenced")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarJamal Hadi Salim <jhs@mojatatu.com>
      Link: https://lore.kernel.org/r/20210819105842.1315705-1-idosch@idosch.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      fa05bdb8
    • Jakub Kicinski's avatar
      Merge branch 'intel-wired-lan-driver-updates-2021-08-18' · d584566c
      Jakub Kicinski authored
      Tony Nguyen says:
      
      ====================
      Intel Wired LAN Driver Updates 2021-08-18
      
      This series contains updates to i40e and iavf drivers.
      
      Arkadiusz fixes Flow Director not using the correct queue due to calling
      the wrong pick Tx function for i40e.
      
      Sylwester resolves traffic loss for iavf when it attempts to change its
      MAC address when it does not have permissions to do so.
      ====================
      
      Link: https://lore.kernel.org/r/20210818174217.4138922-1-anthony.l.nguyen@intel.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      d584566c