1. 01 Jun, 2020 40 commits
    • Willem de Bruijn's avatar
      tun: correct header offsets in napi frags mode · 96aa1b22
      Willem de Bruijn authored
      Tun in IFF_NAPI_FRAGS mode calls napi_gro_frags. Unlike netif_rx and
      netif_gro_receive, this expects skb->data to point to the mac layer.
      
      But skb_probe_transport_header, __skb_get_hash_symmetric, and
      xdp_do_generic in tun_get_user need skb->data to point to the network
      header. Flow dissection also needs skb->protocol set, so
      eth_type_trans has to be called.
      
      Ensure the link layer header lies in linear as eth_type_trans pulls
      ETH_HLEN. Then take the same code paths for frags as for not frags.
      Push the link layer header back just before calling napi_gro_frags.
      
      By pulling up to ETH_HLEN from frag0 into linear, this disables the
      frag0 optimization in the special case when IFF_NAPI_FRAGS is used
      with zero length iov[0] (and thus empty skb->linear).
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarPetar Penkov <ppenkov@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      96aa1b22
    • Guillaume Nault's avatar
      cls_flower: remove mpls_opts_policy · 4e4f4ce6
      Guillaume Nault authored
      Compiling with W=1 gives the following warning:
      net/sched/cls_flower.c:731:1: warning: ‘mpls_opts_policy’ defined but not used [-Wunused-const-variable=]
      
      The TCA_FLOWER_KEY_MPLS_OPTS contains a list of
      TCA_FLOWER_KEY_MPLS_OPTS_LSE. Therefore, the attributes all have the
      same type and we can't parse the list with nla_parse*() and have the
      attributes validated automatically using an nla_policy.
      
      fl_set_key_mpls_opts() properly verifies that all attributes in the
      list are TCA_FLOWER_KEY_MPLS_OPTS_LSE. Then fl_set_key_mpls_lse()
      uses nla_parse_nested() on all these attributes, thus verifying that
      they have the NLA_F_NESTED flag. So we can safely drop the
      mpls_opts_policy.
      Reported-by: default avatarkbuild test robot <lkp@intel.com>
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4e4f4ce6
    • David S. Miller's avatar
      Merge branch 'bridge-mrp-Add-support-for-MRA-role' · 2a67ab99
      David S. Miller authored
      Horatiu Vultur says:
      
      ====================
      bridge: mrp: Add support for MRA role
      
      This patch series extends the MRP with the MRA role.
      A node that has the MRA role can behave as a MRM or as a MRC. In case there are
      multiple nodes in the topology that has the MRA role then only one node can
      behave as MRM and all the others need to be have as MRC. The node that has the
      higher priority(lower value) will behave as MRM.
      A node that has the MRA role and behaves as MRC, it just needs to forward the
      MRP_Test frames between the ring ports but also it needs to detect in case it
      stops receiving MRP_Test frames. In that case it would try to behave as MRM.
      
      v2:
       - add new patch that fixes sparse warnings
       - fix parsing of prio attribute
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2a67ab99
    • Horatiu Vultur's avatar
      bridge: mrp: Add support for role MRA · c6676e7d
      Horatiu Vultur authored
      A node that has the MRA role, it can behave as MRM or MRC.
      
      Initially it starts as MRM and sends MRP_Test frames on both ring ports.
      If it detects that there are MRP_Test send by another MRM, then it
      checks if these frames have a lower priority than itself. In this case
      it would send MRP_Nack frames to notify the other node that it needs to
      stop sending MRP_Test frames.
      If it receives a MRP_Nack frame then it stops sending MRP_Test frames
      and starts to behave as a MRC but it would continue to monitor the
      MRP_Test frames send by MRM. If at a point the MRM stops to send
      MRP_Test frames it would get the MRM role and start to send MRP_Test
      frames.
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c6676e7d
    • Horatiu Vultur's avatar
      bridge: mrp: Set the priority of MRP instance · 4b3a61b0
      Horatiu Vultur authored
      Each MRP instance has a priority, a lower value means a higher priority.
      The priority of MRP instance is stored in MRP_Test frame in this way
      all the MRP nodes in the ring can see other nodes priority.
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b3a61b0
    • Horatiu Vultur's avatar
      bridge: mrp: Update MRP frame type · 7e89ed8a
      Horatiu Vultur authored
      Replace u16/u32 with be16/be32 in the MRP frame types.
      This fixes sparse warnings like:
      warning: cast to restricted __be16
      Signed-off-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7e89ed8a
    • Jia-Ju Bai's avatar
      net: vmxnet3: fix possible buffer overflow caused by bad DMA value in vmxnet3_get_rss() · 3e1c6846
      Jia-Ju Bai authored
      The value adapter->rss_conf is stored in DMA memory, and it is assigned
      to rssConf, so rssConf->indTableSize can be modified at anytime by
      malicious hardware. Because rssConf->indTableSize is assigned to n,
      buffer overflow may occur when the code "rssConf->indTable[n]" is
      executed.
      
      To fix this possible bug, n is checked after being used.
      Signed-off-by: default avatarJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3e1c6846
    • Arnd Bergmann's avatar
      flow_dissector: work around stack frame size warning · 0af413bd
      Arnd Bergmann authored
      The fl_flow_key structure is around 500 bytes, so having two of them
      on the stack in one function now exceeds the warning limit after an
      otherwise correct change:
      
      net/sched/cls_flower.c:298:12: error: stack frame size of 1056 bytes in function 'fl_classify' [-Werror,-Wframe-larger-than=]
      
      I suspect the fl_classify function could be reworked to only have one
      of them on the stack and modify it in place, but I could not work out
      how to do that.
      
      As a somewhat hacky workaround, move one of them into an out-of-line
      function to reduce its scope. This does not necessarily reduce the stack
      usage of the outer function, but at least the second copy is removed
      from the stack during most of it and does not add up to whatever is
      called from there.
      
      I now see 552 bytes of stack usage for fl_classify(), plus 528 bytes
      for fl_mask_lookup().
      
      Fixes: 58cff782 ("flow_dissector: Parse multiple MPLS Label Stack Entries")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0af413bd
    • Roelof Berg's avatar
      lan743x: Added fixed link and RGMII support · 6f197fb6
      Roelof Berg authored
      Microchip lan7431 is frequently connected to a phy. However, it
      can also be directly connected to a MII remote peer without
      any phy in between. For supporting such a phyless hardware setup
      in Linux we utilized phylib, which supports a fixed-link
      configuration via the device tree. And we added support for
      defining the connection type R/GMII in the device tree.
      
      New behavior:
      -------------
      . The automatic speed and duplex detection of the lan743x silicon
        between mac and phy is disabled. Instead phylib is used like in
        other typical Linux drivers. The usage of phylib allows to
        specify fixed-link parameters in the device tree.
      
      . The device tree entry phy-connection-type is supported now with
        the modes RGMII or (G)MII (default).
      
      Development state:
      ------------------
      . Tested with fixed-phy configurations. Not yet tested in normal
        configurations with phy. Microchip kindly offered testing
        as soon as the Corona measures allow this.
      
      . All review findings of Andrew Lunn are included
      
      Example:
      --------
      &pcie {
      	status = "okay";
      
      	host@0 {
      		reg = <0 0 0 0 0>;
      
      		#address-cells = <3>;
      		#size-cells = <2>;
      
      		ethernet@0 {
      			compatible = "weyland-yutani,noscom1", "microchip,lan743x";
      			status = "okay";
      			reg = <0 0 0 0 0>;
      			phy-connection-type = "rgmii";
      
      			fixed-link {
      				speed = <100>;
      				full-duplex;
      			};
      		};
      	};
      };
      Signed-off-by: default avatarRoelof Berg <rberg@berg-solutions.de>
      Signed-off-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f197fb6
    • David S. Miller's avatar
      Merge branch 'devlink-Add-support-for-control-packet-traps' · ff0f6383
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      devlink: Add support for control packet traps
      
      So far device drivers were only able to register drop and exception
      packet traps with devlink. These traps are used for packets that were
      either dropped by the underlying device or encountered an exception
      (e.g., missing neighbour entry) during forwarding.
      
      However, in the steady state, the majority of the packets being trapped
      to the CPU are packets that are required for the correct functioning of
      the control plane. For example, ARP request and IGMP query packets.
      
      This patch set allows device drivers to register such control traps with
      devlink and expose their default control plane policy to user space.
      User space can then tune the packet trap policer settings according to
      its needs, as with existing packet traps.
      
      In a similar fashion to exception traps, the action associated with such
      traps cannot be changed as it can easily break the control plane. Unlike
      drop and exception traps, packets trapped via control traps are not
      reported to the kernel's drop monitor as they are not indicative of any
      problem.
      
      Patch set overview:
      
      Patches #1-#3 break out layer 3 exceptions to a different group to
      provide better granularity. A future patch set will make this completely
      configurable.
      
      Patch #4 adds a new trap action ('mirror') that is used for packets that
      are forwarded by the device and sent to the CPU. Such packets are marked
      by device drivers with 'skb->offload_fwd_mark = 1' in order to prevent
      the kernel from forwarding them again.
      
      Patch #5 adds the new trap type, 'control'.
      
      Patches #6-#8 gradually add various control traps to devlink with proper
      documentation.
      
      Patch #9 adds a few control traps to netdevsim, which are automatically
      exercised by existing devlink-trap selftest.
      
      Patches #10 performs small refactoring in mlxsw.
      
      Patches #11-#13 change mlxsw to register its existing control traps with
      devlink.
      
      Patch #14 adds a selftest over mlxsw that exercises all the registered
      control traps.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ff0f6383
    • Ido Schimmel's avatar
      selftests: mlxsw: Add test for control packets · 9959b389
      Ido Schimmel authored
      Generate packets matching the various control traps and check that the
      traps' stats increase accordingly.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9959b389
    • Ido Schimmel's avatar
      mlxsw: spectrum_trap: Register ACL control traps · 88e27749
      Ido Schimmel authored
      In a similar fashion to other control traps, register ACL control traps
      with devlink.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      88e27749
    • Ido Schimmel's avatar
      mlxsw: spectrum_trap: Register layer 3 control traps · 8110668e
      Ido Schimmel authored
      In a similar fashion to layer 2 control traps, register layer 3 control
      traps with devlink.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8110668e
    • Ido Schimmel's avatar
      mlxsw: spectrum_trap: Register layer 2 control traps · 39c10350
      Ido Schimmel authored
      In a similar fashion to other traps, register layer 2 control traps with
      devlink.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      39c10350
    • Ido Schimmel's avatar
      mlxsw: spectrum_trap: Factor out common Rx listener function · 45b1c873
      Ido Schimmel authored
      We currently have an Rx listener function for exception traps that marks
      received skbs with 'offload_fwd_mark' and injects them to the kernel's
      Rx path. The marking is done because all these exceptions occur during
      L3 forwarding, after the packets were potentially flooded at L2.
      
      A subsequent patch will add support for control traps. Packets received
      via some of these control traps need different handling:
      
      1. Packets might not need to be marked with 'offload_fwd_mark'. For
         example, if packet was trapped before L2 forwarding
      
      2. Packets might not need to be injected to the kernel's Rx path. For
         example, sampled packets are reported to user space via the psample
         module
      
      Factor out a common Rx listener function that only reports trapped
      packets to devlink. Call it from mlxsw_sp_rx_no_mark_listener() and
      mlxsw_sp_rx_mark_listener() that will inject the packets to the kernel's
      Rx path, without and with the marking, respectively.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      45b1c873
    • Ido Schimmel's avatar
      netdevsim: Register control traps · 18979367
      Ido Schimmel authored
      Register two control traps with devlink. The existing selftest at
      tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh iterates
      over all registered traps and checks that the action of non-drop traps
      cannot be changed. Up until now only exception traps were tested, now
      control traps will be tested as well.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18979367
    • Ido Schimmel's avatar
      devlink: Add ACL control packet traps · 5eb18a2b
      Ido Schimmel authored
      Add packet traps for packets that are sampled / trapped by ACLs, so that
      capable drivers could register them with devlink. Add documentation for
      every added packet trap and packet trap group.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5eb18a2b
    • Ido Schimmel's avatar
      devlink: Add layer 3 control packet traps · d77cfd16
      Ido Schimmel authored
      Add layer 3 control packet traps such as ARP and DHCP, so that capable
      device drivers could register them with devlink. Add documentation for
      every added packet trap and packet trap group.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d77cfd16
    • Ido Schimmel's avatar
      devlink: Add layer 2 control packet traps · 515eac67
      Ido Schimmel authored
      Add layer 2 control packet traps such as STP and IGMP query, so that
      capable device drivers could register them with devlink. Add
      documentation for every added packet trap and packet trap group.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      515eac67
    • Ido Schimmel's avatar
      devlink: Add 'control' trap type · 30a4e9a2
      Ido Schimmel authored
      This type is used for traps that trap control packets such as ARP
      request and IGMP query to the CPU.
      
      Do not report such packets to the kernel's drop monitor as they were not
      dropped by the device no encountered an exception during forwarding.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      30a4e9a2
    • Ido Schimmel's avatar
      devlink: Add 'mirror' trap action · 9eefeabe
      Ido Schimmel authored
      The action is used by control traps such as IGMP query. The packet is
      flooded by the device, but also trapped to the CPU in order for the
      software bridge to mark the receiving port as a multicast router port.
      Such packets are marked with 'skb->offload_fwd_mark = 1' in order to
      prevent the software bridge from flooding them again.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9eefeabe
    • Ido Schimmel's avatar
      netdevsim: Move layer 3 exceptions to exceptions trap group · 85176f19
      Ido Schimmel authored
      The layer 3 exceptions are still subject to the same trap policer, so
      nothing changes, but user space can choose to assign a different one.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      85176f19
    • Ido Schimmel's avatar
      mlxsw: spectrum_trap: Move layer 3 exceptions to exceptions trap group · 1e292f5c
      Ido Schimmel authored
      The layer 3 exceptions are still subject to the same trap policer, so
      nothing changes, but user space can choose to assign a different one.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1e292f5c
    • Ido Schimmel's avatar
      devlink: Create dedicated trap group for layer 3 exceptions · 678eb199
      Ido Schimmel authored
      Packets that hit exceptions during layer 3 forwarding must be trapped to
      the CPU for the control plane to function properly. Create a dedicated
      group for them, so that user space could choose to assign a different
      policer for them.
      Signed-off-by: default avatarIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: default avatarJiri Pirko <jiri@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      678eb199
    • David S. Miller's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · af0a2482
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for net-next
      to extend ctnetlink and the flowtable infrastructure:
      
      1) Extend ctnetlink kernel side netlink dump filtering capabilities,
         from Romain Bellan.
      
      2) Generalise the flowtable hook parser to take a hook list.
      
      3) Pass a hook list to the flowtable hook registration/unregistration.
      
      4) Add a helper function to release the flowtable hook list.
      
      5) Update the flowtable event notifier to pass a flowtable hook list.
      
      6) Allow users to add new devices to an existing flowtables.
      
      7) Allow users to remove devices to an existing flowtables.
      
      8) Allow for registering a flowtable with no initial devices.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      af0a2482
    • Liu Xiang's avatar
      net: fec: disable correct clk in the err path of fec_enet_clk_enable · a74d19ba
      Liu Xiang authored
      When enable clk_ref failed, clk_ptp should be disabled rather than
      clk_ref itself.
      Signed-off-by: default avatarLiu Xiang <liuxiang_1999@126.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a74d19ba
    • Alexander Sverdlin's avatar
      net: octeon: mgmt: Repair filling of RX ring · 0c34bb59
      Alexander Sverdlin authored
      The removal of mips_swiotlb_ops exposed a problem in octeon_mgmt Ethernet
      driver. mips_swiotlb_ops had an mb() after most of the operations and the
      removal of the ops had broken the receive functionality of the driver.
      My code inspection has shown no other places except
      octeon_mgmt_rx_fill_ring() where an explicit barrier would be obviously
      missing. The latter function however has to make sure that "ringing the
      bell" doesn't happen before RX ring entry is really written.
      
      The patch has been successfully tested on Octeon II.
      
      Fixes: a999933d ("MIPS: remove mips_swiotlb_ops")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0c34bb59
    • David S. Miller's avatar
      Merge branch 'fix-indirect-flow_block-infrastructure' · 2aec17f1
      David S. Miller authored
      Pablo Neira Ayuso says:
      
      ====================
      the indirect flow_block infrastructure, revisited
      
      This series fixes b5140a36 ("netfilter: flowtable: add indr block
      setup support") that adds support for the indirect block for the
      flowtable. This patch crashes the kernel with the TC CT action.
      
      [  630.908086] BUG: kernel NULL pointer dereference, address: 00000000000000f0
      [  630.908233] #PF: error_code(0x0000) - not-present page
      [  630.908304] PGD 800000104addd067 P4D 800000104addd067 PUD 104311d067 PMD 0
      [  630.908380] Oops: 0000 [#1] SMP PTI [  630.908615] RIP: 0010:nf_flow_table_indr_block_cb+0xc0/0x190 [nf_flow_table]
      [  630.908690] Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 4c 89 75 a0 4c 89 65 a8 4d 89 ee 49 89 dd 4c 89 fe 48 c7 c7 b7 64 36 a0 31 c0 e8 ce ed d8 e0 <49> 8b b7 f0 00 00 00 48 c7 c7 c8 64      36 a0 31 c0 e8 b9 ed d8 e0 49[  630.908790] RSP: 0018:ffffc9000895f8c0 EFLAGS: 00010246
      [...]
      [  630.910774] Call Trace:
      [  630.911192]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
      [  630.911621]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
      [  630.912040]  ? mlx5e_rep_indr_setup_block+0x270/0x270 [mlx5_core]
      [  630.912443]  flow_block_cmd+0x51/0x80
      [  630.912844]  __flow_indr_block_cb_register+0x26c/0x510
      [  630.913265]  mlx5e_nic_rep_netdevice_event+0x9e/0x110 [mlx5_core]
      [  630.913665]  notifier_call_chain+0x53/0xa0
      [  630.914063]  raw_notifier_call_chain+0x16/0x20
      [  630.914466]  call_netdevice_notifiers_info+0x39/0x90
      [  630.914859]  register_netdevice+0x484/0x550
      [  630.915256]  __ip_tunnel_create+0x12b/0x1f0 [ip_tunnel]
      [  630.915661]  ip_tunnel_init_net+0x116/0x180 [ip_tunnel]
      [  630.916062]  ipgre_tap_init_net+0x22/0x30 [ip_gre]
      [  630.916458]  ops_init+0x44/0x110
      [  630.916851]  register_pernet_operations+0x112/0x200
      
      A workaround patch to cure this crash has been proposed. However, there
      is another problem: The indirect flow_block still does not work for the
      new TC CT action. The problem is that the existing flow_indr_block_entry
      callback assumes you can look up for the flowtable from the netdevice to
      get the flow_block. This flow_block allows you to offload the flows via
      TC_SETUP_CLSFLOWER. Unfortunately, it is not possible to get the
      flow_block from the TC CT flowtables because they are _not_ bound to any
      specific netdevice.
      
      = What is the indirect flow_block infrastructure?
      
      The indirect flow_block infrastructure allows drivers to offload
      tc/netfilter rules that belong to software tunnel netdevices, e.g.
      vxlan.
      
      This indirect flow_block infrastructure relates tunnel netdevices with
      drivers because there is no obvious way to relate these two things
      from the control plane.
      
      = How does the indirect flow_block work before this patchset?
      
      Front-ends register the indirect block callback through
      flow_indr_add_block_cb() if they support for offloading tunnel
      netdevices.
      
      == Setting up an indirect block
      
      1) Drivers track tunnel netdevices via NETDEV_{REGISTER,UNREGISTER} events.
         If there is a new tunnel netdevice that the driver can offload, then the
         driver invokes __flow_indr_block_cb_register() with the new tunnel
         netdevice and the driver callback. The __flow_indr_block_cb_register()
         call iterates over the list of the front-end callbacks.
      
      2) The front-end callback sets up the flow_block_offload structure and it
         invokes the driver callback to set up the flow_block.
      
      3) The driver callback now registers the flow_block structure and it
         returns the flow_block back to the front-end.
      
      4) The front-end gets the flow_block object and it is now ready to
         offload rules for this tunnel netdevice.
      
      A simplified callgraph is represented below.
      
              Front-end                      Driver
      
                                         NETDEV_REGISTER
                                               |
                           __flow_indr_block_cb_register(netdev, cb_priv, driver_cb)
                                               | [1]
                  .--------------frontend_indr_block_cb(cb_priv, driver_cb)
                  |
                  .
         setup_flow_block_offload(bo)
                  | [2]
             driver_cb(bo, cb_priv) -----------.
                                               |
                                               \/
                                        set up flow_blocks [3]
                                               |
            add rules to flow_block <----------
            TC_SETUP_CLSFLOWER [4]
      
      == Releasing the indirect flow_block
      
      There are two possibilities, either tunnel netdevice is removed or
      a netdevice (port representor) is removed.
      
      === Tunnel netdevice is removed
      
      Driver waits for the NETDEV_UNREGISTER event that announces the tunnel
      netdevice removal. Then, it calls __flow_indr_block_cb_unregister() to
      remove the flow_block and rules.  Callgraph is very similar to the one
      described above.
      
      === Netdevice is removed (port representor)
      
      Driver calls __flow_indr_block_cb_unregister() to remove the existing
      netfilter/tc rule that belong to the tunnel netdevice.
      
      = How does the indirect flow_block work after this patchset?
      
      Drivers register the indirect flow_block setup callback through
      flow_indr_dev_register() if they support for offloading tunnel
      netdevices.
      
      == Setting up an indirect flow_block
      
      1) Frontends check if dev->netdev_ops->ndo_setup_tc is unset. If so,
         frontends call flow_indr_dev_setup_offload(). This call invokes
         the drivers' indirect flow_block setup callback.
      
      2) The indirect flow_block setup callback sets up a flow_block structure
         which relates the tunnel netdevice and the driver.
      
      3) The front-end uses flow_block and offload the rules.
      
      Note that the operational to set up (non-indirect) flow_block is very
      similar.
      
      == Releasing the indirect flow_block
      
      === Tunnel netdevice is removed
      
      This calls flow_indr_dev_setup_offload() to set down the flow_block and
      remove the offloaded rules. This alternate path is exercised if
      dev->netdev_ops->ndo_setup_tc is unset.
      
      === Netdevice is removed (port representor)
      
      If a netdevice is removed, then it might need to to clean up the
      offloaded tc/netfilter rules that belongs to the tunnel netdevice:
      
      1) The driver invokes flow_indr_dev_unregister() when a netdevice is
         removed.
      
      2) This call iterates over the existing indirect flow_blocks
         and it invokes the cleanup callback to let the front-end remove the
         tc/netfilter rules. The cleanup callback already provides the
         flow_block that the front-end needs to clean up.
      
              Front-end                      Driver
      
                                               |
                                  flow_indr_dev_unregister(...)
                                               |
                               iterate over list of indirect flow_block
                                     and invoke cleanup callback
                                               |
                  .-----------------------------
                  |
                  .
         frontend_flow_block_cleanup(flow_block)
                  .
                  |
                 \/
         remove rules to flow_block
            TC_SETUP_CLSFLOWER
      
      = About this patchset
      
      This patchset aims to address the existing TC CT problem while
      simplifying the indirect flow_block infrastructure. Saving 300 LoC in
      the flow_offload core and the drivers. The operational gets aligned with
      the (non-indirect) flow_blocks logic. Patchset is composed of:
      
      Patch #1 add nf_flow_table_gc_cleanup() which is required by the
               netfilter's flowtable new indirect flow_block approach.
      
      Patch #2 adds the flow_block_indr object which is actually part of
               of the flow_block object. This stores the indirect flow_block
               metadata such as the tunnel netdevice owner and the cleanup
               callback (in case the tunnel netdevice goes away).
      
               This patch adds flow_indr_dev_{un}register() to allow drivers
               to offer netdevice tunnel hardware offload to the front-ends.
               Then, front-ends call flow_indr_dev_setup_offload() to invoke
               the drivers to set up the (indirect) flow_block.
      
      Patch #3 add the tcf_block_offload_init() helper function, this is
               a preparation patch to adapt the tc front-end to use this
               new indirect flow_block infrastructure.
      
      Patch #4 updates the tc and netfilter front-ends to use the new
               indirect flow_block infrastructure.
      
      Patch #5 updates the mlx5 driver to use the new indirect flow_block
               infrastructure.
      
      Patch #6 updates the nfp driver to use the new indirect flow_block
               infrastructure.
      
      Patch #7 updates the bnxt driver to use the new indirect flow_block
               infrastructure.
      
      Patch #8 removes the indirect flow_block infrastructure version 1,
               now that frontends and drivers have been translated to
               version 2 (coming in this patchset).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2aec17f1
    • Pablo Neira Ayuso's avatar
      net: remove indirect block netdev event registration · 709ffbe1
      Pablo Neira Ayuso authored
      Drivers do not register to netdev events to set up indirect blocks
      anymore. Remove __flow_indr_block_cb_register() and
      __flow_indr_block_cb_unregister().
      
      The frontends set up the callbacks through flow_indr_dev_setup_block()
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      709ffbe1
    • Pablo Neira Ayuso's avatar
      bnxt_tc: update indirect block support · e445e30c
      Pablo Neira Ayuso authored
      Register ndo callback via flow_indr_dev_register() and
      flow_indr_dev_unregister().
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e445e30c
    • Pablo Neira Ayuso's avatar
      nfp: update indirect block support · 50c1b1c9
      Pablo Neira Ayuso authored
      Register ndo callback via flow_indr_dev_register() and
      flow_indr_dev_unregister().
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50c1b1c9
    • Pablo Neira Ayuso's avatar
      mlx5: update indirect block support · 9eabd188
      Pablo Neira Ayuso authored
      Register ndo callback via flow_indr_dev_register() and
      flow_indr_dev_unregister().
      
      No need for mlx5e_rep_indr_clean_block_privs() since flow_block_cb_free()
      already releases the internal mapping via ->release callback, which in
      this case is mlx5e_rep_indr_tc_block_unbind().
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9eabd188
    • Pablo Neira Ayuso's avatar
      net: use flow_indr_dev_setup_offload() · 0fdcf78d
      Pablo Neira Ayuso authored
      Update existing frontends to use flow_indr_dev_setup_offload().
      
      This new function must be called if ->ndo_setup_tc is unset to deal
      with tunnel devices.
      
      If there is no driver that is subscribed to new tunnel device
      flow_block bindings, then this function bails out with EOPNOTSUPP.
      
      If the driver module is removed, the ->cleanup() callback removes the
      entries that belong to this tunnel device. This cleanup procedures is
      triggered when the device unregisters the tunnel device offload handler.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0fdcf78d
    • Pablo Neira Ayuso's avatar
      net: cls_api: add tcf_block_offload_init() · 324a823b
      Pablo Neira Ayuso authored
      Add a helper function to initialize the flow_block_offload structure.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      324a823b
    • Pablo Neira Ayuso's avatar
      net: flow_offload: consolidate indirect flow_block infrastructure · 1fac52da
      Pablo Neira Ayuso authored
      Tunnel devices provide no dev->netdev_ops->ndo_setup_tc(...) interface.
      The tunnel device and route control plane does not provide an obvious
      way to relate tunnel and physical devices.
      
      This patch allows drivers to register a tunnel device offload handler
      for the tc and netfilter frontends through flow_indr_dev_register() and
      flow_indr_dev_unregister().
      
      The frontend calls flow_indr_dev_setup_offload() that iterates over the
      list of drivers that are offering tunnel device hardware offload
      support and it sets up the flow block for this tunnel device.
      
      If the driver module is removed, the indirect flow_block ends up with a
      stale callback reference. The module removal path triggers the
      dev_shutdown() path to remove the qdisc and the flow_blocks for the
      physical devices. However, this is not useful for tunnel devices, where
      relation between the physical and the tunnel device is not explicit.
      
      This patch introduces a cleanup callback that is invoked when the driver
      module is removed to clean up the tunnel device flow_block. This patch
      defines struct flow_block_indr and it uses it from flow_block_cb to
      store the information that front-end requires to perform the
      flow_block_cb cleanup on module removal.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1fac52da
    • Pablo Neira Ayuso's avatar
      netfilter: nf_flowtable: expose nf_flow_table_gc_cleanup() · a8284c68
      Pablo Neira Ayuso authored
      This function schedules the flow teardown state and it forces a gc run.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8284c68
    • Davide Caratti's avatar
      net/sched: fix a couple of splats in the error path of tfc_gate_init() · a01c2454
      Davide Caratti authored
      trying to configure TC 'act_gate' rules with invalid control actions, the
      following splat can be observed:
      
       general protection fault, probably for non-canonical address 0xdffffc0000000002: 0000 [#1] SMP KASAN NOPTI
       KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
       CPU: 1 PID: 2143 Comm: tc Not tainted 5.7.0-rc6+ #168
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:hrtimer_active+0x56/0x290
       [...]
        Call Trace:
        hrtimer_try_to_cancel+0x6d/0x330
        hrtimer_cancel+0x11/0x20
        tcf_gate_cleanup+0x15/0x30 [act_gate]
        tcf_action_cleanup+0x58/0x170
        __tcf_action_put+0xb0/0xe0
        __tcf_idr_release+0x68/0x90
        tcf_gate_init+0x7c7/0x19a0 [act_gate]
        tcf_action_init_1+0x60f/0x960
        tcf_action_init+0x157/0x2a0
        tcf_action_add+0xd9/0x2f0
        tc_ctl_action+0x2a3/0x39d
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x121/0x350
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      this is caused by hrtimer_cancel(), running before hrtimer_init(). Fix it
      ensuring to call hrtimer_cancel() only if clockid is valid, and the timer
      has been initialized. After fixing this splat, the same error path causes
      another problem:
      
       general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
       KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
       CPU: 1 PID: 980 Comm: tc Not tainted 5.7.0-rc6+ #168
       Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
       RIP: 0010:release_entry_list+0x4a/0x240 [act_gate]
       [...]
       Call Trace:
        tcf_action_cleanup+0x58/0x170
        __tcf_action_put+0xb0/0xe0
        __tcf_idr_release+0x68/0x90
        tcf_gate_init+0x7ab/0x19a0 [act_gate]
        tcf_action_init_1+0x60f/0x960
        tcf_action_init+0x157/0x2a0
        tcf_action_add+0xd9/0x2f0
        tc_ctl_action+0x2a3/0x39d
        rtnetlink_rcv_msg+0x5f3/0x920
        netlink_rcv_skb+0x121/0x350
        netlink_unicast+0x439/0x630
        netlink_sendmsg+0x714/0xbf0
        sock_sendmsg+0xe2/0x110
        ____sys_sendmsg+0x5b4/0x890
        ___sys_sendmsg+0xe9/0x160
        __sys_sendmsg+0xd3/0x170
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      the problem is similar: tcf_action_cleanup() was trying to release a list
      without initializing it first. Ensure that INIT_LIST_HEAD() is called for
      every newly created 'act_gate' action, same as what was done to 'act_ife'
      with commit 44c23d71 ("net/sched: act_ife: initalize ife->metalist
      earlier").
      
      Fixes: a51c328d ("net: qos: introduce a gate control flow action")
      CC: Ivan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a01c2454
    • David S. Miller's avatar
      Merge branch 'regmap-simple-bit-helpers' · e8509361
      David S. Miller authored
      Bartosz Golaszewski says:
      
      ====================
      regmap: provide simple bitops and use them in a driver
      
      I noticed that oftentimes I use regmap_update_bits() for simple bit
      setting or clearing. In this case the fourth argument is superfluous as
      it's always 0 or equal to the mask argument.
      
      This series proposes to add simple bit operations for setting, clearing
      and testing specific bits with regmap.
      
      The second patch uses all three in a driver that got recently picked into
      the net-next tree.
      
      The patches obviously target different trees so - if you're ok with
      the change itself - I propose you pick the first one into your regmap
      tree for v5.8 and then I'll resend the second patch to add the first
      user for these macros for v5.9.
      
      v1 -> v2:
      - convert the new macros to static inline functions
      
      v2 -> v3:
      - drop unneeded ternary operator
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e8509361
    • Bartosz Golaszewski's avatar
      net: ethernet: mtk-star-emac: use regmap bitops · 240f1ae4
      Bartosz Golaszewski authored
      Shrink the code visually by replacing regmap_update_bits() with
      appropriate regmap bit operations where applicable.
      Signed-off-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      240f1ae4
    • Bartosz Golaszewski's avatar
      regmap: provide helpers for simple bit operations · bfad9781
      Bartosz Golaszewski authored
      In many instances regmap_update_bits() is used for simple bit setting
      and clearing. In these cases the last argument is redundant and we can
      hide it with a static inline function.
      
      This adds three new helpers for simple bit operations: set_bits,
      clear_bits and test_bits (the last one defined as a regular function).
      Signed-off-by: default avatarBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfad9781