1. 21 Dec, 2022 14 commits
  2. 14 Dec, 2022 15 commits
  3. 13 Dec, 2022 1 commit
  4. 12 Dec, 2022 10 commits
    • Jakub Kicinski's avatar
      Merge branch 'net-ipa-enable-ipa-v4-7-support' · c4b7a297
      Jakub Kicinski authored
      Alex Elder says:
      
      ====================
      net: ipa: enable IPA v4.7 support
      
      The first patch in this series adds "qcom,sm6350-ipa" as a possible
      IPA compatible string, for the Qualcomm SM6350 SoC.  That SoC uses
      IPA v4.7
      
      The second patch in this series adds code that enables support for
      IPA v4.7.  DTS updates that make use of these will be merged later.
      ====================
      
      Link: https://lore.kernel.org/r/20221208211529.757669-1-elder@linaro.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      c4b7a297
    • Alex Elder's avatar
      net: ipa: add IPA v4.7 support · b310de78
      Alex Elder authored
      Add the necessary register and data definitions needed for IPA v4.7,
      which is found on the SM6350 SoC.
      Co-developed-by: default avatarLuca Weiss <luca.weiss@fairphone.com>
      Signed-off-by: default avatarLuca Weiss <luca.weiss@fairphone.com>
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b310de78
    • Luca Weiss's avatar
      dt-bindings: net: qcom,ipa: Add SM6350 compatible · 5071429f
      Luca Weiss authored
      Add support for SM6350, which uses IPA v4.7.
      Signed-off-by: default avatarLuca Weiss <luca.weiss@fairphone.com>
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Acked-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5071429f
    • Coco Li's avatar
      bnxt: Use generic HBH removal helper in tx path · b6488b16
      Coco Li authored
      Eric Dumazet implemented Big TCP that allowed bigger TSO/GRO packet sizes
      for IPv6 traffic. See patch series:
      'commit 89527be8 ("net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes")'
      
      This reduces the number of packets traversing the networking stack and
      should usually improves performance. However, it also inserts a
      temporary Hop-by-hop IPv6 extension header.
      
      Using the HBH header removal method in the previous patch, the extra header
      be removed in bnxt drivers to allow it to send big TCP packets (bigger
      TSO packets) as well.
      
      Tested:
      Compiled locally
      
      To further test functional correctness, update the GSO/GRO limit on the
      physical NIC:
      
      ip link set eth0 gso_max_size 181000
      ip link set eth0 gro_max_size 181000
      
      Note that if there are bonding or ipvan devices on top of the physical
      NIC, their GSO sizes need to be updated as well.
      
      Then, IPv6/TCP packets with sizes larger than 64k can be observed.
      Signed-off-by: default avatarCoco Li <lixiaoyan@google.com>
      Reviewed-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Tested-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://lore.kernel.org/r/20221210041646.3587757-2-lixiaoyan@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b6488b16
    • Coco Li's avatar
      IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver · 89300468
      Coco Li authored
      IPv6/TCP and GRO stacks can build big TCP packets with an added
      temporary Hop By Hop header.
      
      Is GSO is not involved, then the temporary header needs to be removed in
      the driver. This patch provides a generic helper for drivers that need
      to modify their headers in place.
      
      Tested:
      Compiled and ran with ethtool -K eth1 tso off
      Could send Big TCP packets
      Signed-off-by: default avatarCoco Li <lixiaoyan@google.com>
      Link: https://lore.kernel.org/r/20221210041646.3587757-1-lixiaoyan@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      89300468
    • Jakub Kicinski's avatar
      Merge branch 'bridge-mcast-extensions-for-evpn' · 8150f0cf
      Jakub Kicinski authored
      Ido Schimmel says:
      
      ====================
      bridge: mcast: Extensions for EVPN
      
      tl;dr
      =====
      
      This patchset creates feature parity between user space and the kernel
      and allows the former to install and replace MDB port group entries with
      a source list and associated filter mode. This is required for EVPN use
      cases where multicast state is not derived from snooped IGMP/MLD
      packets, but instead derived from EVPN routes exchanged by the control
      plane in user space.
      
      Background
      ==========
      
      IGMPv3 [1] and MLDv2 [2] differ from earlier versions of the protocols
      in that they add support for source-specific multicast. That is, hosts
      can advertise interest in listening to a particular multicast address
      only from specific source addresses or from all sources except for
      specific source addresses.
      
      In kernel 5.10 [3][4], the bridge driver gained the ability to snoop
      IGMPv3/MLDv2 packets and install corresponding MDB port group entries.
      For example, a snooped IGMPv3 Membership Report that contains a single
      MODE_IS_EXCLUDE record for group 239.10.10.10 with sources 192.0.2.1,
      192.0.2.2, 192.0.2.20 and 192.0.2.21 would trigger the creation of these
      entries:
      
       # bridge -d mdb show
       dev br0 port veth1 grp 239.10.10.10 src 192.0.2.21 temp filter_mode include proto kernel  blocked
       dev br0 port veth1 grp 239.10.10.10 src 192.0.2.20 temp filter_mode include proto kernel  blocked
       dev br0 port veth1 grp 239.10.10.10 src 192.0.2.2 temp filter_mode include proto kernel  blocked
       dev br0 port veth1 grp 239.10.10.10 src 192.0.2.1 temp filter_mode include proto kernel  blocked
       dev br0 port veth1 grp 239.10.10.10 temp filter_mode exclude source_list 192.0.2.21/0.00,192.0.2.20/0.00,192.0.2.2/0.00,192.0.2.1/0.00 proto kernel
      
      While the kernel can install and replace entries with a filter mode and
      source list, user space cannot. It can only add EXCLUDE entries with an
      empty source list, which is sufficient for IGMPv2/MLDv1, but not for
      IGMPv3/MLDv2.
      
      Use cases where the multicast state is not derived from snooped packets,
      but instead derived from routes exchanged by the user space control
      plane require feature parity between user space and the kernel in terms
      of MDB configuration. Such a use case is detailed in the next section.
      
      Motivation
      ==========
      
      RFC 7432 [5] defines a "MAC/IP Advertisement route" (type 2) [6] that
      allows NVE switches in the EVPN network to advertise and learn
      reachability information for unicast MAC addresses. Traffic destined to
      a unicast MAC address can therefore be selectively forwarded to a single
      NVE switch behind which the MAC is located.
      
      The same is not true for IP multicast traffic. Such traffic is simply
      flooded as BUM to all NVE switches in the broadcast domain (BD),
      regardless if a switch has interested receivers for the multicast stream
      or not. This is especially problematic for overlay networks that make
      heavy use of multicast.
      
      The issue is addressed by RFC 9251 [7] that defines a "Selective
      Multicast Ethernet Tag Route" (type 6) [8] which allows NVE switches in
      the EVPN network to advertise multicast streams that they are interested
      in. This is done by having each switch suppress IGMP/MLD packets from
      being transmitted to the NVE network and instead communicate the
      information over BGP to other switches.
      
      As far as the bridge driver is concerned, the above means that the
      multicast state (i.e., {multicast address, group timer, filter-mode,
      (source records)}) for the VXLAN bridge port is not populated by the
      kernel from snooped IGMP/MLD packets (they are suppressed), but instead
      by user space. Specifically, by the routing daemon that is exchanging
      EVPN routes with other NVE switches.
      
      Changes are obviously also required in the VXLAN driver, but they are
      the subject of future patchsets. See the "Future work" section.
      
      Implementation
      ==============
      
      The user interface is extended to allow user space to specify the filter
      mode of the MDB port group entry and its source list. Replace support is
      also added so that user space would not need to remove an entry and
      re-add it only to edit its source list or filter mode, as that would
      result in packet loss. Example usage:
      
       # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent \
      	source_list 192.0.2.1,192.0.2.3 filter_mode exclude proto zebra
       # bridge -d -s mdb show
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 permanent filter_mode include proto zebra  blocked    0.00
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra  blocked    0.00
       dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude source_list 192.0.2.3/0.00,192.0.2.1/0.00 proto zebra     0.00
      
      The netlink interface is extended with a few new attributes in the
      RTM_NEWMDB request message:
      
      [ struct nlmsghdr ]
      [ struct br_port_msg ]
      [ MDBA_SET_ENTRY ]
      	struct br_mdb_entry
      [ MDBA_SET_ENTRY_ATTRS ]
      	[ MDBE_ATTR_SOURCE ]
      		struct in_addr / struct in6_addr
      	[ MDBE_ATTR_SRC_LIST ]		// new
      		[ MDBE_SRC_LIST_ENTRY ]
      			[ MDBE_SRCATTR_ADDRESS ]
      				struct in_addr / struct in6_addr
      		[ ...]
      	[ MDBE_ATTR_GROUP_MODE ]	// new
      		u8
      	[ MDBE_ATTR_RTPORT ]		// new
      		u8
      
      No changes are required in RTM_NEWMDB responses and notifications, as
      all the information can already be dumped by the kernel today.
      
      Testing
      =======
      
      Tested with existing bridge multicast selftests: bridge_igmp.sh,
      bridge_mdb_port_down.sh, bridge_mdb.sh, bridge_mld.sh,
      bridge_vlan_mcast.sh.
      
      In addition, added many new test cases for existing as well as for new
      MDB functionality.
      
      Patchset overview
      =================
      
      Patches #1-#8 are non-functional preparations for the core changes in
      later patches.
      
      Patches #9-#10 allow user space to install (*, G) entries with a source
      list and associated filter mode. Specifically, patch #9 adds the
      necessary kernel plumbing and patch #10 exposes the new functionality to
      user space via a few new attributes.
      
      Patch #11 allows user space to specify the routing protocol of new MDB
      port group entries so that a routing daemon could differentiate between
      entries installed by it and those installed by an administrator.
      
      Patch #12 allows user space to replace MDB port group entries. This is
      useful, for example, when user space wants to add a new source to a
      source list. Instead of deleting a (*, G) entry and re-adding it with an
      extended source list (which would result in packet loss), user space can
      simply replace the current entry.
      
      Patches #13-#14 add tests for existing MDB functionality as well as for
      all new functionality added in this patchset.
      
      Future work
      ===========
      
      The VXLAN driver will need to be extended with an MDB so that it could
      selectively forward IP multicast traffic to NVE switches with interested
      receivers instead of simply flooding it to all switches as BUM.
      
      The idea is to reuse the existing MDB interface for the VXLAN driver in
      a similar way to how the FDB interface is shared between the bridge and
      VXLAN drivers.
      
      From command line perspective, configuration will look as follows:
      
       # bridge mdb add dev br0 port vxlan0 grp 239.1.1.1 permanent \
      	filter_mode exclude source_list 198.50.100.1,198.50.100.2
      
       # bridge mdb add dev vxlan0 port vxlan0 grp 239.1.1.1 permanent \
      	filter_mode include source_list 198.50.100.3,198.50.100.4 \
      	dst 192.0.2.1 dst_port 4789 src_vni 2
      
       # bridge mdb add dev vxlan0 port vxlan0 grp 239.1.1.1 permanent \
      	filter_mode exclude source_list 198.50.100.1,198.50.100.2 \
      	dst 192.0.2.2 dst_port 4789 src_vni 2
      
      Where the first command is enabled by this set, but the next two will be
      the subject of future work.
      
      From netlink perspective, the existing PF_BRIDGE/RTM_*MDB messages will
      be extended to the VXLAN driver. This means that a few new attributes
      will be added (e.g., 'MDBE_ATTR_SRC_VNI') and that the handlers for
      these messages will need to move to net/core/rtnetlink.c. The rtnetlink
      code will call into the appropriate driver based on the ifindex
      specified in the ancillary header.
      
      iproute2 patches can be found here [9].
      
      Changelog
      =========
      
      Since v1 [10]:
      
      * Patch #12: Remove extack from br_mdb_replace_group_sg().
      * Patch #12: Change 'nlflags' to u16 and move it after 'filter_mode' to
        pack the structure.
      
      Since RFC [11]:
      
      * Patch #6: New patch.
      * Patch #9: Use an array instead of a list to store source entries.
      * Patch #10: Use an array instead of list to store source entries.
      * Patch #10: Drop br_mdb_config_attrs_fini().
      * Patch #11: Reject protocol for host entries.
      * Patch #13: New patch.
      * Patch #14: New patch.
      
      [1] https://datatracker.ietf.org/doc/html/rfc3376
      [2] https://www.rfc-editor.org/rfc/rfc3810
      [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6af52ae2ed14a6bc756d5606b29097dfd76740b8
      [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=68d4fd30c83b1b208e08c954cd45e6474b148c87
      [5] https://datatracker.ietf.org/doc/html/rfc7432
      [6] https://datatracker.ietf.org/doc/html/rfc7432#section-7.2
      [7] https://datatracker.ietf.org/doc/html/rfc9251
      [8] https://datatracker.ietf.org/doc/html/rfc9251#section-9.1
      [9] https://github.com/idosch/iproute2/commits/submit/mdb_v1
      [10] https://lore.kernel.org/netdev/20221208152839.1016350-1-idosch@nvidia.com/
      [11] https://lore.kernel.org/netdev/20221018120420.561846-1-idosch@nvidia.com/
      ====================
      
      Link: https://lore.kernel.org/r/20221210145633.1328511-1-idosch@nvidia.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8150f0cf
    • Ido Schimmel's avatar
      selftests: forwarding: Add bridge MDB test · b6d00da0
      Ido Schimmel authored
      Add a selftests that includes the following test cases:
      
      1. Configuration tests. Both valid and invalid configurations are
         tested across all entry types (e.g., L2, IPv4).
      
      2. Forwarding tests. Both host and port group entries are tested across
         all entry types.
      
      3. Interaction between user installed MDB entries and IGMP / MLD control
         packets.
      
      Example output:
      
      INFO: # Host entries configuration tests
      TEST: Common host entries configuration tests (IPv4)                [ OK ]
      TEST: Common host entries configuration tests (IPv6)                [ OK ]
      TEST: Common host entries configuration tests (L2)                  [ OK ]
      
      INFO: # Port group entries configuration tests - (*, G)
      TEST: Common port group entries configuration tests (IPv4 (*, G))   [ OK ]
      TEST: Common port group entries configuration tests (IPv6 (*, G))   [ OK ]
      TEST: IPv4 (*, G) port group entries configuration tests            [ OK ]
      TEST: IPv6 (*, G) port group entries configuration tests            [ OK ]
      
      INFO: # Port group entries configuration tests - (S, G)
      TEST: Common port group entries configuration tests (IPv4 (S, G))   [ OK ]
      TEST: Common port group entries configuration tests (IPv6 (S, G))   [ OK ]
      TEST: IPv4 (S, G) port group entries configuration tests            [ OK ]
      TEST: IPv6 (S, G) port group entries configuration tests            [ OK ]
      
      INFO: # Port group entries configuration tests - L2
      TEST: Common port group entries configuration tests (L2 (*, G))     [ OK ]
      TEST: L2 (*, G) port group entries configuration tests              [ OK ]
      
      INFO: # Forwarding tests
      TEST: IPv4 host entries forwarding tests                            [ OK ]
      TEST: IPv6 host entries forwarding tests                            [ OK ]
      TEST: L2 host entries forwarding tests                              [ OK ]
      TEST: IPv4 port group "exclude" entries forwarding tests            [ OK ]
      TEST: IPv6 port group "exclude" entries forwarding tests            [ OK ]
      TEST: IPv4 port group "include" entries forwarding tests            [ OK ]
      TEST: IPv6 port group "include" entries forwarding tests            [ OK ]
      TEST: L2 port entries forwarding tests                              [ OK ]
      
      INFO: # Control packets tests
      TEST: IGMPv3 MODE_IS_INCLUE tests                                   [ OK ]
      TEST: MLDv2 MODE_IS_INCLUDE tests                                   [ OK ]
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b6d00da0
    • Ido Schimmel's avatar
      selftests: forwarding: Rename bridge_mdb test · f9923a67
      Ido Schimmel authored
      The test is only concerned with host MDB entries and not with MDB
      entries as a whole. Rename the test to reflect that.
      
      Subsequent patches will add a more general test that will contain the
      test cases for host MDB entries and remove the current test.
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f9923a67
    • Ido Schimmel's avatar
      bridge: mcast: Support replacement of MDB port group entries · 61f21835
      Ido Schimmel authored
      Now that user space can specify additional attributes of port group
      entries such as filter mode and source list, it makes sense to allow
      user space to atomically modify these attributes by replacing entries
      instead of forcing user space to delete the entries and add them back.
      
      Replace MDB port group entries when the 'NLM_F_REPLACE' flag is
      specified in the netlink message header.
      
      When a (*, G) entry is replaced, update the following attributes: Source
      list, state, filter mode, protocol and flags. If the entry is temporary
      and in EXCLUDE mode, reset the group timer to the group membership
      interval. If the entry is temporary and in INCLUDE mode, reset the
      source timers of associated sources to the group membership interval.
      
      Examples:
      
       # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.2 filter_mode include
       # bridge -d -s mdb show
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.2 permanent filter_mode include proto static     0.00
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto static     0.00
       dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode include source_list 192.0.2.2/0.00,192.0.2.1/0.00 proto static     0.00
      
       # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 permanent source_list 192.0.2.1,192.0.2.3 filter_mode exclude proto zebra
       # bridge -d -s mdb show
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 permanent filter_mode include proto zebra  blocked    0.00
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra  blocked    0.00
       dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude source_list 192.0.2.3/0.00,192.0.2.1/0.00 proto zebra     0.00
      
       # bridge mdb replace dev br0 port dummy10 grp 239.1.1.1 temp source_list 192.0.2.4,192.0.2.3 filter_mode include proto bgp
       # bridge -d -s mdb show
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.4 temp filter_mode include proto bgp     0.00
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.3 temp filter_mode include proto bgp     0.00
       dev br0 port dummy10 grp 239.1.1.1 temp filter_mode include source_list 192.0.2.4/259.44,192.0.2.3/259.44 proto bgp     0.00
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      61f21835
    • Ido Schimmel's avatar
      bridge: mcast: Allow user space to specify MDB entry routing protocol · 1d7b66a7
      Ido Schimmel authored
      Add the 'MDBE_ATTR_RTPORT' attribute to allow user space to specify the
      routing protocol of the MDB port group entry. Enforce a minimum value of
      'RTPROT_STATIC' to prevent user space from using protocol values that
      should only be set by the kernel (e.g., 'RTPROT_KERNEL'). Maintain
      backward compatibility by defaulting to 'RTPROT_STATIC'.
      
      The protocol is already visible to user space in RTM_NEWMDB responses
      and notifications via the 'MDBA_MDB_EATTR_RTPROT' attribute.
      
      The routing protocol allows a routing daemon to distinguish between
      entries configured by it and those configured by the administrator. Once
      MDB flush is supported, the protocol can be used as a criterion
      according to which the flush is performed.
      
      Examples:
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto kernel
       Error: integer out of range.
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 permanent proto static
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent proto zebra
      
       # bridge mdb add dev br0 port dummy10 grp 239.1.1.2 permanent source_list 198.51.100.1,198.51.100.2 filter_mode include proto 250
      
       # bridge -d mdb show
       dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.2 permanent filter_mode include proto 250
       dev br0 port dummy10 grp 239.1.1.2 src 198.51.100.1 permanent filter_mode include proto 250
       dev br0 port dummy10 grp 239.1.1.2 permanent filter_mode include source_list 198.51.100.2/0.00,198.51.100.1/0.00 proto 250
       dev br0 port dummy10 grp 239.1.1.1 src 192.0.2.1 permanent filter_mode include proto zebra
       dev br0 port dummy10 grp 239.1.1.1 permanent filter_mode exclude proto static
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1d7b66a7