1. 06 Feb, 2023 40 commits
    • David S. Miller's avatar
      Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · d78f8d83
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      net: implement devlink reload in ice
      
      Michal Swiatkowski says:
      
      This is a part of changes done in patchset [0]. Resource management is
      kind of controversial part, so I split it into two patchsets.
      
      It is the first one, covering refactor and implement reload API call.
      The refactor will unblock some of the patches needed by SIOV or
      subfunction.
      
      Most of this patchset is about implementing driver reload mechanism.
      Part of code from probe and rebuild is used to not duplicate code.
      To allow this reuse probe and rebuild path are split into smaller
      functions.
      
      Patch "ice: split ice_vsi_setup into smaller functions" changes
      boolean variable in function call to integer and adds define
      for it. Instead of having the function called with true/false now it
      can be called with readable defines ICE_VSI_FLAG_INIT or
      ICE_VSI_FLAG_NO_INIT. It was suggested by Jacob Keller and probably this
      mechanism will be implemented across ice driver in follow up patchset.
      
      Previously the code was reviewed here [0].
      
      [0] https://lore.kernel.org/netdev/Y3ckRWtAtZU1BdXm@unreal/T/#m3bb8feba0a62f9b4cd54cd94917b7e2143fc2ecd
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d78f8d83
    • Jesper Dangaard Brouer's avatar
      net: introduce skb_poison_list and use in kfree_skb_list · 9dde0cd3
      Jesper Dangaard Brouer authored
      First user of skb_poison_list is in kfree_skb_list_reason, to catch bugs
      earlier like introduced in commit eedade12 ("net: kfree_skb_list use
      kmem_cache_free_bulk"). For completeness mentioned bug have been fixed in
      commit f72ff8b8 ("net: fix kfree_skb_list use of skb_mark_not_on_list").
      
      In case of a bug like mentioned commit we would have seen OOPS with:
       general protection fault, probably for non-canonical address 0xdead000000000870
      And content of one the registers e.g. R13: dead000000000800
      
      In this case skb->len is at offset 112 bytes (0x70) why fault happens at
       0x800+0x70 = 0x870
      Signed-off-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9dde0cd3
    • David S. Miller's avatar
      Merge branch 'wangxun-interrupts' · 149e8fb0
      David S. Miller authored
      Jiawen Wu says:
      
      ====================
      Wangxun interrupt and RxTx support
      
      Configure interrupt, setup RxTx ring, support to receive and transmit
      packets.
      
      change log:
      v3:
      - Use upper_32_bits() to avoid compile warning.
      - Remove useless codes.
      v2:
      - Andrew Lunn: https://lore.kernel.org/netdev/Y86kDphvyHj21IxK@lunn.ch/
      - Add a judgment when allocate dma for descriptor.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      149e8fb0
    • Mengyuan Lou's avatar
      net: ngbe: Support Rx and Tx process path · b97f955e
      Mengyuan Lou authored
      Add enable and disable operation process for ngbe open/close.
      Clean Rx and Tx ring interrupts, process packets in the data path.
      Signed-off-by: default avatarMengyuan Lou <mengyuanlou@net-swift.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b97f955e
    • Jiawen Wu's avatar
      net: txgbe: Support Rx and Tx process path · 0d22be52
      Jiawen Wu authored
      Clean Rx and Tx ring interrupts, process packets in the data path.
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0d22be52
    • Mengyuan Lou's avatar
      net: libwx: Add tx path to process packets · 09a50880
      Mengyuan Lou authored
      Support to transmit packets without hardware features.
      Signed-off-by: default avatarMengyuan Lou <mengyuanlou@net-swift.com>
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      09a50880
    • Jiawen Wu's avatar
      net: libwx: Support to receive packets in NAPI · 3c47e8ae
      Jiawen Wu authored
      Clean all queues associated with a q_vector, to simple receive packets
      without hardware features.
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c47e8ae
    • Jiawen Wu's avatar
      net: txgbe: Setup Rx and Tx ring · 0ef7e159
      Jiawen Wu authored
      Improve the configuration of Rx and Tx ring, set Rx flags and implement
      ndo_set_rx_mode ops.
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0ef7e159
    • Jiawen Wu's avatar
      net: libwx: Allocate Rx and Tx resources · 850b9711
      Jiawen Wu authored
      Setup Rx and Tx descriptors for specefic rings.
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      850b9711
    • Jiawen Wu's avatar
      net: libwx: Configure Rx and Tx unit on hardware · 18b5b8a9
      Jiawen Wu authored
      Configure hardware for preparing to process packets. Including configure
      receive and transmit unit of the MAC layer, and setup the specific rings.
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      18b5b8a9
    • Jiawen Wu's avatar
      net: txgbe: Add interrupt support · 5d3ac705
      Jiawen Wu authored
      Determine proper interrupt scheme to enable and handle interrupt.
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5d3ac705
    • Mengyuan Lou's avatar
      net: ngbe: Add irqs request flow · e7956139
      Mengyuan Lou authored
      Add request_irq for tx/rx rings and misc other events.
      If the application is successful, config vertors for interrupts.
      Enable some base interrupts mask in ngbe_irq_enable.
      Signed-off-by: default avatarMengyuan Lou <mengyuanlou@net-swift.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7956139
    • Mengyuan Lou's avatar
      net: libwx: Add irq flow functions · 3f703186
      Mengyuan Lou authored
      Add irq flow functions for ngbe and txgbe.
      Alloc pcie msix irqs for drivers, otherwise fall back to msi/legacy.
      Signed-off-by: default avatarMengyuan Lou <mengyuanlou@net-swift.com>
      Signed-off-by: default avatarJiawen Wu <jiawenwu@trustnetic.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3f703186
    • Qingfang DENG's avatar
      net: page_pool: use in_softirq() instead · 542bcea4
      Qingfang DENG authored
      We use BH context only for synchronization, so we don't care if it's
      actually serving softirq or not.
      
      As a side node, in case of threaded NAPI, in_serving_softirq() will
      return false because it's in process context with BH off, making
      page_pool_recycle_in_cache() unreachable.
      Signed-off-by: default avatarQingfang DENG <qingfang.deng@siflower.com.cn>
      Tested-by: default avatarFelix Fietkau <nbd@nbd.name>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      542bcea4
    • David S. Miller's avatar
      Merge tag 'mlx5-updates-2023-02-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 637bc8f0
      David S. Miller authored
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2023-02-04
      
      This series provides misc updates to mlx5 driver:
      
      1) Trivial LAG code cleanup patches from Roi
      
      2) Rahul improves mlx5's documentation structure
      Separates the documentation into multiple pages related to different
      components in the device driver. Adds Kconfig parameters, devlink
      parameters, and tracepoints that were previously introduced but not added
      to the documentation. Introduces a new page on ethtool statistics counters
      with information about counters previously implemented in the mlx5_core
      driver but not documented in the kernel tree.
      
      3) From Raed, policy/state selector support for IPSec.
      
      4) From Fragos, add support for XDR speed in IPoIB mlx5 netdev
      
      5) Few more misc cleanups and trivial changes
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      637bc8f0
    • Parav Pandit's avatar
      virtio-net: Maintain reverse cleanup order · 27369c9c
      Parav Pandit authored
      To easily audit the code, better to keep the device stop()
      sequence to be mirror of the device open() sequence.
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarParav Pandit <parav@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27369c9c
    • David S. Miller's avatar
      Merge branch 'bridge-mdb-limit' · cb3086ce
      David S. Miller authored
      Petr Machata says:
      
      ====================
      bridge: Limit number of MDB entries per port, port-vlan
      
      The MDB maintained by the bridge is limited. When the bridge is configured
      for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
      capacity. In SW datapath, the capacity is configurable through the
      IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
      similar limit exists in the HW datapath for purposes of offloading.
      
      In order to prevent the issue of unilateral exhaustion of MDB resources,
      introduce two parameters in each of two contexts:
      
      - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
        per-port-VLAN number of MDB entries that the port is member in.
      
      - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
        per-port-VLAN maximum permitted number of MDB entries, or 0 for
        no limit.
      
      Per-port number of entries keeps track of the total number of MDB entries
      configured on a given port. The per-port-VLAN value then keeps track of the
      subset of MDB entries configured specifically for the given VLAN, on that
      port. The number is adjusted as port_groups are created and deleted, and
      therefore under multicast lock.
      
      A maximum value, if non-zero, then places a limit on the number of entries
      that can be configured in a given context. Attempts to add entries above
      the maximum are rejected.
      
      Rejection reason of netlink-based requests to add MDB entries is
      communicated through extack. This channel is unavailable for rejections
      triggered from the control path. To address this lack of visibility, the
      patchset adds a tracepoint, bridge:br_mdb_full:
      
      	# perf record -e bridge:br_mdb_full &
      	# [...]
      	# perf script | cut -d: -f4-
      	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 0
      	 dev v2 af 10 src :: grp ff0e::112/00:00:00:00:00:00 vid 0
      	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 10
      	 dev v2 af 10 src 2001:db8:1::1 grp ff0e::1/00:00:00:00:00:00 vid 10
      	 dev v2 af 2 src ::ffff:192.0.2.1 grp ::ffff:239.1.1.1/00:00:00:00:00:00 vid 10
      
      Another option to consume the tracepoint is e.g. through the bpftrace tool:
      
      	# bpftrace -e ' tracepoint:bridge:br_mdb_full /args->af != 0/ {
      			    printf("dev %s src %s grp %s vid %u\n",
      				   str(args->dev), ntop(args->src),
      				   ntop(args->grp), args->vid);
      			}
      			tracepoint:bridge:br_mdb_full /args->af == 0/ {
      			    printf("dev %s grp %s vid %u\n",
      				   str(args->dev),
      				   macaddr(args->grpmac), args->vid);
      			}'
      
      This tracepoint is triggered for mcast_hash_max exhaustions as well.
      
      The following is an example of how the feature is used. A more extensive
      example is available in patch #8:
      
      	# bridge vlan set dev v1 vid 1 mcast_max_groups 1
      	# bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
      	# bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
      	Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1.
      
      The patchset progresses as follows:
      
      - In patch #1, set strict_start_type at two bridge-related policies. The
        reason is we are adding a new attribute to one of these, and want the new
        attribute to be parsed strictly. The other was adjusted for completeness'
        sake.
      
      - In patches #2 to #5, br_mdb and br_multicast code is adjusted to make the
        following additions smoother.
      
      - In patch #6, add the tracepoint.
      
      - In patch #7, the code to maintain number of MDB entries is added as
        struct net_bridge_mcast_port::mdb_n_entries. The maximum is added, too,
        as struct net_bridge_mcast_port::mdb_max_entries, however at this point
        there is no way to set the value yet, and since 0 is treated as "no
        limit", the functionality doesn't change at this point. Note however,
        that mcast_hash_max violations already do trigger at this point.
      
      - In patch #8, netlink plumbing is added: reading of number of entries, and
        reading and writing of maximum.
      
        The per-port values are passed through RTM_NEWLINK / RTM_GETLINK messages
        in IFLA_BRPORT_MCAST_N_GROUPS and _MAX_GROUPS, inside IFLA_PROTINFO nest.
      
        The per-port-vlan values are passed through RTM_GETVLAN / RTM_NEWVLAN
        messages in BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS, _MAX_GROUPS, inside
        BRIDGE_VLANDB_ENTRY.
      
      The following patches deal with the selftest:
      
      - Patches #9 and #10 clean up and move around some selftest code.
      
      - Patches #11 to #14 add helpers and generalize the existing IGMP / MLD
        support to allow generating packets with configurable group addresses and
        varying source lists for (S,G) memberships.
      
      - Patch #15 adds code to generate IGMP leave and MLD done packets.
      
      - Patch #16 finally adds the selftest itself.
      
      v3:
      - Patch #7:
          - Access mdb_max_/_n_entries through READ_/WRITE_ONCE
          - Move extack setting to br_multicast_port_ngroups_inc_one().
            Since we use NL_SET_ERR_MSG_FMT_MOD, the correct context
            (port / port-vlan) can be passed through an argument.
            This also removes the need for more READ/WRITE_ONCE's
            at the extack-setting site.
      - Patch #8:
          - Move the br_multicast_port_ctx_vlan_disabled() check
            out to the _vlan_ helpers callers. Thus these helpers
            cannot fail, which makes them very similar to the
            _port_ helpers. Have them take the MC context directly
            and unify them.
      
      v2:
      - Cover letter:
          - Add an example of a bpftrace-based probe script
      - Patch #6:
          - Report IPv4 as an IPv6-mapped address through the IPv6 buffer
            as well, to save ring buffer space.
      - Patch #7:
          - In br_multicast_port_ngroups_inc_one(), bounce
            if n>=max, not if n==max
          - Adjust extack messages to mention ngroups, now
            that the bounces appear when n>=max, not n==max
          - In __br_multicast_enable_port_ctx(), do not reset
            max to 0. Also do not count number of entries by
            going through _inc, as that would end up incorrectly
            bouncing the entries.
      - Patch #8:
          - Drop locks around accesses in
            br_multicast_{port,vlan}_ngroups_{get,set_max}(),
          - Drop bounces due to max<n in
            br_multicast_{port,vlan}_ngroups_set_max().
      - Patch #12:
          - In the comment at payload_template_calc_checksum(),
            s/%#02x/%02x/, that's the mausezahn payload format.
      - Patch #16:
          - Adjust the tests that check setting max below n and
            reset of max on VLAN snooping enablement
          - Make test naming uniform
          - Enable testing of control path (IGMP/MLD) in
            mcast_vlan_snooping bridge
          - Reorganize the code so that test instances (per bridge
            type and configuration type) always come right after
            the test, in order of {d,q,qvs}{4,6}{cfg,ctl}.
            Then groups of selftests are at the end of the file.
            Similarly adjust invocation order of the tests.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cb3086ce
    • Petr Machata's avatar
      selftests: forwarding: bridge_mdb_max: Add a new selftest · 3446dcd7
      Petr Machata authored
      Add a suite covering mcast_n_groups and mcast_max_groups bridge features.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3446dcd7
    • Petr Machata's avatar
      selftests: forwarding: lib: Add helpers to build IGMP/MLD leave packets · 9ae85469
      Petr Machata authored
      The testsuite that checks for mcast_max_groups functionality will need to
      wipe the added groups as well. Add helpers to build an IGMP or MLD packets
      announcing that host is leaving a given group.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ae85469
    • Petr Machata's avatar
      selftests: forwarding: lib: Allow list of IPs for IGMPv3/MLDv2 · 705d4bc7
      Petr Machata authored
      The testsuite that checks for mcast_max_groups functionality will need
      to generate IGMP and MLD packets with configurable number of (S,G)
      addresses. To that end, further extend igmpv3_is_in_get() and
      mldv2_is_in_get() to allow a list of IP addresses instead of one
      address.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      705d4bc7
    • Petr Machata's avatar
      selftests: forwarding: lib: Parameterize IGMPv3/MLDv2 generation · 506a1ac9
      Petr Machata authored
      In order to generate IGMPv3 and MLDv2 packets on the fly, the
      functions that generate these packets need to be able to generate
      packets for different groups and different sources. Generating MLDv2
      packets further needs the source address of the packet for purposes of
      checksum calculation. Add the necessary parameters, and generate the
      payload accordingly by dispatching to helpers added in the previous
      patches.
      
      Adjust the sole client, bridge_mdb.sh, as well.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      506a1ac9
    • Petr Machata's avatar
      selftests: forwarding: lib: Add helpers for checksum handling · 952e0ee3
      Petr Machata authored
      In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
      helpers to calculate the packet checksum.
      
      The approach presented in this patch revolves around payload templates
      for mausezahn. These are mausezahn-like payload strings (01:23:45:...)
      with possibly one 2-byte sequence replaced with the word PAYLOAD. The
      main function is payload_template_calc_checksum(), which calculates
      RFC 1071 checksum of the message. There are further helpers to then
      convert the checksum to the payload format, and to expand it.
      
      For IPv6, MLDv2 message checksum is computed using a pseudoheader that
      differs from the header used in the payload itself. The fact that the
      two messages are different means that the checksum needs to be
      returned as a separate quantity, instead of being expanded in-place in
      the payload itself. Furthermore, the pseudoheader includes a length of
      the message. Much like the checksum, this needs to be expanded in
      mausezahn format. And likewise for number of addresses for (S,G)
      entries. Thus we have several places where a computed quantity needs
      to be presented in the payload format. Add a helper u16_to_bytes(),
      which will be used in all these cases.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      952e0ee3
    • Petr Machata's avatar
      selftests: forwarding: lib: Add helpers for IP address handling · fcf49276
      Petr Machata authored
      In order to generate IGMPv3 and MLDv2 packets on the fly, we will need
      helpers to expand IPv4 and IPv6 addresses given as parameters in
      mausezahn payload notation. Add helpers that do it.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fcf49276
    • Petr Machata's avatar
      selftests: forwarding: bridge_mdb: Fix a typo · f7ccf60c
      Petr Machata authored
      Add the letter missing from the word "INCLUDE".
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f7ccf60c
    • Petr Machata's avatar
      selftests: forwarding: Move IGMP- and MLD-related functions to lib · 344dd2c9
      Petr Machata authored
      These functions will be helpful for other testsuites as well. Extract them
      to a common place.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      344dd2c9
    • Petr Machata's avatar
      net: bridge: Add netlink knobs for number / maximum MDB entries · a1aee20d
      Petr Machata authored
      The previous patch added accounting for number of MDB entries per port and
      per port-VLAN, and the logic to verify that these values stay within
      configured bounds. However it didn't provide means to actually configure
      those bounds or read the occupancy. This patch does that.
      
      Two new netlink attributes are added for the MDB occupancy:
      IFLA_BRPORT_MCAST_N_GROUPS for the per-port occupancy and
      BRIDGE_VLANDB_ENTRY_MCAST_N_GROUPS for the per-port-VLAN occupancy.
      And another two for the maximum number of MDB entries:
      IFLA_BRPORT_MCAST_MAX_GROUPS for the per-port maximum, and
      BRIDGE_VLANDB_ENTRY_MCAST_MAX_GROUPS for the per-port-VLAN one.
      
      Note that the two new IFLA_BRPORT_ attributes prompt bumping of
      RTNL_SLAVE_MAX_TYPE to size the slave attribute tables large enough.
      
      The new attributes are used like this:
      
       # ip link add name br up type bridge vlan_filtering 1 mcast_snooping 1 \
                                            mcast_vlan_snooping 1 mcast_querier 1
       # ip link set dev v1 master br
       # bridge vlan add dev v1 vid 2
      
       # bridge vlan set dev v1 vid 1 mcast_max_groups 1
       # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 1
       # bridge mdb add dev br port v1 grp 230.1.2.4 temp vid 1
       Error: bridge: Port-VLAN is already in 1 groups, and mcast_max_groups=1.
      
       # bridge link set dev v1 mcast_max_groups 1
       # bridge mdb add dev br port v1 grp 230.1.2.3 temp vid 2
       Error: bridge: Port is already in 1 groups, and mcast_max_groups=1.
      
       # bridge -d link show
       5: v1@v2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br [...]
           [...] mcast_n_groups 1 mcast_max_groups 1
      
       # bridge -d vlan show
       port              vlan-id
       br                1 PVID Egress Untagged
                           state forwarding mcast_router 1
       v1                1 PVID Egress Untagged
                           [...] mcast_n_groups 1 mcast_max_groups 1
                         2
                           [...] mcast_n_groups 0 mcast_max_groups 0
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a1aee20d
    • Petr Machata's avatar
      net: bridge: Maintain number of MDB entries in net_bridge_mcast_port · b57e8d87
      Petr Machata authored
      The MDB maintained by the bridge is limited. When the bridge is configured
      for IGMP / MLD snooping, a buggy or malicious client can easily exhaust its
      capacity. In SW datapath, the capacity is configurable through the
      IFLA_BR_MCAST_HASH_MAX parameter, but ultimately is finite. Obviously a
      similar limit exists in the HW datapath for purposes of offloading.
      
      In order to prevent the issue of unilateral exhaustion of MDB resources,
      introduce two parameters in each of two contexts:
      
      - Per-port and per-port-VLAN number of MDB entries that the port
        is member in.
      
      - Per-port and (when BROPT_MCAST_VLAN_SNOOPING_ENABLED is enabled)
        per-port-VLAN maximum permitted number of MDB entries, or 0 for
        no limit.
      
      The per-port multicast context is used for tracking of MDB entries for the
      port as a whole. This is available for all bridges.
      
      The per-port-VLAN multicast context is then only available on
      VLAN-filtering bridges on VLANs that have multicast snooping on.
      
      With these changes in place, it will be possible to configure MDB limit for
      bridge as a whole, or any one port as a whole, or any single port-VLAN.
      
      Note that unlike the global limit, exhaustion of the per-port and
      per-port-VLAN maximums does not cause disablement of multicast snooping.
      It is also permitted to configure the local limit larger than hash_max,
      even though that is not useful.
      
      In this patch, introduce only the accounting for number of entries, and the
      max field itself, but not the means to toggle the max. The next patch
      introduces the netlink APIs to toggle and read the values.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b57e8d87
    • Petr Machata's avatar
      net: bridge: Add a tracepoint for MDB overflows · d47230a3
      Petr Machata authored
      The following patch will add two more maximum MDB allowances to the global
      one, mcast_hash_max, that exists today. In all these cases, attempts to add
      MDB entries above the configured maximums through netlink, fail noisily and
      obviously. Such visibility is missing when adding entries through the
      control plane traffic, by IGMP or MLD packets.
      
      To improve visibility in those cases, add a trace point that reports the
      violation, including the relevant netdevice (be it a slave or the bridge
      itself), and the MDB entry parameters:
      
      	# perf record -e bridge:br_mdb_full &
      	# [...]
      	# perf script | cut -d: -f4-
      	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 0
      	 dev v2 af 10 src :: grp ff0e::112/00:00:00:00:00:00 vid 0
      	 dev v2 af 2 src ::ffff:0.0.0.0 grp ::ffff:239.1.1.112/00:00:00:00:00:00 vid 10
      	 dev v2 af 10 src 2001:db8:1::1 grp ff0e::1/00:00:00:00:00:00 vid 10
      	 dev v2 af 2 src ::ffff:192.0.2.1 grp ::ffff:239.1.1.1/00:00:00:00:00:00 vid 10
      
      CC: Steven Rostedt <rostedt@goodmis.org>
      CC: linux-trace-kernel@vger.kernel.org
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarSteven Rostedt (Google) <rostedt@goodmis.org>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d47230a3
    • Petr Machata's avatar
      net: bridge: Change a cleanup in br_multicast_new_port_group() to goto · eceb3085
      Petr Machata authored
      This function is getting more to clean up in the following patches.
      Structuring the cleanups in one labeled block will allow reusing the same
      cleanup from several places.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eceb3085
    • Petr Machata's avatar
      net: bridge: Add br_multicast_del_port_group() · 976b3858
      Petr Machata authored
      Since cleaning up the effects of br_multicast_new_port_group() just
      consists of delisting and freeing the memory, the function
      br_mdb_add_group_star_g() inlines the corresponding code. In the following
      patches, number of per-port and per-port-VLAN MDB entries is going to be
      maintained, and that counter will have to be updated. Because that logic
      is going to be hidden in the br_multicast module, introduce a new hook
      intended to again remove a newly-created group.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      976b3858
    • Petr Machata's avatar
      net: bridge: Move extack-setting to br_multicast_new_port_group() · 1c85b80b
      Petr Machata authored
      Now that br_multicast_new_port_group() takes an extack argument, move
      setting the extack there. The downside is that the error messages end
      up being less specific (the function cannot distinguish between (S,G)
      and (*,G) groups). However, the alternative is to check in the caller
      whether the callee set the extack, and if it didn't, set it. But that
      is only done when the callee is not exactly known. (E.g. in case of a
      notifier invocation.)
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1c85b80b
    • Petr Machata's avatar
      net: bridge: Add extack to br_multicast_new_port_group() · 60977a0c
      Petr Machata authored
      Make it possible to set an extack in br_multicast_new_port_group().
      Eventually, this function will check for per-port and per-port-vlan
      MDB maximums, and will use the extack to communicate the reason for
      the bounce.
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      60977a0c
    • Petr Machata's avatar
      net: bridge: Set strict_start_type at two policies · c00041cf
      Petr Machata authored
      Make any attributes newly-added to br_port_policy or vlan_tunnel_policy
      parsed strictly, to prevent userspace from passing garbage. Note that this
      patchset only touches the former policy. The latter was adjusted for
      completeness' sake. There do not appear to be other _deprecated calls
      with non-NULL policies.
      Suggested-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarNikolay Aleksandrov <razor@blackwall.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c00041cf
    • David S. Miller's avatar
      Merge branch 'sparx5-PSFP-support' · 8b7018fa
      David S. Miller authored
      Daniel Machon says:
      
      ====================
      net: Add support for PSFP in Sparx5
      
      ================================================================================
      Add support for Per-Stream Filtering and Policing (802.1Q-2018, 8.6.5.1).
      ================================================================================
      
      The VCAP CLM (VCAP IS0 ingress classifier) classifies streams,
      identified by ISDX (Ingress Service Index, frame metadata), and maps
      ISDX to streams.
      
      Flow meters are also classified by ISDX, and implemented using service
      policers (Service Dual Leacky Buckets, SDLB). Leacky buckets are linked
      together in a leak chain of a leak group. Leak groups a preconfigured to serve
      buckets within a certain rate interval.
      
      Stream gates are time-based policers used by PSFP. Frames are dropped
      based on the gate state (OPEN/ CLOSE), whose state will be altered based
      on the Gate Control List (GCL) and current PTP time. Apart from
      time-based policing, stream gates can alter egress queue selection for
      the frames that pass through the Gate. This is done through Internal
      Priority Selector (IPS). Stream gates are mapped from stream filters.
      
      Support for tc actions gate and police, have been added to the VCAP IS0 set of
      supported actions.
      
      Examples:
      
      // tc filter with gate action
      $ tc filter add dev eth1 ingress chain 1100000 prio 1 handle 1001 protocol \
      802.1q flower skip_sw vlan_id 100 action gate base-time 0 sched-entry open \
      700000 7 8m sched-entry close 300000 action goto chain 1200000
      
      // tc filter with police action
      $ tc filter add dev eth1 ingress chain 1100000 prio 1 handle 1002 protocol \
      802.1q flower skip_sw vlan_id 100 action police rate 1gbit burst 8096      \
      conform-exceed drop action goto chain 1200000
      
      ================================================================================
      Patches
      ================================================================================
      Patch #1:  Adds new register needed for PSFP.
      Patch #2:  Adds resource pools to control PSFP needed chip resources.
      Patch #3:  Adds support for SDLB's needed for flow-meters.
      Patch #4:  Adds support for service policers.
      Patch #5:  Adds support for PSFP flow-meters, using service policers.
      Patch #6:  Adds a new function to calculate basetime, required by flow-meters.
      Patch #7:  Adds support for PSFP stream gates.
      Patch #8:  Adds support for PSFP stream filters.
      Patch #9:  Adds a function to initialize flow-meters, stream gates and stream
                 filters.
      Patch #10: Adds the required flower code to configure PSFP using the tc command.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8b7018fa
    • Daniel Machon's avatar
      sparx5: add support for configuring PSFP via tc · 6ebf182b
      Daniel Machon authored
      Add support for tc actions gate and police, in order to implement
      support for configuring PSFP through tc.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6ebf182b
    • Daniel Machon's avatar
      net: microchip: sparx5: initialize PSFP · e116b19d
      Daniel Machon authored
      Initialize the SDLB's, stream gates and stream filters.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e116b19d
    • Daniel Machon's avatar
      net: microchip: sparx5: add support for PSFP stream filters · ae3e691f
      Daniel Machon authored
      Add support for configuring PSFP stream filters (IEEE 802.1Q-2018,
      8.6.5.1.1).
      
      The VCAP CLM (VCAP IS0 ingress classifier) classifies streams,
      identified by ISDX (Ingress Service Index, frame metadata), and maps
      ISDX to streams.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ae3e691f
    • Daniel Machon's avatar
      net: microchip: sparx5: add support for PSFP stream gates · c70a5e2c
      Daniel Machon authored
      Add support for configuring PSFP stream gates (IEEE 802.1Q-2018,
      8.6.5.1.2).
      
      Stream gates are time-based policers used by PSFP. Frames are dropped
      based on the gate state (OPEN/ CLOSE), whose state will be altered based
      on the Gate Control List (GCL) and current PTP time. Apart from
      time-based policing, stream gates can alter egress queue selection for
      the frames that pass through the Gate. This is done through Internal
      Priority Selector (IPS). Stream gates are mapped from stream filters.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c70a5e2c
    • Daniel Machon's avatar
      net: microchip: sparx5: add function for calculating PTP basetime · 9e02131e
      Daniel Machon authored
      Add a new function for calculating PTP basetime, required by the stream
      gate scheduler to calculate gate state (open / close).
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9e02131e
    • Daniel Machon's avatar
      net: microchip: sparx5: add support for PSFP flow-meters · d2185e79
      Daniel Machon authored
      Add support for configuring PSFP flow-meters (IEEE 802.1Q-2018,
      8.6.5.1.3).
      
      The VCAP CLM (VCAP IS0 ingress classifier) classifies streams,
      identified by ISDX (Ingress Service Index, frame metadata), and maps
      ISDX to flow-meters. SDLB's provide the flow-meter parameters.
      Signed-off-by: default avatarDaniel Machon <daniel.machon@microchip.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d2185e79