1. 13 Feb, 2021 40 commits
    • Alexander Lobakin's avatar
      skbuff: introduce {,__}napi_build_skb() which reuses NAPI cache heads · f450d539
      Alexander Lobakin authored
      Instead of just bulk-flushing skbuff_heads queued up through
      napi_consume_skb() or __kfree_skb_defer(), try to reuse them
      on allocation path.
      If the cache is empty on allocation, bulk-allocate the first
      16 elements, which is more efficient than per-skb allocation.
      If the cache is full on freeing, bulk-wipe the second half of
      the cache (32 elements).
      This also includes custom KASAN poisoning/unpoisoning to be
      double sure there are no use-after-free cases.
      
      To not change current behaviour, introduce a new function,
      napi_build_skb(), to optionally use a new approach later
      in drivers.
      
      Note on selected bulk size, 16:
       - this equals to XDP_BULK_QUEUE_SIZE, DEV_MAP_BULK_SIZE
         and especially VETH_XDP_BATCH, which is also used to
         bulk-allocate skbuff_heads and was tested on powerful
         setups;
       - this also showed the best performance in the actual
         test series (from the array of {8, 16, 32}).
      
      Suggested-by: Edward Cree <ecree.xilinx@gmail.com> # Divide on two halves
      Suggested-by: Eric Dumazet <edumazet@google.com>   # KASAN poisoning
      Cc: Dmitry Vyukov <dvyukov@google.com>             # Help with KASAN
      Cc: Paolo Abeni <pabeni@redhat.com>                # Reduced batch size
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f450d539
    • Alexander Lobakin's avatar
      skbuff: move NAPI cache declarations upper in the file · 50fad4b5
      Alexander Lobakin authored
      NAPI cache structures will be used for allocating skbuff_heads,
      so move their declarations a bit upper.
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      50fad4b5
    • Alexander Lobakin's avatar
      skbuff: remove __kfree_skb_flush() · fec6e49b
      Alexander Lobakin authored
      This function isn't much needed as NAPI skb queue gets bulk-freed
      anyway when there's no more room, and even may reduce the efficiency
      of bulk operations.
      It will be even less needed after reusing skb cache on allocation path,
      so remove it and this way lighten network softirqs a bit.
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fec6e49b
    • Alexander Lobakin's avatar
      skbuff: use __build_skb_around() in __alloc_skb() · f9d6725b
      Alexander Lobakin authored
      Just call __build_skb_around() instead of open-coding it.
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f9d6725b
    • Alexander Lobakin's avatar
      skbuff: simplify __alloc_skb() a bit · df1ae022
      Alexander Lobakin authored
      Use unlikely() annotations for skbuff_head and data similarly to the
      two other allocation functions and remove totally redundant goto.
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      df1ae022
    • Alexander Lobakin's avatar
      skbuff: make __build_skb_around() return void · 483126b3
      Alexander Lobakin authored
      __build_skb_around() can never fail and always returns passed skb.
      Make it return void to simplify and optimize the code.
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      483126b3
    • Alexander Lobakin's avatar
      skbuff: simplify kmalloc_reserve() · ef28095f
      Alexander Lobakin authored
      Eversince the introduction of __kmalloc_reserve(), "ip" argument
      hasn't been used. _RET_IP_ is embedded inside
      kmalloc_node_track_caller().
      Remove the redundant macro and rename the function after it.
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ef28095f
    • Alexander Lobakin's avatar
      skbuff: move __alloc_skb() next to the other skb allocation functions · 5381b23d
      Alexander Lobakin authored
      In preparation before reusing several functions in all three skb
      allocation variants, move __alloc_skb() next to the
      __netdev_alloc_skb() and __napi_alloc_skb().
      No functional changes.
      Signed-off-by: default avatarAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5381b23d
    • David S. Miller's avatar
      Merge branch 'Xilinx-axienet-updates' · 773dc50d
      David S. Miller authored
      Robert Hancock says:
      
      ====================
      Xilinx axienet updates
      
      Updates to the Xilinx AXI Ethernet driver to add support for an additional
      ethtool operation, and to support dynamic switching between 1000BaseX and
      SGMII interface modes.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      773dc50d
    • Robert Hancock's avatar
      net: axienet: Support dynamic switching between 1000BaseX and SGMII · 6c8f06bb
      Robert Hancock authored
      Newer versions of the Xilinx AXI Ethernet core (specifically version 7.2 or
      later) allow the core to be configured with a PHY interface mode of "Both",
      allowing either 1000BaseX or SGMII modes to be selected at runtime. Add
      support for this in the driver to allow better support for applications
      which can use both fiber and copper SFP modules.
      Signed-off-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6c8f06bb
    • Robert Hancock's avatar
      dt-bindings: net: xilinx_axienet: add xlnx,switch-x-sgmii attribute · eceac9d2
      Robert Hancock authored
      Document the new xlnx,switch-x-sgmii attribute which is used to indicate
      that the Ethernet core supports dynamic switching between 1000BaseX and
      SGMII.
      Signed-off-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eceac9d2
    • Robert Hancock's avatar
      net: axienet: hook up nway_reset ethtool operation · 66b51663
      Robert Hancock authored
      Hook up the nway_reset ethtool operation to the corresponding phylink
      function so that "ethtool -r" can be supported.
      Signed-off-by: default avatarRobert Hancock <robert.hancock@calian.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      66b51663
    • David S. Miller's avatar
      Merge branch 'tcp-mem-pressure-vs-SO_RCVLOWAT' · 762d17b9
      David S. Miller authored
      Eric Dumazet says:
      
      ====================
      tcp: mem pressure vs SO_RCVLOWAT
      
      First patch fixes an issue for applications using SO_RCVLOWAT
      to reduce context switches.
      
      Second patch is a cleanup.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      762d17b9
    • Eric Dumazet's avatar
      tcp: factorize logic into tcp_epollin_ready() · 05dc72ab
      Eric Dumazet authored
      Both tcp_data_ready() and tcp_stream_is_readable() share the same logic.
      
      Add tcp_epollin_ready() helper to avoid duplication.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Arjun Roy <arjunroy@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05dc72ab
    • Eric Dumazet's avatar
      tcp: fix SO_RCVLOWAT related hangs under mem pressure · f969dc5a
      Eric Dumazet authored
      While commit 24adbc16 ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
      fixed an issue vs too small sk_rcvbuf for given sk_rcvlowat constraint,
      it missed to address issue caused by memory pressure.
      
      1) If we are under memory pressure and socket receive queue is empty.
      First incoming packet is allowed to be queued, after commit
      76dfa608 ("tcp: allow one skb to be received per socket under memory pressure")
      
      But we do not send EPOLLIN yet, in case tcp_data_ready() sees sk_rcvlowat
      is bigger than skb length.
      
      2) Then, when next packet comes, it is dropped, and we directly
      call sk->sk_data_ready().
      
      3) If application is using poll(), tcp_poll() will then use
      tcp_stream_is_readable() and decide the socket receive queue is
      not yet filled, so nothing will happen.
      
      Even when sender retransmits packets, phases 2) & 3) repeat
      and flow is effectively frozen, until memory pressure is off.
      
      Fix is to consider tcp_under_memory_pressure() to take care
      of global memory pressure or memcg pressure.
      
      Fixes: 24adbc16 ("tcp: fix SO_RCVLOWAT hangs with fat skbs")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarArjun Roy <arjunroy@google.com>
      Suggested-by: default avatarWei Wang <weiwan@google.com>
      Reviewed-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f969dc5a
    • David S. Miller's avatar
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue · 5cdaf9d6
      David S. Miller authored
      Tony Nguyen says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2021-02-12
      
      This series contains updates to i40e, ice, and ixgbe drivers.
      
      Maciej does cleanups on the following drivers.
      For i40e, removes redundant check for XDP prog, cleans up no longer
      relevant information, and removes an unused function argument.
      For ice, removes local variable use, instead returning values directly.
      Moves skb pointer from buffer to ring and removes an unneeded check for
      xdp_prog in zero copy path. Also removes a redundant MTU check when
      changing it.
      For i40e, ice, and ixgbe, stores the rx_offset in the Rx ring as
      the value is constant so there's no need for continual calls.
      
      Bjorn folds a decrement into a while statement.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5cdaf9d6
    • David S. Miller's avatar
      Merge branch 'tc-mpls-selftests' · 7aceeb73
      David S. Miller authored
      Guillaume Nault says:
      
      ====================
      selftests: tc: Test tc-flower's MPLS features
      
      A couple of patches for exercising the MPLS filters of tc-flower.
      
      Patch 1 tests basic MPLS matching features: those that only work on the
      first label stack entry (that is, the mpls_label, mpls_tc, mpls_bos and
      mpls_ttl options).
      
      Patch 2 tests the more generic "mpls" and "lse" options, which allow
      matching MPLS fields beyond the first stack entry.
      
      In both patches, special care is taken to skip these new tests for
      incompatible versions of tc.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7aceeb73
    • Guillaume Nault's avatar
      selftests: tc: Add generic mpls matching support for tc-flower · c09bfd9a
      Guillaume Nault authored
      Add tests in tc_flower.sh for generic matching on MPLS Label Stack
      Entries. The label, tc, bos and ttl fields are tested for the first
      and second labels. For each field, the minimal and maximal values are
      tested (the former at depth 1 and the later at depth 2).
      There are also tests for matching the presence of a label stack entry
      at a given depth.
      
      In order to reduce the amount of code, all "lse" subcommands are tested
      in match_mpls_lse_test(). Action "continue" is used, so that test
      packets are evaluated by all filters. Then, we can verify if each
      filter matched the expected number of packets.
      
      Some versions of tc-flower produced invalid json output when dumping
      MPLS filters with depth > 1. Skip the test if tc isn't recent enough.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c09bfd9a
    • Guillaume Nault's avatar
      selftests: tc: Add basic mpls_* matching support for tc-flower · 203ee5cd
      Guillaume Nault authored
      Add tests in tc_flower.sh for mpls_label, mpls_tc, mpls_bos and
      mpls_ttl. For each keyword, test the minimal and maximal values.
      
      Selectively skip these new mpls tests for tc versions that don't
      support them.
      Signed-off-by: default avatarGuillaume Nault <gnault@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      203ee5cd
    • David S. Miller's avatar
      Merge branch 'brport-flags' · 4098ced4
      David S. Miller authored
      Vladimir Oltean says:
      
      ====================
      Cleanup in brport flags switchdev offload for DSA
      
      The initial goal of this series was to have better support for
      standalone ports mode on the DSA drivers like ocelot/felix and sja1105.
      This turned out to require some API adjustments in both directions:
      to the information presented to and by the switchdev notifier, and to
      the API presented to the switch drivers by the DSA layer.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4098ced4
    • Vladimir Oltean's avatar
      net: dsa: sja1105: offload bridge port flags to device · 4d942354
      Vladimir Oltean authored
      The chip can configure unicast flooding, broadcast flooding and learning.
      Learning is per port, while flooding is per {ingress, egress} port pair
      and we need to configure the same value for all possible ingress ports
      towards the requested one.
      
      While multicast flooding is not officially supported, we can hack it by
      using a feature of the second generation (P/Q/R/S) devices, which is that
      FDB entries are maskable, and multicast addresses always have an odd
      first octet. So by putting a match-all for 00:01:00:00:00:00 addr and
      00:01:00:00:00:00 mask at the end of the FDB, we make sure that it is
      always checked last, and does not take precedence in front of any other
      MDB. So it behaves effectively as an unknown multicast entry.
      
      For the first generation switches, this feature is not available, so
      unknown multicast will always be treated the same as unknown unicast.
      So the only thing we can do is request the user to offload the settings
      for these 2 flags in tandem, i.e.
      
      ip link set swp2 type bridge_slave flood off
      Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
      ip link set swp2 type bridge_slave flood off mcast_flood off
      ip link set swp2 type bridge_slave mcast_flood on
      Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4d942354
    • Vladimir Oltean's avatar
      net: mscc: ocelot: offload bridge port flags to device · 421741ea
      Vladimir Oltean authored
      We should not be unconditionally enabling address learning, since doing
      that is actively detrimential when a port is standalone and not offloading
      a bridge. Namely, if a port in the switch is standalone and others are
      offloading the bridge, then we could enter a situation where we learn an
      address towards the standalone port, but the bridged ports could not
      forward the packet there, because the CPU is the only path between the
      standalone and the bridged ports. The solution of course is to not
      enable address learning unless the bridge asks for it.
      
      We need to set up the initial port flags for no learning and flooding
      everything, and also when the port joins and leaves the bridge.
      The flood configuration was already configured ok for standalone mode
      in ocelot_init, we just need to disable learning in ocelot_init_port.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarAlexandre Belloni <alexandre.belloni@bootlin.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      421741ea
    • Vladimir Oltean's avatar
      net: mscc: ocelot: use separate flooding PGID for broadcast · b360d94f
      Vladimir Oltean authored
      In preparation of offloading the bridge port flags which have
      independent settings for unknown multicast and for broadcast, we should
      also start reserving one destination Port Group ID for the flooding of
      broadcast packets, to allow configuring it individually.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b360d94f
    • Vladimir Oltean's avatar
      net: dsa: felix: restore multicast flood to CPU when NPI tagger reinitializes · 6edb9e8d
      Vladimir Oltean authored
      ocelot_init sets up PGID_MC to include the CPU port module, and that is
      fine, but the ocelot-8021q tagger removes the CPU port module from the
      unknown multicast replicator. So after a transition from the default
      ocelot tagger towards ocelot-8021q and then again towards ocelot,
      multicast flooding towards the CPU port module will be disabled.
      
      Fixes: e21268ef ("net: dsa: felix: perform switch setup for tag_8021q")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6edb9e8d
    • Vladimir Oltean's avatar
      net: dsa: act as passthrough for bridge port flags · a8b659e7
      Vladimir Oltean authored
      There are multiple ways in which a PORT_BRIDGE_FLAGS attribute can be
      expressed by the bridge through switchdev, and not all of them can be
      emulated by DSA mid-layer API at the same time.
      
      One possible configuration is when the bridge offloads the port flags
      using a mask that has a single bit set - therefore only one feature
      should change. However, DSA currently groups together unicast and
      multicast flooding in the .port_egress_floods method, which limits our
      options when we try to add support for turning off broadcast flooding:
      do we extend .port_egress_floods with a third parameter which b53 and
      mv88e6xxx will ignore? But that means that the DSA layer, which
      currently implements the PRE_BRIDGE_FLAGS attribute all by itself, will
      see that .port_egress_floods is implemented, and will report that all 3
      types of flooding are supported - not necessarily true.
      
      Another configuration is when the user specifies more than one flag at
      the same time, in the same netlink message. If we were to create one
      individual function per offloadable bridge port flag, we would limit the
      expressiveness of the switch driver of refusing certain combinations of
      flag values. For example, a switch may not have an explicit knob for
      flooding of unknown multicast, just for flooding in general. In that
      case, the only correct thing to do is to allow changes to BR_FLOOD and
      BR_MCAST_FLOOD in tandem, and never allow mismatched values. But having
      a separate .port_set_unicast_flood and .port_set_multicast_flood would
      not allow the driver to possibly reject that.
      
      Also, DSA doesn't consider it necessary to inform the driver that a
      SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
      just calls .port_egress_floods for the CPU port. When we'll add support
      for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
      problem because the flood settings will need to be held statefully in
      the DSA middle layer, otherwise changing the mrouter port attribute will
      impact the flooding attribute. And that's _assuming_ that the underlying
      hardware doesn't have anything else to do when a multicast router
      attaches to a port than flood unknown traffic to it.  If it does, there
      will need to be a dedicated .port_set_mrouter anyway.
      
      So we need to let the DSA drivers see the exact form that the bridge
      passes this switchdev attribute in, otherwise we are standing in the
      way. Therefore we also need to use this form of language when
      communicating to the driver that it needs to configure its initial
      (before bridge join) and final (after bridge leave) port flags.
      
      The b53 and mv88e6xxx drivers are converted to the passthrough API and
      their implementation of .port_egress_floods is split into two: a
      function that configures unicast flooding and another for multicast.
      The mv88e6xxx implementation is quite hairy, and it turns out that
      the implementations of unknown unicast flooding are actually the same
      for 6185 and for 6352:
      
      behind the confusing names actually lie two individual bits:
      NO_UNKNOWN_MC -> FLOOD_UC = 0x4 = BIT(2)
      NO_UNKNOWN_UC -> FLOOD_MC = 0x8 = BIT(3)
      
      so there was no reason to entangle them in the first place.
      
      Whereas the 6185 writes to MV88E6185_PORT_CTL0_FORWARD_UNKNOWN of
      PORT_CTL0, which has the exact same bit index. I have left the
      implementations separate though, for the only reason that the names are
      different enough to confuse me, since I am not able to double-check with
      a user manual. The multicast flooding setting for 6185 is in a different
      register than for 6352 though.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8b659e7
    • Vladimir Oltean's avatar
      net: switchdev: pass flags and mask to both {PRE_,}BRIDGE_FLAGS attributes · e18f4c18
      Vladimir Oltean authored
      This switchdev attribute offers a counterproductive API for a driver
      writer, because although br_switchdev_set_port_flag gets passed a
      "flags" and a "mask", those are passed piecemeal to the driver, so while
      the PRE_BRIDGE_FLAGS listener knows what changed because it has the
      "mask", the BRIDGE_FLAGS listener doesn't, because it only has the final
      value. But certain drivers can offload only certain combinations of
      settings, like for example they cannot change unicast flooding
      independently of multicast flooding - they must be both on or both off.
      The way the information is passed to switchdev makes drivers not
      expressive enough, and unable to reject this request ahead of time, in
      the PRE_BRIDGE_FLAGS notifier, so they are forced to reject it during
      the deferred BRIDGE_FLAGS attribute, where the rejection is currently
      ignored.
      
      This patch also changes drivers to make use of the "mask" field for edge
      detection when possible.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e18f4c18
    • Vladimir Oltean's avatar
      net: dsa: configure better brport flags when ports leave the bridge · 5e38c158
      Vladimir Oltean authored
      For a DSA switch port operating in standalone mode, address learning
      doesn't make much sense since that is a bridge function. In fact,
      address learning even breaks setups such as this one:
      
         +---------------------------------------------+
         |                                             |
         | +-------------------+                       |
         | |        br0        |    send      receive  |
         | +--------+-+--------+ +--------+ +--------+ |
         | |        | |        | |        | |        | |
         | |  swp0  | |  swp1  | |  swp2  | |  swp3  | |
         | |        | |        | |        | |        | |
         +-+--------+-+--------+-+--------+-+--------+-+
                |         ^           |          ^
                |         |           |          |
                |         +-----------+          |
                |                                |
                +--------------------------------+
      
      because if the switch has a single FDB (can offload a single bridge)
      then source address learning on swp3 can "steal" the source MAC address
      of swp2 from br0's FDB, because learning frames coming from swp2 will be
      done twice: first on the swp1 ingress port, second on the swp3 ingress
      port. So the hardware FDB will become out of sync with the software
      bridge, and when swp2 tries to send one more packet towards swp1, the
      ASIC will attempt to short-circuit the forwarding path and send it
      directly to swp3 (since that's the last port it learned that address on),
      which it obviously can't, because swp3 operates in standalone mode.
      
      So DSA drivers operating in standalone mode should still configure a
      list of bridge port flags even when they are standalone. Currently DSA
      attempts to call dsa_port_bridge_flags with 0, which disables egress
      flooding of unknown unicast and multicast, something which doesn't make
      much sense. For the switches that implement .port_egress_floods - b53
      and mv88e6xxx, it probably doesn't matter too much either, since they
      can possibly inject traffic from the CPU into a standalone port,
      regardless of MAC DA, even if egress flooding is turned off for that
      port, but certainly not all DSA switches can do that - sja1105, for
      example, can't. So it makes sense to use a better common default there,
      such as "flood everything".
      
      It should also be noted that what DSA calls "dsa_port_bridge_flags()"
      is a degenerate name for just calling .port_egress_floods(), since
      nothing else is implemented - not learning, in particular. But disabling
      address learning, something that this driver is also coding up for, will
      be supported by individual drivers once .port_egress_floods is replaced
      with a more generic .port_bridge_flags.
      
      Previous attempts to code up this logic have been in the common bridge
      layer, but as pointed out by Ido Schimmel, there are corner cases that
      are missed when doing that:
      https://patchwork.kernel.org/project/netdevbpf/patch/20210209151936.97382-5-olteanv@gmail.com/
      
      So, at least for now, let's leave DSA in charge of setting port flags
      before and after the bridge join and leave.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5e38c158
    • Vladimir Oltean's avatar
      net: bridge: don't print in br_switchdev_set_port_flag · 078bbb85
      Vladimir Oltean authored
      For the netlink interface, propagate errors through extack rather than
      simply printing them to the console. For the sysfs interface, we still
      print to the console, but at least that's one layer higher than in
      switchdev, which also allows us to silently ignore the offloading of
      flags if that is ever needed in the future.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      078bbb85
    • Vladimir Oltean's avatar
      net: bridge: offload all port flags at once in br_setport · 304ae3bf
      Vladimir Oltean authored
      If for example this command:
      
      ip link set swp0 type bridge_slave flood off mcast_flood off learning off
      
      succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at
      BR_LEARNING, there would be no attempt to revert the partial state in
      any way. Arguably, if the user changes more than one flag through the
      same netlink command, this one _should_ be all or nothing, which means
      it should be passed through switchdev as all or nothing.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      304ae3bf
    • Vladimir Oltean's avatar
      net: switchdev: propagate extack to port attributes · 4c08c586
      Vladimir Oltean authored
      When a struct switchdev_attr is notified through switchdev, there is no
      way to report informational messages, unlike for struct switchdev_obj.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: default avatarNikolay Aleksandrov <nikolay@nvidia.com>
      Reviewed-by: default avatarGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c08c586
    • David S. Miller's avatar
      octeontx2: Fix condition. · b0aae0bd
      David S. Miller authored
      Fixes: 93efb0c6 ("octeontx2-pf: Fix out-of-bounds read in otx2_get_fecparam()")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0aae0bd
    • David S. Miller's avatar
      Merge branch 'ipa-cleanups' · 4b47ad00
      David S. Miller authored
      Alex Elder says:
      
      ====================
      net: ipa: some more cleanup
      
      Version 3 of this series uses dev_err_probe() in the second patch,
      as suggested by Heiner Kallweit.
      
      Version 2 was sent to ensure the series was based on current
      net-next/master, and added copyright updates to files touched.
      
      The original introduction is below.
      
      This is another fairly innocuous set of cleanup patches.
      
      The first was motivated by a bug found that would affect IPA v4.5.
      It maintain a new GSI address pointer; one is the "raw" (original
      mapped) address, and the other will have been adjusted if necessary
      for use on newer platforms.
      
      The second just quiets some unnecessary noise during early probe.
      
      The third fixes some errors that show up when IPA_VALIDATION is
      enabled.
      
      The last two just create helper functions to improve readability.
      
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b47ad00
    • Alex Elder's avatar
      net: ipa: introduce gsi_channel_initialized() · 6170b6da
      Alex Elder authored
      Create a simple helper function that indicates whether a channel has
      been initialized.  This abstacts/hides the details of how this is
      determined.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6170b6da
    • Alex Elder's avatar
      net: ipa: introduce ipa_table_hash_support() · a266ad6b
      Alex Elder authored
      Introduce a new function to abstract the knowledge of whether hashed
      routing and filter tables are supported for a given IPA instance.
      
      IPA v4.2 is the only one that doesn't support hashed tables (now
      and for the foreseeable future), but the name of the helper function
      is better for explaining what's going on.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a266ad6b
    • Alex Elder's avatar
      net: ipa: fix register write command validation · 2d65ed76
      Alex Elder authored
      In ipa_cmd_register_write_valid() we verify that values we will
      supply to a REGISTER_WRITE IPA immediate command will fit in
      the fields that need to hold them.  This patch fixes some issues
      in that function and ipa_cmd_register_write_offset_valid().
      
      The dev_err() call in ipa_cmd_register_write_offset_valid() has
      some printf format errors:
        - The name of the register (corresponding to the string format
          specifier) was not supplied.
        - The IPA base offset and offset need to be supplied separately to
          match the other format specifiers.
      Also make the ~0 constant used there to compute the maximum
      supported offset value explicitly unsigned.
      
      There are two other issues in ipa_cmd_register_write_valid():
        - There's no need to check the hash flush register for platforms
          (like IPA v4.2) that do not support hashed tables
        - The highest possible endpoint number, whose status register
          offset is computed, is COUNT - 1, not COUNT.
      
      Fix these problems, and add some additional commentary.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      2d65ed76
    • Alex Elder's avatar
      net: ipa: use dev_err_probe() in ipa_clock.c · 4c7ccfcd
      Alex Elder authored
      When initializing the IPA core clock and interconnects, it's
      possible we'll get an EPROBE_DEFER error.  This isn't really an
      error, it's just means we need to be re-probed later.
      
      Use dev_err_probe() to report the error rather than dev_err().
      This avoids polluting the log with these "error" messages.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4c7ccfcd
    • Alex Elder's avatar
      net: ipa: use a separate pointer for adjusted GSI memory · 571b1e7e
      Alex Elder authored
      This patch actually fixes a bug, though it doesn't affect the two
      platforms supported currently.  The fix implements GSI memory
      pointers a bit differently.
      
      For IPA version 4.5 and above, the address space for almost all GSI
      registers is adjusted downward by a fixed amount.  This is currently
      handled by adjusting the I/O virtual address pointer after it has
      been mapped.  The bug is that the pointer is not "de-adjusted" as it
      should be when it's unmapped.
      
      This patch fixes that error, but it does so by maintaining one "raw"
      pointer for the mapped memory range.  This is assigned when the
      memory is mapped and used to unmap the memory.  This pointer is also
      used to access the two registers that do *not* sit in the "adjusted"
      memory space.
      
      Rather than adjusting *that* pointer, we maintain a separate pointer
      that's an adjusted copy of the "raw" pointer, and that is used for
      most GSI register accesses.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      571b1e7e
    • David S. Miller's avatar
      Merge tag 'mac80211-next-for-net-next-2021-02-12' of... · 21cc70c7
      David S. Miller authored
      Merge tag 'mac80211-next-for-net-next-2021-02-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
      
      Johannes Berg says:
      
      ====================
      Last set of updates:
       * more minstrel work from Felix to reduce the
         probing overhead
       * QoS for nl80211 control port frames
       * STBC injection support
       * and a couple of small fixes
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21cc70c7
    • Gustavo A. R. Silva's avatar
      octeontx2-pf: Fix out-of-bounds read in otx2_get_fecparam() · 93efb0c6
      Gustavo A. R. Silva authored
      Code at line 967 implies that rsp->fwdata.supported_fec may be up to 4:
      
       967: if (rsp->fwdata.supported_fec <= FEC_MAX_INDEX)
      
      If rsp->fwdata.supported_fec evaluates to 4, then there is an
      out-of-bounds read at line 971 because fec is an array with
      a maximum of 4 elements:
      
       954         const int fec[] = {
       955                 ETHTOOL_FEC_OFF,
       956                 ETHTOOL_FEC_BASER,
       957                 ETHTOOL_FEC_RS,
       958                 ETHTOOL_FEC_BASER | ETHTOOL_FEC_RS};
       959 #define FEC_MAX_INDEX 4
      
       971: fecparam->fec = fec[rsp->fwdata.supported_fec];
      
      Fix this by properly indexing fec[] with rsp->fwdata.supported_fec - 1.
      In this case the proper indexes 0 to 3 are used when
      rsp->fwdata.supported_fec evaluates to a range of 1 to 4, correspondingly.
      
      Fixes: d0cf9503 ("octeontx2-pf: ethtool fec mode support")
      Addresses-Coverity-ID: 1501722 ("Out-of-bounds read")
      Signed-off-by: default avatarGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      93efb0c6
    • Colin Ian King's avatar
      octeontx2-af: Fix spelling mistake "recievd" -> "received" · a6e0ee35
      Colin Ian King authored
      There is a spelling mistake in the text in array rpm_rx_stats_fields,
      fix it.
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a6e0ee35