1. 08 Feb, 2023 6 commits
    • Maher Sanalla's avatar
      net/mlx5: Store page counters in a single array · c3bdbaea
      Maher Sanalla authored
      Currently, an independent page counter is used for tracking memory usage
      for each function type such as VF, PF and host PF (DPU).
      
      For better code-readibilty, use a single array that stores
      the number of allocated memory pages for each function type.
      Signed-off-by: default avatarMaher Sanalla <msanalla@nvidia.com>
      Reviewed-by: default avatarShay Drory <shayd@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      c3bdbaea
    • Dragos Tatulea's avatar
      net/mlx5e: IPoIB, Show unknown speed instead of error · 8aa5f171
      Dragos Tatulea authored
      ethtool is returning an error for unknown speeds for the IPoIB interface:
      
      $ ethtool ib0
      netlink error: failed to retrieve link settings
      netlink error: Invalid argument
      netlink error: failed to retrieve link settings
      netlink error: Invalid argument
      Settings for ib0:
      Link detected: no
      
      After this change, ethtool will return success and show "unknown speed":
      
      $ ethtool ib0
      Settings for ib0:
      Supported ports: [  ]
      Supported link modes:   Not reported
      Supported pause frame use: No
      Supports auto-negotiation: No
      Supported FEC modes: Not reported
      Advertised link modes:  Not reported
      Advertised pause frame use: No
      Advertised auto-negotiation: No
      Advertised FEC modes: Not reported
      Speed: Unknown!
      Duplex: Full
      Auto-negotiation: off
      Port: Other
      PHYAD: 0
      Transceiver: internal
      Link detected: no
      
      Fixes: eb234ee9 ("net/mlx5e: IPoIB, Add support for get_link_ksettings in ethtool")
      Signed-off-by: default avatarDragos Tatulea <dtatulea@nvidia.com>
      Reviewed-by: default avatarGal Pressman <gal@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8aa5f171
    • Amir Tzin's avatar
      net/mlx5e: Fix crash unsetting rx-vlan-filter in switchdev mode · 8974aa96
      Amir Tzin authored
      Moving to switchdev mode with rx-vlan-filter on and then setting it off
      causes the kernel to crash since fs->vlan is freed during nic profile
      cleanup flow.
      
      RX VLAN filtering is not supported in switchdev mode so unset it when
      changing to switchdev and restore its value when switching back to
      legacy.
      
      trace:
      [] RIP: 0010:mlx5e_disable_cvlan_filter+0x43/0x70
      [] set_feature_cvlan_filter+0x37/0x40 [mlx5_core]
      [] mlx5e_handle_feature+0x3a/0x60 [mlx5_core]
      [] mlx5e_set_features+0x6d/0x160 [mlx5_core]
      [] __netdev_update_features+0x288/0xa70
      [] ethnl_set_features+0x309/0x380
      [] ? __nla_parse+0x21/0x30
      [] genl_family_rcv_msg_doit.isra.17+0x110/0x150
      [] genl_rcv_msg+0x112/0x260
      [] ? features_reply_size+0xe0/0xe0
      [] ? genl_family_rcv_msg_doit.isra.17+0x150/0x150
      [] netlink_rcv_skb+0x4e/0x100
      [] genl_rcv+0x24/0x40
      [] netlink_unicast+0x1ab/0x290
      [] netlink_sendmsg+0x257/0x4f0
      [] sock_sendmsg+0x5c/0x70
      
      Fixes: cb67b832 ("net/mlx5e: Introduce SRIOV VF representors")
      Signed-off-by: default avatarAmir Tzin <amirtz@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      8974aa96
    • Vlad Buslov's avatar
      net/mlx5: Bridge, fix ageing of peer FDB entries · da0c5242
      Vlad Buslov authored
      SWITCHDEV_FDB_ADD_TO_BRIDGE event handler that updates FDB entry 'lastuse'
      field is only executed for eswitch that owns the entry. However, if peer
      entry processed packets at least once it will have hardware counter 'used'
      value greater than entry 'lastuse' from that point on, which will cause FDB
      entry not being aged out.
      
      Process the event on all eswitch instances.
      
      Fixes: ff9b7521 ("net/mlx5: Bridge, support LAG")
      Signed-off-by: default avatarVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: default avatarMaor Dickman <maord@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      da0c5242
    • Yevgeny Kliteynik's avatar
      net/mlx5: DR, Fix potential race in dr_rule_create_rule_nic · 288d85e0
      Yevgeny Kliteynik authored
      Selecting builder should be protected by the lock to prevent the case
      where a new rule sets a builder in the nic_matcher while the previous
      rule is still using the nic_matcher.
      
      Fixing this issue and cleaning the error flow.
      
      Fixes: b9b81e1e ("net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically")
      Signed-off-by: default avatarYevgeny Kliteynik <kliteyn@nvidia.com>
      Reviewed-by: default avatarAlex Vesker <valex@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      288d85e0
    • Adham Faris's avatar
      net/mlx5e: Update rx ring hw mtu upon each rx-fcs flag change · 1e662209
      Adham Faris authored
      rq->hw_mtu is used in function en_rx.c/mlx5e_skb_from_cqe_mpwrq_linear()
      to catch oversized packets. If FCS is concatenated to the end of the
      packet then the check should be updated accordingly.
      
      Rx rings initialization (mlx5e_init_rxq_rq()) invoked for every new set
      of channels, as part of mlx5e_safe_switch_params(), unknowingly if it
      runs with default configuration or not. Current rq->hw_mtu
      initialization assumes default configuration and ignores
      params->scatter_fcs_en flag state.
      Fix this, by accounting for params->scatter_fcs_en flag state during
      rq->hw_mtu initialization.
      
      In addition, updating rq->hw_mtu value during ingress traffic might
      lead to packets drop and oversize_pkts_sw_drop counter increase with no
      good reason. Hence we remove this optimization and switch the set of
      channels with a new one, to make sure we don't get false positives on
      the oversize_pkts_sw_drop counter.
      
      Fixes: 102722fc ("net/mlx5e: Add support for RXFCS feature flag")
      Signed-off-by: default avatarAdham Faris <afaris@nvidia.com>
      Reviewed-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      1e662209
  2. 07 Feb, 2023 4 commits
    • Jiri Pirko's avatar
      devlink: change port event netdev notifier from per-net to global · 565b4824
      Jiri Pirko authored
      Currently only the network namespace of devlink instance is monitored
      for port events. If netdev is moved to a different namespace and then
      unregistered, NETDEV_PRE_UNINIT is missed which leads to trigger
      following WARN_ON in devl_port_unregister().
      WARN_ON(devlink_port->type != DEVLINK_PORT_TYPE_NOTSET);
      
      Fix this by changing the netdev notifier from per-net to global so no
      event is missed.
      
      Fixes: 02a68a47 ("net: devlink: track netdev with devlink_port assigned")
      Signed-off-by: default avatarJiri Pirko <jiri@nvidia.com>
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://lore.kernel.org/r/20230206094151.2557264-1-jiri@resnulli.usSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      565b4824
    • Vladimir Oltean's avatar
      selftests: ocelot: tc_flower_chains: make test_vlan_ingress_modify() more comprehensive · bbb253b2
      Vladimir Oltean authored
      We have two IS1 filters of the OCELOT_VCAP_KEY_ANY key type (the one with
      "action vlan pop" and the one with "action vlan modify") and one of the
      OCELOT_VCAP_KEY_IPV4 key type (the one with "action skbedit priority").
      But we have no IS1 filter with the OCELOT_VCAP_KEY_ETYPE key type, and
      there was an uncaught breakage there.
      
      To increase test coverage, convert one of the OCELOT_VCAP_KEY_ANY
      filters to OCELOT_VCAP_KEY_ETYPE, by making the filter also match on the
      MAC SA of the traffic sent by mausezahn, $h1_mac.
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Link: https://lore.kernel.org/r/20230205192409.1796428-2-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      bbb253b2
    • Vladimir Oltean's avatar
      net: mscc: ocelot: fix VCAP filters not matching on MAC with "protocol 802.1Q" · f964f839
      Vladimir Oltean authored
      Alternative short title: don't instruct the hardware to match on
      EtherType with "protocol 802.1Q" flower filters. It doesn't work for the
      reasons detailed below.
      
      With a command such as the following:
      
      tc filter add dev $swp1 ingress chain $(IS1 2) pref 3 \
      	protocol 802.1Q flower skip_sw vlan_id 200 src_mac $h1_mac \
      	action vlan modify id 300 \
      	action goto chain $(IS2 0 0)
      
      the created filter is set by ocelot_flower_parse_key() to be of type
      OCELOT_VCAP_KEY_ETYPE, and etype is set to {value=0x8100, mask=0xffff}.
      This gets propagated all the way to is1_entry_set() which commits it to
      hardware (the VCAP_IS1_HK_ETYPE field of the key). Compare this to the
      case where src_mac isn't specified - the key type is OCELOT_VCAP_KEY_ANY,
      and is1_entry_set() doesn't populate VCAP_IS1_HK_ETYPE.
      
      The problem is that for VLAN-tagged frames, the hardware interprets the
      ETYPE field as holding the encapsulated VLAN protocol. So the above
      filter will only match those packets which have an encapsulated protocol
      of 0x8100, rather than all packets with VLAN ID 200 and the given src_mac.
      
      The reason why this is allowed to occur is because, although we have a
      block of code in ocelot_flower_parse_key() which sets "match_protocol"
      to false when VLAN keys are present, that code executes too late.
      There is another block of code, which executes for Ethernet addresses,
      and has a "goto finished_key_parsing" and skips the VLAN header parsing.
      By skipping it, "match_protocol" remains with the value it was
      initialized with, i.e. "true", and "proto" is set to f->common.protocol,
      or 0x8100.
      
      The concept of ignoring some keys rather than erroring out when they are
      present but can't be offloaded is dubious in itself, but is present
      since the initial commit fe3490e6 ("net: mscc: ocelot: Hardware
      ofload for tc flower filter"), and it's outside of the scope of this
      patch to change that.
      
      The problem was introduced when the driver started to interpret the
      flower filter's protocol, and populate the VCAP filter's ETYPE field
      based on it.
      
      To fix this, it is sufficient to move the code that parses the VLAN keys
      earlier than the "goto finished_key_parsing" instruction. This will
      ensure that if we have a flower filter with both VLAN and Ethernet
      address keys, it won't match on ETYPE 0x8100, because the VLAN key
      parsing sets "match_protocol = false".
      
      Fixes: 86b956de ("net: mscc: ocelot: support matching on EtherType")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230205192409.1796428-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f964f839
    • Vladimir Oltean's avatar
      net: dsa: mt7530: don't change PVC_EG_TAG when CPU port becomes VLAN-aware · 0b6d6425
      Vladimir Oltean authored
      Frank reports that in a mt7530 setup where some ports are standalone and
      some are in a VLAN-aware bridge, 8021q uppers of the standalone ports
      lose their VLAN tag on xmit, as seen by the link partner.
      
      This seems to occur because once the other ports join the VLAN-aware
      bridge, mt7530_port_vlan_filtering() also calls
      mt7530_port_set_vlan_aware(ds, cpu_dp->index), and this affects the way
      that the switch processes the traffic of the standalone port.
      
      Relevant is the PVC_EG_TAG bit. The MT7530 documentation says about it:
      
      EG_TAG: Incoming Port Egress Tag VLAN Attribution
      0: disabled (system default)
      1: consistent (keep the original ingress tag attribute)
      
      My interpretation is that this setting applies on the ingress port, and
      "disabled" is basically the normal behavior, where the egress tag format
      of the packet (tagged or untagged) is decided by the VLAN table
      (MT7530_VLAN_EGRESS_UNTAG or MT7530_VLAN_EGRESS_TAG).
      
      But there is also an option of overriding the system default behavior,
      and for the egress tagging format of packets to be decided not by the
      VLAN table, but simply by copying the ingress tag format (if ingress was
      tagged, egress is tagged; if ingress was untagged, egress is untagged;
      aka "consistent). This is useful in 2 scenarios:
      
      - VLAN-unaware bridge ports will always encounter a miss in the VLAN
        table. They should forward a packet as-is, though. So we use
        "consistent" there. See commit e045124e ("net: dsa: mt7530: fix
        tagged frames pass-through in VLAN-unaware mode").
      
      - Traffic injected from the CPU port. The operating system is in god
        mode; if it wants a packet to exit as VLAN-tagged, it sends it as
        VLAN-tagged. Otherwise it sends it as VLAN-untagged*.
      
      *This is true only if we don't consider the bridge TX forwarding offload
      feature, which mt7530 doesn't support.
      
      So for now, make the CPU port always stay in "consistent" mode to allow
      software VLANs to be forwarded to their egress ports with the VLAN tag
      intact, and not stripped.
      
      Link: https://lore.kernel.org/netdev/trinity-e6294d28-636c-4c40-bb8b-b523521b00be-1674233135062@3c-app-gmx-bs36/
      Fixes: e045124e ("net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode")
      Reported-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Tested-by: default avatarFrank Wunderlich <frank-w@public-files.de>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Link: https://lore.kernel.org/r/20230205140713.1609281-1-vladimir.oltean@nxp.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0b6d6425
  3. 06 Feb, 2023 4 commits
    • Alan Stern's avatar
      net: USB: Fix wrong-direction WARNING in plusb.c · 811d5811
      Alan Stern authored
      The syzbot fuzzer detected a bug in the plusb network driver: A
      zero-length control-OUT transfer was treated as a read instead of a
      write.  In modern kernels this error provokes a WARNING:
      
      usb 1-1: BOGUS control dir, pipe 80000280 doesn't match bRequestType c0
      WARNING: CPU: 0 PID: 4645 at drivers/usb/core/urb.c:411
      usb_submit_urb+0x14a7/0x1880 drivers/usb/core/urb.c:411
      Modules linked in:
      CPU: 1 PID: 4645 Comm: dhcpcd Not tainted
      6.2.0-rc6-syzkaller-00050-g9f266cca #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
      01/12/2023
      RIP: 0010:usb_submit_urb+0x14a7/0x1880 drivers/usb/core/urb.c:411
      ...
      Call Trace:
       <TASK>
       usb_start_wait_urb+0x101/0x4b0 drivers/usb/core/message.c:58
       usb_internal_control_msg drivers/usb/core/message.c:102 [inline]
       usb_control_msg+0x320/0x4a0 drivers/usb/core/message.c:153
       __usbnet_read_cmd+0xb9/0x390 drivers/net/usb/usbnet.c:2010
       usbnet_read_cmd+0x96/0xf0 drivers/net/usb/usbnet.c:2068
       pl_vendor_req drivers/net/usb/plusb.c:60 [inline]
       pl_set_QuickLink_features drivers/net/usb/plusb.c:75 [inline]
       pl_reset+0x2f/0xf0 drivers/net/usb/plusb.c:85
       usbnet_open+0xcc/0x5d0 drivers/net/usb/usbnet.c:889
       __dev_open+0x297/0x4d0 net/core/dev.c:1417
       __dev_change_flags+0x587/0x750 net/core/dev.c:8530
       dev_change_flags+0x97/0x170 net/core/dev.c:8602
       devinet_ioctl+0x15a2/0x1d70 net/ipv4/devinet.c:1147
       inet_ioctl+0x33f/0x380 net/ipv4/af_inet.c:979
       sock_do_ioctl+0xcc/0x230 net/socket.c:1169
       sock_ioctl+0x1f8/0x680 net/socket.c:1286
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:870 [inline]
       __se_sys_ioctl fs/ioctl.c:856 [inline]
       __x64_sys_ioctl+0x197/0x210 fs/ioctl.c:856
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The fix is to call usbnet_write_cmd() instead of usbnet_read_cmd() and
      remove the USB_DIR_IN flag.
      
      Reported-and-tested-by: syzbot+2a0e7abd24f1eb90ce25@syzkaller.appspotmail.com
      Signed-off-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Fixes: 090ffa9d ("[PATCH] USB: usbnet (9/9) module for pl2301/2302 cables")
      CC: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/00000000000052099f05f3b3e298@google.com/Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      811d5811
    • Casper Andersson's avatar
      net: microchip: sparx5: fix PTP init/deinit not checking all ports · d7d94b26
      Casper Andersson authored
      Check all ports instead of just port_count ports. PTP init was only
      checking ports 0 to port_count. If the hardware ports are not mapped
      starting from 0 then they would be missed, e.g. if only ports 20-30 were
      mapped it would attempt to init ports 0-10, resulting in NULL pointers
      when attempting to timestamp. Now it will init all mapped ports.
      
      Fixes: 70dfe25c ("net: sparx5: Update extraction/injection for timestamping")
      Signed-off-by: default avatarCasper Andersson <casper.casan@gmail.com>
      Reviewed-by: default avatarHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7d94b26
    • Herton R. Krzesinski's avatar
      uapi: add missing ip/ipv6 header dependencies for linux/stddef.h · 03702d4d
      Herton R. Krzesinski authored
      Since commit 58e0be1e ("net: use struct_group to copy ip/ipv6
      header addresses"), ip and ipv6 headers started to use the __struct_group
      definition, which is defined at include/uapi/linux/stddef.h. However,
      linux/stddef.h isn't explicitly included in include/uapi/linux/{ip,ipv6}.h,
      which breaks build of xskxceiver bpf selftest if you install the uapi
      headers in the system:
      
      $ make V=1 xskxceiver -C tools/testing/selftests/bpf
      ...
      make: Entering directory '(...)/tools/testing/selftests/bpf'
      gcc -g -O0 -rdynamic -Wall -Werror (...)
      In file included from xskxceiver.c:79:
      /usr/include/linux/ip.h:103:9: error: expected specifier-qualifier-list before ‘__struct_group’
        103 |         __struct_group(/* no tag */, addrs, /* no attrs */,
            |         ^~~~~~~~~~~~~~
      ...
      
      Include the missing <linux/stddef.h> dependency in ip.h and do the
      same for the ipv6.h header.
      
      Fixes: 58e0be1e ("net: use struct_group to copy ip/ipv6 header addresses")
      Signed-off-by: default avatarHerton R. Krzesinski <herton@redhat.com>
      Reviewed-by: default avatarCarlos O'Donell <carlos@redhat.com>
      Tested-by: default avatarCarlos O'Donell <carlos@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      03702d4d
    • Julian Anastasov's avatar
      neigh: make sure used and confirmed times are valid · c1d2ecdf
      Julian Anastasov authored
      Entries can linger in cache without timer for days, thanks to
      the gc_thresh1 limit. As result, without traffic, the confirmed
      time can be outdated and to appear to be in the future. Later,
      on traffic, NUD_STALE entries can switch to NUD_DELAY and start
      the timer which can see the invalid confirmed time and wrongly
      switch to NUD_REACHABLE state instead of NUD_PROBE. As result,
      timer is set many days in the future. This is more visible on
      32-bit platforms, with higher HZ value.
      
      Why this is a problem? While we expect unused entries to expire,
      such entries stay in REACHABLE state for too long, locked in
      cache. They are not expired normally, only when cache is full.
      
      Problem and the wrong state change reported by Zhang Changzhong:
      
      172.16.1.18 dev bond0 lladdr 0a:0e:0f:01:12:01 ref 1 used 350521/15994171/350520 probes 4 REACHABLE
      
      350520 seconds have elapsed since this entry was last updated, but it is
      still in the REACHABLE state (base_reachable_time_ms is 30000),
      preventing lladdr from being updated through probe.
      
      Fix it by ensuring timer is started with valid used/confirmed
      times. Considering the valid time range is LONG_MAX jiffies,
      we try not to go too much in the past while we are in
      DELAY/PROBE state. There are also places that need
      used/updated times to be validated while timer is not running.
      Reported-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Tested-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1d2ecdf
  4. 04 Feb, 2023 7 commits
  5. 03 Feb, 2023 1 commit
  6. 02 Feb, 2023 18 commits