1. 24 Apr, 2024 14 commits
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: use priv->ds->num_ports instead of MT7530_NUM_PORTS · 318c1944
      Arınç ÜNAL authored
      Use priv->ds->num_ports on all for loops which configure the switch
      registers. In the future, the value of MT7530_NUM_PORTS will depend on
      priv->id. Therefore, this change prepares the subdriver for a simpler
      implementation.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      318c1944
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: get rid of mac_port_validate member of mt753x_info · aa16e1fc
      Arınç ÜNAL authored
      The mac_port_validate member of the mt753x_info structure is not being
      used, remove it. Improve the member description section in the process.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa16e1fc
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: refactor MT7530_PMEEECR_P() · 99acfa82
      Arınç ÜNAL authored
      The MT7530_PMEEECR_P() register is on MT7530, MT7531, and the switch on the
      MT7988 SoC. Rename the definition for them to MT753X_PMEEECR_P(). Use the
      FIELD_PREP and FIELD_GET macros. Rename GET_LPI_THRESH() and
      SET_LPI_THRESH() to LPI_THRESH_GET() and LPI_THRESH_SET().
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99acfa82
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: get rid of function sanity check · 379f7bf8
      Arınç ÜNAL authored
      Get rid of checking whether functions are filled properly. priv->info which
      is an mt753x_info structure is filled and checked for before this check.
      It's unnecessary checking whether it's filled properly.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      379f7bf8
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: define MAC speed capabilities per switch model · 6512204b
      Arınç ÜNAL authored
      With the support of the MT7988 SoC switch, the MAC speed capabilities
      defined on mt753x_phylink_get_caps() won't apply to all switch models
      anymore. Move them to more appropriate locations instead of overwriting
      config->mac_capabilities.
      
      Remove the comment on mt753x_phylink_get_caps() as it's become invalid with
      the support of MT7531 and MT7988 SoC switch.
      
      Add break to case 6 of mt7988_mac_port_get_caps() to be explicit.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6512204b
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: return mt7530_setup_mdio & mt7531_setup_common on error · 7bf06bcd
      Arınç ÜNAL authored
      The mt7530_setup_mdio() and mt7531_setup_common() functions should be
      checked for errors. Return if the functions return a non-zero value.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7bf06bcd
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: move MT753X_MTRAP operations for MT7530 · 377174c5
      Arınç ÜNAL authored
      On MT7530, the media-independent interfaces of port 5 and 6 are controlled
      by the MT7530_P5_DIS and MT7530_P6_DIS bits of the hardware trap. Deal with
      these bits only when the relevant port is being enabled or disabled. This
      ensures that these ports will be disabled when they are not in use.
      
      Do not set MT7530_CHG_TRAP on mt7530_setup_port5() as that's already being
      done on mt7530_setup().
      
      Instead of globally setting MT7530_P5_MAC_SEL, clear it, then set it only
      on the appropriate case.
      
      If PHY muxing is detected, clear MT7530_P5_DIS before calling
      mt7530_setup_port5().
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      377174c5
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: refactor MT7530_HWTRAP and MT7530_MHWTRAP · 7c8d1402
      Arınç ÜNAL authored
      The MT7530_HWTRAP and MT7530_MHWTRAP registers are on MT7530 and MT7531.
      It's called hardware trap on MT7530, software trap on MT7531. That's
      because some bits of the trap on MT7530 cannot be modified by software
      whilst all bits of the trap on MT7531 can. Rename the definitions for them
      to MT753X_TRAP and MT753X_MTRAP. Add MT7530 and MT7531 prefixes to the
      definitions specific to the switch model.
      
      Remove the extra parentheses from MT7530_XTAL_40MHZ and MT7530_XTAL_20MHZ.
      
      Rename MHWTRAP_PHY0_SEL, MHWTRAP_MANUAL, and MHWTRAP_PHY_ACCESS to be on
      par with the "MT7621 Giga Switch Programming Guide v0.3" document.
      
      Make an enumaration for the XTAL frequency. Set the data type of the xtal
      variable on mt7531_pll_setup() to it.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c8d1402
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: refactor MT7530_MFC and MT7531_CFC, add MT7531_QRY_FFP · 9c7401dc
      Arınç ÜNAL authored
      The MT7530_MFC register is on MT7530, MT7531, and the switch on the MT7988
      SoC. Rename it to MT753X_MFC. Bit 7 to 0 differs between MT7530 and
      MT7531/MT7988. Add MT7530 prefix to these definitions, and define the
      IGMP/MLD Query Frame Flooding Ports mask for MT7531.
      
      Rename the cases of MIRROR_MASK to MIRROR_PORT_MASK.
      
      Move mt753x_mirror_port_get() and mt753x_port_mirror_set() to mt7530.h as
      macros.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9c7401dc
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: rename mt753x_bpdu_port_fw enum to mt753x_to_cpu_fw · 7603a0c7
      Arınç ÜNAL authored
      The mt753x_bpdu_port_fw enum is globally used for manipulating the process
      of deciding the forwardable ports, specifically concerning the CPU port(s).
      Therefore, rename it and the values in it to mt753x_to_cpu_fw.
      
      Change FOLLOW_MFC to SYSTEM_DEFAULT to be on par with the switch documents.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7603a0c7
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: rename p5_intf_sel and use only for MT7530 switch · eeaf9acb
      Arınç ÜNAL authored
      The p5_intf_sel pointer is used to store the information of whether PHY
      muxing is used or not. PHY muxing is a feature specific to port 5 of the
      MT7530 switch. Do not use it for other switch models.
      
      Rename the pointer to p5_mode to store the mode the port is being used in.
      Rename the p5_interface_select enum to mt7530_p5_mode, the string
      representation to mt7530_p5_mode_str, and the enum elements.
      
      If PHY muxing is not detected, the default mode, GMAC5, will be used.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eeaf9acb
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: refactor MT7530_PMCR_P() · 883ea1c0
      Arınç ÜNAL authored
      The MT7530_PMCR_P() registers are on MT7530, MT7531, and the switch on the
      MT7988 SoC. Rename the definition for them to MT753X_PMCR_P(). Bit 15 is
      for MT7530 only. Add MT7530 prefix to the definition for bit 15.
      
      Use GENMASK and FIELD_PREP for PMCR_IFG_XMIT().
      
      Rename PMCR_TX_EN and PMCR_RX_EN to PMCR_MAC_TX_EN and PMCR_MAC_TX_EN to
      follow the naming on the "MT7621 Giga Switch Programming Guide v0.3",
      "MT7531 Reference Manual for Development Board v1.0", and "MT7988A Wi-Fi 7
      Generation Router Platform: Datasheet (Open Version) v0.1" documents.
      
      These documents show that PMCR_RX_FC_EN is at bit 5. Correct this along
      with renaming it to PMCR_FORCE_RX_FC_EN, and the same for PMCR_TX_FC_EN.
      
      Remove PMCR_SPEED_MASK which doesn't have a use.
      
      Rename the force mode definitions for MT7531 to FORCE_MODE. Add MASK at the
      end for the mask that includes all force mode definitions.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      883ea1c0
    • Arınç ÜNAL's avatar
      net: dsa: mt7530: disable EEE abilities on failure on MT7531 and MT7988 · 385c22ee
      Arınç ÜNAL authored
      The MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 bits let the
      PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits determine the 1G/100 EEE
      abilities of the MAC. If MT7531_FORCE_EEE1G and MT7531_FORCE_EEE100 are
      unset, the abilities are left to be determined by PHY auto polling.
      
      The commit 40b5d2f1 ("net: dsa: mt7530: Add support for EEE features")
      made it so that the PMCR_FORCE_EEE1G and PMCR_FORCE_EEE100 bits are set on
      mt753x_phylink_mac_link_up(). But it did not set the MT7531_FORCE_EEE1G and
      MT7531_FORCE_EEE100 bits. Because of this, the EEE abilities will be
      determined by PHY auto polling, regardless of the result of phy_init_eee().
      
      Define these bits and add them to the MT7531_FORCE_MODE mask which is set
      in mt7531_setup_common(). With this, there won't be any EEE abilities set
      when phy_init_eee() returns a negative value.
      
      Thanks to Russell for explaining when phy_init_eee() could return a
      negative value below.
      
      Looking at phy_init_eee(), it could return a negative value when:
      
      1. phydev->drv is NULL
      2. if genphy_c45_eee_is_active() returns negative
      3. if genphy_c45_eee_is_active() returns zero, it returns -EPROTONOSUPPORT
      4. if phy_set_bits_mmd() fails (e.g. communication error with the PHY)
      
      If we then look at genphy_c45_eee_is_active(), then:
      
      genphy_c45_read_eee_adv() and genphy_c45_read_eee_lpa() propagate their
      non-zero return values, otherwise this function returns zero or positive
      integer.
      
      If we then look at genphy_c45_read_eee_adv(), then a failure of
      phy_read_mmd() would cause a negative value to be returned.
      
      Looking at genphy_c45_read_eee_lpa(), the same is true.
      
      So, it can be summarised as:
      
      - phydev->drv is NULL
      - there is a communication error accessing the PHY
      - EEE is not active
      
      otherwise, it returns zero on success.
      
      If one wishes to determine whether an error occurred vs EEE not being
      supported through negotiation for the negotiated speed, if it returns
      -EPROTONOSUPPORT in the latter case. Other error codes mean either the
      driver has been unloaded or communication error.
      
      In conclusion, determining the EEE abilities by PHY auto polling shouldn't
      result in having any EEE abilities enabled, when one of the last two
      situations in the summary happens. And it seems that if phydev->drv is
      NULL, there would be bigger problems with the device than a broken link. So
      this is not a bugfix.
      Signed-off-by: default avatarArınç ÜNAL <arinc.unal@arinc9.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      385c22ee
    • Eric Dumazet's avatar
      neighbour: fix neigh_master_filtered() · 1c04b46c
      Eric Dumazet authored
      If we no longer hold RTNL, we must use netdev_master_upper_dev_get_rcu()
      instead of netdev_master_upper_dev_get().
      
      Fixes: ba0f7806 ("neighbour: no longer hold RTNL in neigh_dump_info()")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Link: https://lore.kernel.org/r/20240421185753.1808077-1-edumazet@google.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1c04b46c
  2. 23 Apr, 2024 26 commits
    • Jakub Kicinski's avatar
      Merge branch 'selftests-drv-net-support-testing-with-a-remote-system' · 8d03c153
      Jakub Kicinski authored
      Jakub Kicinski says:
      
      ====================
      selftests: drv-net: support testing with a remote system
      
      Implement support for tests which require access to a remote system /
      endpoint which can generate traffic.
      This series concludes the "groundwork" for upstream driver tests.
      
      I wanted to support the three models which came up in discussions:
       - SW testing with netdevsim
       - "local" testing with two ports on the same system in a loopback
       - "remote" testing via SSH
      so there is a tiny bit of an abstraction which wraps up how "remote"
      commands are executed. Otherwise hopefully there's nothing surprising.
      
      I'm only adding a ping test. I had a bigger one written but I was
      worried we'll get into discussing the details of the test itself
      and how I chose to hack up netdevsim, instead of the test infra...
      So that test will be a follow up :)
      
      v4: https://lore.kernel.org/all/20240418233844.2762396-1-kuba@kernel.org
      v3: https://lore.kernel.org/all/20240417231146.2435572-1-kuba@kernel.org
      v2: https://lore.kernel.org/all/20240416004556.1618804-1-kuba@kernel.org
      v1: https://lore.kernel.org/all/20240412233705.1066444-1-kuba@kernel.org
      ====================
      
      Link: https://lore.kernel.org/r/20240420025237.3309296-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8d03c153
    • Jakub Kicinski's avatar
      selftests: drv-net: add require_XYZ() helpers for validating env · f1e68a1a
      Jakub Kicinski authored
      Wrap typical checks like whether given command used by the test
      is available in helpers.
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20240420025237.3309296-8-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f1e68a1a
    • Jakub Kicinski's avatar
      selftests: drv-net: add a TCP ping test case (and useful helpers) · 31611cea
      Jakub Kicinski authored
      More complex tests often have to spawn a background process,
      like a server which will respond to requests or tcpdump.
      
      Add support for creating such processes using the with keyword:
      
        with bkg("my-daemon", ..):
           # my-daemon is alive in this block
      
      My initial thought was to add this support to cmd() directly
      but it runs the command in the constructor, so by the time
      we __enter__ it's too late to make sure we used "background=True".
      
      Second useful helper transplanted from net_helper.sh is
      wait_port_listen().
      
      The test itself uses socat, which insists on v6 addresses
      being wrapped in [], it's not the only command which requires
      this format, so add the wrapped address to env. The hope
      is to save test code from checking if address is v6.
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20240420025237.3309296-7-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      31611cea
    • Jakub Kicinski's avatar
      selftests: net: support matching cases by name prefix · 01b43164
      Jakub Kicinski authored
      While writing tests with a lot more cases I got tired of having
      to jump back and forth to add the name of the test to the ksft_run()
      list. Most unittest frameworks do some name matching, e.g. assume
      that functions with names starting with test_ are test cases.
      
      Support similar flow in ksft_run(). Let the author list the desired
      prefixes. globals() need to be passed explicitly, IDK how to work
      around that.
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20240420025237.3309296-6-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      01b43164
    • Jakub Kicinski's avatar
      selftests: drv-net: add a trivial ping test · a48a87c0
      Jakub Kicinski authored
      Add a very simple test for testing with a remote system.
      Both IPv4 and IPv6 connectivity is optional, later change
      will add checks to skip tests based on available addresses.
      
      Using netdevsim:
      
       $ ./run_kselftest.sh -t drivers/net:ping.py
       TAP version 13
       1..1
       # timeout set to 45
       # selftests: drivers/net: ping.py
       # KTAP version 1
       # 1..2
       # ok 1 ping.test_v4
       # ok 2 ping.test_v6
       # # Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
       ok 1 selftests: drivers/net: ping.py
      
      Command line SSH:
      
       $ NETIF=virbr0 REMOTE_TYPE=ssh REMOTE_ARGS=root@192.168.122.123 \
          LOCAL_V4=192.168.122.1 REMOTE_V4=192.168.122.123 \
          ./tools/testing/selftests/drivers/net/ping.py
       KTAP version 1
       1..2
       ok 1 ping.test_v4
       ok 2 ping.test_v6 # SKIP Test requires IPv6 connectivity
       # Totals: pass:1 fail:0 xfail:1 xpass:0 skip:0 error:0
      
      Existing devices placed in netns (and using net.config):
      
       $ cat drivers/net/net.config
       NETIF=veth0
       REMOTE_TYPE=netns
       REMOTE_ARGS=red
       LOCAL_V4="192.168.1.1"
       REMOTE_V4="192.168.1.2"
      
       $ ./run_kselftest.sh -t drivers/net:ping.py
       TAP version 13
       1..1
       # timeout set to 45
       # selftests: drivers/net: ping.py
       # KTAP version 1
       # 1..2
       # ok 1 ping.test_v4
       # ok 2 ping.test_v6 # SKIP Test requires IPv6 connectivity
       # # Totals: pass:1 fail:0 xfail:1 xpass:0 skip:0 error:0
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20240420025237.3309296-5-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      a48a87c0
    • Jakub Kicinski's avatar
      selftests: drv-net: construct environment for running tests which require an endpoint · 1880f272
      Jakub Kicinski authored
      Nothing surprising here, hopefully. Wrap the variables from
      the environment into a class or spawn a netdevsim based env
      and pass it to the tests.
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20240420025237.3309296-4-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1880f272
    • Jakub Kicinski's avatar
      selftests: drv-net: factor out parsing of the env · 54338929
      Jakub Kicinski authored
      The tests with a remote end will use a different class,
      for clarity, but will also need to parse the env.
      So factor parsing the env out to a function.
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20240420025237.3309296-3-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      54338929
    • Jakub Kicinski's avatar
      selftests: drv-net: define endpoint structures · 1a20a9a0
      Jakub Kicinski authored
      Define the remote endpoint "model". To execute most meaningful device
      driver tests we need to be able to communicate with a remote system,
      and have it send traffic to the device under test.
      
      Various test environments will have different requirements.
      
      0) "Local" netdevsim-based testing can simply use net namespaces.
      netdevsim supports connecting two devices now, to form a veth-like
      construct.
      
      1) Similarly on hosts with multiple NICs, the NICs may be connected
      together with a loopback cable or internal device loopback.
      One interface may be placed into separate netns, and tests
      would proceed much like in the netdevsim case. Note that
      the loopback config or the moving of one interface
      into a netns is not expected to be part of selftest code.
      
      2) Some systems may need to communicate with the remote endpoint
      via SSH.
      
      3) Last but not least environment may have its own custom communication
      method.
      
      Fundamentally we only need two operations:
       - run a command remotely
       - deploy a binary (if some tool we need is built as part of kselftests)
      
      Wrap these two in a class. Use dynamic loading to load the Remote
      class. This will allow very easy definition of other communication
      methods without bothering upstream code base.
      
      Stick to the "simple" / "no unnecessary abstractions" model for
      referring to the remote endpoints. The host / remote object are
      passed as an argument to the usual cmd() or ip() invocation.
      For example:
      
       ip("link show", json=True, host=remote)
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20240420025237.3309296-2-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      1a20a9a0
    • Jakub Kicinski's avatar
      Merge branch 'netdev-support-dumping-a-single-netdev-in-qstats' · b2c8599f
      Jakub Kicinski authored
      Jakub Kicinski says:
      
      ====================
      netdev: support dumping a single netdev in qstats
      
      I was writing a test for page pool which depended on qstats,
      and got tired of having to filter dumps in user space.
      Add support for dumping stats for a single netdev.
      
      To get there we first need to add full support for extack
      in dumps (and fix a dump error handling bug in YNL, sent
      separately to the net tree).
      ====================
      
      Link: https://lore.kernel.org/r/20240420023543.3300306-1-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      b2c8599f
    • Jakub Kicinski's avatar
      selftests: drv-net: test dumping qstats per device · 23710925
      Jakub Kicinski authored
      Add a test for dumping qstats device by device.
      
      ksft framework grows a ksft_raises() helper, to be used
      under with, which should be familiar to unittest users.
      
      Link: https://lore.kernel.org/r/20240420023543.3300306-5-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      23710925
    • Jakub Kicinski's avatar
      netlink: support all extack types in dumps · 8af4f604
      Jakub Kicinski authored
      Note that when this commit message refers to netlink dump
      it only means the actual dumping part, the parsing / dump
      start is handled by the same code as "doit".
      
      Commit 4a19edb6 ("netlink: Pass extack to dump handlers")
      added support for returning extack messages from dump handlers,
      but left out other extack info, e.g. bad attribute.
      
      This used to be fine because until YNL we had little practical
      use for the machine readable attributes, and only messages were
      used in practice.
      
      YNL flips the preference 180 degrees, it's now much more useful
      to point to a bad attr with NL_SET_BAD_ATTR() than type
      an English message saying "attribute XYZ is $reason-why-bad".
      
      Support all of extack. The fact that extack only gets added if
      it fits remains unaddressed.
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20240420023543.3300306-4-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8af4f604
    • Jakub Kicinski's avatar
      netlink: move extack writing helpers · 652332e3
      Jakub Kicinski authored
      Next change will need them in netlink_dump_done(), pure move.
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://lore.kernel.org/r/20240420023543.3300306-3-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      652332e3
    • Jakub Kicinski's avatar
      netdev: support dumping a single netdev in qstats · ce05d0f2
      Jakub Kicinski authored
      Having to filter the right ifindex in the tests is a bit tedious.
      Add support for dumping qstats for a single ifindex.
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240420023543.3300306-2-kuba@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      ce05d0f2
    • Kuniyuki Iwashima's avatar
      af_unix: Don't access successor in unix_del_edges() during GC. · 1af2dfac
      Kuniyuki Iwashima authored
      syzbot reported use-after-free in unix_del_edges().  [0]
      
      What the repro does is basically repeat the following quickly.
      
        1. pass a fd of an AF_UNIX socket to itself
      
          socketpair(AF_UNIX, SOCK_DGRAM, 0, [3, 4]) = 0
          sendmsg(3, {..., msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET,
                                         cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], ...}, 0) = 0
      
        2. pass other fds of AF_UNIX sockets to the socket above
      
          socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [5, 6]) = 0
          sendmsg(3, {..., msg_control=[{cmsg_len=48, cmsg_level=SOL_SOCKET,
                                         cmsg_type=SCM_RIGHTS, cmsg_data=[5, 6]}], ...}, 0) = 0
      
        3. close all sockets
      
      Here, two skb are created, and every unix_edge->successor is the first
      socket.  Then, __unix_gc() will garbage-collect the two skb:
      
        (a) free skb with self-referencing fd
        (b) free skb holding other sockets
      
      After (a), the self-referencing socket will be scheduled to be freed
      later by the delayed_fput() task.
      
      syzbot repeated the sequences above (1. ~ 3.) quickly and triggered
      the task concurrently while GC was running.
      
      So, at (b), the socket was already freed, and accessing it was illegal.
      
      unix_del_edges() accesses the receiver socket as edge->successor to
      optimise GC.  However, we should not do it during GC.
      
      Garbage-collecting sockets does not change the shape of the rest
      of the graph, so we need not call unix_update_graph() to update
      unix_graph_grouped when we purge skb.
      
      However, if we clean up all loops in the unix_walk_scc_fast() path,
      unix_graph_maybe_cyclic remains unchanged (true), and __unix_gc()
      will call unix_walk_scc_fast() continuously even though there is no
      socket to garbage-collect.
      
      To keep that optimisation while fixing UAF, let's add the same
      updating logic of unix_graph_maybe_cyclic in unix_walk_scc_fast()
      as done in unix_walk_scc() and __unix_walk_scc().
      
      Note that when unix_del_edges() is called from other places, the
      receiver socket is always alive:
      
        - sendmsg: the successor's sk_refcnt is bumped by sock_hold()
                   unix_find_other() for SOCK_DGRAM, connect() for SOCK_STREAM
      
        - recvmsg: the successor is the receiver, and its fd is alive
      
      [0]:
      BUG: KASAN: slab-use-after-free in unix_edge_successor net/unix/garbage.c:109 [inline]
      BUG: KASAN: slab-use-after-free in unix_del_edge net/unix/garbage.c:165 [inline]
      BUG: KASAN: slab-use-after-free in unix_del_edges+0x148/0x630 net/unix/garbage.c:237
      Read of size 8 at addr ffff888079c6e640 by task kworker/u8:6/1099
      
      CPU: 0 PID: 1099 Comm: kworker/u8:6 Not tainted 6.9.0-rc4-next-20240418-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
      Workqueue: events_unbound __unix_gc
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
       print_address_description mm/kasan/report.c:377 [inline]
       print_report+0x169/0x550 mm/kasan/report.c:488
       kasan_report+0x143/0x180 mm/kasan/report.c:601
       unix_edge_successor net/unix/garbage.c:109 [inline]
       unix_del_edge net/unix/garbage.c:165 [inline]
       unix_del_edges+0x148/0x630 net/unix/garbage.c:237
       unix_destroy_fpl+0x59/0x210 net/unix/garbage.c:298
       unix_detach_fds net/unix/af_unix.c:1811 [inline]
       unix_destruct_scm+0x13e/0x210 net/unix/af_unix.c:1826
       skb_release_head_state+0x100/0x250 net/core/skbuff.c:1127
       skb_release_all net/core/skbuff.c:1138 [inline]
       __kfree_skb net/core/skbuff.c:1154 [inline]
       kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1190
       __skb_queue_purge_reason include/linux/skbuff.h:3251 [inline]
       __skb_queue_purge include/linux/skbuff.h:3256 [inline]
       __unix_gc+0x1732/0x1830 net/unix/garbage.c:575
       process_one_work kernel/workqueue.c:3218 [inline]
       process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
       worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
       </TASK>
      
      Allocated by task 14427:
       kasan_save_stack mm/kasan/common.c:47 [inline]
       kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
       unpoison_slab_object mm/kasan/common.c:312 [inline]
       __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338
       kasan_slab_alloc include/linux/kasan.h:201 [inline]
       slab_post_alloc_hook mm/slub.c:3897 [inline]
       slab_alloc_node mm/slub.c:3957 [inline]
       kmem_cache_alloc_noprof+0x135/0x290 mm/slub.c:3964
       sk_prot_alloc+0x58/0x210 net/core/sock.c:2074
       sk_alloc+0x38/0x370 net/core/sock.c:2133
       unix_create1+0xb4/0x770
       unix_create+0x14e/0x200 net/unix/af_unix.c:1034
       __sock_create+0x490/0x920 net/socket.c:1571
       sock_create net/socket.c:1622 [inline]
       __sys_socketpair+0x33e/0x720 net/socket.c:1773
       __do_sys_socketpair net/socket.c:1822 [inline]
       __se_sys_socketpair net/socket.c:1819 [inline]
       __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Freed by task 1805:
       kasan_save_stack mm/kasan/common.c:47 [inline]
       kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
       kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
       poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
       __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
       kasan_slab_free include/linux/kasan.h:184 [inline]
       slab_free_hook mm/slub.c:2190 [inline]
       slab_free mm/slub.c:4393 [inline]
       kmem_cache_free+0x145/0x340 mm/slub.c:4468
       sk_prot_free net/core/sock.c:2114 [inline]
       __sk_destruct+0x467/0x5f0 net/core/sock.c:2208
       sock_put include/net/sock.h:1948 [inline]
       unix_release_sock+0xa8b/0xd20 net/unix/af_unix.c:665
       unix_release+0x91/0xc0 net/unix/af_unix.c:1049
       __sock_release net/socket.c:659 [inline]
       sock_close+0xbc/0x240 net/socket.c:1421
       __fput+0x406/0x8b0 fs/file_table.c:422
       delayed_fput+0x59/0x80 fs/file_table.c:445
       process_one_work kernel/workqueue.c:3218 [inline]
       process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299
       worker_thread+0x86d/0xd70 kernel/workqueue.c:3380
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
      
      The buggy address belongs to the object at ffff888079c6e000
       which belongs to the cache UNIX of size 1920
      The buggy address is located 1600 bytes inside of
       freed 1920-byte region [ffff888079c6e000, ffff888079c6e780)
      
      Reported-by: syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com
      Closes: https://syzkaller.appspot.com/bug?extid=f3f3eef1d2100200e593
      Fixes: 77e5593a ("af_unix: Skip GC if no cycle exists.")
      Fixes: fd863448 ("af_unix: Try not to hold unix_gc_lock during accept().")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20240419235102.31707-1-kuniyu@amazon.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      1af2dfac
    • Paolo Abeni's avatar
      Merge branch 'net-ipa-eight-simple-cleanups' · 0ff1db48
      Paolo Abeni authored
      Alex Elder says:
      
      ====================
      net: ipa: eight simple cleanups
      
      This series contains a mix of cleanups, some dating back to
      December, 2022.  Version 1 was based on an older version of
      net-next/main; this version has simply been rebased.
      
      The first two make it so the IPA SUSPEND interrupt only gets enabled
      when necessary.  That make it possible in the third patch to call
      device_init_wakeup() during an earlier phase of initialization, and
      remove two functions.
      
      The next patch removes IPA register definitions that are never used.
      The fifth patch makes ipa_table_hash_support() a real function, so
      the IPA structure only needs to be declared rather than defined when
      that file is parsed.
      
      The sixth patch fixes improper argument names in two function
      declarations.  The seventh removes the declaration for a function
      that does not exist, and makes ipa_cmd_init() actually get called.
      And the last one eliminates ipa_version_supported(), in favor of
      just deciding that if a device is probed because its compatible
      matches, that device is assumed to be supported.
      ====================
      
      Link: https://lore.kernel.org/r/20240419151800.2168903-1-elder@linaro.orgSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      0ff1db48
    • Alex Elder's avatar
      net: ipa: kill ipa_version_supported() · dfdd70e2
      Alex Elder authored
      The only place ipa_version_supported() is called is in the probe
      function.  The version comes from the match data.  Rather than
      checking the version validity separately, just consider anything
      that has match data to be supported.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dfdd70e2
    • Alex Elder's avatar
      net: ipa: fix two minor ipa_cmd problems · 319b6d4e
      Alex Elder authored
      In "ipa_cmd.h", ipa_cmd_data_valid() is declared, but that function
      does not exist.  So delete that declaration.
      
      Also, for some reason ipa_cmd_init() never gets called.  It isn't
      really critical--it just validates that some memory offsets and a
      size can be represented in some register fields, and they won't fail
      with current data.  Regardless, call the function in ipa_probe().
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      319b6d4e
    • Alex Elder's avatar
      net: ipa: fix two bogus argument names · f2e4e9ea
      Alex Elder authored
      In "ipa_endpoint.h", two function declarations have bogus argument
      names.  Fix these.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f2e4e9ea
    • Alex Elder's avatar
      net: ipa: make ipa_table_hash_support() a real function · b81565b7
      Alex Elder authored
      With the exception of ipa_table_hash_support(), nothing defined in
      "ipa_table.h" requires the full definition of the IPA structure.
      
      Change that function to be a "real" function rather than an inline,
      to avoid requring the IPA structure to be defined.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      b81565b7
    • Alex Elder's avatar
      net: ipa: remove unneeded FILT_ROUT_HASH_EN definitions · 5043d6b1
      Alex Elder authored
      The FILT_ROUT_HASH_EN register is only used for IPA v4.2.  There,
      routing and filter table hashing are not supported, and so the
      register must be written to disable the feature.  No other version
      uses this register, so its definition can be removed.  If we need to
      use these some day (for example, explicitly enable the feature) this
      commit can be reverted.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      5043d6b1
    • Alex Elder's avatar
      net: ipa: call device_init_wakeup() earlier · 19790951
      Alex Elder authored
      Currently, enabling wakeup for the IPA device doesn't occur until
      the setup phase of initialization (in ipa_power_setup()).
      
      There is no need to delay doing that, however.  We can conveniently
      do it during the config phase, in ipa_interrupt_config(), where we
      enable power management wakeup mode for the IPA interrupt.
      
      Moving the device_init_wakeup() out of ipa_power_setup() leaves that
      function empty, so it can just be eliminated.
      
      Similarly, rearrange all of the matching inverse calls, disabling
      device wakeup in ipa_interrupt_deconfig() and removing that function
      as well.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      19790951
    • Alex Elder's avatar
      net: ipa: only enable the SUSPEND IPA interrupt when needed · 6f370026
      Alex Elder authored
      Only enable the SUSPEND IPA interrupt type when at least one
      endpoint has that interrupt enabled.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      6f370026
    • Alex Elder's avatar
      net: ipa: maintain bitmap of suspend-enabled endpoints · 2eca7344
      Alex Elder authored
      Keep track of which endpoints have the SUSPEND IPA interrupt enabled
      in a variable-length bitmap.  This will be used in the next patch to
      allow the SUSPEND interrupt type to be disabled except when needed.
      Signed-off-by: default avatarAlex Elder <elder@linaro.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      2eca7344
    • Paolo Abeni's avatar
      Merge branch 'net-stmmac-fix-mac-capabilities-procedure' · 57f15912
      Paolo Abeni authored
      Serge Semin says:
      
      ====================
      net: stmmac: Fix MAC-capabilities procedure
      
      The series got born as a result of the discussions around the recent
      Yanteng' series adding the Loongson LS7A1000, LS2K1000, LS7A2000, LS2K2000
      MACs support:
      Link: https://lore.kernel.org/netdev/fu3f6uoakylnb6eijllakeu5i4okcyqq7sfafhp5efaocbsrwe@w74xe7gb6x7p
      
      In particular the Yanteng' patchset needed to implement the Loongson
      MAC-specific constraints applied to the link speed and link duplex mode.
      As a result of the discussion with Russel the next preliminary patch was
      born:
      Link: https://lore.kernel.org/netdev/df31e8bcf74b3b4ddb7ddf5a1c371390f16a2ad5.1712917541.git.siyanteng@loongson.cn
      
      The patch above was a temporal solution utilized by Yanteng for further
      developments and to move on with the on-going review. This patchset is a
      refactored version of that single patch with formatting required for the
      fixes patches.
      
      The main part of the series has already been merged in on v1 stage. The
      leftover is the cleanup patches which rename
      stmmac_ops::phylink_get_caps() callback to stmmac_ops::update_caps() and
      move the MAC-capabilities init/re-init to the phylink MAC-capabilities
      getter.
      
      Link: https://lore.kernel.org/netdev/20240412180340.7965-1-fancer.lancer@gmail.com/
      Changelog v2:
      - Add a new patch (Romain):
        [PATCH net-next v2 1/2] net: stmmac: Rename phylink_get_caps() callback to update_caps()
      - Resubmit the leftover patches to net-next tree (Paolo).
      
      Link: https://lore.kernel.org/netdev/20240417140013.12575-1-fancer.lancer@gmail.com/
      Changelog v3:
      - Just resubmit (Jakub).
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      ====================
      
      Link: https://lore.kernel.org/r/20240419090357.5547-1-fancer.lancer@gmail.comSigned-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      57f15912
    • Serge Semin's avatar
      net: stmmac: Move MAC caps init to phylink MAC caps getter · f951a649
      Serge Semin authored
      After a set of recent fixes the stmmac_phy_setup() and
      stmmac_reinit_queues() methods have turned to having some duplicated code.
      Let's get rid from the duplication by moving the MAC-capabilities
      initialization to the PHYLINK MAC-capabilities getter. The getter is
      called during each network device interface open/close cycle. So the
      MAC-capabilities will be initialized in generic device open procedure and
      in case of the Tx/Rx queues re-initialization as the original code
      semantics implies.
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      f951a649
    • Serge Semin's avatar
      net: stmmac: Rename phylink_get_caps() callback to update_caps() · dc144bae
      Serge Semin authored
      Since recent commits the stmmac_ops::phylink_get_caps() callback has no
      longer been responsible for the phylink MAC capabilities getting, but
      merely updates the MAC capabilities in the mac_device_info::link::caps
      field. Rename the callback to comply with the what the method does now.
      Signed-off-by: default avatarSerge Semin <fancer.lancer@gmail.com>
      Reviewed-by: default avatarRomain Gantois <romain.gantois@bootlin.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      dc144bae