1. 20 Jun, 2022 19 commits
    • Cong Wang's avatar
      tcp: Introduce tcp_read_skb() · 04919bed
      Cong Wang authored
      This patch inroduces tcp_read_skb() based on tcp_read_sock(),
      a preparation for the next patch which actually introduces
      a new sock ops.
      
      TCP is special here, because it has tcp_read_sock() which is
      mainly used by splice(). tcp_read_sock() supports partial read
      and arbitrary offset, neither of them is needed for sockmap.
      Signed-off-by: default avatarCong Wang <cong.wang@bytedance.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20220615162014.89193-2-xiyou.wangcong@gmail.com
      04919bed
    • David S. Miller's avatar
      Merge branch 'mlxsw-unified-bridge-conversion-part-1' · 4336487e
      David S. Miller authored
      Ido Schimmel says:
      
      ====================
      mlxsw: Unified bridge conversion - part 1/6
      
      This set starts converting mlxsw to the unified bridge model and mainly
      adds new device registers and extends existing ones that will be used in
      follow-up patchsets.
      
      High-level summary
      ==================
      
      The unified bridge model is a new way of managing low-level device
      objects such as filtering identifiers (FIDs). The conversion moves a lot
      of logic out of the device's firmware towards the driver, but its main
      selling point is that it allows to overcome various scalability issues
      related to the amount of entries that need to be programmed to the
      device.
      
      The only (intended) user visible changes of the conversion are
      improvement in resource utilization and ability to support more router
      interfaces (RIFs) in Spectrum-{2,3}.
      
      Details
      =======
      
      Commit 50853808 ("Merge branch
      'mlxsw-Prepare-for-VLAN-aware-bridge-w-VxLAN'") converted mlxsw to
      emulate 802.1Q FIDs (represent VLANs in a VLAN-aware bridge) using
      802.1D FIDs (represent VLAN-unaware bridges). This was necessary because
      at that time VNI could not be assigned to 802.1Q FIDs, which effectively
      meant that mlxsw could not support VXLAN with VLAN-aware bridges.
      
      The downside of this approach is that multiple {Port,VID}->FID entries
      are required in order to classify incoming traffic to a FID, as opposed
      to a single VID->FID entry that can be used with actual 802.1Q FIDs.
      
      For example, if 10 ports are members in the same VLAN-aware bridge and
      the same 100 VLANs are configured on each port, then only 100 VID->FID
      entries are required with 802.1Q FIDs, whereas 1000 {Port,VID}->FID
      entries are required with emulated 802.1Q FIDs.
      
      The above limitation is the result of various assumptions that were made
      in the design of the API that was exposed to software. In the unified
      bridge model the API is much more "raw" and therefore avoids these
      assumptions, allowing software to configure the device in a more
      efficient manner.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4336487e
    • Amit Cohen's avatar
      mlxsw: reg: Add support for VLAN RIF as part of RITR register · b3820922
      Amit Cohen authored
      Router interfaces (RIFs) constructed on top of VLAN-aware bridges are of
      "VLAN" type, whereas RIFs constructed on top of VLAN-unaware bridges of
      "FID" type.
      
      In other words, the RIF type is derived from the underlying FID type.
      VLAN RIFs are used on top of 802.1Q FIDs, whereas FID RIFs are used on
      top of 802.1D FIDs.
      
      Currently 802.1Q FIDs are emulated using 802.1D FIDs, and therefore VLAN
      RIFs are emulated using FID RIFs.
      
      As part of converting the driver to use unified bridge, 802.1Q FIDs and
      VLAN RIFs will be used.
      
      Add the relevant fields to RITR register, add pack() function for VLAN
      RIF and rename one field to fit the internal name.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3820922
    • Amit Cohen's avatar
      mlxsw: Add support for egress FID classification after decapsulation · 1b1c198c
      Amit Cohen authored
      As preparation for unified bridge model, add support for VNI->FID mapping
      via SVFA register.
      
      When performing VXLAN encapsulation, the VXLAN header needs to contain a
      VNI. This VNI is derived from the FID classification performed on
      ingress, through which the ingress RIF is also determined.
      
      Similarly, when performing VXLAN decapsulation, the FID of the packet
      needs to be determined. This FID is derived from VNI classification
      performed during decapsulation.
      
      In the old model, both entries (i.e., FID->VNI and VNI->FID) were
      configured via SFMR.vni.
      
      In the new model, where ingress is separated from egress, ingress
      configuration (VNI->FID) is performed via SVFA, while SFMR only
      configures egress (FID->VNI).
      
      Add 'vni' field to SVFA, add new mapping table - VNI to FID, add new
      pack() function for VNI mapping and edit the comment in SFMR.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      1b1c198c
    • Amit Cohen's avatar
      mlxsw: reg: Add egress FID field to RITR register · ad9592c0
      Amit Cohen authored
      RITR configures the router interface table. As preparation for unified
      bridge model, add egress FID field to RITR.
      
      After routing, a packet has to perform a layer-2 lookup using the
      destination MAC it got from the routing and a FID.
      In the new model, the egress FID is configured by RITR for both sub-port
      and FID RIFs.
      
      Add 'efid' field to sub-port router interface and update FID router
      interface related comment.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ad9592c0
    • Amit Cohen's avatar
      mlxsw: reg: Add Router Egress Interface to VID Register · 27f0b6ce
      Amit Cohen authored
      The REIV maps {egress router interface (eRIF), egress_port} -> {vlan ID}.
      As preparation for unified bridge model, add REIV register for future use.
      
      In the past, firmware would take care of the above mentioned mapping,
      but in the new model this should be done by software using REIV register.
      
      REIV register supports a simultaneous update of 256 ports using
      'port_page' field. When 'port_page'=0 the records represent ports
      0-255, when 'port_page'=1 the records represent ports 256-511 and so
      on.
      
      The register is reserved while using the legacy model.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      27f0b6ce
    • Amit Cohen's avatar
      mlxsw: reg: Replace MID related fields in SFGC register · 48bca94f
      Amit Cohen authored
      SFGC register maps {packet type, bridge type} -> {MID base, table type}.
      As preparation for unified bridge model, remove 'mid' field and add
      'mid_base' field.
      
      The MID index (index to PGT table which maps MID to local port list and
      SMPE index) is a result of 'mid_base' + 'fid_offset'. Using the legacy
      bridge model, firmware configures 'mid_base'. However, using the new model,
      software is responsible to configure it via SFGC register.
      
      The 'mid_base' is configured per {packet type, bridge type}, for
      example, for {Unicast, .1Q}, {Broadcast, .1D}.
      
      Add the field 'mid_base' to SFGC register and increase the length of the
      register accordingly.
      
      Remove the field 'mid' as currently it is ignored by the device, its use
      is an old leftover.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      48bca94f
    • Amit Cohen's avatar
      mlxsw: reg: Add flood related field to SFMR register · 94536249
      Amit Cohen authored
      SFMR register creates and configures FIDs. As preparation for unified
      bridge model, add a required field for future use.
      
      The PGT (Port Group) table maps multicast ID (MID) to
      {local port list, SMPE index} on Spectrum-1 and to {local port list} on
      the other ASICs.
      
      In the legacy model, software did not interact with this table directly.
      Instead, it was accessed by firmware in response to registers such as
      SFTR and SMID.
      In the new model, the SFTR register is deprecated and software has full
      control over the PGT table using the SMID register.
      
      The configuration of MDB entries (using SFD) is unchanged, but flooding
      configuration is completely different.
      SFGC register maps {packet type, bridge type} -> {MID base, table type},
      then with FID and FID-offset which are configured via SFMR, the MID index
      is obtained.
      
      Add the field 'flood_bridge_type' to SFMR, software can separate between
      802.1q FIDs and vFIDs using two types which are supported.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      94536249
    • Amit Cohen's avatar
      mlxsw: reg: Add VID related fields to SFD register · 485c281c
      Amit Cohen authored
      SFD register configures FDB table. As preparation for unified bridge model,
      add some required fields for future use.
      
      In the new model, firmware no longer configures the egress VID, this
      responsibility is moved to software. For layer 2 this means that software
      needs to determine the egress VID for both unicast and multicast.
      
      For unicast FDB records and unicast LAG FDB records, the VID needs to be
      set via new fields in SFD - 'set_vid' and 'vid'.
      
      Add the two mentioned fields for future use.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      485c281c
    • Amit Cohen's avatar
      mlxsw: reg: Add SMPE related fields to SFMR register · 92e4e543
      Amit Cohen authored
      SFMR register creates and configures FIDs. As preparation unified bridge
      model, add some required fields for future use.
      
      The device includes two main tables to support layer 2 multicast (i.e.,
      MDB and flooding). These are the PGT (Port Group Table) and the
      MPE (Multicast Port Egress) table.
      - PGT is {MID -> (bitmap of local_port, SPME index)}
      - MPE is {(Local port, SMPE index) -> eVID}
      
      In Spectrum-2 and later ASICs, the SMPE index is an attribute of the FID
      and programmed via new fields in SFMR register - 'smpe_valid' and 'smpe'.
      
      Add the two mentioned fields for future use and increase the length of
      the register accordingly.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      92e4e543
    • Amit Cohen's avatar
      mlxsw: Add SMPE related fields to SMID2 register · 894b98d5
      Amit Cohen authored
      SMID register maps multicast ID (MID) into a list of local ports.
      As preparation for unified bridge model, add some required fields for
      future use.
      
      The device includes two main tables to support layer 2 multicast (i.e.,
      MDB and flooding). These are the PGT (Port Group Table) and the
      MPE (Multicast Port Egress) table.
      - PGT is {MID -> (bitmap of local_port, SPME index)}
      - MPE is {(Local port, SMPE index) -> eVID}
      
      In Spectrum-1, both indexes into the MPE table (local port and SMPE) are
      derived from the PGT table. Therefore, the SMPE index needs to be
      programmed as part of the PGT entry via new fields in SMID - 'smpe_valid'
      and 'smpe'.
      
      Add the two mentioned fields for future use and align the callers of
      mlxsw_reg_smid2_pack() to pass zeros for SMPE fields.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      894b98d5
    • Amit Cohen's avatar
      mlxsw: reg: Add Switch Multicast Port to Egress VID Register · e0f071c5
      Amit Cohen authored
      The SMPE register maps {egress_port, SMPE index} -> VID.
      
      The device includes two main tables to support layer 2 multicast (i.e.,
      MDB and flooding). These are the PGT (Port Group Table) and the
      MPE (Multicast Port Egress) table.
      - PGT is {MID -> (bitmap of local_port, SPME index)}
      - MPE is {(Local port, SMPE index) -> eVID}
      
      In Spectrum-1, the index into the MPE table - called switch multicast to
      port egress VID (SMPE) - is derived from the PGT entry, whereas in
      Spectrum-2 and later ASICs it is derived from the FID.
      
      In the legacy model, software did not interact with this table as it was
      completely hidden in firmware. In the new model, software needs to
      populate the table itself in order to map from {Local port, SMPE index} to
      an egress VID. This is done using the SMPE register.
      
      Add the register for future use.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarDanielle Ratson <danieller@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e0f071c5
    • Amit Cohen's avatar
      mlxsw: reg: Add ingress RIF related fields to SVFA register · dd326565
      Amit Cohen authored
      SVFA register controls the VID to FID mapping and {Port, VID} to FID
      mapping for virtualized ports. As preparation for unified bridge model,
      add some required fields for future use.
      
      On ingress, after ingress ACL, a packet needs to be classified to a FID.
      The key for this lookup can be one of:
      1. VID. When port is not in virtual mode.
      2. {RQ, VID}. When port is in virtual mode.
      3. FID. When FID was set by ingress ACL.
      
      Since RITR no longer performs ingress configuration, the ingress RIF for
      the first two entry types needs to be set via new fields in SVFA -
      'irif_v' and 'irif'.
      
      Add the two mentioned fields for future use and increase the length of
      the register accordingly.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dd326565
    • Amit Cohen's avatar
      mlxsw: reg: Add ingress RIF related fields to SFMR register · e459466a
      Amit Cohen authored
      SFMR register creates and configures FIDs. As preparation for unified
      bridge model, add some required fields for future use.
      
      On ingress, after ingress ACL, a packet needs to be classified to a FID.
      The key for this lookup can be one of:
      1. VID. When port is not in virtual mode.
      2. {RQ, VID}. When port is in virtual mode.
      3. FID. When FID was set by ingress ACL.
         For example, via VR_AND_FID_ACTION.
      
      Since RITR no longer performs ingress configuration, the ingress RIF for
      the last entry type needs to be set via new fields in SFMR - 'irif_v'
      and 'irif'.
      
      Add the two mentioned fields for future use.
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e459466a
    • Amit Cohen's avatar
      mlxsw: reg: Add 'flood_rsp' field to SFMR register · 02d23c95
      Amit Cohen authored
      SFMR register creates and configures FIDs. As preparation for unified
      bridge model, add a field for future use.
      
      In the new model, RITR no longer configures the rFID used for sub-port RIFs
      and it has to be created by software via SFMR. Such FIDs need to be created
      with special flood indication using 'flood_rsp' field. When set, this bit
      instructs the device to manage the flooding entries for this FID in a
      reserved part of the port group table (PGT).
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      02d23c95
    • Ronak Doshi's avatar
      vmxnet3: disable overlay offloads if UPT device does not support · a56b158a
      Ronak Doshi authored
      'Commit 6f91f4ba ("vmxnet3: add support for capability registers")'
      added support for capability registers. These registers are used
      to advertize capabilities of the device.
      
      The patch updated the dev_caps to disable outer checksum offload if
      PTCR register does not support it. However, it missed to update
      other overlay offloads. This patch fixes this issue.
      
      Fixes: 6f91f4ba ("vmxnet3: add support for capability registers")
      Signed-off-by: default avatarRonak Doshi <doshir@vmware.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a56b158a
    • David S. Miller's avatar
      Merge branch 'raw-rcu-fixes' · 6f9d7046
      David S. Miller authored
      Kuniyuki Iwashima says:
      
      ====================
      raw: Fix nits of RCU conversion series.
      
      The first patch fixes a build error by commit ba44f818 ("raw: use
      more conventional iterators"), but it does not land in the net tree,
      so this series is targeted to net-next.  The second patch replaces some
      hlist functions with sk's helper macros.
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6f9d7046
    • Kuniyuki Iwashima's avatar
      raw: Use helpers for the hlist_nulls variant. · f289c02b
      Kuniyuki Iwashima authored
      hlist_nulls_add_head_rcu() and hlist_nulls_for_each_entry() have dedicated
      macros for sk.
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f289c02b
    • Kuniyuki Iwashima's avatar
      raw: Fix mixed declarations error in raw_icmp_error(). · 5da39e31
      Kuniyuki Iwashima authored
      The trailing semicolon causes a compiler error, so let's remove it.
      
      net/ipv4/raw.c: In function ‘raw_icmp_error’:
      net/ipv4/raw.c:266:2: error: ISO C90 forbids mixed declarations and code [-Werror=declaration-after-statement]
        266 |  struct hlist_nulls_head *hlist;
            |  ^~~~~~
      
      Fixes: ba44f818 ("raw: use more conventional iterators")
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5da39e31
  2. 19 Jun, 2022 14 commits
  3. 18 Jun, 2022 7 commits
    • Eric Dumazet's avatar
      ping: convert to RCU lookups, get rid of rwlock · dbca1596
      Eric Dumazet authored
      Using rwlock in networking code is extremely risky.
      writers can starve if enough readers are constantly
      grabing the rwlock.
      
      I thought rwlock were at fault and sent this patch:
      
      https://lkml.org/lkml/2022/6/17/272
      
      But Peter and Linus essentially told me rwlock had to be unfair.
      
      We need to get rid of rwlock in networking code.
      
      Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      dbca1596
    • Peter Lafreniere's avatar
      ax25: use GFP_KERNEL in ax25_dev_device_up() · f0623340
      Peter Lafreniere authored
      ax25_dev_device_up() is only called during device setup, which is
      done in user context. In addition, ax25_dev_device_up()
      unconditionally calls ax25_register_dev_sysctl(), which already
      allocates with GFP_KERNEL.
      
      Since it is allowed to sleep in this function, here we change
      ax25_dev_device_up() to use GFP_KERNEL to reduce unnecessary
      out-of-memory errors.
      Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarPeter Lafreniere <pjlafren@mtu.edu>
      Link: https://lore.kernel.org/r/20220616152333.9812-1-pjlafren@mtu.eduSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      f0623340
    • Xiang wangx's avatar
      f691b4d8
    • Xiang wangx's avatar
      ppp: Fix typo in comment · 959edef6
      Xiang wangx authored
      Delete the redundant word 'the'.
      Signed-off-by: default avatarXiang wangx <wangxiang@cdjrlc.com>
      Link: https://lore.kernel.org/r/20220616142624.3397-1-wangxiang@cdjrlc.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      959edef6
    • Yinjun Zhang's avatar
      nfp: add support for .get_pauseparam() · 382f99c4
      Yinjun Zhang authored
      Show correct pause frame parameters for nfp. These parameters cannot
      be configured, so .set_pauseparam() is not implemented. With this
      change:
      
       #ethtool --show-pause enp1s0np0
       Pause parameters for enp1s0np0:
       Autonegotiate:  off
       RX:             on
       TX:             on
      Signed-off-by: default avatarYinjun Zhang <yinjun.zhang@corigine.com>
      Signed-off-by: default avatarSimon Horman <simon.horman@corigine.com>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220616133358.135305-1-simon.horman@corigine.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      382f99c4
    • Oleksij Rempel's avatar
      net: dsa: ar9331: fix potential dead lock on mdio access · 7a49f219
      Oleksij Rempel authored
      Rework MDIO locking to avoid potential  circular locking:
      
       WARNING: possible circular locking dependency detected
       5.19.0-rc1-ar9331-00017-g3ab364c7c48c #5 Not tainted
       ------------------------------------------------------
       kworker/u2:4/68 is trying to acquire lock:
       81f3c83c (ar9331:1005:(&ar9331_mdio_regmap_config)->lock){+.+.}-{4:4}, at: regmap_write+0x50/0x8c
      
       but task is already holding lock:
       81f60494 (&bus->mdio_lock){+.+.}-{4:4}, at: mdiobus_read+0x40/0x78
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&bus->mdio_lock){+.+.}-{4:4}:
              lock_acquire+0x2d4/0x360
              __mutex_lock+0xf8/0x384
              mutex_lock_nested+0x2c/0x38
              mdiobus_write+0x44/0x80
              ar9331_sw_bus_write+0x50/0xe4
              _regmap_raw_write_impl+0x604/0x724
              _regmap_bus_raw_write+0x9c/0xb4
              _regmap_write+0xdc/0x1a0
              _regmap_update_bits+0xf4/0x118
              _regmap_select_page+0x108/0x138
              _regmap_raw_read+0x25c/0x288
              _regmap_bus_read+0x60/0x98
              _regmap_read+0xd4/0x1b0
              _regmap_update_bits+0xc4/0x118
              regmap_update_bits_base+0x64/0x8c
              ar9331_sw_irq_bus_sync_unlock+0x40/0x6c
              __irq_set_handler+0x7c/0xac
              ar9331_sw_irq_map+0x48/0x7c
              irq_domain_associate+0x174/0x208
              irq_create_mapping_affinity+0x1a8/0x230
              ar9331_sw_probe+0x22c/0x388
              mdio_probe+0x44/0x70
              really_probe+0x200/0x424
              __driver_probe_device+0x290/0x298
              driver_probe_device+0x54/0xe4
              __device_attach_driver+0xe4/0x130
              bus_for_each_drv+0xb4/0xd8
              __device_attach+0x104/0x1a4
              bus_probe_device+0x48/0xc4
              device_add+0x600/0x800
              mdio_device_register+0x68/0xa0
              of_mdiobus_register+0x2bc/0x3c4
              ag71xx_probe+0x6e4/0x984
              platform_probe+0x78/0xd0
              really_probe+0x200/0x424
              __driver_probe_device+0x290/0x298
              driver_probe_device+0x54/0xe4
              __driver_attach+0x17c/0x190
              bus_for_each_dev+0x8c/0xd0
              bus_add_driver+0x110/0x228
              driver_register+0xe4/0x12c
              do_one_initcall+0x104/0x2a0
              kernel_init_freeable+0x250/0x288
              kernel_init+0x34/0x130
              ret_from_kernel_thread+0x14/0x1c
      
       -> #0 (ar9331:1005:(&ar9331_mdio_regmap_config)->lock){+.+.}-{4:4}:
              check_noncircular+0x88/0xc0
              __lock_acquire+0x10bc/0x18bc
              lock_acquire+0x2d4/0x360
              __mutex_lock+0xf8/0x384
              mutex_lock_nested+0x2c/0x38
              regmap_write+0x50/0x8c
              ar9331_sw_mbus_read+0x74/0x1b8
              __mdiobus_read+0x90/0xec
              mdiobus_read+0x50/0x78
              get_phy_device+0xa0/0x18c
              fwnode_mdiobus_register_phy+0x120/0x1d4
              of_mdiobus_register+0x244/0x3c4
              devm_of_mdiobus_register+0xe8/0x100
              ar9331_sw_setup+0x16c/0x3a0
              dsa_register_switch+0x7dc/0xcc0
              ar9331_sw_probe+0x370/0x388
              mdio_probe+0x44/0x70
              really_probe+0x200/0x424
              __driver_probe_device+0x290/0x298
              driver_probe_device+0x54/0xe4
              __device_attach_driver+0xe4/0x130
              bus_for_each_drv+0xb4/0xd8
              __device_attach+0x104/0x1a4
              bus_probe_device+0x48/0xc4
              deferred_probe_work_func+0xf0/0x10c
              process_one_work+0x314/0x4d4
              worker_thread+0x2a4/0x354
              kthread+0x134/0x13c
              ret_from_kernel_thread+0x14/0x1c
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&bus->mdio_lock);
                                      lock(ar9331:1005:(&ar9331_mdio_regmap_config)->lock);
                                      lock(&bus->mdio_lock);
         lock(ar9331:1005:(&ar9331_mdio_regmap_config)->lock);
      
        *** DEADLOCK ***
      
       5 locks held by kworker/u2:4/68:
        #0: 81c04eb4 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1e4/0x4d4
        #1: 81f0de78 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1e4/0x4d4
        #2: 81f0a880 (&dev->mutex){....}-{4:4}, at: __device_attach+0x40/0x1a4
        #3: 80c8aee0 (dsa2_mutex){+.+.}-{4:4}, at: dsa_register_switch+0x5c/0xcc0
        #4: 81f60494 (&bus->mdio_lock){+.+.}-{4:4}, at: mdiobus_read+0x40/0x78
      
       stack backtrace:
       CPU: 0 PID: 68 Comm: kworker/u2:4 Not tainted 5.19.0-rc1-ar9331-00017-g3ab364c7c48c #5
       Workqueue: events_unbound deferred_probe_work_func
       Stack : 00000056 800d4638 81f0d64c 00000004 00000018 00000000 80a20000 80a20000
               80937590 81ef3858 81f0d760 3913578a 00000005 8045e824 81f0d600 a8db84cc
               00000000 00000000 80937590 00000a44 00000000 00000002 00000001 ffffffff
               81f0d6a4 80982d7c 0000000f 20202020 80a20000 00000001 80937590 81ef3858
               81f0d760 3913578a 00000005 00000005 00000000 03bd0000 00000000 80e00000
               ...
       Call Trace:
       [<80069db0>] show_stack+0x94/0x130
       [<8045e824>] dump_stack_lvl+0x54/0x8c
       [<800c7fac>] check_noncircular+0x88/0xc0
       [<800ca068>] __lock_acquire+0x10bc/0x18bc
       [<800cb478>] lock_acquire+0x2d4/0x360
       [<807b84c4>] __mutex_lock+0xf8/0x384
       [<807b877c>] mutex_lock_nested+0x2c/0x38
       [<804ea640>] regmap_write+0x50/0x8c
       [<80501e38>] ar9331_sw_mbus_read+0x74/0x1b8
       [<804fe9a0>] __mdiobus_read+0x90/0xec
       [<804feac4>] mdiobus_read+0x50/0x78
       [<804fcf74>] get_phy_device+0xa0/0x18c
       [<804ffeb4>] fwnode_mdiobus_register_phy+0x120/0x1d4
       [<805004f0>] of_mdiobus_register+0x244/0x3c4
       [<804f0c50>] devm_of_mdiobus_register+0xe8/0x100
       [<805017a0>] ar9331_sw_setup+0x16c/0x3a0
       [<807355c8>] dsa_register_switch+0x7dc/0xcc0
       [<80501468>] ar9331_sw_probe+0x370/0x388
       [<804ff0c0>] mdio_probe+0x44/0x70
       [<804d1848>] really_probe+0x200/0x424
       [<804d1cfc>] __driver_probe_device+0x290/0x298
       [<804d1d58>] driver_probe_device+0x54/0xe4
       [<804d2298>] __device_attach_driver+0xe4/0x130
       [<804cf048>] bus_for_each_drv+0xb4/0xd8
       [<804d200c>] __device_attach+0x104/0x1a4
       [<804d026c>] bus_probe_device+0x48/0xc4
       [<804d108c>] deferred_probe_work_func+0xf0/0x10c
       [<800a0ffc>] process_one_work+0x314/0x4d4
       [<800a17fc>] worker_thread+0x2a4/0x354
       [<800a9a54>] kthread+0x134/0x13c
       [<8006306c>] ret_from_kernel_thread+0x14/0x1c
      [
      Signed-off-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/r/20220616112550.877118-1-o.rempel@pengutronix.deSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7a49f219
    • Jakub Kicinski's avatar
      Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 9fb424c4
      Jakub Kicinski authored
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2022-06-17
      
      We've added 72 non-merge commits during the last 15 day(s) which contain
      a total of 92 files changed, 4582 insertions(+), 834 deletions(-).
      
      The main changes are:
      
      1) Add 64 bit enum value support to BTF, from Yonghong Song.
      
      2) Implement support for sleepable BPF uprobe programs, from Delyan Kratunov.
      
      3) Add new BPF helpers to issue and check TCP SYN cookies without binding to a
         socket especially useful in synproxy scenarios, from Maxim Mikityanskiy.
      
      4) Fix libbpf's internal USDT address translation logic for shared libraries as
         well as uprobe's symbol file offset calculation, from Andrii Nakryiko.
      
      5) Extend libbpf to provide an API for textual representation of the various
         map/prog/attach/link types and use it in bpftool, from Daniel Müller.
      
      6) Provide BTF line info for RV64 and RV32 JITs, and fix a put_user bug in the
         core seen in 32 bit when storing BPF function addresses, from Pu Lehui.
      
      7) Fix libbpf's BTF pointer size guessing by adding a list of various aliases
         for 'long' types, from Douglas Raillard.
      
      8) Fix bpftool to readd setting rlimit since probing for memcg-based accounting
         has been unreliable and caused a regression on COS, from Quentin Monnet.
      
      9) Fix UAF in BPF cgroup's effective program computation triggered upon BPF link
         detachment, from Tadeusz Struk.
      
      10) Fix bpftool build bootstrapping during cross compilation which was pointing
          to the wrong AR process, from Shahab Vahedi.
      
      11) Fix logic bug in libbpf's is_pow_of_2 implementation, from Yuze Chi.
      
      12) BPF hash map optimization to avoid grabbing spinlocks of all CPUs when there
          is no free element. Also add a benchmark as reproducer, from Feng Zhou.
      
      13) Fix bpftool's codegen to bail out when there's no BTF, from Michael Mullin.
      
      14) Various minor cleanup and improvements all over the place.
      
      * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
        bpf: Fix bpf_skc_lookup comment wrt. return type
        bpf: Fix non-static bpf_func_proto struct definitions
        selftests/bpf: Don't force lld on non-x86 architectures
        selftests/bpf: Add selftests for raw syncookie helpers in TC mode
        bpf: Allow the new syncookie helpers to work with SKBs
        selftests/bpf: Add selftests for raw syncookie helpers
        bpf: Add helpers to issue and check SYN cookies in XDP
        bpf: Allow helpers to accept pointers with a fixed size
        bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
        selftests/bpf: add tests for sleepable (uk)probes
        libbpf: add support for sleepable uprobe programs
        bpf: allow sleepable uprobe programs to attach
        bpf: implement sleepable uprobes by chaining gps
        bpf: move bpf_prog to bpf.h
        libbpf: Fix internal USDT address translation logic for shared libraries
        samples/bpf: Check detach prog exist or not in xdp_fwd
        selftests/bpf: Avoid skipping certain subtests
        selftests/bpf: Fix test_varlen verification failure with latest llvm
        bpftool: Do not check return value from libbpf_set_strict_mode()
        Revert "bpftool: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK"
        ...
      ====================
      
      Link: https://lore.kernel.org/r/20220617220836.7373-1-daniel@iogearbox.netSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      9fb424c4