1. 13 Jan, 2022 7 commits
    • David S. Miller's avatar
      Merge branch 'smc-race-fixes' · 3ba8c625
      David S. Miller authored
      Wen Gu says:
      
      ====================
      net/smc: Fixes for race in smc link group termination
      
      We encountered some crashes recently and they are caused by the
      race between the access and free of link/link group in abnormal
      smc link group termination. The crashes can be reproduced in
      frequent abnormal link group termination, like setting RNICs up/down.
      
      This set of patches tries to fix this by extending the life cycle
      of link/link group to ensure that they won't be referred to after
      cleared or freed.
      
      v1 -> v2:
      - Improve some comments.
      
      - Move codes of waking up lgrs_deleted wait queue from smc_lgr_free()
        to __smc_lgr_free().
      
      - Move codes of waking up links_deleted wait queue from smcr_link_clear()
        to __smcr_link_clear().
      
      - Move codes of smc_ibdev_cnt_dec() and put_device() from smcr_link_clear()
        to __smcr_link_clear()
      
      - Move smc_lgr_put() to the end of __smcr_link_clear().
      
      - Call smc_lgr_put() after 'out' tag in smcr_link_init() when link
        initialization fails.
      
      - Modify the location where smc connection holds the lgr or link.
      
          before:
            * hold lgr in smc_lgr_register_conn().
            * hold link in smcr_lgr_conn_assign_link().
          after:
            * hold both lgr and link in smc_conn_create().
      
        Modify the location to symmetrical with the place where smc connections
        put the lgr or link, which is smc_conn_free().
      
      - Initialize conn->freed as zero in smc_conn_create().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ba8c625
    • Wen Gu's avatar
      net/smc: Resolve the race between link group access and termination · 61f434b0
      Wen Gu authored
      We encountered some crashes caused by the race between the access
      and the termination of link groups.
      
      Here are some of panic stacks we met:
      
      1) Race between smc_clc_wait_msg() and __smc_lgr_terminate()
      
       BUG: kernel NULL pointer dereference, address: 00000000000002f0
       Workqueue: smc_hs_wq smc_listen_work [smc]
       RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc]
       Call Trace:
        <TASK>
        ? smc_clc_send_accept+0x45/0xa0 [smc]
        ? smc_clc_send_accept+0x45/0xa0 [smc]
        smc_listen_work+0x783/0x1220 [smc]
        ? finish_task_switch+0xc4/0x2e0
        ? process_one_work+0x1ad/0x3c0
        process_one_work+0x1ad/0x3c0
        worker_thread+0x4c/0x390
        ? rescuer_thread+0x320/0x320
        kthread+0x149/0x190
        ? set_kthread_struct+0x40/0x40
        ret_from_fork+0x1f/0x30
        </TASK>
      
      smc_listen_work()                abnormal case like port error
      ---------------------------------------------------------------
                                      | __smc_lgr_terminate()
                                      |  |- smc_conn_kill()
                                      |      |- smc_lgr_unregister_conn()
                                      |          |- set conn->lgr = NULL
      smc_clc_wait_msg()              |
       |- access conn->lgr (panic)    |
      
      2) Race between smc_setsockopt() and __smc_lgr_terminate()
      
       BUG: kernel NULL pointer dereference, address: 00000000000002e8
       RIP: 0010:smc_setsockopt+0x17a/0x280 [smc]
       Call Trace:
        <TASK>
        __sys_setsockopt+0xfc/0x190
        __x64_sys_setsockopt+0x20/0x30
        do_syscall_64+0x34/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
        </TASK>
      
      smc_setsockopt()                 abnormal case like port error
      --------------------------------------------------------------
                                      | __smc_lgr_terminate()
                                      |  |- smc_conn_kill()
                                      |      |- smc_lgr_unregister_conn()
                                      |          |- set conn->lgr = NULL
      mod_delayed_work()              |
       |- access conn->lgr (panic)    |
      
      There are some other panic places and they are caused by the
      similar reason as described above, which is accessing link
      group after termination, thus getting a NULL pointer or invalid
      resource.
      
      Currently, there seems to be no synchronization between the
      link group access and a sudden termination of it. This patch
      tries to fix this by introducing reference count of link group
      and not freeing link group until reference count is zero.
      
      Link group might be referred to by links or smc connections. So
      the operation to the link group reference count can be concluded
      as follows:
      
      object          [hold or initialized as 1]       [put]
      -------------------------------------------------------------------
      link group      smc_lgr_create()                 smc_lgr_free()
      connections     smc_conn_create()                smc_conn_free()
      links           smcr_link_init()                 smcr_link_clear()
      
      Througth this way, we extend the life cycle of link group and
      ensure it is longer than the life cycle of connections and links
      above it, so that avoid invalid access to link group after its
      termination.
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61f434b0
    • Li Zhijian's avatar
      kselftests/net: adapt the timeout to the largest runtime · de0e4447
      Li Zhijian authored
      timeout in settings is used by each case under the same directory, so
      it should adapt to the maximum runtime.
      
      A normally running net/fib_nexthops.sh may be killed by this unsuitable
      timeout. Furthermore, since the defect[1] of kselftests framework,
      net/fib_nexthops.sh which might take at least (300 * 4) seconds would
      block the whole kselftests framework previously.
      $ git grep -w 'sleep 300' tools/testing/selftests/net
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      
      Enlarge the timeout by plus 300 based on the obvious largest runtime
      to avoid the blocking.
      
      [1]: https://www.spinics.net/lists/kernel/msg4185370.htmlSigned-off-by: default avatarZhou Jie <zhoujie2011@fujitsu.com>
      Signed-off-by: default avatarLi Zhijian <lizhijian@fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de0e4447
    • Vladimir Oltean's avatar
      net: mscc: ocelot: don't let phylink re-enable TX PAUSE on the NPI port · 33cb0ff3
      Vladimir Oltean authored
      Since commit b3964807 ("net: mscc: ocelot: disable flow control on
      NPI interface"), flow control should be disabled on the DSA CPU port
      when used in NPI mode.
      
      However, the commit blamed in the Fixes: tag below broke this, because
      it allowed felix_phylink_mac_link_up() to overwrite SYS_PAUSE_CFG_PAUSE_ENA
      for the DSA CPU port.
      
      This issue became noticeable since the device tree update from commit
      8fcea7be ("arm64: dts: ls1028a: mark internal links between Felix
      and ENETC as capable of flow control").
      
      The solution is to check whether this is the currently configured NPI
      port from ocelot_phylink_mac_link_up(), and to not modify the statically
      disabled PAUSE frame transmission if it is.
      
      When the port is configured for lossless mode as opposed to tail drop
      mode, but the link partner (DSA master) doesn't observe the transmitted
      PAUSE frames, the switch termination throughput is much worse, as can be
      seen below.
      
      Before:
      
      root@debian:~# iperf3 -c 192.168.100.2
      Connecting to host 192.168.100.2, port 5201
      [  5] local 192.168.100.1 port 37504 connected to 192.168.100.2 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec  28.4 MBytes   238 Mbits/sec  357   22.6 KBytes
      [  5]   1.00-2.00   sec  33.6 MBytes   282 Mbits/sec  426   19.8 KBytes
      [  5]   2.00-3.00   sec  34.0 MBytes   285 Mbits/sec  343   21.2 KBytes
      [  5]   3.00-4.00   sec  32.9 MBytes   276 Mbits/sec  354   22.6 KBytes
      [  5]   4.00-5.00   sec  32.3 MBytes   271 Mbits/sec  297   18.4 KBytes
      ^C[  5]   5.00-5.06   sec  2.05 MBytes   270 Mbits/sec   45   19.8 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-5.06   sec   163 MBytes   271 Mbits/sec  1822             sender
      [  5]   0.00-5.06   sec  0.00 Bytes  0.00 bits/sec                  receiver
      
      After:
      
      root@debian:~# iperf3 -c 192.168.100.2
      Connecting to host 192.168.100.2, port 5201
      [  5] local 192.168.100.1 port 49470 connected to 192.168.100.2 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec   112 MBytes   941 Mbits/sec  259    143 KBytes
      [  5]   1.00-2.00   sec   110 MBytes   920 Mbits/sec  329    144 KBytes
      [  5]   2.00-3.00   sec   112 MBytes   936 Mbits/sec  255    144 KBytes
      [  5]   3.00-4.00   sec   110 MBytes   927 Mbits/sec  355    105 KBytes
      [  5]   4.00-5.00   sec   110 MBytes   926 Mbits/sec  350    156 KBytes
      [  5]   5.00-6.00   sec   110 MBytes   925 Mbits/sec  305    148 KBytes
      [  5]   6.00-7.00   sec   110 MBytes   924 Mbits/sec  320    143 KBytes
      [  5]   7.00-8.00   sec   110 MBytes   925 Mbits/sec  273   97.6 KBytes
      [  5]   8.00-9.00   sec   109 MBytes   913 Mbits/sec  299    141 KBytes
      [  5]   9.00-10.00  sec   110 MBytes   922 Mbits/sec  287    146 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-10.00  sec  1.08 GBytes   926 Mbits/sec  3032             sender
      [  5]   0.00-10.00  sec  1.08 GBytes   925 Mbits/sec                  receiver
      
      Fixes: de274be3 ("net: dsa: felix: set TX flow control according to the phylink_mac_link_up resolution")
      Reported-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33cb0ff3
    • Colin Ian King's avatar
      atm: iphase: remove redundant pointer skb · d7b43034
      Colin Ian King authored
      The pointer skb is redundant, it is assigned a value that is never
      read and hence can be removed. Cleans up clang scan warning:
      
      drivers/atm/iphase.c:205:18: warning: Although the value stored
      to 'skb' is used in the enclosing expression, the value is never
      actually read from 'skb' [deadcode.DeadStores]
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7b43034
    • Maxim Mikityanskiy's avatar
      sch_api: Don't skip qdisc attach on ingress · de2d807b
      Maxim Mikityanskiy authored
      The attach callback of struct Qdisc_ops is used by only a few qdiscs:
      mq, mqprio and htb. qdisc_graft() contains the following logic
      (pseudocode):
      
          if (!qdisc->ops->attach) {
              if (ingress)
                  do ingress stuff;
              else
                  do egress stuff;
          }
          if (!ingress) {
              ...
              if (qdisc->ops->attach)
                  qdisc->ops->attach(qdisc);
          } else {
              ...
          }
      
      As we see, the attach callback is not called if the qdisc is being
      attached to ingress (TC_H_INGRESS). That wasn't a problem for mq and
      mqprio, since they contain a check that they are attached to TC_H_ROOT,
      and they can't be attached to TC_H_INGRESS anyway.
      
      However, the commit cited below added the attach callback to htb. It is
      needed for the hardware offload, but in the non-offload mode it
      simulates the "do egress stuff" part of the pseudocode above. The
      problem is that when htb is attached to ingress, neither "do ingress
      stuff" nor attach() is called. It results in an inconsistency, and the
      following message is printed to dmesg:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 2
      
      This commit addresses the issue by running "do ingress stuff" in the
      ingress flow even in the attach callback is present, which is fine,
      because attach isn't going to be called afterwards.
      
      The bug was found by syzbot and reported by Eric.
      
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de2d807b
    • Pawel Dembicki's avatar
      net: qmi_wwan: add ZTE MF286D modem 19d2:1485 · 078c6a1c
      Pawel Dembicki authored
      Modem from ZTE MF286D is an Qualcomm MDM9250 based 3G/4G modem.
      
      T:  Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  3 Spd=5000 MxCh= 0
      D:  Ver= 3.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs=  1
      P:  Vendor=19d2 ProdID=1485 Rev=52.87
      S:  Manufacturer=ZTE,Incorporated
      S:  Product=ZTE Technologies MSM
      S:  SerialNumber=MF286DZTED000000
      C:* #Ifs= 7 Cfg#= 1 Atr=80 MxPwr=896mA
      A:  FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=06 Prot=00
      I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=ff Driver=rndis_host
      E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host
      E:  Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=83(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=86(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=88(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=8e(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=0f(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
      E:  Ad=05(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=89(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      Signed-off-by: default avatarPawel Dembicki <paweldembicki@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      078c6a1c
  2. 12 Jan, 2022 29 commits
  3. 11 Jan, 2022 4 commits
    • Saeed Mahameed's avatar
      Revert "net: vertexcom: default to disabled on kbuild" · 7d6019b6
      Saeed Mahameed authored
      This reverts commit 6bf950a8.
      
      To align with other vendors, NET_VENDOR configs are supposed to be ON by
      default, while their drivers should default to OFF.
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20220110205246.66298-1-saeed@kernel.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      7d6019b6
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: typo NULL check in _clone() function · 51edb2ff
      Pablo Neira Ayuso authored
      This should check for NULL in case memory allocation fails.
      Reported-by: default avatarJulian Wiedmann <jwiedmann.dev@gmail.com>
      Fixes: 3b9e2ea6 ("netfilter: nft_limit: move stateful fields out of expression data")
      Fixes: 37f319f3 ("netfilter: nft_connlimit: move stateful fields out of expression data")
      Fixes: 33a24de3 ("netfilter: nft_last: move stateful fields out of expression data")
      Fixes: ed0a0c60 ("netfilter: nft_quota: move stateful fields out of expression data")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://lore.kernel.org/r/20220110194817.53481-1-pablo@netfilter.orgSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      51edb2ff
    • Linus Torvalds's avatar
      Merge tag 'devprop-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fe8152b3
      Linus Torvalds authored
      Pull device properties framework updates from Rafael Wysocki:
       "These update the handling of software nodes and graph properties, and
        the MAINTAINERS entry for the former.
      
        Specifics:
      
         - Remove device_add_properties() which does not work correctly if
           software nodes holding additional device properties are shared or
           reused (Heikki Krogerus).
      
         - Fix nargs_prop property handling for software nodes (Clément
           Léger).
      
         - Update documentation of ACPI device properties (Sakari Ailus).
      
         - Update the handling of graph properties in the generic framework to
           match the DT case (Sakari Ailus).
      
         - Update software nodes entry in MAINTAINERS (Andy Shevchenko)"
      
      * tag 'devprop-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        software node: Update MAINTAINERS data base
        software node: fix wrong node passed to find nargs_prop
        device property: Drop fwnode_graph_get_remote_node()
        device property: Use fwnode_graph_for_each_endpoint() macro
        device property: Implement fwnode_graph_get_endpoint_count()
        Documentation: ACPI: Update references
        Documentation: ACPI: Fix data node reference documentation
        device property: Fix documentation for FWNODE_GRAPH_DEVICE_DISABLED
        device property: Fix fwnode_graph_devcon_match() fwnode leak
        device property: Remove device_add_properties() API
        driver core: Don't call device_remove_properties() from device_del()
        PCI: Convert to device_create_managed_software_node()
      fe8152b3
    • Linus Torvalds's avatar
      Merge tag 'thermal-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · fe2437cc
      Linus Torvalds authored
      Pull thermal control updates from Rafael Wysocki:
       "These add a new driver for Renesas RZ/G2L TSU, update a few existing
        thermal control drivers and clean up the tmon utility.
      
        Specifics:
      
         - Add new TSU driver and DT bindings for the Renesas RZ/G2L platform
           (Biju Das).
      
         - Fix missing check when calling reset_control_deassert() in the
           rz2gl thermal driver (Biju Das).
      
         - In preparation for FORTIFY_SOURCE performing compile-time and
           run-time field bounds checking for memcpy(), avoid intentionally
           writing across neighboring fields in the int340x thermal control
           driver (Kees Cook).
      
         - Fix RFIM mailbox write commands handling in the int340x thermal
           control driver (Sumeet Pawnikar).
      
         - Fix PM issue occurring in the iMX thermal control driver during
           suspend/resume by implementing PM runtime support in it (Oleksij
           Rempel).
      
         - Add 'const' annotation to thermal_cooling_ops in the Intel
           powerclamp driver (Rikard Falkeborn).
      
         - Fix missing ADC bit set in the iMX8MP thermal driver to enable the
           sensor (Paul Gerber).
      
         - Drop unused local variable definition from tmon (ran jianping)"
      
      * tag 'thermal-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thermal/drivers/int340x: Fix RFIM mailbox write commands
        thermal/drivers/rz2gl: Add error check for reset_control_deassert()
        thermal/drivers/imx8mm: Enable ADC when enabling monitor
        thermal/drivers: Add TSU driver for RZ/G2L
        dt-bindings: thermal: Document Renesas RZ/G2L TSU
        thermal/drivers/intel_powerclamp: Constify static thermal_cooling_device_ops
        thermal/drivers/imx: Implement runtime PM support
        thermal: tools: tmon: remove unneeded local variable
        thermal: int340x: Use struct_group() for memcpy() region
      fe2437cc