1. 13 Jan, 2022 9 commits
    • Wen Gu's avatar
      net/smc: Introduce a new conn->lgr validity check helper · ea89c6c0
      Wen Gu authored
      It is no longer suitable to identify whether a smc connection
      is registered in a link group through checking if conn->lgr
      is NULL, because conn->lgr won't be reset even the connection
      is unregistered from a link group.
      
      So this patch introduces a new helper smc_conn_lgr_valid() and
      replaces all the check of conn->lgr in original implementation
      with the new helper to judge if conn->lgr is valid to use.
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea89c6c0
    • Eric Dumazet's avatar
      inet: frags: annotate races around fqdir->dead and fqdir->high_thresh · 91341fa0
      Eric Dumazet authored
      Both fields can be read/written without synchronization,
      add proper accessors and documentation.
      
      Fixes: d5dd8879 ("inet: fix various use-after-free in defrags units")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      91341fa0
    • David S. Miller's avatar
      Merge branch 'smc-race-fixes' · 3ba8c625
      David S. Miller authored
      Wen Gu says:
      
      ====================
      net/smc: Fixes for race in smc link group termination
      
      We encountered some crashes recently and they are caused by the
      race between the access and free of link/link group in abnormal
      smc link group termination. The crashes can be reproduced in
      frequent abnormal link group termination, like setting RNICs up/down.
      
      This set of patches tries to fix this by extending the life cycle
      of link/link group to ensure that they won't be referred to after
      cleared or freed.
      
      v1 -> v2:
      - Improve some comments.
      
      - Move codes of waking up lgrs_deleted wait queue from smc_lgr_free()
        to __smc_lgr_free().
      
      - Move codes of waking up links_deleted wait queue from smcr_link_clear()
        to __smcr_link_clear().
      
      - Move codes of smc_ibdev_cnt_dec() and put_device() from smcr_link_clear()
        to __smcr_link_clear()
      
      - Move smc_lgr_put() to the end of __smcr_link_clear().
      
      - Call smc_lgr_put() after 'out' tag in smcr_link_init() when link
        initialization fails.
      
      - Modify the location where smc connection holds the lgr or link.
      
          before:
            * hold lgr in smc_lgr_register_conn().
            * hold link in smcr_lgr_conn_assign_link().
          after:
            * hold both lgr and link in smc_conn_create().
      
        Modify the location to symmetrical with the place where smc connections
        put the lgr or link, which is smc_conn_free().
      
      - Initialize conn->freed as zero in smc_conn_create().
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ba8c625
    • Wen Gu's avatar
      net/smc: Resolve the race between link group access and termination · 61f434b0
      Wen Gu authored
      We encountered some crashes caused by the race between the access
      and the termination of link groups.
      
      Here are some of panic stacks we met:
      
      1) Race between smc_clc_wait_msg() and __smc_lgr_terminate()
      
       BUG: kernel NULL pointer dereference, address: 00000000000002f0
       Workqueue: smc_hs_wq smc_listen_work [smc]
       RIP: 0010:smc_clc_wait_msg+0x3eb/0x5c0 [smc]
       Call Trace:
        <TASK>
        ? smc_clc_send_accept+0x45/0xa0 [smc]
        ? smc_clc_send_accept+0x45/0xa0 [smc]
        smc_listen_work+0x783/0x1220 [smc]
        ? finish_task_switch+0xc4/0x2e0
        ? process_one_work+0x1ad/0x3c0
        process_one_work+0x1ad/0x3c0
        worker_thread+0x4c/0x390
        ? rescuer_thread+0x320/0x320
        kthread+0x149/0x190
        ? set_kthread_struct+0x40/0x40
        ret_from_fork+0x1f/0x30
        </TASK>
      
      smc_listen_work()                abnormal case like port error
      ---------------------------------------------------------------
                                      | __smc_lgr_terminate()
                                      |  |- smc_conn_kill()
                                      |      |- smc_lgr_unregister_conn()
                                      |          |- set conn->lgr = NULL
      smc_clc_wait_msg()              |
       |- access conn->lgr (panic)    |
      
      2) Race between smc_setsockopt() and __smc_lgr_terminate()
      
       BUG: kernel NULL pointer dereference, address: 00000000000002e8
       RIP: 0010:smc_setsockopt+0x17a/0x280 [smc]
       Call Trace:
        <TASK>
        __sys_setsockopt+0xfc/0x190
        __x64_sys_setsockopt+0x20/0x30
        do_syscall_64+0x34/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
        </TASK>
      
      smc_setsockopt()                 abnormal case like port error
      --------------------------------------------------------------
                                      | __smc_lgr_terminate()
                                      |  |- smc_conn_kill()
                                      |      |- smc_lgr_unregister_conn()
                                      |          |- set conn->lgr = NULL
      mod_delayed_work()              |
       |- access conn->lgr (panic)    |
      
      There are some other panic places and they are caused by the
      similar reason as described above, which is accessing link
      group after termination, thus getting a NULL pointer or invalid
      resource.
      
      Currently, there seems to be no synchronization between the
      link group access and a sudden termination of it. This patch
      tries to fix this by introducing reference count of link group
      and not freeing link group until reference count is zero.
      
      Link group might be referred to by links or smc connections. So
      the operation to the link group reference count can be concluded
      as follows:
      
      object          [hold or initialized as 1]       [put]
      -------------------------------------------------------------------
      link group      smc_lgr_create()                 smc_lgr_free()
      connections     smc_conn_create()                smc_conn_free()
      links           smcr_link_init()                 smcr_link_clear()
      
      Througth this way, we extend the life cycle of link group and
      ensure it is longer than the life cycle of connections and links
      above it, so that avoid invalid access to link group after its
      termination.
      Signed-off-by: default avatarWen Gu <guwen@linux.alibaba.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61f434b0
    • Li Zhijian's avatar
      kselftests/net: adapt the timeout to the largest runtime · de0e4447
      Li Zhijian authored
      timeout in settings is used by each case under the same directory, so
      it should adapt to the maximum runtime.
      
      A normally running net/fib_nexthops.sh may be killed by this unsuitable
      timeout. Furthermore, since the defect[1] of kselftests framework,
      net/fib_nexthops.sh which might take at least (300 * 4) seconds would
      block the whole kselftests framework previously.
      $ git grep -w 'sleep 300' tools/testing/selftests/net
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      tools/testing/selftests/net/fib_nexthops.sh:    sleep 300
      
      Enlarge the timeout by plus 300 based on the obvious largest runtime
      to avoid the blocking.
      
      [1]: https://www.spinics.net/lists/kernel/msg4185370.htmlSigned-off-by: default avatarZhou Jie <zhoujie2011@fujitsu.com>
      Signed-off-by: default avatarLi Zhijian <lizhijian@fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de0e4447
    • Vladimir Oltean's avatar
      net: mscc: ocelot: don't let phylink re-enable TX PAUSE on the NPI port · 33cb0ff3
      Vladimir Oltean authored
      Since commit b3964807 ("net: mscc: ocelot: disable flow control on
      NPI interface"), flow control should be disabled on the DSA CPU port
      when used in NPI mode.
      
      However, the commit blamed in the Fixes: tag below broke this, because
      it allowed felix_phylink_mac_link_up() to overwrite SYS_PAUSE_CFG_PAUSE_ENA
      for the DSA CPU port.
      
      This issue became noticeable since the device tree update from commit
      8fcea7be ("arm64: dts: ls1028a: mark internal links between Felix
      and ENETC as capable of flow control").
      
      The solution is to check whether this is the currently configured NPI
      port from ocelot_phylink_mac_link_up(), and to not modify the statically
      disabled PAUSE frame transmission if it is.
      
      When the port is configured for lossless mode as opposed to tail drop
      mode, but the link partner (DSA master) doesn't observe the transmitted
      PAUSE frames, the switch termination throughput is much worse, as can be
      seen below.
      
      Before:
      
      root@debian:~# iperf3 -c 192.168.100.2
      Connecting to host 192.168.100.2, port 5201
      [  5] local 192.168.100.1 port 37504 connected to 192.168.100.2 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec  28.4 MBytes   238 Mbits/sec  357   22.6 KBytes
      [  5]   1.00-2.00   sec  33.6 MBytes   282 Mbits/sec  426   19.8 KBytes
      [  5]   2.00-3.00   sec  34.0 MBytes   285 Mbits/sec  343   21.2 KBytes
      [  5]   3.00-4.00   sec  32.9 MBytes   276 Mbits/sec  354   22.6 KBytes
      [  5]   4.00-5.00   sec  32.3 MBytes   271 Mbits/sec  297   18.4 KBytes
      ^C[  5]   5.00-5.06   sec  2.05 MBytes   270 Mbits/sec   45   19.8 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-5.06   sec   163 MBytes   271 Mbits/sec  1822             sender
      [  5]   0.00-5.06   sec  0.00 Bytes  0.00 bits/sec                  receiver
      
      After:
      
      root@debian:~# iperf3 -c 192.168.100.2
      Connecting to host 192.168.100.2, port 5201
      [  5] local 192.168.100.1 port 49470 connected to 192.168.100.2 port 5201
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec   112 MBytes   941 Mbits/sec  259    143 KBytes
      [  5]   1.00-2.00   sec   110 MBytes   920 Mbits/sec  329    144 KBytes
      [  5]   2.00-3.00   sec   112 MBytes   936 Mbits/sec  255    144 KBytes
      [  5]   3.00-4.00   sec   110 MBytes   927 Mbits/sec  355    105 KBytes
      [  5]   4.00-5.00   sec   110 MBytes   926 Mbits/sec  350    156 KBytes
      [  5]   5.00-6.00   sec   110 MBytes   925 Mbits/sec  305    148 KBytes
      [  5]   6.00-7.00   sec   110 MBytes   924 Mbits/sec  320    143 KBytes
      [  5]   7.00-8.00   sec   110 MBytes   925 Mbits/sec  273   97.6 KBytes
      [  5]   8.00-9.00   sec   109 MBytes   913 Mbits/sec  299    141 KBytes
      [  5]   9.00-10.00  sec   110 MBytes   922 Mbits/sec  287    146 KBytes
      - - - - - - - - - - - - - - - - - - - - - - - - -
      [ ID] Interval           Transfer     Bitrate         Retr
      [  5]   0.00-10.00  sec  1.08 GBytes   926 Mbits/sec  3032             sender
      [  5]   0.00-10.00  sec  1.08 GBytes   925 Mbits/sec                  receiver
      
      Fixes: de274be3 ("net: dsa: felix: set TX flow control according to the phylink_mac_link_up resolution")
      Reported-by: default avatarXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      33cb0ff3
    • Colin Ian King's avatar
      atm: iphase: remove redundant pointer skb · d7b43034
      Colin Ian King authored
      The pointer skb is redundant, it is assigned a value that is never
      read and hence can be removed. Cleans up clang scan warning:
      
      drivers/atm/iphase.c:205:18: warning: Although the value stored
      to 'skb' is used in the enclosing expression, the value is never
      actually read from 'skb' [deadcode.DeadStores]
      Signed-off-by: default avatarColin Ian King <colin.i.king@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d7b43034
    • Maxim Mikityanskiy's avatar
      sch_api: Don't skip qdisc attach on ingress · de2d807b
      Maxim Mikityanskiy authored
      The attach callback of struct Qdisc_ops is used by only a few qdiscs:
      mq, mqprio and htb. qdisc_graft() contains the following logic
      (pseudocode):
      
          if (!qdisc->ops->attach) {
              if (ingress)
                  do ingress stuff;
              else
                  do egress stuff;
          }
          if (!ingress) {
              ...
              if (qdisc->ops->attach)
                  qdisc->ops->attach(qdisc);
          } else {
              ...
          }
      
      As we see, the attach callback is not called if the qdisc is being
      attached to ingress (TC_H_INGRESS). That wasn't a problem for mq and
      mqprio, since they contain a check that they are attached to TC_H_ROOT,
      and they can't be attached to TC_H_INGRESS anyway.
      
      However, the commit cited below added the attach callback to htb. It is
      needed for the hardware offload, but in the non-offload mode it
      simulates the "do egress stuff" part of the pseudocode above. The
      problem is that when htb is attached to ingress, neither "do ingress
      stuff" nor attach() is called. It results in an inconsistency, and the
      following message is printed to dmesg:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 2
      
      This commit addresses the issue by running "do ingress stuff" in the
      ingress flow even in the attach callback is present, which is fine,
      because attach isn't going to be called afterwards.
      
      The bug was found by syzbot and reported by Eric.
      
      Fixes: d03b195b ("sch_htb: Hierarchical QoS hardware offload")
      Signed-off-by: default avatarMaxim Mikityanskiy <maximmi@nvidia.com>
      Reported-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de2d807b
    • Pawel Dembicki's avatar
      net: qmi_wwan: add ZTE MF286D modem 19d2:1485 · 078c6a1c
      Pawel Dembicki authored
      Modem from ZTE MF286D is an Qualcomm MDM9250 based 3G/4G modem.
      
      T:  Bus=02 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  3 Spd=5000 MxCh= 0
      D:  Ver= 3.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS= 9 #Cfgs=  1
      P:  Vendor=19d2 ProdID=1485 Rev=52.87
      S:  Manufacturer=ZTE,Incorporated
      S:  Product=ZTE Technologies MSM
      S:  SerialNumber=MF286DZTED000000
      C:* #Ifs= 7 Cfg#= 1 Atr=80 MxPwr=896mA
      A:  FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=06 Prot=00
      I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=02 Prot=ff Driver=rndis_host
      E:  Ad=82(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host
      E:  Ad=81(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=01(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=83(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=02(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=85(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=84(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=03(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      E:  Ad=87(I) Atr=03(Int.) MxPS=  10 Ivl=32ms
      E:  Ad=86(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=04(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 5 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      E:  Ad=88(I) Atr=03(Int.) MxPS=   8 Ivl=32ms
      E:  Ad=8e(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=0f(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=usbfs
      E:  Ad=05(O) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      E:  Ad=89(I) Atr=02(Bulk) MxPS=1024 Ivl=0ms
      Signed-off-by: default avatarPawel Dembicki <paweldembicki@gmail.com>
      Acked-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      078c6a1c
  2. 12 Jan, 2022 29 commits
  3. 11 Jan, 2022 2 commits