• D. Wythe's avatar
    net/smc: Prevent smc_release() from long blocking · 5c15b312
    D. Wythe authored
    In nginx/wrk benchmark, there's a hung problem with high probability
    on case likes that: (client will last several minutes to exit)
    
    server: smc_run nginx
    
    client: smc_run wrk -c 10000 -t 1 http://server
    
    Client hangs with the following backtrace:
    
    0 [ffffa7ce8Of3bbf8] __schedule at ffffffff9f9eOd5f
    1 [ffffa7ce8Of3bc88] schedule at ffffffff9f9eløe6
    2 [ffffa7ce8Of3bcaO] schedule_timeout at ffffffff9f9e3f3c
    3 [ffffa7ce8Of3bd2O] wait_for_common at ffffffff9f9el9de
    4 [ffffa7ce8Of3bd8O] __flush_work at ffffffff9fOfeOl3
    5 [ffffa7ce8øf3bdfO] smc_release at ffffffffcO697d24 [smc]
    6 [ffffa7ce8Of3be2O] __sock_release at ffffffff9f8O2e2d
    7 [ffffa7ce8Of3be4ø] sock_close at ffffffff9f8ø2ebl
    8 [ffffa7ce8øf3be48] __fput at ffffffff9f334f93
    9 [ffffa7ce8Of3be78] task_work_run at ffffffff9flOlff5
    10 [ffffa7ce8Of3beaO] do_exit at ffffffff9fOe5Ol2
    11 [ffffa7ce8Of3bflO] do_group_exit at ffffffff9fOe592a
    12 [ffffa7ce8Of3bf38] __x64_sys_exit_group at ffffffff9fOe5994
    13 [ffffa7ce8Of3bf4O] do_syscall_64 at ffffffff9f9d4373
    14 [ffffa7ce8Of3bfsO] entry_SYSCALL_64_after_hwframe at ffffffff9fa0007c
    
    This issue dues to flush_work(), which is used to wait for
    smc_connect_work() to finish in smc_release(). Once lots of
    smc_connect_work() was pending or all executing work dangling,
    smc_release() has to block until one worker comes to free, which
    is equivalent to wait another smc_connnect_work() to finish.
    
    In order to fix this, There are two changes:
    
    1. For those idle smc_connect_work(), cancel it from the workqueue; for
       executing smc_connect_work(), waiting for it to finish. For that
       purpose, replace flush_work() with cancel_work_sync().
    
    2. Since smc_connect() hold a reference for passive closing, if
       smc_connect_work() has been cancelled, release the reference.
    
    Fixes: 24ac3a08 ("net/smc: rebuild nonblocking connect")
    Reported-by: default avatarTony Lu <tonylu@linux.alibaba.com>
    Tested-by: default avatarDust Li <dust.li@linux.alibaba.com>
    Reviewed-by: default avatarDust Li <dust.li@linux.alibaba.com>
    Reviewed-by: default avatarTony Lu <tonylu@linux.alibaba.com>
    Signed-off-by: default avatarD. Wythe <alibuda@linux.alibaba.com>
    Acked-by: default avatarKarsten Graul <kgraul@linux.ibm.com>
    Link: https://lore.kernel.org/r/1639571361-101128-1-git-send-email-alibuda@linux.alibaba.comSigned-off-by: default avatarJakub Kicinski <kuba@kernel.org>
    5c15b312
af_smc.c 73.7 KB