MDEV-30370 Fixing spider hang when server aborts
This is Kentoku's patch for MDEV-22979 (e6e41f04 + 22a00977), which fixes 30370. It changes the wait to a timed wait for the first sts thread, which waits on server start to execute the init queries for spider. It also flips the flag init_command to false when the sts thread is being freed. With these changes the sts thread can check the flag regularly and abort the init_queries when it finds out the init_command is false. This avoids the deadlock that causes the problem in MDEV-30370. It also fixes MDEV-22979 for 10.4, but not 10.5. I have not tested higher versions for MDEV-22979. A test has also been done on MDEV-29904 to avoid regression, given MDEV-27233 is a similar problem and its patch caused the regression. The test passes for 10.4-11.0. However, this adhoc test only works consistently when placed in the main testsuite. We should not place spider tests in the main suite, so we do not include it in this commit. A patch for MDEV-27912 should fix this problem and allow a proper test for MDEV-29904. See comments in the jira ticket MDEV-30370/29904 for the adhoc testcase used for this commit.
Showing
Please register or sign in to comment