BUG#14458232 - CRASH IN THD_IS_TRANSACTION_ACTIVE DURING
THREAD POOLING STRESS TEST PROBLEM: Connection stress tests which consists of concurrent kill connections interleaved with mysql ping queries cause the mysqld server which uses thread pool scheduler to crash. FIX: Killing a connection involves shutdown and close of client socket and this can cause EPOLLHUP(or EPOLLERR) events to be to be queued and handled after disarming and cleanup of of the connection object (THD) is being done.We disarm the the connection by modifying the epoll mask to zero which ensure no events come and release the ownership of waiting thread that collect events and then do the cleanup of THD. object.As per the linux kernel epoll source code ( http://lxr.linux.no/linux+*/fs/eventpoll.c#L1771), EPOLLHUP (or EPOLLERR) can't be masked even if we set EPOLL mask to zero. So we disarm the connection and thus prevent execution of any query processing handler/queueing to client ctx. queue by removing the client fd from the epoll set via EPOLL_CTL_DEL. Also there is a race condition which involve the following threads: 1) Thread X executing KILL CONNECTION Y and is in THD::awake and using mysys_var (holding LOCK_thd_data). 2) Thread Y in tp_process_event executing and is being killed. 3) Thread Z receives KILL flag internally and possible call the tp_thd_cleanup function which set thread session variable and changing mysys_var. The fix for the above race is to set thread session variable under LOCK_thd_data. We also do not call THD::awake if we found the thread in the thread list that is to be killed but it's KILL_CONNECTION flag set thus avoiding any possible concurrent cleanup. This patch is approved by Mikael Ronstrom via email review.
Showing
Please register or sign in to comment