MDEV-22979 MDEV-27233 MDEV-28218 Fixing spider init bugs
We introduce a new set of synchronisation primitives ("query ready") that are signaled when the server is ready to take queries, and make spider initialisation use them instead of the primitives associated with server start. During spider initialisation, if the server is query ready, proceed with the initialisation synchronously (requiring the sql service). Otherwise, leave the init to the first sts bg thread as usual. When creating a spider handler before the spider init is done, if the server is not query ready, just give up and return NULL. If the server is query ready, then wait for a specific amount of time (5s) for the spider init to be done, and on timeout give up and return NULL. How this fixes / prevents various init bugs: MDEV-27233 (spider installed and used in init file). Before - spider_create_handler() called by read_init_file() waits for bg thread to finish init, whereas the bg thread is waiting for server start. After - spider init is run synchronously because the server is query ready when read_init_file(). No funny business with concurrency. MDEV-29904 (spider loaded with plugin-load-add). Before (before the original patch to MDEV-27233 was reverted) - fully synchronous spider init fails due to dependencies on Aria / acl init. After - spider init is run async, and as soon as the server is query ready, the init proceeds and succeeds. MDEV-22979 (mysqld --bootstrap --plugin-load-add=ha_spider). Before - bootstrap queries causes call to spider_create_handler(), which waits for bg thread to finish init, whereas bg thread waits for server start. After - spider init is run async, waiting for server to be query ready. spider_create_hander() notices bg thread still trying to init spider, and server is not query ready so it gives up. MDEV-28218 (spider is loaded in a query after server start, immediately before another query dropping a spider table). Before - bg thread is created to init spider async, while the server proceeds to the next query that tries to drop a spider system table. race condition causes the situation where the server holds the mdl lock to the table and waits for the bg thread in spider_create_handler(), and the bg thread waiting for the mdl lock to create the table. After - since server has already started, spider init is run synchronously, thus no concurrency issues. Same applies to the variant of MDEV-28218 represented by mdev_28218_init_file where the queries are in an init file. Another variant of MDEV-28218 represented by mdev_28218_mixed loads spider with plugin-load-add and executes the drop table query in an init_file. Spider init is async because the server was not query ready. As soon as the the server becomes query ready the server tries to execute the drop table query. It waits on the init in spider_create_handler(), but because of the deadlock it times out after 5 seconds. The drop table query did not find any table, but spider init then proceeds as it finally acquires the mdl lock, and creates the table. The moral of the story is to advise users not to mix plugin-load-add=ha_spider with init_file containing queries related to spider system tables. Queries using spider engine to create tables are still ok because there's no deadlock. MDEV-30370 (mysqld --wsrep-recover --plugin-load-add=spider). The timed wait in the patch for MDEV-30370 still applies, so it still passes. More on the design: Compared to the server start primitives, the query ready primitives are signaled earlier ("query ready point") to avoid unnecessary wait and deadlocks. The spider initialisation does not only require other plugins like Aria to be initialised first, but also acl init (due to its creation of udfs) as well as the creation of the `mysql` database (due to its system tables being in this database). Thus the point where it can go ahead with the initialisation should be after the bootstrap. By the time the server can accept queries by reading from init files, it can definitely accept all spider init queries. So the point of spider initialisation can be before this point. Looking at the lines between boostrap and reading init file, the most natural position is just before the call to read_init_file(), where we can also give the primitives a most meaningful name "query ready".
Showing
Please register or sign in to comment