MDEV-30232: rpl.rpl_gtid_crash fails sporadically in BB (0c249ad7) · Commits · nexedi / MariaDB

Commit 0c249ad7 authored Apr 16, 2024 by

Kristian Nielsen

MDEV-30232: rpl.rpl_gtid_crash fails sporadically in BB

The root cause of the failure is a bug in the Linux network stack:

  https://lore.kernel.org/netdev/87sf0ldk41.fsf@urd.knielsen-hq.org/T/#u

If the slave does a connect(2) at the exact same time that kill -9 of the
master process closes the listening socket, the FIN or RST packet is lost in
the kernel, and the slave ends up timing out waiting for the initial
communication from the server. This timeout defaults to
--slave-net-timeout=120, which causes include/master_gtid_wait.inc to time
out first and fail the test.

Work-around this problem by reducing the --slave-net-timeout for this test
case. If this problem turns up in other tests, we can consider reducing the
default value for all tests.
Signed-off-by: Kristian Nielsen <knielsen@knielsen-hq.org>

parent 4a2e0345

Hide whitespace changes

Inline Side-by-side

View file @ 0c249ad7

		--master-retry-count=100
		--master-retry-count=100 --slave-net-timeout=10

Please register or to comment