Commit 860f4383 authored by unknown's avatar unknown

Fix for BUG##24415: Instance manager test im_daemon_life_cycle

fails randomly.

The problem was that the test case used command line tool (mysql)
without specifying connect_timeout argument. In some cases,
this lead to hanging of the test case.

The fix is to specify --connect_timeout=1 when starting mysql.

Also, the patch contains polishing and various cleanups to simplify
analyzing of the problems further.

The patch affects only test suite, no server codebase has been
touched.


mysql-test/lib/mtr_im.pl:
  Remember PID of the IM-spawner -- a process, that is
  used to fork IM-angel.
mysql-test/lib/mtr_io.pl:
  Trim \n from the PID.
mysql-test/lib/mtr_process.pl:
  Don't complain if it was IM-spawner, who died.
mysql-test/r/im_daemon_life_cycle.result:
  Update the result file.
mysql-test/r/im_life_cycle.result:
  Update the result file.
mysql-test/t/im_daemon_life_cycle.imtest:
  Polishing: add more comments, be more verbose.
mysql-test/t/im_life_cycle.imtest:
  Polishing: be more verbose.
mysql-test/t/im_utils.imtest:
  Polishing: be more verbose.
mysql-test/t/kill_n_check.sh:
  Log messages to the extrenal file so that they can be analyzed
  if test case failed.
mysql-test/t/wait_for_process.sh:
  Log messages to the extrenal file so that they can be analyzed
  if test case failed.
mysql-test/t/wait_for_socket.sh:
  Log messages to the extrenal file so that they can be analyzed
  if test case failed.
mysql-test/t/log.sh:
  Dummy script to facilitate logging from test-scripts.
mysql-test/t/utils.sh:
  A bunch of auxilary functions to facilitate logging.
parent 963a874a
......@@ -582,7 +582,7 @@ sub mtr_im_start($$) {
mtr_add_arg($args, $opt);
}
$im->{'pid'} =
$im->{'spawner_pid'} =
mtr_spawn(
$::exe_im, # path to the executable
$args, # cmd-line args
......@@ -593,7 +593,7 @@ sub mtr_im_start($$) {
{ append_log_file => 1 } # append log files
);
unless ( $im->{'pid'} )
unless ( $im->{'spawner_pid'} )
{
mtr_error('Could not start Instance Manager.')
}
......
......@@ -39,6 +39,7 @@ sub mtr_get_pid_from_file ($) {
# Read pid number from file
my $pid= <FILE>;
chomp $pid;
close FILE;
return $pid if $pid=~ /^(\d+)/;
......
......@@ -937,6 +937,12 @@ sub check_expected_crash_and_restart($)
}
}
}
if ($::instance_manager->{'spawner_pid'} eq $ret_pid)
{
return;
}
mtr_warning("check_expected_crash_and_restart couldn't find an entry for pid: $ret_pid");
}
......
......@@ -6,7 +6,7 @@ instance_name status
mysqld1 online
mysqld2 offline
Killing the process...
Sleeping...
Waiting...
Success: the process was restarted.
Success: server is ready to accept connection on socket.
......@@ -16,7 +16,7 @@ Success: server is ready to accept connection on socket.
START INSTANCE mysqld2;
Success: the process has been started.
Killing the process...
Sleeping...
Waiting...
Success: the process was restarted.
Success: server is ready to accept connection on socket.
SHOW INSTANCE STATUS mysqld1;
......
......@@ -39,7 +39,7 @@ ERROR HY000: Bad instance name. Check that the instance with such a name exists
-- 1.1.6.
--------------------------------------------------------------------
Killing the process...
Sleeping...
Waiting...
Success: the process was restarted.
SHOW INSTANCES;
instance_name status
......@@ -52,7 +52,7 @@ mysqld2 offline
START INSTANCE mysqld2;
Success: the process has been started.
Killing the process...
Sleeping...
Waiting...
Success: the process was killed.
--------------------------------------------------------------------
......
......@@ -6,59 +6,73 @@
#
###########################################################################
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle im_daemon_life_cycle.imtest started.
###########################################################################
--source include/im_check_env.inc
# Turn on reconnect, not on by default anymore
# Turn on reconnect, not on by default anymore.
--enable_reconnect
###########################################################################
#
# The main daemon-life-cycle test case -- check that IM-angel will restart
# IM-main if it got killed:
# - kill IM-main and check that IM-angel will restart it;
# - wait for IM-main to start accepting connections before continue test
# case;
#
###########################################################################
# Kill the IM main process and check that the IM Angel will restart the main
# process.
--exec $MYSQL_TEST_DIR/t/kill_n_check.sh $IM_PATH_PID restarted 30
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle Main-test: starting...
###########################################################################
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle Killing IM-main...
--exec $MYSQL_TEST_DIR/t/kill_n_check.sh $IM_PATH_PID restarted 30 im_daemon_life_cycle
# Wait for IM to start accepting connections.
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle Waiting for IM-main to start accepting connections...
--exec $MYSQL_TEST_DIR/t/wait_for_socket.sh $EXE_MYSQL $IM_PATH_SOCK $IM_USERNAME $IM_PASSWORD '' 30 im_daemon_life_cycle
--exec $MYSQL_TEST_DIR/t/wait_for_socket.sh $EXE_MYSQL $IM_PATH_SOCK $IM_USERNAME $IM_PASSWORD '' 30
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle Main-test: done.
###########################################################################
#
# BUG#12751: Instance Manager: client hangs
# - start nonguarded instance (mysqld2);
# - kill IM-main and get it restarted by IM-angel;
# - check that guarded instance (mysqld1) is accepting connections.
# - check that non-guarded instance (mysqld2) were not stopped.
#
###########################################################################
--echo
--echo --------------------------------------------------------------------
--echo -- Test for BUG#12751
--echo --------------------------------------------------------------------
# Give some time to begin accepting connections after restart.
# FIXME: race condition here.
--sleep 3
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle BUG12751: starting...
# 1. Start mysqld;
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle mysqld2: starting...
START INSTANCE mysqld2;
# FIXME: START INSTANCE should be synchronous.
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 started
# 2. Restart IM-main: kill it and IM-angel will restart it; wait for IM to
# start accepting connections again.
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle mysqld2: waiting to start...
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 started im_daemon_life_cycle
--exec $MYSQL_TEST_DIR/t/kill_n_check.sh $IM_PATH_PID restarted 30
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle mysqld2: started.
--exec $MYSQL_TEST_DIR/t/wait_for_socket.sh $EXE_MYSQL $IM_PATH_SOCK $IM_USERNAME $IM_PASSWORD '' 30
# 2. Restart IM-main;
# 3. Issue some statement -- connection should be re-established.
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle Killing IM-main...
--exec $MYSQL_TEST_DIR/t/kill_n_check.sh $IM_PATH_PID restarted 30 im_daemon_life_cycle
# Give some time to begin accepting connections after restart.
# FIXME: race condition here.
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle Waiting for IM-main to start accepting connections...
--exec $MYSQL_TEST_DIR/t/wait_for_socket.sh $EXE_MYSQL $IM_PATH_SOCK $IM_USERNAME $IM_PASSWORD '' 30 im_daemon_life_cycle
# 3. Issue some statement -- connection should be re-established.
--sleep 3
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle Checking that IM-main processing commands...
--replace_column 3 VERSION
SHOW INSTANCE STATUS mysqld1;
......@@ -67,6 +81,13 @@ SHOW INSTANCE STATUS mysqld1;
# So, if it we do not stop it, it will be stopped by mysql-test-run.pl with
# warning.
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle mysqld2: stopping...
STOP INSTANCE mysqld2;
# FIXME: STOP INSTANCE should be synchronous.
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 stopped
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle mysqld2: waiting to stop...
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 stopped im_daemon_life_cycle
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle mysqld2: stopped.
###########################################################################
--exec $MYSQL_TEST_DIR/t/log.sh im_daemon_life_cycle BUG12751: done.
......@@ -25,7 +25,7 @@
START INSTANCE mysqld2;
# FIXME: START INSTANCE should be synchronous.
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 started
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 started im_life_cycle
# FIXME: Result of SHOW INSTANCES here is not deterministic unless START
# INSTANCE is synchronous. Even waiting for mysqld to start by looking at
......@@ -58,7 +58,7 @@ SHOW VARIABLES LIKE 'port';
STOP INSTANCE mysqld2;
# FIXME: STOP INSTANCE should be synchronous.
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 stopped
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 stopped im_life_cycle
# FIXME: Result of SHOW INSTANCES here is not deterministic unless START
# INSTANCE is synchronous. Even waiting for mysqld to start by looking at
......@@ -121,7 +121,7 @@ STOP INSTANCE mysqld3;
--echo -- 1.1.6.
--echo --------------------------------------------------------------------
--exec $MYSQL_TEST_DIR/t/kill_n_check.sh $IM_MYSQLD1_PATH_PID restarted 30
--exec $MYSQL_TEST_DIR/t/kill_n_check.sh $IM_MYSQLD1_PATH_PID restarted 30 im_life_cycle
# Give some time to IM to detect that mysqld was restarted. It should be
# longer than monitoring interval.
......@@ -143,7 +143,7 @@ SHOW INSTANCES;
START INSTANCE mysqld2;
# FIXME: START INSTANCE should be synchronous.
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 started
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 started im_life_cycle
# FIXME: Result of SHOW INSTANCES here is not deterministic unless START
# INSTANCE is synchronous. Even waiting for mysqld to start by looking at
......@@ -151,7 +151,7 @@ START INSTANCE mysqld2;
# mysqld has started.
# SHOW INSTANCES;
--exec $MYSQL_TEST_DIR/t/kill_n_check.sh $IM_MYSQLD2_PATH_PID killed 10
--exec $MYSQL_TEST_DIR/t/kill_n_check.sh $IM_MYSQLD2_PATH_PID killed 10 im_life_cycle
# FIXME: Result of SHOW INSTANCES here is not deterministic unless START
# INSTANCE is synchronous. Even waiting for mysqld to start by looking at
......
......@@ -31,10 +31,10 @@ SHOW INSTANCE OPTIONS mysqld2;
#
START INSTANCE mysqld2;
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 started
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 started im_utils
STOP INSTANCE mysqld2;
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 stopped
--exec $MYSQL_TEST_DIR/t/wait_for_process.sh $IM_MYSQLD2_PATH_PID 30 stopped im_utils
#
# Check 'SHOW LOG FILES' command:
......
......@@ -2,74 +2,101 @@
###########################################################################
# NOTE: this script returns 0 (success) even in case of failure. This is
# because this script is executed under mysql-test-run[.pl] and it's better to
# examine particular problem in log file, than just having said that the test
# case has failed.
# NOTE: this script returns 0 (success) even in case of failure (except for
# usage-error). This is because this script is executed under
# mysql-test-run[.pl] and it's better to examine particular problem in log
# file, than just having said that the test case has failed.
###########################################################################
basename=`basename "$0"`
dirname=`dirname "$0"`
###########################################################################
. "$dirname/utils.sh"
###########################################################################
check_restart()
{
if [ ! -r "$pid_path" ]; then
log_debug "No '$pid_path' found."
user_msg='the process was killed'
return 1
fi
new_pid=`cat "$pid_path" 2>/dev/null`
err_code=$?
if [ $? -eq 0 -a "$original_pid" = "$new_pid" ]; then
log_debug "err_code: $err_code; original_pid: $original_pid; new_pid: $new_pid."
if [ $err_code -eq 0 -a "$original_pid" = "$new_pid" ]; then
log_debug "The process was not restarted."
user_msg='the process was not restarted'
return 1
fi
log_debug "The process was restarted."
user_msg='the process was restarted'
return 0
}
###########################################################################
if [ $# -ne 3 ]; then
echo "Usage: kill_n_check.sh <pid file path> killed|restarted <timeout>"
exit 0
if [ $# -ne 4 ]; then
echo "Usage: $basename <pid file path> killed|restarted <timeout> <test id>"
exit 1
fi
pid_path="$1"
expected_result="$2"
total_timeout="$3"
test_id="$4"
log_file="$MYSQLTEST_VARDIR/log/$test_id.log"
log_debug "-- $basename: starting --"
log_debug "pid_path: '$pid_path'"
log_debug "expected_result: '$expected_result'"
log_debug "total_timeout: '$total_timeout'"
log_debug "test_id: '$test_id'"
log_debug "log_file: '$log_file'"
###########################################################################
if [ "$expected_result" != 'killed' -a \
"$expected_result" != 'restarted' ]; then
echo "Error: invalid second argument ('killed' or 'restarted' expected)."
exit 0
log_error "Invalid second argument ($expected_result): 'killed' or 'restarted' expected."
quit 0
fi
if [ -z "$pid_path" ]; then
echo "Error: invalid PID path ($pid_path)."
exit 0
log_error "Invalid PID path ($pid_path)."
quit 0
fi
if [ ! -r "$pid_path" ]; then
echo "Error: PID file ($pid_path) does not exist."
exit 0
log_error "PID file ($pid_path) does not exist."
quit 0
fi
if [ -z "$total_timeout" ]; then
echo "Error: timeout is not specified."
exit 0
log_error "Timeout is not specified."
quit 0
fi
###########################################################################
original_pid=`cat "$pid_path"`
log_debug "original_pid: $original_pid."
echo "Killing the process..."
log_info "Killing the process..."
kill -9 $original_pid
###########################################################################
echo "Sleeping..."
log_info "Waiting..."
if [ "$expected_result" = "restarted" ]; then
......@@ -79,37 +106,42 @@ if [ "$expected_result" = "restarted" ]; then
while true; do
log_debug "cur_attempt: $cur_attempt."
if check_restart; then
echo "Success: $user_msg."
exit 0
log_info "Success: $user_msg."
quit 0
fi
[ $cur_attempt -ge $total_timeout ] && break
log_debug "Sleeping for 1 second..."
sleep 1
cur_attempt=`expr $cur_attempt + 1`
done
echo "Error: $user_msg."
exit 0
log_error "$user_msg."
quit 0
else # $expected_result == killed
# Here we have to sleep for some long time to ensure that the process will
# not be restarted.
log_debug "Sleeping for $total_timeout seconds..."
sleep $total_timeout
new_pid=`cat "$pid_path" 2>/dev/null`
log_debug "new_pid: $new_pid."
if [ "$new_pid" -a "$new_pid" -ne "$original_pid" ]; then
echo "Error: the process was restarted."
log_error "The process was restarted."
else
echo "Success: the process was killed."
log_info "Success: the process was killed."
fi
exit 0
quit 0
fi
#!/bin/bash
###########################################################################
basename=`basename "$0"`
dirname=`dirname "$0"`
###########################################################################
. "$dirname/utils.sh"
###########################################################################
if [ $# -lt 2 ]; then
echo "Usage: $basename <test id> log message ..."
exit 1
fi
test_id="$1"
log_file="$MYSQLTEST_VARDIR/log/$test_id.log"
shift
log_debug "$*"
###########################################################################
#
# This file provides utility functions and is included by other scripts.
#
# The following global variables must be set before calling functions from this
# file:
# - basename -- base name of the calling script (main application);
# - log_file -- where to store log records;
#
###########################################################################
log()
{
[ -z "$log_file" ] && return;
log_level="$1"
log_msg="$2"
ts=`date`
echo "[$ts] [$basename] [$log_level] $log_msg" >> "$log_file";
}
###########################################################################
log_debug()
{
log 'DEBUG' "$1"
}
###########################################################################
log_info()
{
log 'INFO' "$1"
echo "$1"
}
###########################################################################
log_error()
{
log 'ERROR' "$1"
echo "Error: $1"
}
###########################################################################
quit()
{
exit_status="$1"
log_debug "-- $basename: finished (exit_status: $exit_status) --"
exit $exit_status
}
......@@ -2,34 +2,40 @@
###########################################################################
pid_path="$1"
total_attempts="$2"
event="$3"
# NOTE: this script returns 0 (success) even in case of failure (except for
# usage-error). This is because this script is executed under
# mysql-test-run[.pl] and it's better to examine particular problem in log
# file, than just having said that the test case has failed.
case "$3" in
started)
check_fn='check_started';
;;
###########################################################################
stopped)
check_fn='check_stopped';
;;
basename=`basename "$0"`
dirname=`dirname "$0"`
*)
echo "Error: invalid third argument ('started' or 'stopped' expected)."
exit 0
esac
###########################################################################
. "$dirname/utils.sh"
###########################################################################
check_started()
{
[ ! -r "$pid_path" ] && return 1
if [ ! -r "$pid_path" ]; then
log_debug "No PID-file ($pid_path) found -- not started."
return 1
fi
new_pid=`cat "$pid_path" 2>/dev/null`
err_code=$?
log_debug "err_code: $err_code; new_pid: $new_pid."
[ $? -eq 0 -a "$original_pid" = "$new_pid" ] && return 1
if [ $? -ne 0 -o -z "$new_pid" ]; then
log_debug "The process was not started."
return 1
fi
log_debug "The process was started."
return 0
}
......@@ -37,30 +43,72 @@ check_started()
check_stopped()
{
[ -r "$pid_path" ] && return 1
if [ -r "$pid_path" ]; then
log_debug "PID-file '$pid_path' exists -- not stopped."
return 1
fi
log_debug "No PID-file ($pid_path) found -- stopped."
return 0
}
###########################################################################
if [ $# -ne 4 ]; then
echo "Usage: $basename <pid file path> <total attempts> started|stopped <test id>"
exit 1
fi
pid_path="$1"
total_attempts="$2"
event="$3"
test_id="$4"
log_file="$MYSQLTEST_VARDIR/log/$test_id.log"
log_debug "-- $basename: starting --"
log_debug "pid_path: '$pid_path'"
log_debug "total_attempts: '$total_attempts'"
log_debug "event: '$event'"
log_debug "test_id: '$test_id'"
log_debug "log_file: '$log_file'"
###########################################################################
case "$event" in
started)
check_fn='check_started';
;;
stopped)
check_fn='check_stopped';
;;
*)
log_error "Invalid third argument ('started' or 'stopped' expected)."
quit 0
esac
###########################################################################
cur_attempt=1
while true; do
log_debug "cur_attempt: $cur_attempt."
if ( eval $check_fn ); then
echo "Success: the process has been $event."
exit 0
log_info "Success: the process has been $event."
quit 0
fi
[ $cur_attempt -ge $total_attempts ] && break
log_debug "Sleeping for 1 second..."
sleep 1
cur_attempt=`expr $cur_attempt + 1`
done
echo "Error: the process has not been $event in $total_attempts secs."
exit 0
log_error "The process has not been $event in $total_attempts secs."
quit 0
......@@ -2,9 +2,25 @@
###########################################################################
if [ $# -ne 6 ]; then
echo "Usage: wait_for_socket.sh <executable path> <socket path> <username> <password> <db> <timeout>"
exit 0
# NOTE: this script returns 0 (success) even in case of failure (except for
# usage-error). This is because this script is executed under
# mysql-test-run[.pl] and it's better to examine particular problem in log
# file, than just having said that the test case has failed.
###########################################################################
basename=`basename "$0"`
dirname=`dirname "$0"`
###########################################################################
. "$dirname/utils.sh"
###########################################################################
if [ $# -ne 7 ]; then
echo "Usage: wait_for_socket.sh <executable path> <socket path> <username> <password> <db> <timeout> <test id>"
exit 1
fi
client_exe="$1"
......@@ -13,41 +29,57 @@ username="$3"
password="$4"
db="$5"
total_timeout="$6"
test_id="$7"
log_file="$MYSQLTEST_VARDIR/log/$test_id.log"
log_debug "-- $basename: starting --"
log_debug "client_exe: '$client_exe'"
log_debug "socket_path: '$socket_path'"
log_debug "username: '$username'"
log_debug "password: '$password'"
log_debug "db: '$db'"
log_debug "total_timeout: '$total_timeout'"
log_debug "test_id: '$test_id'"
log_debug "log_file: '$log_file'"
###########################################################################
if [ -z "$client_exe" ]; then
echo "Error: invalid path to client executable ($client_exe)."
exit 0;
log_error "Invalid path to client executable ($client_exe)."
quit 0;
fi
if [ ! -x "$client_exe" ]; then
echo "Error: client by path '$client_exe' is not available."
exit 0;
log_error "Client by path '$client_exe' is not available."
quit 0;
fi
if [ -z "$socket_path" ]; then
echo "Error: invalid socket patch."
exit 0
log_error "Invalid socket patch ($socket_path)."
quit 0
fi
###########################################################################
client_args="--silent --socket=$socket_path "
client_args="--silent --connect_timeout=1 --socket=$socket_path "
[ -n "$username" ] && client_args="$client_args --user=$username "
[ -n "$password" ] && client_args="$client_args --password=$password "
[ -n "$db" ] && client_args="$client_args $db"
log_debug "client_args: '$client_args'"
###########################################################################
cur_attempt=1
while true; do
log_debug "cur_attempt: $cur_attempt."
if ( echo 'quit' | "$client_exe" $client_args >/dev/null 2>&1 ); then
echo "Success: server is ready to accept connection on socket."
exit 0
log_info "Success: server is ready to accept connection on socket."
quit 0
fi
[ $cur_attempt -ge $total_timeout ] && break
......@@ -58,5 +90,5 @@ while true; do
done
echo "Error: server does not accept connections after $total_timeout seconds."
exit 0
log_error "Server does not accept connections after $total_timeout seconds."
quit 0
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment