• Shishir Jaiswal's avatar
    Bug#11751149 - TRYING TO START MYSQL WHILE ANOTHER INSTANCE · e00810b9
    Shishir Jaiswal authored
                   IS STARTING: CONFUSING ERROR
    
    DESCRIPTION
    ===========
    When mysql server processes transactions but has not yet
    committed and shuts down abnormally (due to crash, external
    killing etc.), a recovery is due from Storage engine side
    which takes place the next time mysql server (either
    through mysqld or mysqld_safe) is run.
    
    While the 1st server is in mid of recovery, if another
    instance of mysqld_safe is made to run, it may result into
    2nd instance killing the 1st one after a moment.
    
    ANALYSIS
    ========
    In the "while true" loop, we've a check (which is done
    after the server stops) for the existence of pid file to
    enquire if it was a normal shutdown or not. If the file is
    absent, it means that the graceful exit of server had
    removed this file.
    
    However if the file is present, the scripts makes a plain
    assumption that this file is leftover of the "current"
    server. It misses to consider that it could be a valid pid
    file belonging to another running mysql server.
    
    We need to add more checks in the latter case. The script
    should extract the PID from this existing file and check if
    its running or not. If yes, it means an older instance of
    mysql server is running and hence the script should abort.
    
    FIX
    ===
    Checking the status of process (alive or not) by adding a
    @CHECK_PID@ in such a case. Aborting if its alive. Detailed
    logic is as follows:
    
    - The mysqld_safe script would quit at start only as soon
    as it finds that there is an active PID i.e. a mysql server
    is already running.
    - The PID file creation takes place after InnoDb recovery,
    which means in rare case (when PID file isn't created yet)
    it may happen that more than 1 server can come up but even
    in that case others will have to wait till the 1st server
    has released the acquired InnoDb lock. In this case all
    these servers will either TIMEOUT waiting for InnoDb lock
    or after this they would find that the 1st server is
    already running (by reading $pid_file) and would abort.
    - Our core fix is that we now check the status of mysql
    server process (alive or not) after the server stops
    running within the loop of "run -> shutdown/kill/abort ->
    run ... ", so that only the script who owns the mysql
    server would be able to bring it down if required.
    
    NOTE
    ====
    Removed the deletion of pid file and socket file from entry
    of the loop, as it may result in 2nd instance deleting
    these files created by 1st instance in RACE condition.
    Compensated this by deleting these files at end of the loop
    
    Reverted the changes made in patch to Bug#16776528. So
    after this patch is pushed, the concept of mysqld_safe.pid
    would go altogether. This was required as the script was
    deleting other instance's mysqld_safe.pid allowing multiple
    mysqld_safe instances to run in parallel. This patch would
    fix Bug#16776528 as well as the resources would be guarded
    anyway by InnoDb lock + our planned 5.7 patch.
    e00810b9
mysqld_safe.sh 26.5 KB