Commit 12bdd58c authored by Libing Song's avatar Libing Song Committed by Sergei Golubchik

MDEV-32014 Rename binlog cache temporary file to binlog file

           for large transaction

Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.

The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.

Design
======
* binlog_large_commit_threshold
  type: ulonglong
  scope: global
  dynamic: yes
  default: 128MB

  Only the binlog cache temporary files large than 128MB are
  renamed to binlog file.

* #binlog_cache_files directory
  To support rename, all binlog cache temporary files are managed
  as normal files now. `#binlog_cache_files` directory is in the same
  directory with binlog files. It is created at server startup if it doesn't
  exist. Otherwise, all files in the directory is deleted at startup.

  The temporary files are named with ML_ prefix and the memorary address
  of the binlog_cache_data object which guarantees it is unique.

* Reserve space
  To supprot rename feature, It must reserve enough space at the
  begin of the binlog cache file. The space is required for
  Format description, Gtid list, checkpoint and Gtid events when
  renaming it to a binlog file.

  Since binlog_cache_data's cache_log is directly accessed by binlog log,
  online alter and wsrep. It is not easy to update all the code. Thus
  binlog cache will not reserve space if it is not session binlog cache or
  wsrep session is enabled.

  - m_file_reserved_bytes
    Stores the bytes reserved at the begin of the cache file.
    It is initialized in write_prepare() and cleared by reset().

    The reserved file header is hide to callers. Thus there is no
    change for callers. E.g.
    - get_byte_position() still get the length of binlog data
      written to the cache, but not the file length.
    - truncate(0) will truncate the file to m_file_reserved_bytes but not 0.

  - write_prepare()
    write_prepare() is called everytime when anything is being written
    into the cache. It will call init_file_reserved_bytes() to  create
    the cache file (if it doesn't exist) and reserve suitable space if
    the data written exceeds buffer's size.

* Binlog_commit_by_rotate
  It is used to encapsulate the code for remaing a binlog cache
  tempoary file to binlog file.
  - should_commit_by_rotate()
    it is called by write_transaction_to_binlog_events() to check if
    a binlog cache should be rename to a binlog file.
  - commit()
    That is the entry to rename a binlog cache and commit the
    transaction. Both rename and commit are protected by LOCK_log,
    Thus not other transactions can write anything into the renamed
    binlog before it.

    Rename happens in a rotation. After the new binlog file is generated,
    replace_binlog_file() is called to:
    - copy data from the new binlog file to its binlog cache file.
    - write gtid event.
    - rename the binlog cache file to binlog file.

    After that the rotation will continue to succeed. Then the transaction
    is committed in a seperated group itself. Its cache file will be
    detached and cache log will be reset before calling
    trx_group_commit_with_engines(). Thus only Xid event be written.
parent 34139685
......@@ -67,7 +67,7 @@ SET(SQL_EMBEDDED_SOURCES emb_qcache.cc libmysqld.c lib_sql.cc
../sql/item_subselect.cc ../sql/item_sum.cc ../sql/item_timefunc.cc
../sql/item_xmlfunc.cc ../sql/item_jsonfunc.cc
../sql/json_schema.cc ../sql/json_schema_helper.cc
../sql/key.cc ../sql/lock.cc ../sql/log.cc
../sql/key.cc ../sql/lock.cc ../sql/log.cc ../sql/log_cache.cc
../sql/log_event.cc ../sql/log_event_server.cc
../sql/mf_iocache.cc ../sql/my_decimal.cc
../sql/net_serv.cc ../sql/opt_range.cc
......
......@@ -47,14 +47,14 @@
--thread-pool-oversubscribe=#
How many additional active worker threads in a group are
allowed
@@ -1555,8 +1567,8 @@ The following specify which files/extra groups are read (specified before remain
automatically convert it to an on-disk MyISAM or Aria
table
-t, --tmpdir=name Path for temporary files. Several paths may be specified,
- separated by a colon (:), in this case they are used in a
- round-robin fashion
+ separated by a semicolon (;), in this case they are used
+ in a round-robin fashion
@@ -1597,8 +1597,8 @@
background for binlogging by user threads are placed in a
separate location (see `binlog_large_commit_threshold`
option). Several paths may be specified, separated by a
- colon (:), in this case they are used in a round-robin
- fashion
+ semicolon (;), in this case they are used in a
+ round-robin fashion
--transaction-alloc-block-size=#
Allocation block size for transactions to be stored in
binary log
......
......@@ -109,6 +109,16 @@ The following specify which files/extra groups are read (specified before remain
--binlog-ignore-db=name
Tells the master that updates to the given database
should not be logged to the binary log
--binlog-large-commit-threshold=#
Increases transaction concurrency for large transactions
(i.e. those with sizes larger than this value) by using
the large transaction's cache file as a new binary log,
and rotating the active binary log to the large
transaction's cache file at commit time. This avoids the
default commit logic that copies the transaction cache
data to the end of the active binary log file while
holding a lock that prevents other transactions from
binlogging
--binlog-legacy-event-pos
Fill in the end_log_pos field of _all_ events in the
binlog, even when doing so costs performance. Can be used
......@@ -620,7 +630,9 @@ The following specify which files/extra groups are read (specified before remain
--max-binlog-cache-size=#
Sets the total size of the transactional cache
--max-binlog-size=# Binary log will be rotated automatically when the size
exceeds this value
exceeds this value, unless
`binlog_large_commit_threshold` causes rotation
prematurely
--max-binlog-stmt-cache-size=#
Sets the total size of the statement cache
--max-binlog-total-size=#
......@@ -1590,9 +1602,12 @@ The following specify which files/extra groups are read (specified before remain
temporary table exceeds this size, MariaDB will
automatically convert it to an on-disk MyISAM or Aria
table
-t, --tmpdir=name Path for temporary files. Several paths may be specified,
separated by a colon (:), in this case they are used in a
round-robin fashion
-t, --tmpdir=name Path for temporary files. Files that are created in
background for binlogging by user threads are placed in a
separate location (see `binlog_large_commit_threshold`
option). Several paths may be specified, separated by a
colon (:), in this case they are used in a round-robin
fashion
--transaction-alloc-block-size=#
Allocation block size for transactions to be stored in
binary log
......@@ -1651,6 +1666,7 @@ binlog-format MIXED
binlog-gtid-index TRUE
binlog-gtid-index-page-size 4096
binlog-gtid-index-span-min 65536
binlog-large-commit-threshold 134217728
binlog-legacy-event-pos FALSE
binlog-optimize-thread-scheduling TRUE
binlog-row-event-max-size 8192
......
......@@ -160,16 +160,17 @@ ERROR HY000: Global temporary space limit reached
#
set @save_max_tmp_total_space_usage=@@global.max_tmp_total_space_usage;
set @@global.max_tmp_total_space_usage=64*1024*1024;
set @@max_tmp_session_space_usage=1179648;
set @@max_tmp_session_space_usage=1179648+65536;
select @@max_tmp_session_space_usage;
@@max_tmp_session_space_usage
1179648
1245184
set @save_aria_repair_threads=@@aria_repair_threads;
set @@aria_repair_threads=2;
set @save_max_heap_table_size=@@max_heap_table_size;
set @@max_heap_table_size=16777216;
CREATE TABLE t1 (a CHAR(255),b INT,INDEX (b));
INSERT INTO t1 SELECT SEQ,SEQ FROM seq_1_to_100000;
set @@max_tmp_session_space_usage=1179648;
SELECT * FROM t1 UNION SELECT * FROM t1;
ERROR HY000: Local temporary space limit reached
DROP TABLE t1;
......@@ -205,11 +206,13 @@ ERROR HY000: Local temporary space limit reached
#
connect c1, localhost, root,,;
set @@binlog_format=row;
CREATE OR REPLACE TABLE t1 (a DATETIME) ENGINE=MyISAM;
CREATE OR REPLACE TABLE t1 (a DATETIME) ENGINE=InnoDB;
BEGIN;
INSERT INTO t1 SELECT NOW() FROM seq_1_to_6000;
SET max_tmp_session_space_usage = 64*1024;
SELECT * FROM information_schema.ALL_PLUGINS LIMIT 2;
ERROR HY000: Local temporary space limit reached
ROLLBACK;
drop table t1;
connection default;
disconnect c1;
......
......@@ -215,7 +215,8 @@ select count(distinct concat(seq,repeat('x',1000))) from seq_1_to_1000;
set @save_max_tmp_total_space_usage=@@global.max_tmp_total_space_usage;
set @@global.max_tmp_total_space_usage=64*1024*1024;
set @@max_tmp_session_space_usage=1179648;
# Binlog cache reserve 4096 bytes at the begin of the temporary file.
set @@max_tmp_session_space_usage=1179648+65536;
select @@max_tmp_session_space_usage;
set @save_aria_repair_threads=@@aria_repair_threads;
set @@aria_repair_threads=2;
......@@ -224,6 +225,7 @@ set @@max_heap_table_size=16777216;
CREATE TABLE t1 (a CHAR(255),b INT,INDEX (b));
INSERT INTO t1 SELECT SEQ,SEQ FROM seq_1_to_100000;
set @@max_tmp_session_space_usage=1179648;
--error 200
SELECT * FROM t1 UNION SELECT * FROM t1;
DROP TABLE t1;
......@@ -266,11 +268,16 @@ SELECT MIN(VARIABLE_VALUE) OVER (), NTILE(1) OVER (), MAX(VARIABLE_NAME) OVER ()
connect(c1, localhost, root,,);
set @@binlog_format=row;
CREATE OR REPLACE TABLE t1 (a DATETIME) ENGINE=MyISAM;
CREATE OR REPLACE TABLE t1 (a DATETIME) ENGINE=InnoDB;
# Binlog cache file will be truncated at commit, thus keep the the transaction
# to keep binlog cache temporary file large enough
BEGIN;
INSERT INTO t1 SELECT NOW() FROM seq_1_to_6000;
SET max_tmp_session_space_usage = 64*1024;
--error 200
SELECT * FROM information_schema.ALL_PLUGINS LIMIT 2;
ROLLBACK;
drop table t1;
connection default;
disconnect c1;
......
RESET MASTER;
#
# binlog cache file is created in #binlog_cache_files directory
# and it is deleted at disconnect
#
connect con1,localhost,root,,;
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
# list binlog_cache_files/
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
# list #binlog_cache_files/
ML_BINLOG_CACHE_FILE
SET debug_sync = "thread_end SIGNAL signal.thread_end";
disconnect con1;
connection default;
SET debug_sync = "now WAIT_FOR signal.thread_end";
# binlog cache file is deleted at disconnection
# list #binlog_cache_files/
#
# Reserved space is not big enough, rename will not happen. But rotate
# will succeed.
#
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET debug = 'd,simulate_required_size_too_big';
UPDATE t1 SET c1 = repeat('2', 5242880);
include/assert.inc [Binlog is rotated, but rename is not executed.]
#
# Error happens when renaming binlog cache to binlog file, rename will
# not happen. Since the original binlog is delete, the rotate will failed
# too. binlog will be closed.
#
SET debug = 'd,simulate_rename_binlog_cache_to_binlog_error';
UPDATE t1 SET c1 = repeat('3', 5242880);
ERROR HY000: Can't open file: './master-bin.000004' (errno: 1 "Operation not permitted")
SELECT count(*) FROM t1 WHERE c1 like "3%";
count(*)
0
# Binlog is closed
show master status;
File Position Binlog_Do_DB Binlog_Ignore_DB
# restart
show master status;
File Position Binlog_Do_DB Binlog_Ignore_DB
master-bin.000004 # <Binlog_Do_DB> <Binlog_Ignore_DB>
#
# Crash happens before rename the file
#
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET debug = 'd,binlog_commit_by_rotate_crash_before_rename';
UPDATE t1 SET c1 = repeat('4', 5242880);
ERROR HY000: Lost connection to server during query
# One cache file left afte crash
# list #binlog_cache_files/
ML_BINLOG_CACHE_FILE
non_binlog_cache
# restart
# The cache file is deleted at startup.
# list #binlog_cache_files/
non_binlog_cache
include/assert_grep.inc [warning: non_binlog_cache file is in #binlog_cache_files/]
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000005 # Format_desc # # SERVER_VERSION, BINLOG_VERSION
master-bin.000005 # Gtid_list # # [#-#-#]
#
# Crash happens just after rotation is finished, binlog commit is not
# started yet, so there is no Xid_log_event in the log, no garbage at
# the end of the file.
#
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
BEGIN;
UPDATE t1 SET c1 = repeat('5', 5242880);
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('6', 5242880);
UPDATE t1 SET c1 = repeat('7', 5242880);
ROLLBACK TO SAVEPOINT s1;
INSERT INTO t1 VALUES('a');
SET debug = 'd,binlog_commit_by_rotate_crash_after_rotate';
COMMIT;
ERROR HY000: Lost connection to server during query
# No cache file left afte crash
# list #binlog_cache_files/
# restart
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000006 # Format_desc # # SERVER_VERSION, BINLOG_VERSION
master-bin.000006 # Gtid_list # # [#-#-#]
master-bin.000006 # Gtid # # BEGIN GTID #-#-#
master-bin.000006 # Annotate_rows # # UPDATE t1 SET c1 = repeat('5', 5242880)
master-bin.000006 # Table_map # # table_id: # (test.t1)
master-bin.000006 # Update_rows_v1 # # table_id: #
master-bin.000006 # Update_rows_v1 # # table_id: # flags: STMT_END_F
master-bin.000006 # Query # # SAVEPOINT `s1`
master-bin.000006 # Annotate_rows # # INSERT INTO t1 VALUES('a')
master-bin.000006 # Table_map # # table_id: # (test.t1)
master-bin.000006 # Write_rows_v1 # # table_id: # flags: STMT_END_F
call mtr.add_suppression(".*Turning logging off for the whole duration.*");
call mtr.add_suppression(".*non_binlog_cache is in #binlog_cache_files/.*");
DROP TABLE t1;
################################################################################
# MDEV-32014 Rename binlog cache to binlog file
#
# It verifies that the rename logic is handled correct if error happens.
################################################################################
--source include/have_binlog_format_row.inc
--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
RESET MASTER;
--echo #
--echo # binlog cache file is created in #binlog_cache_files directory
--echo # and it is deleted at disconnect
--echo #
--connect(con1,localhost,root,,)
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
--echo # list binlog_cache_files/
--let $datadir= `SELECT @@datadir`
--list_files $datadir/#binlog_cache_files
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
--echo # list #binlog_cache_files/
--replace_regex /ML_[0-9]+/ML_BINLOG_CACHE_FILE/
--list_files $datadir/#binlog_cache_files
SET debug_sync = "thread_end SIGNAL signal.thread_end";
--disconnect con1
--connection default
# Wait until the connection is closed completely.
SET debug_sync = "now WAIT_FOR signal.thread_end";
--echo # binlog cache file is deleted at disconnection
--echo # list #binlog_cache_files/
--list_files $datadir/#binlog_cache_files
--echo #
--echo # Reserved space is not big enough, rename will not happen. But rotate
--echo # will succeed.
--echo #
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET debug = 'd,simulate_required_size_too_big';
UPDATE t1 SET c1 = repeat('2', 5242880);
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000002' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos < 4096
--let $assert_text= Binlog is rotated, but rename is not executed.
--source include/assert.inc
--echo #
--echo # Error happens when renaming binlog cache to binlog file, rename will
--echo # not happen. Since the original binlog is delete, the rotate will failed
--echo # too. binlog will be closed.
--echo #
SET debug = 'd,simulate_rename_binlog_cache_to_binlog_error';
--error ER_CANT_OPEN_FILE
UPDATE t1 SET c1 = repeat('3', 5242880);
SELECT count(*) FROM t1 WHERE c1 like "3%";
--echo # Binlog is closed
--source include/show_master_status.inc
--source include/restart_mysqld.inc
--source include/show_master_status.inc
--echo #
--echo # Crash happens before rename the file
--echo #
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET debug = 'd,binlog_commit_by_rotate_crash_before_rename';
--source include/expect_crash.inc
--error 2013
UPDATE t1 SET c1 = repeat('4', 5242880);
--write_file $datadir/#binlog_cache_files/non_binlog_cache
It is not a binlog cache file
EOF
--echo # One cache file left afte crash
--echo # list #binlog_cache_files/
--replace_regex /ML_[0-9]+/ML_BINLOG_CACHE_FILE/
--list_files $datadir/#binlog_cache_files
--source include/start_mysqld.inc
--echo # The cache file is deleted at startup.
--echo # list #binlog_cache_files/
--list_files $datadir/#binlog_cache_files
--let $assert_text= warning: non_binlog_cache file is in #binlog_cache_files/
--let $assert_file= $MYSQLTEST_VARDIR/log/mysqld.1.err
--let $assert_select= non_binlog_cache.*#binlog_cache_files/
--let $assert_count= 1
--let $assert_only_after= CURRENT_TEST: binlog.binlog_commit_by_rotate_atomic
--source include/assert_grep.inc
--remove_file $datadir/#binlog_cache_files/non_binlog_cache
--let $binlog_file= LAST
--let $binlog_start= 4
--let $skip_checkpoint_events= 1
--source include/show_binlog_events.inc
--echo #
--echo # Crash happens just after rotation is finished, binlog commit is not
--echo # started yet, so there is no Xid_log_event in the log, no garbage at
--echo # the end of the file.
--echo #
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
BEGIN;
UPDATE t1 SET c1 = repeat('5', 5242880);
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('6', 5242880);
UPDATE t1 SET c1 = repeat('7', 5242880);
ROLLBACK TO SAVEPOINT s1;
INSERT INTO t1 VALUES('a');
SET debug = 'd,binlog_commit_by_rotate_crash_after_rotate';
--source include/expect_crash.inc
--error 2013
COMMIT;
--echo # No cache file left afte crash
--echo # list #binlog_cache_files/
--replace_regex /ML_[0-9]+/ML_BINLOG_CACHE_FILE/
--list_files $datadir/#binlog_cache_files
--source include/start_mysqld.inc
--let $binlog_file= master-bin.000006
--let $binlog_start= 4
--let $skip_checkpoint_events= 1
--source include/show_binlog_events.inc
call mtr.add_suppression(".*Turning logging off for the whole duration.*");
call mtr.add_suppression(".*non_binlog_cache is in #binlog_cache_files/.*");
DROP TABLE t1;
RESET MASTER;
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
SET @saved_threshold= @@GLOBAL.binlog_large_commit_threshold;
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
UPDATE t1 SET c1 = repeat('2', 5242880);
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000002 # Gtid # # BEGIN GTID #-#-#
master-bin.000002 # Annotate_rows # # UPDATE t1 SET c1 = repeat('2', 5242880)
master-bin.000002 # Table_map # # table_id: # (test.t1)
master-bin.000002 # Update_rows_v1 # # table_id: #
master-bin.000002 # Update_rows_v1 # # table_id: # flags: STMT_END_F
master-bin.000002 # Xid # # COMMIT /* XID */
SET GLOBAL binlog_large_commit_threshold = @saved_threshold;
DROP TABLE t1;
--source include/have_file_key_management_plugin.inc
--source include/have_binlog_format_row.inc
--source include/have_innodb.inc
RESET MASTER;
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
SET @saved_threshold= @@GLOBAL.binlog_large_commit_threshold;
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
UPDATE t1 SET c1 = repeat('2', 5242880);
--let $binlog_file= LAST
--let $skip_checkpoint_events=1
--source include/show_binlog_events.inc
SET GLOBAL binlog_large_commit_threshold = @saved_threshold;
DROP TABLE t1;
include/master-slave.inc
[connection master]
# Prepare
SET @saved_binlog_large_commit_threshold= @@GLOBAL.binlog_large_commit_threshold;
SET @saved_binlog_checksum= @@GLOBAL.binlog_checksum;
SET GLOBAL binlog_checksum = "NONE";
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
CREATE TABLE t2 (c1 LONGTEXT) ENGINE = MyISAM;
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t2 values(repeat("1", 5242880));
INSERT INTO t2 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
# Not renamed to binlog, since the binlog cache is not larger than the
# threshold. And it should works well after ROLLBACK TO SAVEPOINT
BEGIN;
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('1', 5242880);
ROLLBACK TO SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('2', 5242880);
SAVEPOINT s2;
UPDATE t1 SET c1 = repeat('3', 5242880);
UPDATE t1 SET c1 = repeat('4', 5242880);
ROLLBACK TO SAVEPOINT s2;
COMMIT;
include/assert.inc [Binlog is not rotated]
#
# Test binlog cache rename to binlog file with checksum off
#
include/sync_slave_sql_with_master.inc
include/stop_slave.inc
SET @saved_binlog_large_commit_threshold = @@GLOBAL.binlog_large_commit_threshold;
SET @saved_slave_parallel_workers = @@GLOBAL.slave_parallel_workers;
SET @saved_slave_parallel_mode = @@GLOBAL.slave_parallel_mode;
SET @saved_slave_parallel_max_queued = @@GLOBAL.slave_parallel_max_queued;
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET GLOBAL slave_parallel_max_queued = 100 * 1024 * 1024;
SET GLOBAL slave_parallel_workers = 4;
SET GLOBAL slave_parallel_mode = "aggressive";
include/start_slave.inc
BEGIN;
DELETE FROM t1;
connection master;
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
# Transaction cache can be renamed and works well with ROLLBACK TO SAVEPOINT
BEGIN;
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('2', 5242880);
ROLLBACK TO s1;
UPDATE t1 SET c1 = repeat('3', 5242880);
SAVEPOINT s2;
UPDATE t1 SET c1 = repeat('4', 5242880);
UPDATE t1 SET c1 = repeat('5', 5242880);
UPDATE t1 SET c1 = repeat('6', 5242880);
ROLLBACK TO SAVEPOINT s2;
COMMIT;
INSERT INTO t1 VALUES("after_update_t1");
include/assert.inc [Rename is executed.]
# statement cache can be renamed
connection master;
BEGIN;
UPDATE t2 SET c1 = repeat('4', 5242880);
INSERT INTO t1 VALUES("after_update_t2");
COMMIT;
include/assert.inc [Rename is executed.]
connection slave;
ROLLBACK;
connection master;
include/sync_slave_sql_with_master.inc
include/assert.inc [Rename is executed.]
include/assert.inc [Rename is executed.]
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
slave-bin.000002 # Gtid # # BEGIN GTID #-#-#
slave-bin.000002 # Annotate_rows # # UPDATE t1 SET c1 = repeat('3', 5242880)
slave-bin.000002 # Table_map # # table_id: # (test.t1)
slave-bin.000002 # Update_rows_v1 # # table_id: #
slave-bin.000002 # Update_rows_v1 # # table_id: # flags: STMT_END_F
slave-bin.000002 # Query # # SAVEPOINT `s2`
slave-bin.000002 # Xid # # COMMIT /* XID */
slave-bin.000002 # Gtid # # BEGIN GTID #-#-#
slave-bin.000002 # Annotate_rows # # INSERT INTO t1 VALUES("after_update_t1")
slave-bin.000002 # Table_map # # table_id: # (test.t1)
slave-bin.000002 # Write_rows_v1 # # table_id: # flags: STMT_END_F
slave-bin.000002 # Xid # # COMMIT /* XID */
slave-bin.000002 # Rotate # # slave-bin.000003;pos=POS
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
slave-bin.000003 # Gtid # # BEGIN GTID #-#-#
slave-bin.000003 # Annotate_rows # # UPDATE t2 SET c1 = repeat('4', 5242880)
slave-bin.000003 # Table_map # # table_id: # (test.t2)
slave-bin.000003 # Update_rows_v1 # # table_id: #
slave-bin.000003 # Update_rows_v1 # # table_id: # flags: STMT_END_F
slave-bin.000003 # Query # # COMMIT
slave-bin.000003 # Gtid # # BEGIN GTID #-#-#
slave-bin.000003 # Annotate_rows # # INSERT INTO t1 VALUES("after_update_t2")
slave-bin.000003 # Table_map # # table_id: # (test.t1)
slave-bin.000003 # Write_rows_v1 # # table_id: # flags: STMT_END_F
slave-bin.000003 # Xid # # COMMIT /* XID */
include/stop_slave.inc
SET GLOBAL binlog_large_commit_threshold = @saved_binlog_large_commit_threshold;
SET GLOBAL slave_parallel_workers = @saved_slave_parallel_workers;
SET GLOBAL slave_parallel_max_queued = @saved_slave_parallel_max_queued;
SET GLOBAL slave_parallel_mode = @saved_slave_parallel_mode;
include/start_slave.inc
# CREATE SELECT works well
connection master;
CREATE TABLE t3 SELECT * FROM t1;
include/assert.inc [Rename is executed.]
CREATE TABLE t4 SELECT * FROM t2;
include/assert.inc [Rename is executed.]
# XA statement works well
XA START "test-a-long-xid========================================";
UPDATE t1 SET c1 = repeat('1', 5242880);
XA END "test-a-long-xid========================================";
XA PREPARE "test-a-long-xid========================================";
XA COMMIT "test-a-long-xid========================================";
include/assert.inc [Rename is executed.]
XA START "test-xid";
UPDATE t1 SET c1 = repeat('2', 5242880);
XA END "test-xid";
XA COMMIT "test-xid" ONE PHASE;
include/assert.inc [Rename is executed.]
#
# It works well in the situation that binlog header is larger than
# IO_SIZE and binlog file's buffer.
#
FLUSH BINARY LOGS;
SET SESSION server_id = 1;
UPDATE t1 SET c1 = repeat('3', 5242880);
include/assert.inc [Rename is executed.]
#
# RESET MASTER should work well. It also verifies binlog checksum mechanism.
#
include/rpl_reset.inc
#
# Test binlog cache rename to binlog file with checksum on
#
SET GLOBAL binlog_checksum = "CRC32";
# It will not rename the cache to file, since the cache's checksum was
# initialized when reset the cache at the end of previous transaction.
UPDATE t1 SET c1 = repeat('5', 5242880);
include/assert.inc [Binlog is not rotated]
#
# Not rename to binlog file If the cache's checksum is not same
# to binlog_checksum
#
BEGIN;
UPDATE t1 SET c1 = repeat('6', 5242880);
SET GLOBAL binlog_checksum = "NONE";
COMMIT;
include/assert.inc [Binlog is not rotated]
BEGIN;
UPDATE t1 SET c1 = repeat('7', 5242880);
SET GLOBAL binlog_checksum = "CRC32";
COMMIT;
include/assert.inc [Binlog is not rotated]
#
# Not rename to binlog file If both stmt and trx cache are not empty
#
UPDATE t1, t2 SET t1.c1 = repeat('8', 5242880), t2.c1 = repeat('7', 5242880);
include/assert.inc [Binlog is not rotated]
#
# Not rename to binlog file If binlog_legacy_event_pos is on
#
SET GLOBAL binlog_legacy_event_pos = ON;
UPDATE t1 SET c1 = repeat('9', 5242880);
SET GLOBAL binlog_legacy_event_pos = OFF;
include/assert.inc [Binlog is not rotated]
DROP TABLE t1, t2, t3, t4;
SET GLOBAL binlog_large_commit_threshold = @saved_binlog_large_commit_threshold;
SET GLOBAL binlog_checksum = @saved_binlog_checksum;
include/rpl_end.inc
################################################################################
# MDEV-32014 Rename binlog cache to binlog file
#
# It verifies that the binlog caches which are larger
# than binlog_large_commit_threshold can be move to a binlog file
# successfully. With a successful rename,
# - it rotates the binlog and the cache is renamed to the new binlog file
# - an ignorable event is generated just after the Gtid_log_event of the
# transaction to take the reserved spaces which is unused.
#
# It also verifies that rename is not supported in below cases
# though the cache is larger than the threshold
# - both statement and transaction cache should be flushed.
# - the cache's checksum option is not same to binlog_checksum
# - binlog_legacy_event_pos is enabled.
################################################################################
--source include/have_binlog_format_row.inc
--source include/have_innodb.inc
--source include/master-slave.inc
--echo # Prepare
SET @saved_binlog_large_commit_threshold= @@GLOBAL.binlog_large_commit_threshold;
SET @saved_binlog_checksum= @@GLOBAL.binlog_checksum;
SET GLOBAL binlog_checksum = "NONE";
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
CREATE TABLE t2 (c1 LONGTEXT) ENGINE = MyISAM;
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t2 values(repeat("1", 5242880));
INSERT INTO t2 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
--echo # Not renamed to binlog, since the binlog cache is not larger than the
--echo # threshold. And it should works well after ROLLBACK TO SAVEPOINT
BEGIN;
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('1', 5242880);
ROLLBACK TO SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('2', 5242880);
SAVEPOINT s2;
UPDATE t1 SET c1 = repeat('3', 5242880);
UPDATE t1 SET c1 = repeat('4', 5242880);
ROLLBACK TO SAVEPOINT s2;
COMMIT;
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000003"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
--echo #
--echo # Test binlog cache rename to binlog file with checksum off
--echo #
--source include/sync_slave_sql_with_master.inc
--source include/stop_slave.inc
SET @saved_binlog_large_commit_threshold = @@GLOBAL.binlog_large_commit_threshold;
SET @saved_slave_parallel_workers = @@GLOBAL.slave_parallel_workers;
SET @saved_slave_parallel_mode = @@GLOBAL.slave_parallel_mode;
SET @saved_slave_parallel_max_queued = @@GLOBAL.slave_parallel_max_queued;
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET GLOBAL slave_parallel_max_queued = 100 * 1024 * 1024;
SET GLOBAL slave_parallel_workers = 4;
SET GLOBAL slave_parallel_mode = "aggressive";
--source include/start_slave.inc
# Block all DML on slave
BEGIN;
DELETE FROM t1;
--connection master
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
--echo # Transaction cache can be renamed and works well with ROLLBACK TO SAVEPOINT
BEGIN;
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('2', 5242880);
ROLLBACK TO s1;
UPDATE t1 SET c1 = repeat('3', 5242880);
SAVEPOINT s2;
UPDATE t1 SET c1 = repeat('4', 5242880);
UPDATE t1 SET c1 = repeat('5', 5242880);
UPDATE t1 SET c1 = repeat('6', 5242880);
ROLLBACK TO SAVEPOINT s2;
COMMIT;
INSERT INTO t1 VALUES("after_update_t1");
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000004' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--echo # statement cache can be renamed
--connection master
BEGIN;
UPDATE t2 SET c1 = repeat('4', 5242880);
INSERT INTO t1 VALUES("after_update_t2");
COMMIT;
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000005' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--connection slave
# UPDATE t2 should be waiting for prior transactions to commit.
let $wait_condition=
SELECT count(*) = 1 FROM information_schema.processlist
WHERE State = "Waiting for prior transaction to commit";
--source include/wait_condition.inc
ROLLBACK;
--connection master
--source include/sync_slave_sql_with_master.inc
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'slave-bin.000002' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'slave-bin.000003' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--let $binlog_file= slave-bin.000002
--let $skip_checkpoint_events= 1
--source include/show_binlog_events.inc
--let $binlog_file= slave-bin.000003
--source include/show_binlog_events.inc
--source include/stop_slave.inc
SET GLOBAL binlog_large_commit_threshold = @saved_binlog_large_commit_threshold;
SET GLOBAL slave_parallel_workers = @saved_slave_parallel_workers;
SET GLOBAL slave_parallel_max_queued = @saved_slave_parallel_max_queued;
SET GLOBAL slave_parallel_mode = @saved_slave_parallel_mode;
--source include/start_slave.inc
--echo # CREATE SELECT works well
--connection master
CREATE TABLE t3 SELECT * FROM t1;
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000006' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
CREATE TABLE t4 SELECT * FROM t2;
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000007' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--echo # XA statement works well
XA START "test-a-long-xid========================================";
UPDATE t1 SET c1 = repeat('1', 5242880);
XA END "test-a-long-xid========================================";
XA PREPARE "test-a-long-xid========================================";
XA COMMIT "test-a-long-xid========================================";
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000008' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
XA START "test-xid";
UPDATE t1 SET c1 = repeat('2', 5242880);
XA END "test-xid";
XA COMMIT "test-xid" ONE PHASE;
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000009' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--echo #
--echo # It works well in the situation that binlog header is larger than
--echo # IO_SIZE and binlog file's buffer.
--echo #
--disable_query_log
# make Gtid_list_event larger than 64K(binlog file's buffer)
--let $server_id= 100000
while ($server_id < 104096)
{
eval SET SESSION server_id = $server_id;
eval UPDATE t1 SET c1 = "$server_id" LIMIT 1;
--inc $server_id
}
--enable_query_log
# After flush, reserved space should be updated.
FLUSH BINARY LOGS;
SET SESSION server_id = 1;
UPDATE t1 SET c1 = repeat('3', 5242880);
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000011' LIMIT 4, End_log_pos, 4)
# 69632 is 65K which is larger, binlog's buffer is 64K
--let $assert_cond= $gtid_end_pos = 69632
--let $assert_text= Rename is executed.
--source include/assert.inc
--echo #
--echo # RESET MASTER should work well. It also verifies binlog checksum mechanism.
--echo #
--source include/rpl_reset.inc
--echo #
--echo # Test binlog cache rename to binlog file with checksum on
--echo #
SET GLOBAL binlog_checksum = "CRC32";
--echo # It will not rename the cache to file, since the cache's checksum was
--echo # initialized when reset the cache at the end of previous transaction.
UPDATE t1 SET c1 = repeat('5', 5242880);
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000002"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
--echo #
--echo # Not rename to binlog file If the cache's checksum is not same
--echo # to binlog_checksum
--echo #
BEGIN;
UPDATE t1 SET c1 = repeat('6', 5242880);
SET GLOBAL binlog_checksum = "NONE";
COMMIT;
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000003"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
BEGIN;
UPDATE t1 SET c1 = repeat('7', 5242880);
SET GLOBAL binlog_checksum = "CRC32";
COMMIT;
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000004"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
--echo #
--echo # Not rename to binlog file If both stmt and trx cache are not empty
--echo #
UPDATE t1, t2 SET t1.c1 = repeat('8', 5242880), t2.c1 = repeat('7', 5242880);
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000004"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
--echo #
--echo # Not rename to binlog file If binlog_legacy_event_pos is on
--echo #
SET GLOBAL binlog_legacy_event_pos = ON;
UPDATE t1 SET c1 = repeat('9', 5242880);
SET GLOBAL binlog_legacy_event_pos = OFF;
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000004"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
# cleanup
DROP TABLE t1, t2, t3, t4;
SET GLOBAL binlog_large_commit_threshold = @saved_binlog_large_commit_threshold;
SET GLOBAL binlog_checksum = @saved_binlog_checksum;
--let $binlog_file=
--let $skip_checkpoint_events=0
--source include/rpl_end.inc
......@@ -462,6 +462,16 @@ NUMERIC_BLOCK_SIZE 1
ENUM_VALUE_LIST NULL
READ_ONLY NO
COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME BINLOG_LARGE_COMMIT_THRESHOLD
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BIGINT UNSIGNED
VARIABLE_COMMENT Increases transaction concurrency for large transactions (i.e. those with sizes larger than this value) by using the large transaction's cache file as a new binary log, and rotating the active binary log to the large transaction's cache file at commit time. This avoids the default commit logic that copies the transaction cache data to the end of the active binary log file while holding a lock that prevents other transactions from binlogging
NUMERIC_MIN_VALUE 10485760
NUMERIC_MAX_VALUE 18446744073709551615
NUMERIC_BLOCK_SIZE 1
ENUM_VALUE_LIST NULL
READ_ONLY NO
COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME BINLOG_OPTIMIZE_THREAD_SCHEDULING
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BOOLEAN
......@@ -1905,7 +1915,7 @@ COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME MAX_BINLOG_SIZE
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BIGINT UNSIGNED
VARIABLE_COMMENT Binary log will be rotated automatically when the size exceeds this value
VARIABLE_COMMENT Binary log will be rotated automatically when the size exceeds this value, unless `binlog_large_commit_threshold` causes rotation prematurely
NUMERIC_MIN_VALUE 4096
NUMERIC_MAX_VALUE 1073741824
NUMERIC_BLOCK_SIZE 4096
......@@ -3995,7 +4005,7 @@ COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME TMPDIR
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE VARCHAR
VARIABLE_COMMENT Path for temporary files. Several paths may be specified, separated by a colon (:), in this case they are used in a round-robin fashion
VARIABLE_COMMENT Path for temporary files. Files that are created in background for binlogging by user threads are placed in a separate location (see `binlog_large_commit_threshold` option). Several paths may be specified, separated by a colon (:), in this case they are used in a round-robin fashion
NUMERIC_MIN_VALUE NULL
NUMERIC_MAX_VALUE NULL
NUMERIC_BLOCK_SIZE NULL
......
......@@ -492,6 +492,16 @@ NUMERIC_BLOCK_SIZE NULL
ENUM_VALUE_LIST NULL
READ_ONLY YES
COMMAND_LINE_ARGUMENT NULL
VARIABLE_NAME BINLOG_LARGE_COMMIT_THRESHOLD
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BIGINT UNSIGNED
VARIABLE_COMMENT Increases transaction concurrency for large transactions (i.e. those with sizes larger than this value) by using the large transaction's cache file as a new binary log, and rotating the active binary log to the large transaction's cache file at commit time. This avoids the default commit logic that copies the transaction cache data to the end of the active binary log file while holding a lock that prevents other transactions from binlogging
NUMERIC_MIN_VALUE 10485760
NUMERIC_MAX_VALUE 18446744073709551615
NUMERIC_BLOCK_SIZE 1
ENUM_VALUE_LIST NULL
READ_ONLY NO
COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME BINLOG_LEGACY_EVENT_POS
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BOOLEAN
......@@ -2105,7 +2115,7 @@ COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME MAX_BINLOG_SIZE
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BIGINT UNSIGNED
VARIABLE_COMMENT Binary log will be rotated automatically when the size exceeds this value
VARIABLE_COMMENT Binary log will be rotated automatically when the size exceeds this value, unless `binlog_large_commit_threshold` causes rotation prematurely
NUMERIC_MIN_VALUE 4096
NUMERIC_MAX_VALUE 1073741824
NUMERIC_BLOCK_SIZE 4096
......@@ -4865,7 +4875,7 @@ COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME TMPDIR
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE VARCHAR
VARIABLE_COMMENT Path for temporary files. Several paths may be specified, separated by a colon (:), in this case they are used in a round-robin fashion
VARIABLE_COMMENT Path for temporary files. Files that are created in background for binlogging by user threads are placed in a separate location (see `binlog_large_commit_threshold` option). Several paths may be specified, separated by a colon (:), in this case they are used in a round-robin fashion
NUMERIC_MIN_VALUE NULL
NUMERIC_MAX_VALUE NULL
NUMERIC_BLOCK_SIZE NULL
......
......@@ -107,7 +107,7 @@ SET (SQL_SOURCE
hostname.cc init.cc item.cc item_buff.cc item_cmpfunc.cc
item_create.cc item_func.cc item_geofunc.cc item_row.cc
item_strfunc.cc item_subselect.cc item_sum.cc item_timefunc.cc
key.cc log.cc lock.cc
key.cc log.cc log_cache.cc lock.cc
log_event.cc log_event_server.cc
rpl_record.cc rpl_reporting.cc
mf_iocache.cc my_decimal.cc
......
This diff is collapsed.
......@@ -600,9 +600,12 @@ class binlog_cache_mngr;
class binlog_cache_data;
struct rpl_gtid;
struct wait_for_commit;
class Binlog_commit_by_rotate;
class MYSQL_BIN_LOG: public TC_LOG, private Event_log
{
friend Binlog_commit_by_rotate;
#ifdef HAVE_PSI_INTERFACE
/** The instrumentation key to use for @ LOCK_index. */
PSI_mutex_key m_key_LOCK_index;
......@@ -756,18 +759,24 @@ class MYSQL_BIN_LOG: public TC_LOG, private Event_log
new_file() is locking. new_file_without_locking() does not acquire
LOCK_log.
*/
int new_file_impl();
int new_file_impl(bool commit_by_rotate);
void do_checkpoint_request(ulong binlog_id);
int write_transaction_or_stmt(group_commit_entry *entry, uint64 commit_id);
int write_transaction_or_stmt(group_commit_entry *entry, uint64 commit_id,
bool commit_by_rotate);
int queue_for_group_commit(group_commit_entry *entry);
bool write_transaction_to_binlog_events(group_commit_entry *entry);
bool write_transaction_with_group_commit(group_commit_entry *entry);
void write_transaction_handle_error(group_commit_entry *entry);
void trx_group_commit_leader(group_commit_entry *leader);
void trx_group_commit_with_engines(group_commit_entry *leader,
group_commit_entry *tail,
bool commit_by_rotate);
bool is_xidlist_idle_nolock();
void update_gtid_index(uint32 offset, rpl_gtid gtid);
public:
void purge(bool all);
int new_file_without_locking();
int new_file_without_locking(bool commit_by_rotate);
/*
A list of struct xid_count_per_binlog is used to keep track of how many
XIDs are in prepared, but not committed, state in each binlog. And how
......@@ -997,7 +1006,8 @@ class MYSQL_BIN_LOG: public TC_LOG, private Event_log
enum cache_type io_cache_type_arg,
ulong max_size,
bool null_created,
bool need_mutex);
bool need_mutex,
bool commit_by_rotate = false);
bool open_index_file(const char *index_file_name_arg,
const char *log_name, bool need_mutex);
/* Use this to start writing a new log file */
......@@ -1037,7 +1047,8 @@ class MYSQL_BIN_LOG: public TC_LOG, private Event_log
bool is_active(const char* log_file_name);
bool can_purge_log(const char *log_file_name, bool interactive);
int update_log_index(LOG_INFO* linfo, bool need_update_threads);
int rotate(bool force_rotate, bool* check_purge);
int rotate(bool force_rotate, bool *check_purge,
bool commit_by_rotate= false);
void checkpoint_and_purge(ulong binlog_id);
int rotate_and_purge(bool force_rotate, DYNAMIC_ARRAY* drop_gtid_domain= NULL);
/**
......@@ -1117,6 +1128,7 @@ class MYSQL_BIN_LOG: public TC_LOG, private Event_log
bool is_xidlist_idle();
bool write_gtid_event(THD *thd, bool standalone, bool is_transactional,
uint64 commit_id,
bool commit_by_rotate,
bool has_xid= false, bool ro_1pc= false);
int read_state_from_file();
int write_state_to_file();
......
#include "my_global.h"
#include "log_cache.h"
#include "handler.h"
#include "my_sys.h"
#include "mysql/psi/mysql_file.h"
#include "mysql/service_wsrep.h"
const char *BINLOG_CACHE_DIR= "#binlog_cache_files";
char binlog_cache_dir[FN_REFLEN];
extern uint32 binlog_cache_reserved_size();
bool binlog_cache_data::init_file_reserved_bytes()
{
// Session's cache file is not created, so created here.
if (cache_log.file == -1)
{
char name[FN_REFLEN];
/* Cache file is named with PREFIX + binlog_cache_data object's address */
snprintf(name, FN_REFLEN, "%s/%s_%llu", cache_log.dir, cache_log.prefix,
(ulonglong) this);
if ((cache_log.file=
mysql_file_open(0, name, O_CREAT | O_RDWR, MYF(MY_WME))) < 0)
{
sql_print_error("Failed to open binlog cache temporary file %s", name);
cache_log.error= -1;
return true;
}
}
#ifdef WITH_WSREP
/*
WSREP code accesses cache_log directly, so don't reserve space if WSREP is
on.
*/
if (unlikely(wsrep_on(current_thd)))
return false;
#endif
m_file_reserved_bytes= binlog_cache_reserved_size();
cache_log.pos_in_file= m_file_reserved_bytes;
cache_log.seek_not_done= 1;
return false;
}
void binlog_cache_data::detach_temp_file()
{
mysql_file_close(cache_log.file, MYF(0));
cache_log.file= -1;
reset();
}
extern void ignore_db_dirs_append(const char *dirname_arg);
bool init_binlog_cache_dir()
{
size_t length;
uint max_tmp_file_name_len=
2 /* prefix */ + 10 /* max len of thread_id */ + 1 /* underline */;
ignore_db_dirs_append(BINLOG_CACHE_DIR);
dirname_part(binlog_cache_dir, log_bin_basename, &length);
/*
Must ensure the full name of the tmp file is shorter than FN_REFLEN, to
avoid overflowing the name buffer in write and commit.
*/
if (length + strlen(BINLOG_CACHE_DIR) + max_tmp_file_name_len >= FN_REFLEN)
{
sql_print_error("Could not create binlog cache dir %s%s. It is too long.",
binlog_cache_dir, BINLOG_CACHE_DIR);
return true;
}
memcpy(binlog_cache_dir + length, BINLOG_CACHE_DIR,
strlen(BINLOG_CACHE_DIR));
binlog_cache_dir[length + strlen(BINLOG_CACHE_DIR)]= 0;
MY_DIR *dir_info= my_dir(binlog_cache_dir, MYF(0));
if (!dir_info)
{
/* Make a dir for binlog cache temp files if not exist. */
if (my_mkdir(binlog_cache_dir, 0777, MYF(0)) < 0)
{
sql_print_error("Could not create binlog cache dir %s.",
binlog_cache_dir);
return true;
}
return false;
}
/* Try to delete all cache files in the directory. */
for (uint i= 0; i < dir_info->number_of_files; i++)
{
FILEINFO *file= dir_info->dir_entry + i;
if (strncmp(file->name, LOG_PREFIX, strlen(LOG_PREFIX)))
{
sql_print_warning("%s is in %s/, but it is not a binlog cache file",
file->name, BINLOG_CACHE_DIR);
continue;
}
char file_path[FN_REFLEN];
fn_format(file_path, file->name, binlog_cache_dir, "",
MYF(MY_REPLACE_DIR));
my_delete(file_path, MYF(0));
}
my_dirend(dir_info);
return false;
}
......@@ -22,6 +22,16 @@ static constexpr my_off_t MY_OFF_T_UNDEF= ~0ULL;
/** Truncate cache log files bigger than this */
static constexpr my_off_t CACHE_FILE_TRUNC_SIZE = 65536;
/**
Create binlog cache directory if it doesn't exist, otherwise delete all
files existing in the directory.
@retval false Succeeds to initialize the directory.
@retval true Failed to initialize the directory.
*/
bool init_binlog_cache_dir();
extern char binlog_cache_dir[FN_REFLEN];
/*
Helper classes to store non-transactional and transactional data
......@@ -35,7 +45,7 @@ class binlog_cache_data
before_stmt_pos(MY_OFF_T_UNDEF), m_pending(0), status(0),
incident(FALSE), precompute_checksums(precompute_checksums),
saved_max_binlog_cache_size(0), ptr_binlog_cache_use(0),
ptr_binlog_cache_disk_use(0)
ptr_binlog_cache_disk_use(0), m_file_reserved_bytes(0)
{
/*
Read the current checksum setting. We will use this setting to decide
......@@ -50,6 +60,10 @@ class binlog_cache_data
~binlog_cache_data()
{
DBUG_ASSERT(empty());
if (cache_log.file != -1 && !encrypt_tmp_files)
unlink(my_filename(cache_log.file));
close_cached_file(&cache_log);
}
......@@ -67,7 +81,7 @@ class binlog_cache_data
bool empty() const
{
return (pending() == NULL &&
(my_b_write_tell(&cache_log) == 0 ||
(my_b_write_tell(&cache_log) - m_file_reserved_bytes == 0 ||
((status & (LOGGED_ROW_EVENT | LOGGED_CRITICAL)) == 0)));
}
......@@ -97,6 +111,8 @@ class binlog_cache_data
bool truncate_file= (cache_log.file != -1 &&
my_b_write_tell(&cache_log) >
MY_MIN(CACHE_FILE_TRUNC_SIZE, binlog_stmt_cache_size));
// m_file_reserved_bytes must be reset to 0, before truncate.
m_file_reserved_bytes= 0;
truncate(0,1); // Forget what's in cache
checksum_opt= !precompute_checksums ? BINLOG_CHECKSUM_ALG_OFF :
(enum_binlog_checksum_alg)binlog_checksum_options;
......@@ -112,7 +128,8 @@ class binlog_cache_data
my_off_t get_byte_position() const
{
return my_b_tell(&cache_log);
DBUG_ASSERT(cache_log.type == WRITE_CACHE);
return my_b_tell(&cache_log) - m_file_reserved_bytes;
}
my_off_t get_prev_position() const
......@@ -172,6 +189,81 @@ class binlog_cache_data
status|= status_arg;
}
/**
This function is called everytime when anything is being written into the
cache_log. To support rename binlog cache to binlog file, the cache_log
should be initialized with reserved space.
*/
bool write_prepare(size_t write_length)
{
/* Data will exceed the buffer size in this write */
if (unlikely(cache_log.write_pos + write_length > cache_log.write_end &&
cache_log.pos_in_file == 0))
{
/* Only session's binlog cache need to reserve space. */
if (cache_log.dir == binlog_cache_dir && !encrypt_tmp_files)
return init_file_reserved_bytes();
}
return false;
}
/**
For session's binlog cache, it have to call this function to skip the
reserved before reading the cache file.
*/
bool init_for_read()
{
return reinit_io_cache(&cache_log, READ_CACHE, m_file_reserved_bytes, 0, 0);
}
/**
For session's binlog cache, it have to call this function to get the
actual data length.
*/
my_off_t length_for_read() const
{
DBUG_ASSERT(cache_log.type == READ_CACHE);
return cache_log.end_of_file - m_file_reserved_bytes;
}
/**
It function returns the cache file's actual length which includes the
reserved space.
*/
my_off_t temp_file_length()
{
return my_b_tell(&cache_log);
}
uint32 file_reserved_bytes() { return m_file_reserved_bytes; }
/**
Flush and sync the data of the file into storage.
@retval true Error happens
@retval false Succeeds
*/
bool sync_temp_file()
{
DBUG_ASSERT(cache_log.file != -1);
if (my_b_flush_io_cache(&cache_log, 1) ||
mysql_file_sync(cache_log.file, MYF(0)))
return true;
return false;
}
/**
Copy the name of the cache file to the argument name.
*/
const char *temp_file_name() { return my_filename(cache_log.file); }
/**
It is called after renaming the cache file to a binlog file. The file
now is a binlog file, so detach it from the binlog cache.
*/
void detach_temp_file();
/*
Cache to store data before copying it to the binary log.
*/
......@@ -253,6 +345,12 @@ class binlog_cache_data
*/
ulong *ptr_binlog_cache_disk_use;
/*
Stores the bytes reserved at the begin of the cache file. It could be
0 for cases that reserved space are not supported. see write_prepare().
*/
uint32 m_file_reserved_bytes {0};
/*
It truncates the cache to a certain position. This includes deleting the
pending event.
......@@ -266,12 +364,18 @@ class binlog_cache_data
delete pending();
set_pending(0);
}
my_bool res __attribute__((unused))=
reinit_io_cache(&cache_log, WRITE_CACHE, pos, 0, reset_cache);
my_bool res __attribute__((unused))= reinit_io_cache(
&cache_log, WRITE_CACHE, pos + m_file_reserved_bytes, 0, reset_cache);
DBUG_ASSERT(res == 0);
cache_log.end_of_file= saved_max_binlog_cache_size;
}
/**
Reserve required space at the begin of the tempoary file. It will create
the temporary file if it doesn't exist.
*/
bool init_file_reserved_bytes();
binlog_cache_data& operator=(const binlog_cache_data& info);
binlog_cache_data(const binlog_cache_data& info);
};
......@@ -3336,6 +3336,14 @@ class Gtid_log_event: public Log_event
uint64 sa_seq_no; // start alter identifier for CA/RA
#ifdef MYSQL_SERVER
event_xid_t xid;
/*
Pad the event to this size if it is not zero. It is only used for renaming
a binlog cache to binlog file. There is some reserved space for gtid event
and the events at the begin of the binlog file. There must be some space
left after the events are filled. Thus the left space is padded into the
gtid event with 0.
*/
uint64 pad_to_size;
#else
event_mysql_xid_t xid;
#endif
......@@ -3400,6 +3408,11 @@ class Gtid_log_event: public Log_event
static const uchar FL_EXTRA_THREAD_ID= 16; // thread_id like in BEGIN Query
#ifdef MYSQL_SERVER
static const uint max_data_length= GTID_HEADER_LEN + 2 + sizeof(XID)
+ 1 /* flags_extra: */
+ 4 /* Extra Engines */
+ 4 /* FL_EXTRA_THREAD_ID */;
Gtid_log_event(THD *thd_arg, uint64 seq_no, uint32 domain_id, bool standalone,
uint16 flags, bool is_transactional, uint64 commit_id,
bool has_xid= false, bool is_ro_1pc= false);
......
......@@ -29,6 +29,7 @@
#include "unireg.h"
#include "log_event.h"
#include "log_cache.h"
#include "sql_base.h" // close_thread_tables
#include "sql_cache.h" // QUERY_CACHE_FLAGS_SIZE
#include "sql_locale.h" // MY_LOCALE, my_locale_by_number, my_locale_en_US
......@@ -690,6 +691,9 @@ void Log_event::init_show_field_list(THD *thd, List<Item>* field_list)
int Log_event_writer::write_internal(const uchar *pos, size_t len)
{
DBUG_ASSERT(!ctx || encrypt_or_write == &Log_event_writer::encrypt_and_write);
if (cache_data && cache_data->write_prepare(len))
return 1;
if (my_b_safe_write(file, pos, len))
{
DBUG_PRINT("error", ("write to log failed: %d", my_errno));
......@@ -2839,7 +2843,7 @@ Gtid_log_event::Gtid_log_event(THD *thd_arg, uint64 seq_no_arg,
bool ro_1pc)
: Log_event(thd_arg, flags_arg, is_transactional),
seq_no(seq_no_arg), commit_id(commit_id_arg), domain_id(domain_id_arg),
flags2((standalone ? FL_STANDALONE : 0) |
pad_to_size(0), flags2((standalone ? FL_STANDALONE : 0) |
(commit_id_arg ? FL_GROUP_COMMIT_ID : 0)),
flags_extra(0), extra_engines(0),
thread_id(thd_arg->variables.pseudo_thread_id)
......@@ -2959,10 +2963,7 @@ Gtid_log_event::peek(const uchar *event_start, size_t event_len,
bool
Gtid_log_event::write(Log_event_writer *writer)
{
uchar buf[GTID_HEADER_LEN + 2 + sizeof(XID)
+ 1 /* flags_extra: */
+ 4 /* Extra Engines */
+ 4 /* FL_EXTRA_THREAD_ID */];
uchar buf[max_data_length];
size_t write_len= 13;
int8store(buf, seq_no);
......@@ -3042,6 +3043,27 @@ Gtid_log_event::write(Log_event_writer *writer)
bzero(buf+write_len, GTID_HEADER_LEN-write_len);
write_len= GTID_HEADER_LEN;
}
if (unlikely(pad_to_size > write_len))
{
if (write_header(writer, pad_to_size) ||
write_data(writer, buf, write_len))
return true;
pad_to_size-= write_len;
char pad_buf[IO_SIZE];
bzero(pad_buf, pad_to_size);
while (pad_to_size)
{
uint64 size= pad_to_size >= IO_SIZE ? IO_SIZE : pad_to_size;
if (write_data(writer, pad_buf, size))
return true;
pad_to_size-= size;
}
return write_footer(writer);
}
return write_header(writer, write_len) ||
write_data(writer, buf, write_len) ||
write_footer(writer);
......
......@@ -120,7 +120,7 @@
#include "sp_cache.h"
#include "sql_reload.h" // reload_acl_and_cache
#include "sp_head.h" // init_sp_psi_keys
#include "log_cache.h"
#include <mysqld_default_groups.h>
#ifdef HAVE_POLL_H
......@@ -5612,6 +5612,8 @@ static int init_server_components()
mysql_mutex_unlock(log_lock);
if (unlikely(error))
unireg_abort(1);
if (unlikely(init_binlog_cache_dir()))
unireg_abort(1);
}
#ifdef HAVE_REPLICATION
......
......@@ -1820,7 +1820,8 @@ static Sys_var_on_access_global<Sys_var_ulong,
Sys_max_binlog_size(
"max_binlog_size",
"Binary log will be rotated automatically when the size exceeds this "
"value",
"value, unless `binlog_large_commit_threshold` causes rotation "
"prematurely",
GLOBAL_VAR(max_binlog_size), CMD_LINE(REQUIRED_ARG),
VALID_RANGE(IO_SIZE, 1024*1024L*1024L), DEFAULT(1024*1024L*1024L),
BLOCK_SIZE(IO_SIZE), NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(0),
......@@ -3276,7 +3277,10 @@ static Sys_var_ulonglong Sys_thread_stack(
BLOCK_SIZE(1024));
static Sys_var_charptr_fscs Sys_tmpdir(
"tmpdir", "Path for temporary files. Several paths may "
"tmpdir",
"Path for temporary files. Files that are created in background for "
"binlogging by user threads are placed in a separate location "
"(see `binlog_large_commit_threshold` option). Several paths may "
"be specified, separated by a "
#if defined(_WIN32)
"semicolon (;)"
......@@ -7414,3 +7418,18 @@ static Sys_var_enum Sys_block_encryption_mode(
"AES_ENCRYPT() and AES_DECRYPT() functions",
SESSION_VAR(block_encryption_mode), CMD_LINE(REQUIRED_ARG),
block_encryption_mode_values, DEFAULT(0));
extern ulonglong opt_binlog_commit_by_rotate_threshold;
static Sys_var_ulonglong Sys_binlog_large_commit_threshold(
"binlog_large_commit_threshold",
"Increases transaction concurrency for large transactions (i.e. "
"those with sizes larger than this value) by using the large "
"transaction's cache file as a new binary log, and rotating the "
"active binary log to the large transaction's cache file at commit "
"time. This avoids the default commit logic that copies the "
"transaction cache data to the end of the active binary log file "
"while holding a lock that prevents other transactions from "
"binlogging",
GLOBAL_VAR(opt_binlog_commit_by_rotate_threshold),
CMD_LINE(REQUIRED_ARG), VALID_RANGE(10 * 1024 * 1024, ULLONG_MAX),
DEFAULT(128 * 1024 * 1024), BLOCK_SIZE(1));
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment