• Libing Song's avatar
    MDEV-32014 Rename binlog cache temporary file to binlog file · 12bdd58c
    Libing Song authored
               for large transaction
    
    Description
    ===========
    When a transaction commits, it copies the binlog events from
    binlog cache to binlog file. Very large transactions
    (eg. gigabytes) can stall other transactions for a long time
    because the data is copied while holding LOCK_log, which blocks
    other commits from binlogging.
    
    The solution in this patch is to rename the binlog cache file to
    a binlog file instead of copy, if the commiting transaction has
    large binlog cache. Rename is a very fast operation, it doesn't
    block other transactions a long time.
    
    Design
    ======
    * binlog_large_commit_threshold
      type: ulonglong
      scope: global
      dynamic: yes
      default: 128MB
    
      Only the binlog cache temporary files large than 128MB are
      renamed to binlog file.
    
    * #binlog_cache_files directory
      To support rename, all binlog cache temporary files are managed
      as normal files now. `#binlog_cache_files` directory is in the same
      directory with binlog files. It is created at server startup if it doesn't
      exist. Otherwise, all files in the directory is deleted at startup.
    
      The temporary files are named with ML_ prefix and the memorary address
      of the binlog_cache_data object which guarantees it is unique.
    
    * Reserve space
      To supprot rename feature, It must reserve enough space at the
      begin of the binlog cache file. The space is required for
      Format description, Gtid list, checkpoint and Gtid events when
      renaming it to a binlog file.
    
      Since binlog_cache_data's cache_log is directly accessed by binlog log,
      online alter and wsrep. It is not easy to update all the code. Thus
      binlog cache will not reserve space if it is not session binlog cache or
      wsrep session is enabled.
    
      - m_file_reserved_bytes
        Stores the bytes reserved at the begin of the cache file.
        It is initialized in write_prepare() and cleared by reset().
    
        The reserved file header is hide to callers. Thus there is no
        change for callers. E.g.
        - get_byte_position() still get the length of binlog data
          written to the cache, but not the file length.
        - truncate(0) will truncate the file to m_file_reserved_bytes but not 0.
    
      - write_prepare()
        write_prepare() is called everytime when anything is being written
        into the cache. It will call init_file_reserved_bytes() to  create
        the cache file (if it doesn't exist) and reserve suitable space if
        the data written exceeds buffer's size.
    
    * Binlog_commit_by_rotate
      It is used to encapsulate the code for remaing a binlog cache
      tempoary file to binlog file.
      - should_commit_by_rotate()
        it is called by write_transaction_to_binlog_events() to check if
        a binlog cache should be rename to a binlog file.
      - commit()
        That is the entry to rename a binlog cache and commit the
        transaction. Both rename and commit are protected by LOCK_log,
        Thus not other transactions can write anything into the renamed
        binlog before it.
    
        Rename happens in a rotation. After the new binlog file is generated,
        replace_binlog_file() is called to:
        - copy data from the new binlog file to its binlog cache file.
        - write gtid event.
        - rename the binlog cache file to binlog file.
    
        After that the rotation will continue to succeed. Then the transaction
        is committed in a seperated group itself. Its cache file will be
        detached and cache log will be reset before calling
        trx_group_commit_with_engines(). Thus only Xid event be written.
    12bdd58c
rpl_binlog_commit_by_rotate.test 9.58 KB