• Marko Mäkelä's avatar
    MDEV-24142: Replace InnoDB rw_lock_t with sux_lock · 03ca6495
    Marko Mäkelä authored
    InnoDB buffer pool block and index tree latches depend on a
    special kind of read-update-write lock that allows reentrant
    (recursive) acquisition of the 'update' and 'write' locks
    as well as an upgrade from 'update' lock to 'write' lock.
    The 'update' lock allows any number of reader locks from
    other threads, but no concurrent 'update' or 'write' lock.
    
    If there were no requirement to support an upgrade from 'update'
    to 'write', we could compose the lock out of two srw_lock
    (implemented as any type of native rw-lock, such as SRWLOCK on
    Microsoft Windows). Removing this requirement is very difficult,
    so in commit f7e7f487d4b06695f91f6fbeb0396b9d87fc7bbf we
    implemented an 'update' mode to our srw_lock.
    
    Re-entrant or recursive locking is mostly needed when writing or
    freeing BLOB pages, but also in crash recovery or when merging
    buffered changes to an index page. The re-entrancy allows us to
    attach a previously acquired page to a sub-mini-transaction that
    will be committed before whatever else is holding the page latch.
    
    The SUX lock supports Shared ('read'), Update, and eXclusive ('write')
    locking modes. The S latches are not re-entrant, but a single S latch
    may be acquired even if the thread already holds an U latch.
    
    The idea of the U latch is to allow a write of something that concurrent
    readers do not care about (such as the contents of BTR_SEG_LEAF,
    BTR_SEG_TOP and other page allocation metadata structures, or
    the MDEV-6076 PAGE_ROOT_AUTO_INC). (The PAGE_ROOT_AUTO_INC field
    is only updated when a dict_table_t for the table exists, and only
    read when a dict_table_t for the table is being added to dict_sys.)
    
    block_lock::u_lock_try(bool for_io=true) is used in buf_flush_page()
    to allow concurrent readers but no concurrent modifications while the
    page is being written to the data file. That latch will be released
    by buf_page_write_complete() in a different thread. Hence, we use
    the special lock owner value FOR_IO.
    
    The index_lock::u_lock() improves concurrency on operations that
    involve non-leaf index pages.
    
    The interface has been cleaned up a little. We will use
    x_lock_recursive() instead of x_lock() when we know that a
    lock is already held by the current thread. Similarly,
    a lock upgrade from U to X is only allowed via u_x_upgrade()
    or x_lock_upgraded() but not via x_lock().
    
    We will disable the LatchDebug and sync_array interfaces to
    InnoDB rw-locks.
    
    The SEMAPHORES section of SHOW ENGINE INNODB STATUS output
    will no longer include any information about InnoDB rw-locks,
    only TTASEventMutex (cmake -DMUTEXTYPE=event) waits.
    This will make a part of the 'innotop' script dead code.
    
    The block_lock buf_block_t::lock will not be covered by any
    PERFORMANCE_SCHEMA instrumentation.
    
    SHOW ENGINE INNODB MUTEX and INFORMATION_SCHEMA.INNODB_MUTEXES
    will no longer output source code file names or line numbers.
    The dict_index_t::lock will be identified by index and table names,
    which should be much more useful. PERFORMANCE_SCHEMA is lumping
    information about all dict_index_t::lock together as
    event_name='wait/synch/sxlock/innodb/index_tree_rw_lock'.
    
    buf_page_free(): Remove the file,line parameters. The sux_lock will
    not store such diagnostic information.
    
    buf_block_dbg_add_level(): Define as empty macro, to be removed
    in a subsequent commit.
    
    Unless the build was configured with cmake -DPLUGIN_PERFSCHEMA=NO
    the index_lock dict_index_t::lock will be instrumented via
    PERFORMANCE_SCHEMA. Similar to
    commit 1669c889
    we will distinguish lock waits by registering shared_lock,exclusive_lock
    events instead of try_shared_lock,try_exclusive_lock.
    Actual 'try' operations will not be instrumented at all.
    
    rw_lock_list: Remove. After MDEV-24167, this only covered
    buf_block_t::lock and dict_index_t::lock. We will output their
    information by traversing buf_pool or dict_sys.
    03ca6495
sync0arr.ic 2.82 KB