• Marko Mäkelä's avatar
    MDEV-33515 log_sys.lsn_lock causes excessive context switching · bf0b82d2
    Marko Mäkelä authored
    The log_sys.lsn_lock is a very contended resource with a small
    critical section in log_sys.append_prepare(). On many processor
    microarchitectures, replacing the system call based log_sys.lsn_lock
    with a pure spin lock would fare worse during high concurrency workloads,
    wasting a significant amount of CPU cycles in the spin loop.
    
    On other microarchitectures, we would see a significant amount of time
    being spent in native_queued_spin_lock_slowpath() in the Linux kernel,
    plus context switching between user and kernel address space. This was
    pointed out by Steve Shaw from Intel Corporation.
    
    Depending on the workload and the hardware implementation, it may be
    useful to use a pure spin lock in log_sys.append_prepare().
    We will introduce a parameter. The statement
    
    	SET GLOBAL INNODB_LOG_SPIN_WAIT_DELAY=50;
    
    would enable a spin lock that will execute that many MY_RELAX_CPU()
    operations (such as the x86 PAUSE instruction) between successive
    attempts of acquiring the spin lock. The use of a system call based
    log_sys.lsn_lock (which is the default setting) can be enabled by
    
    	SET GLOBAL INNODB_LOG_SPIN_WAIT_DELAY=0;
    
    This patch will also introduce #ifdef LOG_LATCH_DEBUG
    (part of cmake -DWITH_INNODB_EXTRA_DEBUG=ON) for more accurate
    tracking of log_sys.latch ownership and reorganize the fields of
    log_sys to improve the locality of reference and to reduce the
    chances of false sharing.
    
    When a spin lock is being used, it will be maintained in the
    most significant bit of log_sys.buf_free. This is useful, because that is
    one of the fields that is covered by the lock. For IA-32 or AMD64, we
    implement the spin lock specially via log_t::lsn_lock_bts(), employing the
    i386 LOCK BTS instruction. A straightforward std::atomic::fetch_or() would
    translate into an inefficient loop around LOCK CMPXCHG.
    
    mtr_t::spin_wait_delay: The value of innodb_log_spin_wait_delay.
    
    mtr_t::finisher: Pointer to the currently used mtr_t::finish_write()
    implementation. This allows to avoid introducing conditional branches.
    We no longer invoke log_sys.is_pmem() at the mini-transaction level,
    but we would do that in log_write_up_to().
    
    mtr_t::finisher_update(): Update finisher when spin_wait_delay is
    changed from or to 0 (the spin lock is changed to log_sys.lsn_lock or
    vice versa).
    bf0b82d2
CMakeLists.txt 12.4 KB