• Marko Mäkelä's avatar
    MDEV-25404: ssux_lock_low: Introduce a separate writer mutex · 8751aa73
    Marko Mäkelä authored
    Having both readers and writers use a single lock word in
    futex system calls caused performance regression compared to
    SRW_LOCK_DUMMY (mutex and 2 condition variables).
    A contributing factor is that we did not accurately keep
    track of the number of waiting threads and thus had to invoke
    system calls to wake up any waiting threads.
    
    SUX_LOCK_GENERIC: Renamed from SRW_LOCK_DUMMY. This is the
    original implementation, with rw_lock (std::atomic<uint32_t>),
    a mutex and two condition variables. Using a separate writer
    mutex (as described below) is not possible, because the mutex ownership
    in a buf_block_t::lock must be able to transfer from a write submitter
    thread to an I/O completion thread, and pthread_mutex_lock() may assume
    that the submitter thread is recursively acquiring the mutex that it
    already holds, while in reality the I/O completion thread is the real
    owner. POSIX does not define an interface for requesting a mutex to
    be non-recursive.
    
    On Microsoft Windows, srw_lock_low will remain a simple wrapper of
    SRWLOCK. On 32-bit Microsoft Windows, sizeof(SRWLOCK)=4 while
    sizeof(srw_lock_low)=8.
    
    On other platforms, srw_lock_low is an alias of ssux_lock_low,
    the Simple (non-recursive) Shared/Update/eXclusive lock.
    
    In the futex-based implementation of ssux_lock_low (Linux, OpenBSD,
    Microsoft Windows), we shall use a dedicated mutex for exclusive
    requests (writer), and have a WRITER flag in the 'readers' lock word
    to inform that a writer is holding the lock or waiting for the lock to
    be granted. When the WRITER flag is set, all lock requests must acquire
    the writer mutex. Normally, shared (S) lock requests simply perform a
    compare-and-swap on the 'readers' word.
    
    Update locks are implemented as a combination of writer mutex
    and a normal counter in the 'readers' lock word. The conflict between
    U and X locks is guaranteed by the writer mutex.
    Unlike SUX_LOCK_GENERIC, wr_u_downgrade() will not wake up any pending
    rd_lock() waits. They will wait until u_unlock() releases the writer mutex.
    
    The ssux_lock_low is always wrapped by sux_lock (with a recursion count
    of U and X locks), used for dict_index_t::lock and buf_block_t::lock.
    Their memory footprint for the futex-based implementation will increase
    by sizeof(srw_mutex), or 4 bytes.
    
    This change addresses a performance regression in read-only benchmarks,
    such as sysbench oltp_read_only. Also write performance was improved.
    
    On 32-bit Linux and OpenBSD, lock_sys_t::hash_table will allocate
    two hash table elements for each srw_lock (14 instead of 15 hash
    table cells per 64-byte cache line on IA-32). On Microsoft Windows,
    sizeof(SRWLOCK)==sizeof(void*) and there is no change.
    
    Reviewed by: Vladislav Vaintroub
    Tested by: Axel Schwenke and Vladislav Vaintroub
    8751aa73
srw_lock.cc 10.8 KB