1. 30 Nov, 2020 12 commits
    • Marko Mäkelä's avatar
      MDEV-24308: Windows improvements · 56009336
      Marko Mäkelä authored
      This reverts commit e34e53b5
      and defines os_thread_sleep() is a macro on Windows.
      56009336
    • Marko Mäkelä's avatar
      MDEV-24142: Remove the LatchDebug interface to rw-locks · f0363b92
      Marko Mäkelä authored
      The latching order checks for rw-locks have not caught many bugs
      in the past few years and they are greatly complicating the code.
      
      Last time the debug checks were useful was in
      commit 59caf2c3 (MDEV-13485).
      
      The B-tree hang MDEV-14637 was not caught by LatchDebug,
      because the granularity of the checks is not sufficient
      to distinguish the levels of non-leaf B-tree pages.
      
      The interface was already made dead code by the parent commit.
      f0363b92
    • Marko Mäkelä's avatar
      MDEV-24142: Replace rw_lock_t with sux_lock · ae1a7d89
      Marko Mäkelä authored
      This will also disable the LatchDebug and sync_array interface
      to InnoDB rw-locks.
      
      FIXME: Pass line number information to PERFORMANCE_SCHEMA
      for dict_index_t::lock
      
      FIXME: Test&document the changes to innotop
      
      Both dict_index_t::lock and buf_block_t::lock depend on
      re-entrant 'update' and 'write' locks. The two main users
      of recursion are the change buffer merge and operations on
      BLOB columns. In both cases, a 'sub-mini-transaction' is being used.
      It is much easier to implement re-entrant locks than to implement
      the migration of blocks between mini-transactions.
      
      The 'update' lock mode allows any number of concurrent reads,
      but no concurrent 'update' or 'write'. While an 'update' lock is
      present, 'read' lock requests may be served immediately, even if
      'write' lock requests are pending. This allows the 'update' lock
      holder thread to acquire one redundant 'read' lock.
      
      An 'update' lock may be upgraded to 'write', but it involves a special
      operation, separate from x_lock(). All existing 'update' lock references
      in the mini-transaction must be changed to 'write'. This is because after
      an x_unlock() operation on mtr_t::commit(), u_unlock() would not be
      permitted. In the old rw_lock_t this was permitted, because the lock word
      itself included an X-lock re-entrancy count, in addition to including
      the S-lock count.
      
      mtr_t::page_lock(): Renamed from buf_page_mtr_lock().
      
      buf_page_try_get_func(): Try U latch only, to avoid recursive S latch.
      ae1a7d89
    • Marko Mäkelä's avatar
      WIP remove rw_lock_list · 1d6beb3d
      Marko Mäkelä authored
      There only exist two types of interesting InnoDB rw_lock_t anymore:
      buf_block_t::lock and dict_index_t::lock. We can remove more of
      the debug fields.
      
      FIXME: Iterate dict_index_t::lock via dict_sys everywhere
      
      FIXME: Move count_os_waits to dict_index_t, and introduce something
      similar for block mutexes
      1d6beb3d
    • Marko Mäkelä's avatar
      srw_lock, rw_lock: Implement update mode · 4b5bf654
      Marko Mäkelä authored
      FIXME: Pass the line number information to performance_schema
      
      FIXME: Remove PSI_RWLOCK_SHAREDLOCK and friends, just use
      PSI_RWLOCK_READLOCK and friends.
      4b5bf654
    • Marko Mäkelä's avatar
      Add srw_mutex · b74dae03
      Marko Mäkelä authored
      b74dae03
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · cde525f9
      Marko Mäkelä authored
      cde525f9
    • Marko Mäkelä's avatar
      MDEV-24308: Revert for Windows · e34e53b5
      Marko Mäkelä authored
      For some reason, InnoDB debug tests on Windows fail due to rw_lock_t
      if the function call overhead for some os_thread_ code is removed.
      
      This change worked fine on Windows in combination with MDEV-24142.
      e34e53b5
    • Marko Mäkelä's avatar
      fc6a7e90
    • Marko Mäkelä's avatar
      MDEV-24167 fixup: Always derive srw_lock from rw_lock · 1fdc161d
      Marko Mäkelä authored
      Let us always base srw_lock on our own std::atomic<uint32_t>
      based rw_lock. In this way, we can extend the locks in a portable
      way across all platforms.
      
      We will use futex system calls where available:
      Linux, OpenBSD, and Microsoft Windows.
      
      Elsewhere, we will emulate futex with a mutex and a condition variable.
      
      Thanks to Daniel Black for testing this on OpenBSD.
      1fdc161d
    • Marko Mäkelä's avatar
      Merge 10.5 into 10.6 · 565b0dd1
      Marko Mäkelä authored
      565b0dd1
    • Marko Mäkelä's avatar
      MDEV-24308: Remove some os_thread_ functions · 8fa6e363
      Marko Mäkelä authored
      os_thread_pf(): Remove.
      
      os_thread_eq(), os_thread_yield(), os_thread_get_curr_id():
      Define as macros.
      
      ut_print_timestamp(), ut_sprintf_timestamp(): Simplify.
      8fa6e363
  2. 28 Nov, 2020 1 commit
  3. 27 Nov, 2020 1 commit
    • Igor Babaev's avatar
      MDEV-24242 Query returns wrong result while using big_tables=1 · b92391d5
      Igor Babaev authored
      When executing set operations in a pipeline using only one temporary table
      additional scans of intermediate results may be needed. The scans are
      performed with usage of the rnd_next() handler function that might
      leave record buffers used for the temporary table not in a state that
      is good for following writes into the table. For example it happens for
      aria engine when the last call of rnd_next() encounters only deleted
      records. Thus a cleanup of record buffers is needed after each such scan
      of the temporary table.
      
      Approved by Oleksandr Byelkin <sanja@mariadb.com>
      b92391d5
  4. 26 Nov, 2020 7 commits
  5. 25 Nov, 2020 11 commits
    • Marko Mäkelä's avatar
      MDEV-24280 InnoDB triggers too many independent periodic tasks · 657fcdf4
      Marko Mäkelä authored
      A side effect of MDEV-16264 is that a large number of threads will
      be created at server startup, to be destroyed after a minute or two.
      
      One source of such thread creation is srv_start_periodic_timer().
      InnoDB is creating 3 periodic tasks: srv_master_callback (1Hz)
      srv_error_monitor_task (1Hz), and srv_monitor_task (0.2Hz).
      
      It appears that we can merge srv_error_monitor_task and srv_monitor_task
      and have them invoked 4 times per minute (every 15 seconds). This will
      affect our ability to enforce innodb_fatal_semaphore_wait_threshold and
      some computations around BUF_LRU_STAT_N_INTERVAL.
      
      We could remove srv_master_callback along with the DROP TABLE queue
      at some point of time in the future. We must keep it independent
      of the innodb_fatal_semaphore_wait_threshold detection, because
      the background DROP TABLE queue could get stuck due to dict_sys
      being locked by another thread. For now, srv_master_callback
      must be invoked once per second, so that
      innodb_flush_log_at_timeout=1 can work.
      
      BUF_LRU_STAT_N_INTERVAL: Reduce the precision and extend the time
      from 50*1 second to 4*15 seconds.
      
      srv_error_monitor_timer: Remove.
      
      MAX_MUTEX_NOWAIT: Increase from 20*1 second to 2*15 seconds.
      
      srv_refresh_innodb_monitor_stats(): Avoid a repeated call to time(NULL).
      Change the interval to less than 60 seconds.
      
      srv_monitor(): Renamed from srv_monitor_task.
      
      srv_monitor_task(): Renamed from srv_error_monitor_task().
      Invoked only once in 15 seconds. Invoke also srv_monitor().
      Increase the fatal_cnt threshold from 10*1 second to 1*15 seconds.
      
      sync_array_print_long_waits_low(): Invoke time(NULL) only once.
      Remove a bogus message about printouts for 30 seconds. Those
      printouts were effectively already disabled in MDEV-16264
      (commit 5e62b6a5).
      657fcdf4
    • Marko Mäkelä's avatar
      MDEV-24278 InnoDB page cleaner keeps waking up on idle server · 7b1252c0
      Marko Mäkelä authored
      The purpose of the InnoDB page cleaner subsystem is to write out
      modified pages from the buffer pool to data files. When the
      innodb_max_dirty_pages_pct_lwm is not exceeded or
      innodb_adaptive_flushing=ON decides not to write out anything,
      the page cleaner should keep sleeping indefinitely until the state
      of the system changes: a dirty page is added to the buffer pool such
      that the page cleaner would no longer be idle.
      
      buf_flush_page_cleaner(): Explicitly note when the page cleaner is idle.
      When that happens, use mysql_cond_wait() instead of mysql_cond_timedwait().
      
      buf_flush_insert_into_flush_list(): Wake up the page cleaner if needed.
      
      innodb_max_dirty_pages_pct_update(),
      innodb_max_dirty_pages_pct_lwm_update():
      Wake up the page cleaner just in case.
      
      Note: buf_flush_ahead(), buf_flush_wait_flushed() and shutdown are
      already waking up the page cleaner thread.
      7b1252c0
    • Marko Mäkelä's avatar
      MDEV-24270: Clarify some comments · f693b725
      Marko Mäkelä authored
      f693b725
    • Vladislav Vaintroub's avatar
      Fix misspelling. · 2de95f7a
      Vladislav Vaintroub authored
      Kudos to Marko for finding.
      2de95f7a
    • Vladislav Vaintroub's avatar
      Cleanup. Remove obsolete comment · af98fddc
      Vladislav Vaintroub authored
      af98fddc
    • Vladislav Vaintroub's avatar
    • Vladislav Vaintroub's avatar
      Partially Revert "MDEV-24270: Collect multiple completed events at a time" · 78df9e37
      Vladislav Vaintroub authored
      This partially reverts commit 6479006e.
      
      Remove the constant tpool::aio::N_PENDING, which has no
      intrinsic meaning for the tpool.
      78df9e37
    • Marko Mäkelä's avatar
    • Marko Mäkelä's avatar
      4a22056c
    • Marko Mäkelä's avatar
      MDEV-24270: Collect multiple completed events at a time · 6479006e
      Marko Mäkelä authored
      tpool::aio::N_PENDING: Replaces OS_AIO_N_PENDING_IOS_PER_THREAD.
      This limits two similar things: the number of outstanding requests
      that a thread may io_submit(), and the number of completed requests
      collected at a time by io_getevents().
      6479006e
    • Marko Mäkelä's avatar
      MDEV-24270 Misuse of io_getevents() causes wake-ups at least twice per second · 7a9405e3
      Marko Mäkelä authored
      In the asynchronous I/O interface, InnoDB is invoking io_getevents()
      with a timeout value of half a second, and requesting exactly 1 event
      at a time.
      
      The reason to have such a short timeout is to facilitate shutdown.
      
      We can do better: Use an infinite timeout, wait for a larger maximum
      number of events. On shutdown, we will invoke io_destroy(), which
      should lead to the io_getevents system call reporting EINVAL.
      
      my_getevents(): Reimplement the libaio io_getevents() by only invoking
      the system call. The library implementation would try to elide the
      system call and return 0 immediately if aio_ring_is_empty() holds.
      Here, we do want a blocking system call, not 100% CPU usage. Neither
      do we want the aio_ring_is_empty() trigger SIGSEGV because it is
      dereferencing some memory that was freed by io_destroy().
      7a9405e3
  6. 24 Nov, 2020 8 commits