• Vanislavsky's avatar
    MDEV-30182: Optimize open_tables to take O(N) time · d81b581b
    Vanislavsky authored
    ======
    This is an  adopted version of patch provided in the PR #MariaDB/2680
    Original author: Sergey Vanislavskiy
    Adopted by: Nikita Maliavin
    ======
    
    Table opening stage was known to make two list traversals:
    1. In find_fk_prelocked_table, all the query_tables list is traversed for each
    foreign key of a table to open.
    2. MDL_context::find_ticket traverses all mdl tickets, one ticket per table.
    
    Both result in O(tables^2) time total.
    This may dramatically increase the query latencty in the following known cases:
    * updates/deletes on tables with many children
    * DMLs in transactions involving many different tables
    
    Also, it slows down the DROP DATABASE performance, with a big enough amount of
    tables.
    
    So to optimize this out the following is done:
    * A hash table with all FK-prelocked tables is added to THD. A table is filled
    and queried inside find_fk_prelocked_table, and cleaned up at the query end.
    * The find_ticket implementation is replaced with two consecutive hash lookups.
    * A hash table with all tickets for current context (Query_tables_list) is
    added.
    * find_fk_prelocked_table now makes a hash lookup as well. We have to calculate
    a hash value for this lookup, so MDL_key is created earlier. It's then reused
    if a new table list is created.
    
    Given the requirement of no performance degradation for a small table value,
    a new data structure is introduced: Open_address_hash.
    
    Open_address_hash is a generic data structure, that is optimized for usage with
    a few elements, similarly to the "short string optimization" in c++ standard
    libraries: if a number of elements can fit the structure's class body inline,
    then they are placed there. One bit is borrowed to indicate whether the inline
    mode is used.
    
    This also means that this optimization cannot work in 32-bit environments.
    
    The hash table is implemented from scratch using open address hashing to reduce
    memory fragmentation, reduce allocation pressure and memory footprint.
    
    Speaking of the memory footprint, it is expected to be as follows:
    * +4x pointer size for each connection (2 pointers per hash, two hashes total)
    * +4x pointer size for each prepared statement, or cached/exected stored
    procedure.
    * If number of tables opened > 2, then +2x pointer size per table, because the
    hash load factor is kept 50% at this moment
    * If number of FK-prelocked tables opened > 2, then +2x pointer size per
    FK-prelocked table.
    d81b581b
mdl.cc 104 KB