An error occurred fetching the project authors.
  1. 22 Nov, 2011 1 commit
  2. 26 Oct, 2011 2 commits
  3. 30 Sep, 2011 2 commits
  4. 29 Aug, 2011 1 commit
    • Marko Mäkelä's avatar
      Bug#12704861 Corruption after a crash during BLOB update · 41f229cd
      Marko Mäkelä authored
      The fix of Bug#12612184 broke crash recovery. When a record that
      contains off-page columns (BLOBs) is updated, we must first write redo
      log about the BLOB page writes, and only after that write the redo log
      about the B-tree changes. The buggy fix would log the B-tree changes
      first, meaning that after recovery, we could end up having a record
      that contains a null BLOB pointer.
      
      Because we will be redo logging the writes off the off-page columns
      before the B-tree changes, we must make sure that the pages chosen for
      the off-page columns are free both before and after the B-tree
      changes. In this way, the worst thing that can happen in crash
      recovery is that the BLOBs are written to free pages, but the B-tree
      changes are not applied. The BLOB pages would correctly remain free in
      this case. To achieve this, we must allocate the BLOB pages in the
      mini-transaction of the B-tree operation. A further quirk is that BLOB
      pages are allocated from the same file segment as leaf pages. Because
      of this, we must temporarily "hide" any leaf pages that were freed
      during the B-tree operation by "fake allocating" them prior to writing
      the BLOBs, and freeing them again before the mtr_commit() of the
      B-tree operation, in btr_mark_freed_leaves().
      
      btr_cur_mtr_commit_and_start(): Remove this faulty function that was
      introduced in the Bug#12612184 fix. The problem that this function was
      trying to address was that when we did mtr_commit() the BLOB writes
      before the mtr_commit() of the update, the new BLOB pages could have
      overwritten clustered index B-tree leaf pages that were freed during
      the update. If recovery applied the redo log of the BLOB writes but
      did not see the log of the record update, the index tree would be
      corrupted. The correct solution is to make the freed clustered index
      pages unavailable to the BLOB allocation. This function is also a
      likely culprit of InnoDB hangs that were observed when testing the
      Bug#12612184 fix.
      
      btr_mark_freed_leaves(): Mark all freed clustered index leaf pages of
      a mini-transaction allocated (nonfree=TRUE) before storing the BLOBs,
      or freed (nonfree=FALSE) before committing the mini-transaction.
      
      btr_freed_leaves_validate(): A debug function for checking that all
      clustered index leaf pages that have been marked free in the
      mini-transaction are consistent (have not been zeroed out).
      
      btr_page_alloc_low(): Refactored from btr_page_alloc(). Return the
      number of the allocated page, or FIL_NULL if out of space. Add the
      parameter "mtr_t* init_mtr" for specifying the mini-transaction where
      the page should be initialized, or if this is a "fake allocation"
      (init_mtr=NULL) by btr_mark_freed_leaves(nonfree=TRUE).
      
      btr_page_alloc(): Add the parameter init_mtr, allowing the page to be
      initialized and X-latched in a different mini-transaction than the one
      that is used for the allocation. Invoke btr_page_alloc_low(). If a
      clustered index leaf page was previously freed in mtr, remove it from
      the memo of previously freed pages.
      
      btr_page_free(): Assert that the page is a B-tree page and it has been
      X-latched by the mini-transaction. If the freed page was a leaf page
      of a clustered index, link it by a MTR_MEMO_FREE_CLUST_LEAF marker to
      the mini-transaction.
      
      btr_store_big_rec_extern_fields_func(): Add the parameter alloc_mtr,
      which is NULL (old behaviour in inserts) and the same as local_mtr in
      updates. If alloc_mtr!=NULL, the BLOB pages will be allocated from it
      instead of the mini-transaction that is used for writing the BLOBs.
      
      fsp_alloc_from_free_frag(): Refactored from
      fsp_alloc_free_page(). Allocate the specified page from a partially
      free extent.
      
      fseg_alloc_free_page_low(), fseg_alloc_free_page_general(): Add the
      parameter "mtr_t* init_mtr" for specifying the mini-transaction where
      the page should be initialized, or NULL if this is a "fake allocation"
      that prevents the reuse of a previously freed B-tree page for BLOB
      storage. If init_mtr==NULL, try harder to reallocate the specified page
      and assert that it succeeded.
      
      fsp_alloc_free_page(): Add the parameter "mtr_t* init_mtr" for
      specifying the mini-transaction where the page should be initialized.
      Do not allow init_mtr == NULL, because this function is never to be
      used for "fake allocations".
      
      mtr_t: Add the operation MTR_MEMO_FREE_CLUST_LEAF and the flag
      mtr->freed_clust_leaf for quickly determining if any
      MTR_MEMO_FREE_CLUST_LEAF operations have been posted.
      
      row_ins_index_entry_low(): When columns are being made off-page in
      insert-by-update, invoke btr_mark_freed_leaves(nonfree=TRUE) and pass
      the mini-transaction as the alloc_mtr to
      btr_store_big_rec_extern_fields(). Finally, invoke
      btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
      
      row_build(): Correct a comment, and add a debug assertion that a
      record that contains NULL BLOB pointers must be a fresh insert.
      
      row_upd_clust_rec(): When columns are being moved off-page, invoke
      btr_mark_freed_leaves(nonfree=TRUE) and pass the mini-transaction as
      the alloc_mtr to btr_store_big_rec_extern_fields(). Finally, invoke
      btr_mark_freed_leaves(nonfree=FALSE) to avoid leaking pages.
      
      buf_reset_check_index_page_at_flush(): Remove. The function
      fsp_init_file_page_low() already sets
      bpage->check_index_page_at_flush=FALSE.
      
      There is a known issue in tablespace extension. If the request to
      allocate a BLOB page leads to the tablespace being extended, crash
      recovery could see BLOB writes to pages that are off the tablespace
      file bounds. This should trigger an assertion failure in fil_io() at
      crash recovery. The safe thing would be to write redo log about the
      tablespace extension to the mini-transaction of the BLOB write, not to
      the mini-transaction of the record update. However, there is no redo
      log record for file extension in the current redo log format.
      
      rb:693 approved by Sunny Bains
      41f229cd
  5. 16 Jun, 2011 1 commit
    • Marko Mäkelä's avatar
      Bug#12612184 Race condition after btr_cur_pessimistic_update() · 5b4ceba5
      Marko Mäkelä authored
      btr_cur_compress_if_useful(), btr_compress(): Add the parameter ibool
      adjust. If adjust=TRUE, adjust the cursor position after compressing
      the page.
      
      btr_lift_page_up(): Return a pointer to the father page.
      
      BTR_KEEP_POS_FLAG: A new flag for btr_cur_pessimistic_update().
      
      btr_cur_pessimistic_update(): If *big_rec != NULL and flags &
      BTR_KEEP_POS_FLAG, keep the cursor positioned on the updated record.
      Also, do not release the index tree x-lock if *big_rec != NULL.
      
      btr_cur_mtr_commit_and_start(): Commits and restarts a
      mini-transaction so that it will retain an x-lock on index->lock and
      the page of the cursor. This is invoked when
      btr_cur_pessimistic_update() returns *big_rec != NULL.
      
      In all callers of btr_cur_pessimistic_update() that do not pass
      BTR_KEEP_POS_FLAG, assert that *big_rec == NULL.
      
      btr_cur_compress(): Unused function [in the built-in MySQL 5.1], remove.
      
      page_rec_get_nth(): Return the nth record on the page (an inverse
      function of page_rec_get_n_recs_before()). Refactored from
      page_get_middle_rec().
      
      page_get_middle_rec(): Invoke page_rec_get_nth().
      
      page_cur_insert_rec_zip_reorg(): Make use of the page directory
      shortcuts in page_rec_get_nth() instead of scanning the whole list of
      records.
      
      row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG to
      btr_cur_pessimistic_update().
      
      row_ins_index_entry_low(): If row_ins_clust_index_entry_by_modify()
      returns a big_rec, invoke btr_cur_mtr_commit_and_start() in order to
      commit and start the mini-transaction without releasing the x-locks on
      index->lock and the cursor page, and write the big_rec. Releasing the
      page latch in mtr_commit() caused a race condition.
      
      row_upd_clust_rec(): Pass BTR_KEEP_POS_FLAG to
      btr_cur_pessimistic_update(). If it returns a big_rec, invoke
      btr_cur_mtr_commit_and_start() in order to commit and start the
      mini-transaction without releasing the x-locks on index->lock and the
      cursor page, and write the big_rec. Releasing the page latch in
      mtr_commit() caused a race condition.
      
      sync_thread_add_level(): Add the parameter ibool relock. When TRUE,
      bypass the latching order rules.
      
      rw_lock_add_debug_info(): For nested X-lock requests, pass relock=TRUE
      to sync_thread_add_level().
      
      rb:678 approved by Jimmy Yang
      5b4ceba5
  6. 14 Jan, 2011 1 commit
  7. 16 Aug, 2010 2 commits
    • Vasil Dimov's avatar
      Fix Bug#53761 RANGE estimation for matched rows may be 200 times different · 7f62ec7b
      Vasil Dimov authored
      Improve the range estimation algorithm.
      
      Previously:
      For a given level the algo knows the number of pages in the requested range and the n
      
      With this change:
      Same idea, but peek a few (10) of the intermediate pages to get a better estimate of 
      
      In the bug report one of the examples has a btree with a snippet of the leaf level li
      page1(899 records), page2(1 record), page3(1 record), page4(1 record)
      so when trying to estimate, the previous algo, assumed there are average (899+1)/2=45
      Fix Bug#53761 RANGE estimation for matched rows may be 200 times different
      
      Improve the range estimation algorithm.
      
      Previously:
      For a given level the algo knows the number of pages in the requested range
      and the number of records on the leftmost and the rightmost page. Then it
      assumes all pages in between contain the average between the two border pages
      and multiplies this average number by the number of intermediate pages.
      
      With this change:
      Same idea, but peek a few (10) of the intermediate pages to get a better
      estimate of the average number of records per page. If there are less than 10
      intermediate pages then all of them will be scanned and the result will be
      precise, not an estimation.
      
      In the bug report one of the examples has a btree with a snippet of the leaf
      level like this:
      page1(899 records), page2(1 record), page3(1 record), page4(1 record)
      so when trying to estimate, the previous algo, assumed there are average
      (899+1)/2=450 records per page which went terribly wrong. With this change
      page2 and page3 will be read and the exact number of records will be returned.
      
      Approved by:	Sunny (rb://401)
      7f62ec7b
    • Sunny Bains's avatar
      Merge from -c3476 mysql-5.1-security. · 3c4d4e0a
      Sunny Bains authored
           ------------------------------------------------------------
           revno: 3476
           committer: Sunny Bains <Sunny.Bains@Oracle.Com>
           branch nick: 5.1-security
           timestamp: Thu 2010-08-05 19:18:17 +1000
           message:
             Fix bug# 55543 - InnoDB Plugin: Signal 6: Assertion failure in file fil/fil0fil.c line 4306
      
               The bug is due to a double delete of a BLOB, once via:
      
                     rollback -> btr_cur_pessimistic_delete()
      
               and the second time via purge.
      
               The bug is in row_upd_clust_rec_by_insert(). There we relinquish ownership
               of the non-updated BLOB columns in btr_cur_mark_extern_inherited_fields()
               before building the row entry that will be inserted and whose contents will
               be logged in the UNDO log. However, we don't set the BLOB column later to
               INHERITED so that a possible rollback will not free the original row's
               non-updated BLOB entries. This is because the condition that checks for
               that is in :
      
           		    	if (node->upd_ext) {}.
      
               node->upd_ext is non-NULL only if a BLOB column was updated and that column
               is part of some key ordering (see row_upd_replace()). This results in the
               non-update BLOB columns being deleted during a rollback and subsequently by
               purge again.
      
               rb://413
      3c4d4e0a
  8. 29 Jun, 2010 1 commit
    • Marko Mäkelä's avatar
      Merge Bug#54358 fix from mysql-5.1-innodb: · 7c718cdb
      Marko Mäkelä authored
      ------------------------------------------------------------
      revno: 3529
      revision-id: marko.makela@oracle.com-20100629125518-m3am4ia1ffjr0d0j
      parent: jimmy.yang@oracle.com-20100629024137-690sacm5sogruzvb
      committer: Marko Mäkelä <marko.makela@oracle.com>
      branch nick: 5.1-innodb
      timestamp: Tue 2010-06-29 15:55:18 +0300
      message:
        Bug#54358: READ UNCOMMITTED access failure of off-page DYNAMIC or COMPRESSED
        columns
      
        When the server crashes after a record stub has been inserted and
        before all its off-page columns have been written, the record will
        contain incomplete off-page columns after crash recovery. Such records
        may only be accessed at the READ UNCOMMITTED isolation level or when
        rolling back a recovered transaction in recv_recovery_rollback_active().
        Skip these records at the READ UNCOMMITTED isolation level.
      
        TODO: Add assertions for checking the above assumptions hold when an
        incomplete BLOB is encountered.
      
        btr_rec_copy_externally_stored_field(): Return NULL if the field is
        incomplete.
      
        row_prebuilt_t::templ_contains_blob: Clarify what "BLOB" means in this
        context. Hint: MySQL BLOBs are not the same as InnoDB BLOBs.
      
        row_sel_store_mysql_rec(): Return FALSE if not all columns could be
        retrieved. Previously this function always returned TRUE.  Assert that
        the record is not delete-marked.
      
        row_sel_push_cache_row_for_mysql(): Return FALSE if not all columns
        could be retrieved.
      
        row_search_for_mysql(): Skip records containing incomplete off-page
        columns. Assert that the transaction isolation level is READ
        UNCOMMITTED.
      
        rb://380 approved by Jimmy Yang
      7c718cdb
  9. 08 Sep, 2009 1 commit
  10. 07 Aug, 2009 1 commit
    • Guilhem Bichot's avatar
      Renamed storage/innodb_plugin to storage/innobase, so that 1) it's the same · 7ceb29ff
      Guilhem Bichot authored
      layout as we always had in trees containing only the builtin
      2) win\configure.js WITH_INNOBASE_STORAGE_ENGINE still works.
      
      storage/innobase/CMakeLists.txt:
        fix to new directory name (and like 5.1)
      storage/innobase/Makefile.am:
        fix to new directory name (and like 5.1)
      storage/innobase/handler/ha_innodb.cc:
        fix to new directory name (and like 5.1)
      storage/innobase/plug.in:
        fix to new directory name (and like 5.1)
      7ceb29ff
  11. 27 May, 2009 1 commit