Commit ae309bd3 authored by Marko Mäkelä's avatar Marko Mäkelä

Bug#13721257 RACE CONDITION IN UPDATES OR INSERTS OF WIDE RECORDS

This bug was originally filed and fixed as Bug#12612184. The original
fix was buggy, and it was patched by Bug#12704861. Also that patch was
buggy (potentially breaking crash recovery), and both fixes were
reverted.

This fix was not ported to the built-in InnoDB of MySQL 5.1, because
the function signatures of many core functions are different from
InnoDB Plugin and later versions. The block allocation routines and
their callers would have to changed so that they handle block
descriptors instead of page frames.

When a record is updated so that its size grows, non-updated columns
can be selected for external (off-page) storage. The bug is that the
initially inserted updated record contains an all-zero BLOB pointer to
the field that was not updated. Only after the BLOB pages have been
allocated and written, the valid pointer can be written to the record.

Between the release of the page latch in mtr_commit(mtr) after
btr_cur_pessimistic_update() and the re-latching of the page in
btr_pcur_restore_position(), other threads can see the invalid BLOB
pointer consisting of 20 zero bytes. Moreover, if the system crashes
at this point, the situation could persist after crash recovery, and
the contents of the non-updated column would be permanently lost.

The problem is amplified by the ROW_FORMAT=DYNAMIC and
ROW_FORMAT=COMPRESSED that were introduced in
innodb_file_format=barracuda in InnoDB Plugin, but the bug does exist
in all InnoDB versions.

The fix is as follows. After a pessimistic B-tree operation that needs
to write out off-page columns, allocate the pages for these columns in
the mini-transaction that performed the B-tree operation (btr_mtr),
but write the pages in a separate mini-transaction (blob_mtr). Do
mtr_commit(blob_mtr) before mtr_commit(btr_mtr). A quirk: Do not reuse
pages that were previously freed in btr_mtr. Only write the off-page
columns to 'fresh' pages.

In this way, crash recovery will see redo log entries for blob_mtr
before any redo log entry for btr_mtr. It will apply the BLOB page
writes to pages that were marked free at that point. If crash recovery
fails to see all of the btr_mtr redo log, there will be some
unreachable BLOB data in free pages, but the B-tree will be in a
consistent state.

btr_page_alloc_low(): Renamed from btr_page_alloc(). Add the parameter
init_mtr. Return an allocated block, or NULL. If init_mtr!=mtr but
the page was already X-latched in mtr, do not initialize the page.

btr_page_alloc(): Wrapper for btr_page_alloc_for_ibuf() and
btr_page_alloc_low().

btr_page_free(): Add a debug assertion that the page was a B-tree page.

btr_lift_page_up(): Return the father block.

btr_compress(), btr_cur_compress_if_useful(): Add the parameter ibool
adjust, for adjusting the cursor position.

btr_cur_pessimistic_update(): Preserve the cursor position when
big_rec will be written and the new flag BTR_KEEP_POS_FLAG is defined.
Remove a duplicate rec_get_offsets() call. Keep the X-latch on
index->lock when big_rec is needed.

btr_store_big_rec_extern_fields(): Replace update_inplace with
an operation code, and local_mtr with btr_mtr. When not doing a
fresh insert and btr_mtr has freed pages, put aside any pages that
were previously X-latched in btr_mtr, and free the pages after
writing out all data. The data must be written to 'fresh' pages,
because btr_mtr will be committed and written to the redo log after
the BLOB writes have been written to the redo log.

btr_blob_op_is_update(): Check if an operation passed to
btr_store_big_rec_extern_fields() is an update or insert-by-update.

fseg_alloc_free_page_low(), fsp_alloc_free_page(),
fseg_alloc_free_extent(), fseg_alloc_free_page_general(): Add the
parameter init_mtr. Return an allocated block, or NULL. If
init_mtr!=mtr but the page was already X-latched in mtr, do not
initialize the page.

xdes_get_descriptor_with_space_hdr(): Assert that the file space
header is being X-latched.

fsp_alloc_from_free_frag(): Refactored from fsp_alloc_free_page().

fsp_page_create(): New function, for allocating, X-latching and
potentially initializing a page. If init_mtr!=mtr but the page was
already X-latched in mtr, do not initialize the page.

fsp_free_page(): Add ut_ad(0) to the error outcomes.

fsp_free_page(), fseg_free_page_low(): Increment mtr->n_freed_pages.

fsp_alloc_seg_inode_page(), fseg_create_general(): Assert that the
page was not previously X-latched in the mini-transaction. A file
segment or inode page should never be allocated in the middle of an
mini-transaction that frees pages, such as btr_cur_pessimistic_delete().

fseg_alloc_free_page_low(): If the hinted page was allocated, skip the
check if the tablespace should be extended. Return NULL instead of
FIL_NULL on failure. Remove the flag frag_page_allocated. Instead,
return directly, because the page would already have been initialized.

fseg_find_free_frag_page_slot() would return ULINT_UNDEFINED on error,
not FIL_NULL. Correct a bogus assertion.

fseg_alloc_free_page(): Redefine as a wrapper macro around
fseg_alloc_free_page_general().

buf_block_buf_fix_inc(): Move the definition from the buf0buf.ic to
buf0buf.h, so that it can be called from other modules.

mtr_t: Add n_freed_pages (number of pages that have been freed).

page_rec_get_nth_const(), page_rec_get_nth(): The inverse function of
page_rec_get_n_recs_before(), get the nth record of the record
list. This is faster than iterating the linked list. Refactored from
page_get_middle_rec().

trx_undo_rec_copy(): Add a debug assertion for the length.

trx_undo_add_page(): Return a block descriptor or NULL instead of a
page number or FIL_NULL.

trx_undo_report_row_operation(): Add debug assertions.

trx_sys_create_doublewrite_buf(): Assert that each page was not
previously X-latched.

page_cur_insert_rec_zip_reorg(): Make use of page_rec_get_nth().

row_ins_clust_index_entry_by_modify(): Pass BTR_KEEP_POS_FLAG, so that
the repositioning of the cursor can be avoided.

row_ins_index_entry_low(): Add DEBUG_SYNC points before and after
writing off-page columns. If inserting by updating a delete-marked
record, do not reposition the cursor or commit the mini-transaction
before writing the off-page columns.

row_build(): Tighten a debug assertion about null BLOB pointers.

row_upd_clust_rec(): Add DEBUG_SYNC points before and after writing
off-page columns. Do not reposition the cursor or commit the
mini-transaction before writing the off-page columns.

rb:939 approved by Jimmy Yang
parent c2198863
CREATE TABLE t1 (a INT PRIMARY KEY, b TEXT) ENGINE=InnoDB;
CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=InnoDB;
CREATE TABLE t3 (a INT PRIMARY KEY, b TEXT, c TEXT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1,REPEAT('a',30000)),(2,REPEAT('b',40000));
SET DEBUG_SYNC='before_row_upd_extern SIGNAL have_latch WAIT_FOR go1';
BEGIN;
UPDATE t1 SET a=a+2;
ROLLBACK;
BEGIN;
UPDATE t1 SET b=CONCAT(b,'foo');
SET DEBUG_SYNC='now WAIT_FOR have_latch';
SELECT a, RIGHT(b,20) FROM t1;
SET DEBUG_SYNC='now SIGNAL go1';
a RIGHT(b,20)
1 aaaaaaaaaaaaaaaaaaaa
2 bbbbbbbbbbbbbbbbbbbb
SET DEBUG='+d,row_ins_extern_checkpoint';
SET DEBUG_SYNC='before_row_ins_extern_latch SIGNAL rec_not_blob WAIT_FOR crash';
ROLLBACK;
BEGIN;
INSERT INTO t1 VALUES (3,REPEAT('c',50000));
SET DEBUG_SYNC='now WAIT_FOR rec_not_blob';
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT @@tx_isolation;
@@tx_isolation
READ-UNCOMMITTED
SELECT a, RIGHT(b,20) FROM t1;
a RIGHT(b,20)
1 aaaaaaaaaaaaaaaaaaaa
2 bbbbbbbbbbbbbbbbbbbb
SELECT a FROM t1;
a
1
2
3
SET DEBUG='+d,crash_commit_before';
INSERT INTO t2 VALUES (42);
ERROR HY000: Lost connection to MySQL server during query
ERROR HY000: Lost connection to MySQL server during query
CHECK TABLE t1;
Table Op Msg_type Msg_text
test.t1 check status OK
INSERT INTO t3 VALUES
(1,REPEAT('d',7000),REPEAT('e',100)),
(2,REPEAT('g',7000),REPEAT('h',100));
SET DEBUG_SYNC='before_row_upd_extern SIGNAL have_latch WAIT_FOR go';
UPDATE t3 SET c=REPEAT('f',3000) WHERE a=1;
SET DEBUG_SYNC='now WAIT_FOR have_latch';
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT @@tx_isolation;
@@tx_isolation
READ-UNCOMMITTED
SELECT a, RIGHT(b,20), RIGHT(c,20) FROM t3;
SET DEBUG_SYNC='now SIGNAL go';
a RIGHT(b,20) RIGHT(c,20)
1 dddddddddddddddddddd ffffffffffffffffffff
2 gggggggggggggggggggg hhhhhhhhhhhhhhhhhhhh
CHECK TABLE t1,t2,t3;
Table Op Msg_type Msg_text
test.t1 check status OK
test.t2 check status OK
test.t3 check status OK
BEGIN;
INSERT INTO t2 VALUES (347);
SET DEBUG='+d,row_upd_extern_checkpoint';
SET DEBUG_SYNC='before_row_upd_extern SIGNAL have_latch WAIT_FOR crash';
UPDATE t3 SET c=REPEAT('i',3000) WHERE a=2;
SET DEBUG_SYNC='now WAIT_FOR have_latch';
SELECT info FROM information_schema.processlist
WHERE state = 'debug sync point: before_row_upd_extern';
info
UPDATE t3 SET c=REPEAT('i',3000) WHERE a=2
SET DEBUG='+d,crash_commit_before';
COMMIT;
ERROR HY000: Lost connection to MySQL server during query
ERROR HY000: Lost connection to MySQL server during query
CHECK TABLE t1,t2,t3;
Table Op Msg_type Msg_text
test.t1 check status OK
test.t2 check status OK
test.t3 check status OK
SELECT a, RIGHT(b,20), RIGHT(c,20) FROM t3;
a RIGHT(b,20) RIGHT(c,20)
1 dddddddddddddddddddd ffffffffffffffffffff
2 gggggggggggggggggggg hhhhhhhhhhhhhhhhhhhh
SELECT a FROM t3;
a
1
2
BEGIN;
INSERT INTO t2 VALUES (33101);
SET DEBUG='+d,row_upd_extern_checkpoint';
SET DEBUG_SYNC='after_row_upd_extern SIGNAL have_latch WAIT_FOR crash';
UPDATE t3 SET c=REPEAT('j',3000) WHERE a=2;
SET DEBUG_SYNC='now WAIT_FOR have_latch';
SELECT info FROM information_schema.processlist
WHERE state = 'debug sync point: after_row_upd_extern';
info
UPDATE t3 SET c=REPEAT('j',3000) WHERE a=2
SET DEBUG='+d,crash_commit_before';
COMMIT;
ERROR HY000: Lost connection to MySQL server during query
ERROR HY000: Lost connection to MySQL server during query
CHECK TABLE t1,t2,t3;
Table Op Msg_type Msg_text
test.t1 check status OK
test.t2 check status OK
test.t3 check status OK
SELECT a, RIGHT(b,20), RIGHT(c,20) FROM t3;
a RIGHT(b,20) RIGHT(c,20)
1 dddddddddddddddddddd ffffffffffffffffffff
2 gggggggggggggggggggg hhhhhhhhhhhhhhhhhhhh
SELECT a FROM t3;
a
1
2
SELECT * FROM t2;
a
DROP TABLE t1,t2,t3;
# Bug#13721257 RACE CONDITION IN UPDATES OR INSERTS OF WIDE RECORDS
# Test what happens when a record is inserted or updated so that some
# columns are stored off-page.
--source include/have_innodb_plugin.inc
# DEBUG_SYNC must be compiled in.
--source include/have_debug_sync.inc
# Valgrind would complain about memory leaks when we crash on purpose.
--source include/not_valgrind.inc
# Embedded server does not support crashing
--source include/not_embedded.inc
# Avoid CrashReporter popup on Mac
--source include/not_crashrep.inc
# InnoDB Plugin cannot use DEBUG_SYNC on Windows
--source include/not_windows.inc
CREATE TABLE t1 (a INT PRIMARY KEY, b TEXT) ENGINE=InnoDB;
CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=InnoDB;
CREATE TABLE t3 (a INT PRIMARY KEY, b TEXT, c TEXT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1,REPEAT('a',30000)),(2,REPEAT('b',40000));
SET DEBUG_SYNC='before_row_upd_extern SIGNAL have_latch WAIT_FOR go1';
BEGIN;
# This will not block, because it will not store new BLOBs.
UPDATE t1 SET a=a+2;
ROLLBACK;
BEGIN;
--send
UPDATE t1 SET b=CONCAT(b,'foo');
connect (con1,localhost,root,,);
SET DEBUG_SYNC='now WAIT_FOR have_latch';
# this one should block due to the clustered index tree and leaf page latches
--send
SELECT a, RIGHT(b,20) FROM t1;
connect (con2,localhost,root,,);
# Check that the above SELECT is blocked
let $wait_condition=
select count(*) = 1 from information_schema.processlist
where state = 'Sending data' and
info = 'SELECT a, RIGHT(b,20) FROM t1';
--source include/wait_condition.inc
SET DEBUG_SYNC='now SIGNAL go1';
connection con1;
reap;
connection default;
reap;
SET DEBUG='+d,row_ins_extern_checkpoint';
SET DEBUG_SYNC='before_row_ins_extern_latch SIGNAL rec_not_blob WAIT_FOR crash';
ROLLBACK;
BEGIN;
--send
INSERT INTO t1 VALUES (3,REPEAT('c',50000));
connection con1;
SET DEBUG_SYNC='now WAIT_FOR rec_not_blob';
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT @@tx_isolation;
# this one should see (3,NULL_BLOB)
SELECT a, RIGHT(b,20) FROM t1;
SELECT a FROM t1;
# Request a crash, and restart the server.
SET DEBUG='+d,crash_commit_before';
--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--error 2013
INSERT INTO t2 VALUES (42);
disconnect con1;
disconnect con2;
connection default;
# This connection should notice the crash as well.
--error 2013
reap;
# Write file to make mysql-test-run.pl restart the server
--enable_reconnect
--source include/wait_until_connected_again.inc
--disable_reconnect
CHECK TABLE t1;
INSERT INTO t3 VALUES
(1,REPEAT('d',7000),REPEAT('e',100)),
(2,REPEAT('g',7000),REPEAT('h',100));
SET DEBUG_SYNC='before_row_upd_extern SIGNAL have_latch WAIT_FOR go';
# This should move column b off-page.
--send
UPDATE t3 SET c=REPEAT('f',3000) WHERE a=1;
connect (con1,localhost,root,,);
SET DEBUG_SYNC='now WAIT_FOR have_latch';
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT @@tx_isolation;
# this one should block
-- send
SELECT a, RIGHT(b,20), RIGHT(c,20) FROM t3;
connect (con2,localhost,root,,);
# Check that the above SELECT is blocked
let $wait_condition=
select count(*) = 1 from information_schema.processlist
where state = 'Sending data' and
info = 'SELECT a, RIGHT(b,20), RIGHT(c,20) FROM t3';
--source include/wait_condition.inc
SET DEBUG_SYNC='now SIGNAL go';
connection con1;
reap;
disconnect con1;
connection default;
reap;
CHECK TABLE t1,t2,t3;
connection con2;
BEGIN;
INSERT INTO t2 VALUES (347);
connection default;
# The row_upd_extern_checkpoint was removed in Bug#13721257,
# because the mini-transaction of the B-tree modification would
# remain open while we are writing the off-page columns and are
# stuck in the DEBUG_SYNC. A checkpoint involves a flush, which
# would wait for the buffer-fix to cease.
SET DEBUG='+d,row_upd_extern_checkpoint';
SET DEBUG_SYNC='before_row_upd_extern SIGNAL have_latch WAIT_FOR crash';
# This should move column b off-page.
--send
UPDATE t3 SET c=REPEAT('i',3000) WHERE a=2;
connection con2;
SET DEBUG_SYNC='now WAIT_FOR have_latch';
# Check that the above UPDATE is blocked
SELECT info FROM information_schema.processlist
WHERE state = 'debug sync point: before_row_upd_extern';
# Request a crash, and restart the server.
SET DEBUG='+d,crash_commit_before';
--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--error 2013
COMMIT;
disconnect con2;
connection default;
# This connection should notice the crash as well.
--error 2013
reap;
# Write file to make mysql-test-run.pl restart the server
--enable_reconnect
--source include/wait_until_connected_again.inc
--disable_reconnect
CHECK TABLE t1,t2,t3;
SELECT a, RIGHT(b,20), RIGHT(c,20) FROM t3;
SELECT a FROM t3;
connect (con2,localhost,root,,);
BEGIN;
INSERT INTO t2 VALUES (33101);
connection default;
# The row_upd_extern_checkpoint was removed in Bug#13721257,
# because the mini-transaction of the B-tree modification would
# remain open while we are writing the off-page columns and are
# stuck in the DEBUG_SYNC. A checkpoint involves a flush, which
# would wait for the buffer-fix to cease.
SET DEBUG='+d,row_upd_extern_checkpoint';
SET DEBUG_SYNC='after_row_upd_extern SIGNAL have_latch WAIT_FOR crash';
# This should move column b off-page.
--send
UPDATE t3 SET c=REPEAT('j',3000) WHERE a=2;
connection con2;
SET DEBUG_SYNC='now WAIT_FOR have_latch';
# Check that the above UPDATE is blocked
SELECT info FROM information_schema.processlist
WHERE state = 'debug sync point: after_row_upd_extern';
# Request a crash, and restart the server.
SET DEBUG='+d,crash_commit_before';
--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--error 2013
COMMIT;
disconnect con2;
connection default;
# This connection should notice the crash as well.
--error 2013
reap;
# Write file to make mysql-test-run.pl restart the server
--enable_reconnect
--source include/wait_until_connected_again.inc
--disable_reconnect
CHECK TABLE t1,t2,t3;
SELECT a, RIGHT(b,20), RIGHT(c,20) FROM t3;
SELECT a FROM t3;
SELECT * FROM t2;
DROP TABLE t1,t2,t3;
2012-02-15 The InnoDB Team
* btr/btr0btr.c, btr/btr0cur.c, fsp/fsp0fsp.c, ibuf/ibuf0ibuf.c,
include/btr0btr.h, include/btr0cur.h, include/btr0cur.ic,
include/buf0buf.h, include/buf0buf.ic, include/fsp0fsp.h,
include/mtr0mtr.h, include/mtr0mtr.ic, include/page0page.h,
include/page0page.ic, include/trx0rec.ic, include/trx0undo.h,
mtr/mtr0mtr.c, page/page0cur.c, page/page0page.c, row/row0ins.c,
row/row0row.c, row/row0upd.c, trx/trx0rec.c, trx/trx0sys.c,
trx/trx0undo.c:
Fix Bug#13721257 RACE CONDITION IN UPDATES OR INSERTS OF WIDE RECORDS
2012-02-06 The InnoDB Team
* handler/ha_innodb.cc:
Fix Bug #11754376 45976: INNODB LOST FILES FOR TEMPORARY TABLES ON
Fix Bug#11754376 45976: INNODB LOST FILES FOR TEMPORARY TABLES ON
GRACEFUL SHUTDOWN
2012-01-30 The InnoDB Team
......
/*****************************************************************************
Copyright (c) 1994, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1994, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -906,28 +906,31 @@ btr_page_alloc_for_ibuf(
/**************************************************************//**
Allocates a new file page to be used in an index tree. NOTE: we assume
that the caller has made the reservation for free extents!
@return new allocated block, x-latched; NULL if out of space */
UNIV_INTERN
@retval NULL if no page could be allocated
@retval block, rw_lock_x_lock_count(&block->lock) == 1 if allocation succeeded
(init_mtr == mtr, or the page was not previously freed in mtr)
@retval block (not allocated or initialized) otherwise */
static __attribute__((nonnull, warn_unused_result))
buf_block_t*
btr_page_alloc(
/*===========*/
btr_page_alloc_low(
/*===============*/
dict_index_t* index, /*!< in: index */
ulint hint_page_no, /*!< in: hint of a good page */
byte file_direction, /*!< in: direction where a possible
page split is made */
ulint level, /*!< in: level where the page is placed
in the tree */
mtr_t* mtr) /*!< in: mtr */
mtr_t* mtr, /*!< in/out: mini-transaction
for the allocation */
mtr_t* init_mtr) /*!< in/out: mtr or another
mini-transaction in which the
page should be initialized.
If init_mtr!=mtr, but the page
is already X-latched in mtr, do
not initialize the page. */
{
fseg_header_t* seg_header;
page_t* root;
buf_block_t* new_block;
ulint new_page_no;
if (dict_index_is_ibuf(index)) {
return(btr_page_alloc_for_ibuf(index, mtr));
}
root = btr_root_get(index, mtr);
......@@ -941,17 +944,47 @@ btr_page_alloc(
reservation for free extents, and thus we know that a page can
be allocated: */
new_page_no = fseg_alloc_free_page_general(seg_header, hint_page_no,
file_direction, TRUE, mtr);
if (new_page_no == FIL_NULL) {
return(fseg_alloc_free_page_general(
seg_header, hint_page_no, file_direction,
TRUE, mtr, init_mtr));
}
return(NULL);
/**************************************************************//**
Allocates a new file page to be used in an index tree. NOTE: we assume
that the caller has made the reservation for free extents!
@retval NULL if no page could be allocated
@retval block, rw_lock_x_lock_count(&block->lock) == 1 if allocation succeeded
(init_mtr == mtr, or the page was not previously freed in mtr)
@retval block (not allocated or initialized) otherwise */
UNIV_INTERN
buf_block_t*
btr_page_alloc(
/*===========*/
dict_index_t* index, /*!< in: index */
ulint hint_page_no, /*!< in: hint of a good page */
byte file_direction, /*!< in: direction where a possible
page split is made */
ulint level, /*!< in: level where the page is placed
in the tree */
mtr_t* mtr, /*!< in/out: mini-transaction
for the allocation */
mtr_t* init_mtr) /*!< in/out: mini-transaction
for x-latching and initializing
the page */
{
buf_block_t* new_block;
if (dict_index_is_ibuf(index)) {
return(btr_page_alloc_for_ibuf(index, mtr));
}
new_block = buf_page_get(dict_index_get_space(index),
dict_table_zip_size(index->table),
new_page_no, RW_X_LATCH, mtr);
new_block = btr_page_alloc_low(
index, hint_page_no, file_direction, level, mtr, init_mtr);
if (new_block) {
buf_block_dbg_add_level(new_block, SYNC_TREE_NODE_NEW);
}
return(new_block);
}
......@@ -1087,10 +1120,10 @@ btr_page_free(
buf_block_t* block, /*!< in: block to be freed, x-latched */
mtr_t* mtr) /*!< in: mtr */
{
ulint level;
level = btr_page_get_level(buf_block_get_frame(block), mtr);
const page_t* page = buf_block_get_frame(block);
ulint level = btr_page_get_level(page, mtr);
ut_ad(fil_page_get_type(block->frame) == FIL_PAGE_INDEX);
btr_page_free_low(index, block, level, mtr);
}
......@@ -1329,16 +1362,12 @@ btr_create(
/* Allocate then the next page to the segment: it will be the
tree root page */
page_no = fseg_alloc_free_page(buf_block_get_frame(
ibuf_hdr_block)
+ IBUF_HEADER
+ IBUF_TREE_SEG_HEADER,
block = fseg_alloc_free_page(
buf_block_get_frame(ibuf_hdr_block)
+ IBUF_HEADER + IBUF_TREE_SEG_HEADER,
IBUF_TREE_ROOT_PAGE_NO,
FSP_UP, mtr);
ut_ad(page_no == IBUF_TREE_ROOT_PAGE_NO);
block = buf_page_get(space, zip_size, page_no,
RW_X_LATCH, mtr);
ut_ad(buf_block_get_page_no(block) == IBUF_TREE_ROOT_PAGE_NO);
} else {
#ifdef UNIV_BLOB_DEBUG
if ((type & DICT_CLUSTERED) && !index->blobs) {
......@@ -1815,7 +1844,7 @@ btr_root_raise_and_insert(
level = btr_page_get_level(root, mtr);
new_block = btr_page_alloc(index, 0, FSP_NO_DIR, level, mtr);
new_block = btr_page_alloc(index, 0, FSP_NO_DIR, level, mtr, mtr);
new_page = buf_block_get_frame(new_block);
new_page_zip = buf_block_get_page_zip(new_block);
ut_a(!new_page_zip == !root_page_zip);
......@@ -2551,7 +2580,7 @@ func_start:
/* 2. Allocate a new page to the index */
new_block = btr_page_alloc(cursor->index, hint_page_no, direction,
btr_page_get_level(page, mtr), mtr);
btr_page_get_level(page, mtr), mtr, mtr);
new_page = buf_block_get_frame(new_block);
new_page_zip = buf_block_get_page_zip(new_block);
btr_page_create(new_block, new_page_zip, cursor->index,
......@@ -3001,15 +3030,16 @@ btr_node_ptr_delete(
ut_a(err == DB_SUCCESS);
if (!compressed) {
btr_cur_compress_if_useful(&cursor, mtr);
btr_cur_compress_if_useful(&cursor, FALSE, mtr);
}
}
/*************************************************************//**
If page is the only on its level, this function moves its records to the
father page, thus reducing the tree height. */
father page, thus reducing the tree height.
@return father block */
static
void
buf_block_t*
btr_lift_page_up(
/*=============*/
dict_index_t* index, /*!< in: index tree */
......@@ -3126,6 +3156,8 @@ btr_lift_page_up(
}
ut_ad(page_validate(father_page, index));
ut_ad(btr_check_node_ptr(index, father_block, mtr));
return(father_block);
}
/*************************************************************//**
......@@ -3142,11 +3174,13 @@ UNIV_INTERN
ibool
btr_compress(
/*=========*/
btr_cur_t* cursor, /*!< in: cursor on the page to merge or lift;
the page must not be empty: in record delete
use btr_discard_page if the page would become
empty */
mtr_t* mtr) /*!< in: mtr */
btr_cur_t* cursor, /*!< in/out: cursor on the page to merge
or lift; the page must not be empty:
when deleting records, use btr_discard_page()
if the page would become empty */
ibool adjust, /*!< in: TRUE if should adjust the
cursor position even if compression occurs */
mtr_t* mtr) /*!< in/out: mini-transaction */
{
dict_index_t* index;
ulint space;
......@@ -3164,12 +3198,14 @@ btr_compress(
ulint* offsets;
ulint data_size;
ulint n_recs;
ulint nth_rec = 0; /* remove bogus warning */
ulint max_ins_size;
ulint max_ins_size_reorg;
block = btr_cur_get_block(cursor);
page = btr_cur_get_page(cursor);
index = btr_cur_get_index(cursor);
ut_a((ibool) !!page_is_comp(page) == dict_table_is_comp(index->table));
ut_ad(mtr_memo_contains(mtr, dict_index_get_lock(index),
......@@ -3190,6 +3226,10 @@ btr_compress(
offsets = btr_page_get_father_block(NULL, heap, index, block, mtr,
&father_cursor);
if (adjust) {
nth_rec = page_rec_get_n_recs_before(btr_cur_get_rec(cursor));
}
/* Decide the page to which we try to merge and which will inherit
the locks */
......@@ -3216,9 +3256,9 @@ btr_compress(
} else {
/* The page is the only one on the level, lift the records
to the father */
btr_lift_page_up(index, block, mtr);
mem_heap_free(heap);
return(TRUE);
merge_block = btr_lift_page_up(index, block, mtr);
goto func_exit;
}
n_recs = page_get_n_recs(page);
......@@ -3300,6 +3340,10 @@ err_exit:
btr_node_ptr_delete(index, block, mtr);
lock_update_merge_left(merge_block, orig_pred, block);
if (adjust) {
nth_rec += page_rec_get_n_recs_before(orig_pred);
}
} else {
rec_t* orig_succ;
#ifdef UNIV_BTR_DEBUG
......@@ -3364,7 +3408,6 @@ err_exit:
}
btr_blob_dbg_remove(page, index, "btr_compress");
mem_heap_free(heap);
if (!dict_index_is_clust(index) && page_is_leaf(merge_page)) {
/* Update the free bits of the B-tree page in the
......@@ -3416,6 +3459,16 @@ err_exit:
btr_page_free(index, block, mtr);
ut_ad(btr_check_node_ptr(index, merge_block, mtr));
func_exit:
mem_heap_free(heap);
if (adjust) {
btr_cur_position(
index,
page_rec_get_nth(merge_block->frame, nth_rec),
merge_block, cursor);
}
return(TRUE);
}
......
This diff is collapsed.
This diff is collapsed.
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -1726,7 +1726,7 @@ ibuf_add_free_page(void)
page_t* header_page;
ulint flags;
ulint zip_size;
ulint page_no;
buf_block_t* block;
page_t* page;
page_t* root;
page_t* bitmap_page;
......@@ -1750,32 +1750,23 @@ ibuf_add_free_page(void)
of a deadlock. This is the reason why we created a special ibuf
header page apart from the ibuf tree. */
page_no = fseg_alloc_free_page(
block = fseg_alloc_free_page(
header_page + IBUF_HEADER + IBUF_TREE_SEG_HEADER, 0, FSP_UP,
&mtr);
if (page_no == FIL_NULL) {
if (block == NULL) {
mtr_commit(&mtr);
return(DB_STRONG_FAIL);
}
{
buf_block_t* block;
block = buf_page_get(
IBUF_SPACE_ID, 0, page_no, RW_X_LATCH, &mtr);
ut_ad(rw_lock_get_x_lock_count(&block->lock) == 1);
ibuf_enter();
mutex_enter(&ibuf_mutex);
root = ibuf_tree_root_get(&mtr);
buf_block_dbg_add_level(block, SYNC_IBUF_TREE_NODE_NEW);
page = buf_block_get_frame(block);
}
/* Add the page to the free list and update the ibuf size data */
......@@ -1792,10 +1783,11 @@ ibuf_add_free_page(void)
(level 2 page) */
bitmap_page = ibuf_bitmap_get_map_page(
IBUF_SPACE_ID, page_no, zip_size, &mtr);
IBUF_SPACE_ID, buf_block_get_page_no(block), zip_size, &mtr);
ibuf_bitmap_page_set_bits(
bitmap_page, page_no, zip_size, IBUF_BITMAP_IBUF, TRUE, &mtr);
bitmap_page, buf_block_get_page_no(block), zip_size,
IBUF_BITMAP_IBUF, TRUE, &mtr);
mtr_commit(&mtr);
......
/*****************************************************************************
Copyright (c) 1994, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1994, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -488,11 +488,14 @@ UNIV_INTERN
ibool
btr_compress(
/*=========*/
btr_cur_t* cursor, /*!< in: cursor on the page to merge or lift;
the page must not be empty: in record delete
use btr_discard_page if the page would become
empty */
mtr_t* mtr); /*!< in: mtr */
btr_cur_t* cursor, /*!< in/out: cursor on the page to merge
or lift; the page must not be empty:
when deleting records, use btr_discard_page()
if the page would become empty */
ibool adjust, /*!< in: TRUE if should adjust the
cursor position even if compression occurs */
mtr_t* mtr) /*!< in/out: mini-transaction */
__attribute__((nonnull));
/*************************************************************//**
Discards a page from a B-tree. This is used to remove the last record from
a B-tree page: the whole page must be removed at the same time. This cannot
......@@ -543,7 +546,10 @@ btr_get_size(
/**************************************************************//**
Allocates a new file page to be used in an index tree. NOTE: we assume
that the caller has made the reservation for free extents!
@return new allocated block, x-latched; NULL if out of space */
@retval NULL if no page could be allocated
@retval block, rw_lock_x_lock_count(&block->lock) == 1 if allocation succeeded
(init_mtr == mtr, or the page was not previously freed in mtr)
@retval block (not allocated or initialized) otherwise */
UNIV_INTERN
buf_block_t*
btr_page_alloc(
......@@ -554,7 +560,12 @@ btr_page_alloc(
page split is made */
ulint level, /*!< in: level where the page is placed
in the tree */
mtr_t* mtr); /*!< in: mtr */
mtr_t* mtr, /*!< in/out: mini-transaction
for the allocation */
mtr_t* init_mtr) /*!< in/out: mini-transaction
for x-latching and initializing
the page */
__attribute__((nonnull, warn_unused_result));
/**************************************************************//**
Frees a file page used in an index tree. NOTE: cannot free field external
storage pages because the page must contain info on its level. */
......
/*****************************************************************************
Copyright (c) 1994, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1994, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -36,6 +36,9 @@ Created 10/16/1994 Heikki Tuuri
#define BTR_NO_LOCKING_FLAG 2 /* do no record lock checking */
#define BTR_KEEP_SYS_FLAG 4 /* sys fields will be found from the
update vector or inserted entry */
#define BTR_KEEP_POS_FLAG 8 /* btr_cur_pessimistic_update()
must keep cursor position when
moving columns to big_rec */
#ifndef UNIV_HOTBACKUP
#include "que0types.h"
......@@ -309,7 +312,9 @@ btr_cur_pessimistic_update(
/*=======================*/
ulint flags, /*!< in: undo logging, locking, and rollback
flags */
btr_cur_t* cursor, /*!< in: cursor on the record to update */
btr_cur_t* cursor, /*!< in/out: cursor on the record to update;
cursor may become invalid if *big_rec == NULL
|| !(flags & BTR_KEEP_POS_FLAG) */
mem_heap_t** heap, /*!< in/out: pointer to memory heap, or NULL */
big_rec_t** big_rec,/*!< out: big rec vector whose fields have to
be stored externally by the caller, or NULL */
......@@ -376,10 +381,13 @@ UNIV_INTERN
ibool
btr_cur_compress_if_useful(
/*=======================*/
btr_cur_t* cursor, /*!< in: cursor on the page to compress;
btr_cur_t* cursor, /*!< in/out: cursor on the page to compress;
cursor does not stay valid if compression
occurs */
mtr_t* mtr); /*!< in: mtr */
ibool adjust, /*!< in: TRUE if should adjust the
cursor position even if compression occurs */
mtr_t* mtr) /*!< in/out: mini-transaction */
__attribute__((nonnull));
/*******************************************************//**
Removes the record on which the tree cursor is positioned. It is assumed
that the mtr has an x-latch on the page where the cursor is positioned,
......@@ -504,6 +512,27 @@ btr_cur_disown_inherited_fields(
const upd_t* update, /*!< in: update vector */
mtr_t* mtr) /*!< in/out: mini-transaction */
__attribute__((nonnull(2,3,4,5,6)));
/** Operation code for btr_store_big_rec_extern_fields(). */
enum blob_op {
/** Store off-page columns for a freshly inserted record */
BTR_STORE_INSERT = 0,
/** Store off-page columns for an insert by update */
BTR_STORE_INSERT_UPDATE,
/** Store off-page columns for an update */
BTR_STORE_UPDATE
};
/*******************************************************************//**
Determine if an operation on off-page columns is an update.
@return TRUE if op != BTR_STORE_INSERT */
UNIV_INLINE
ibool
btr_blob_op_is_update(
/*==================*/
enum blob_op op) /*!< in: operation */
__attribute__((warn_unused_result));
/*******************************************************************//**
Stores the fields in big_rec_vec to the tablespace and puts pointers to
them in rec. The extern flags in rec will have to be set beforehand.
......@@ -511,52 +540,23 @@ The fields are stored on pages allocated from leaf node
file segment of the index tree.
@return DB_SUCCESS or DB_OUT_OF_FILE_SPACE */
UNIV_INTERN
ulint
btr_store_big_rec_extern_fields_func(
/*=================================*/
enum db_err
btr_store_big_rec_extern_fields(
/*============================*/
dict_index_t* index, /*!< in: index of rec; the index tree
MUST be X-latched */
buf_block_t* rec_block, /*!< in/out: block containing rec */
rec_t* rec, /*!< in: record */
rec_t* rec, /*!< in/out: record */
const ulint* offsets, /*!< in: rec_get_offsets(rec, index);
the "external storage" flags in offsets
will not correspond to rec when
this function returns */
#ifdef UNIV_DEBUG
mtr_t* local_mtr, /*!< in: mtr containing the
latch to rec and to the tree */
#endif /* UNIV_DEBUG */
#if defined UNIV_DEBUG || defined UNIV_BLOB_LIGHT_DEBUG
ibool update_in_place,/*! in: TRUE if the record is updated
in place (not delete+insert) */
#endif /* UNIV_DEBUG || UNIV_BLOB_LIGHT_DEBUG */
const big_rec_t*big_rec_vec) /*!< in: vector containing fields
const big_rec_t*big_rec_vec, /*!< in: vector containing fields
to be stored externally */
__attribute__((nonnull));
/** Stores the fields in big_rec_vec to the tablespace and puts pointers to
them in rec. The extern flags in rec will have to be set beforehand.
The fields are stored on pages allocated from leaf node
file segment of the index tree.
@param index in: clustered index; MUST be X-latched by mtr
@param b in/out: block containing rec; MUST be X-latched by mtr
@param rec in/out: clustered index record
@param offsets in: rec_get_offsets(rec, index);
the "external storage" flags in offsets will not be adjusted
@param mtr in: mini-transaction that holds x-latch on index and b
@param upd in: TRUE if the record is updated in place (not delete+insert)
@param big in: vector containing fields to be stored externally
@return DB_SUCCESS or DB_OUT_OF_FILE_SPACE */
#ifdef UNIV_DEBUG
# define btr_store_big_rec_extern_fields(index,b,rec,offsets,mtr,upd,big) \
btr_store_big_rec_extern_fields_func(index,b,rec,offsets,mtr,upd,big)
#elif defined UNIV_BLOB_LIGHT_DEBUG
# define btr_store_big_rec_extern_fields(index,b,rec,offsets,mtr,upd,big) \
btr_store_big_rec_extern_fields_func(index,b,rec,offsets,upd,big)
#else
# define btr_store_big_rec_extern_fields(index,b,rec,offsets,mtr,upd,big) \
btr_store_big_rec_extern_fields_func(index,b,rec,offsets,big)
#endif
mtr_t* btr_mtr, /*!< in: mtr containing the
latches to the clustered index */
enum blob_op op) /*! in: operation code */
__attribute__((nonnull, warn_unused_result));
/*******************************************************************//**
Frees the space in an externally stored field to the file space
......
/*****************************************************************************
Copyright (c) 1994, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1994, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -139,7 +139,7 @@ btr_cur_compress_recommendation(
btr_cur_t* cursor, /*!< in: btr cursor */
mtr_t* mtr) /*!< in: mtr */
{
page_t* page;
const page_t* page;
ut_ad(mtr_memo_contains(mtr, btr_cur_get_block(cursor),
MTR_MEMO_PAGE_X_FIX));
......@@ -197,4 +197,25 @@ btr_cur_can_delete_without_compress(
return(TRUE);
}
/*******************************************************************//**
Determine if an operation on off-page columns is an update.
@return TRUE if op != BTR_STORE_INSERT */
UNIV_INLINE
ibool
btr_blob_op_is_update(
/*==================*/
enum blob_op op) /*!< in: operation */
{
switch (op) {
case BTR_STORE_INSERT:
return(FALSE);
case BTR_STORE_INSERT_UPDATE:
case BTR_STORE_UPDATE:
return(TRUE);
}
ut_ad(0);
return(FALSE);
}
#endif /* !UNIV_HOTBACKUP */
/*****************************************************************************
Copyright (c) 1995, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1995, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -468,6 +468,31 @@ buf_block_get_modify_clock(
#else /* !UNIV_HOTBACKUP */
# define buf_block_modify_clock_inc(block) ((void) 0)
#endif /* !UNIV_HOTBACKUP */
/*******************************************************************//**
Increments the bufferfix count. */
UNIV_INLINE
void
buf_block_buf_fix_inc_func(
/*=======================*/
#ifdef UNIV_SYNC_DEBUG
const char* file, /*!< in: file name */
ulint line, /*!< in: line */
#endif /* UNIV_SYNC_DEBUG */
buf_block_t* block) /*!< in/out: block to bufferfix */
__attribute__((nonnull));
#ifdef UNIV_SYNC_DEBUG
/** Increments the bufferfix count.
@param b in/out: block to bufferfix
@param f in: file name where requested
@param l in: line number where requested */
# define buf_block_buf_fix_inc(b,f,l) buf_block_buf_fix_inc_func(f,l,b)
#else /* UNIV_SYNC_DEBUG */
/** Increments the bufferfix count.
@param b in/out: block to bufferfix
@param f in: file name where requested
@param l in: line number where requested */
# define buf_block_buf_fix_inc(b,f,l) buf_block_buf_fix_inc_func(b)
#endif /* UNIV_SYNC_DEBUG */
/********************************************************************//**
Calculates a page checksum which is stored to the page when it is written
to a file. Note that we must be careful to calculate the same value
......
/*****************************************************************************
Copyright (c) 1995, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1995, 2012, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 2008, Google Inc.
Portions of this file contain modifications contributed and copyrighted by
......@@ -18,8 +18,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -916,19 +916,6 @@ buf_block_buf_fix_inc_func(
block->page.buf_fix_count++;
}
#ifdef UNIV_SYNC_DEBUG
/** Increments the bufferfix count.
@param b in/out: block to bufferfix
@param f in: file name where requested
@param l in: line number where requested */
# define buf_block_buf_fix_inc(b,f,l) buf_block_buf_fix_inc_func(f,l,b)
#else /* UNIV_SYNC_DEBUG */
/** Increments the bufferfix count.
@param b in/out: block to bufferfix
@param f in: file name where requested
@param l in: line number where requested */
# define buf_block_buf_fix_inc(b,f,l) buf_block_buf_fix_inc_func(b)
#endif /* UNIV_SYNC_DEBUG */
/*******************************************************************//**
Decrements the bufferfix count. */
......
/*****************************************************************************
Copyright (c) 1995, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1995, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -176,30 +176,33 @@ fseg_n_reserved_pages(
Allocates a single free page from a segment. This function implements
the intelligent allocation strategy which tries to minimize
file space fragmentation.
@return the allocated page offset FIL_NULL if no page could be allocated */
UNIV_INTERN
ulint
fseg_alloc_free_page(
/*=================*/
fseg_header_t* seg_header, /*!< in: segment header */
ulint hint, /*!< in: hint of which page would be desirable */
byte direction, /*!< in: if the new page is needed because
@param[in/out] seg_header segment header
@param[in] hint hint of which page would be desirable
@param[in] direction if the new page is needed because
of an index page split, and records are
inserted there in order, into which
direction they go alphabetically: FSP_DOWN,
FSP_UP, FSP_NO_DIR */
mtr_t* mtr); /*!< in: mtr handle */
FSP_UP, FSP_NO_DIR
@param[in/out] mtr mini-transaction
@return X-latched block, or NULL if no page could be allocated */
#define fseg_alloc_free_page(seg_header, hint, direction, mtr) \
fseg_alloc_free_page_general(seg_header, hint, direction, \
FALSE, mtr, mtr)
/**********************************************************************//**
Allocates a single free page from a segment. This function implements
the intelligent allocation strategy which tries to minimize file space
fragmentation.
@return allocated page offset, FIL_NULL if no page could be allocated */
@retval NULL if no page could be allocated
@retval block, rw_lock_x_lock_count(&block->lock) == 1 if allocation succeeded
(init_mtr == mtr, or the page was not previously freed in mtr)
@retval block (not allocated or initialized) otherwise */
UNIV_INTERN
ulint
buf_block_t*
fseg_alloc_free_page_general(
/*=========================*/
fseg_header_t* seg_header,/*!< in/out: segment header */
ulint hint, /*!< in: hint of which page would be desirable */
ulint hint, /*!< in: hint of which page would be
desirable */
byte direction,/*!< in: if the new page is needed because
of an index page split, and records are
inserted there in order, into which
......@@ -210,8 +213,12 @@ fseg_alloc_free_page_general(
with fsp_reserve_free_extents, then there
is no need to do the check for this individual
page */
mtr_t* mtr) /*!< in/out: mini-transaction */
__attribute__((warn_unused_result, nonnull(1,5)));
mtr_t* mtr, /*!< in/out: mini-transaction */
mtr_t* init_mtr)/*!< in/out: mtr or another mini-transaction
in which the page should be initialized.
If init_mtr!=mtr, but the page is already
latched in mtr, do not initialize the page. */
__attribute__((warn_unused_result, nonnull));
/**********************************************************************//**
Reserves free pages from a tablespace. All mini-transactions which may
use several pages from the tablespace should call this function beforehand
......
/*****************************************************************************
Copyright (c) 1995, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1995, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -50,7 +50,9 @@ first 3 values must be RW_S_LATCH, RW_X_LATCH, RW_NO_LATCH */
#define MTR_MEMO_PAGE_S_FIX RW_S_LATCH
#define MTR_MEMO_PAGE_X_FIX RW_X_LATCH
#define MTR_MEMO_BUF_FIX RW_NO_LATCH
#define MTR_MEMO_MODIFY 54
#ifdef UNIV_DEBUG
# define MTR_MEMO_MODIFY 54
#endif /* UNIV_DEBUG */
#define MTR_MEMO_S_LOCK 55
#define MTR_MEMO_X_LOCK 56
......@@ -378,11 +380,14 @@ struct mtr_struct{
dyn_array_t memo; /*!< memo stack for locks etc. */
dyn_array_t log; /*!< mini-transaction log */
ibool modifications;
/* TRUE if the mtr made modifications to
buffer pool pages */
/*!< TRUE if the mini-transaction
modified buffer pool pages */
ulint n_log_recs;
/* count of how many page initial log records
have been written to the mtr log */
ulint n_freed_pages;
/* number of pages that have been freed in
this mini-transaction */
ulint log_mode; /* specifies which operations should be
logged; default value MTR_LOG_ALL */
ib_uint64_t start_lsn;/* start lsn of the possible log entry for
......
/*****************************************************************************
Copyright (c) 1995, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1995, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -45,6 +45,7 @@ mtr_start(
mtr->log_mode = MTR_LOG_ALL;
mtr->modifications = FALSE;
mtr->n_log_recs = 0;
mtr->n_freed_pages = 0;
ut_d(mtr->state = MTR_ACTIVE);
ut_d(mtr->magic_n = MTR_MAGIC_N);
......
/*****************************************************************************
Copyright (c) 1994, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1994, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -281,16 +281,42 @@ page_get_supremum_offset(
const page_t* page); /*!< in: page which must have record(s) */
#define page_get_infimum_rec(page) ((page) + page_get_infimum_offset(page))
#define page_get_supremum_rec(page) ((page) + page_get_supremum_offset(page))
/************************************************************//**
Returns the middle record of record list. If there are an even number
of records in the list, returns the first record of upper half-list.
@return middle record */
Returns the nth record of the record list.
This is the inverse function of page_rec_get_n_recs_before().
@return nth record */
UNIV_INTERN
const rec_t*
page_rec_get_nth_const(
/*===================*/
const page_t* page, /*!< in: page */
ulint nth) /*!< in: nth record */
__attribute__((nonnull, warn_unused_result));
/************************************************************//**
Returns the nth record of the record list.
This is the inverse function of page_rec_get_n_recs_before().
@return nth record */
UNIV_INLINE
rec_t*
page_rec_get_nth(
/*=============*/
page_t* page, /*< in: page */
ulint nth) /*!< in: nth record */
__attribute__((nonnull, warn_unused_result));
#ifndef UNIV_HOTBACKUP
/************************************************************//**
Returns the middle record of the records on the page. If there is an
even number of records in the list, returns the first record of the
upper half-list.
@return middle record */
UNIV_INLINE
rec_t*
page_get_middle_rec(
/*================*/
page_t* page); /*!< in: page */
#ifndef UNIV_HOTBACKUP
page_t* page) /*!< in: page */
__attribute__((nonnull, warn_unused_result));
/*************************************************************//**
Compares a data tuple to a physical record. Differs from the function
cmp_dtuple_rec_with_match in the way that the record must reside on an
......@@ -345,6 +371,7 @@ page_get_n_recs(
/***************************************************************//**
Returns the number of records before the given record in chain.
The number includes infimum and supremum records.
This is the inverse function of page_rec_get_nth().
@return number of records */
UNIV_INTERN
ulint
......
/*****************************************************************************
Copyright (c) 1994, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1994, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -420,7 +420,37 @@ page_rec_is_infimum(
return(page_rec_is_infimum_low(page_offset(rec)));
}
/************************************************************//**
Returns the nth record of the record list.
This is the inverse function of page_rec_get_n_recs_before().
@return nth record */
UNIV_INLINE
rec_t*
page_rec_get_nth(
/*=============*/
page_t* page, /*!< in: page */
ulint nth) /*!< in: nth record */
{
return((rec_t*) page_rec_get_nth_const(page, nth));
}
#ifndef UNIV_HOTBACKUP
/************************************************************//**
Returns the middle record of the records on the page. If there is an
even number of records in the list, returns the first record of the
upper half-list.
@return middle record */
UNIV_INLINE
rec_t*
page_get_middle_rec(
/*================*/
page_t* page) /*!< in: page */
{
ulint middle = (page_get_n_recs(page) + PAGE_HEAP_NO_USER_LOW) / 2;
return(page_rec_get_nth(page, middle));
}
/*************************************************************//**
Compares a data tuple to a physical record. Differs from the function
cmp_dtuple_rec_with_match in the way that the record must reside on an
......
/*****************************************************************************
Copyright (c) 1996, 2009, Innobase Oy. All Rights Reserved.
Copyright (c) 1996, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -107,6 +107,7 @@ trx_undo_rec_copy(
len = mach_read_from_2(undo_rec)
- ut_align_offset(undo_rec, UNIV_PAGE_SIZE);
ut_ad(len < UNIV_PAGE_SIZE);
return(mem_heap_dup(heap, undo_rec, len));
}
#endif /* !UNIV_HOTBACKUP */
/*****************************************************************************
Copyright (c) 1996, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1996, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -194,16 +194,17 @@ trx_undo_get_first_rec(
mtr_t* mtr); /*!< in: mtr */
/********************************************************************//**
Tries to add a page to the undo log segment where the undo log is placed.
@return page number if success, else FIL_NULL */
@return X-latched block if success, else NULL */
UNIV_INTERN
ulint
buf_block_t*
trx_undo_add_page(
/*==============*/
trx_t* trx, /*!< in: transaction */
trx_undo_t* undo, /*!< in: undo log memory object */
mtr_t* mtr); /*!< in: mtr which does not have a latch to any
mtr_t* mtr) /*!< in: mtr which does not have a latch to any
undo log page; the caller must have reserved
the rollback segment mutex */
__attribute__((nonnull, warn_unused_result));
/********************************************************************//**
Frees the last undo log page.
The caller must hold the rollback segment mutex. */
......
/*****************************************************************************
Copyright (c) 1994, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1994, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -1180,14 +1180,15 @@ page_cur_insert_rec_zip_reorg(
/* Before trying to reorganize the page,
store the number of preceding records on the page. */
pos = page_rec_get_n_recs_before(rec);
ut_ad(pos > 0);
if (page_zip_reorganize(block, index, mtr)) {
/* The page was reorganized: Find rec by seeking to pos,
and update *current_rec. */
if (pos > 1) {
rec = page_rec_get_nth(page, pos - 1);
} else {
rec = page + PAGE_NEW_INFIMUM;
while (--pos) {
rec = page + rec_get_next_offs(rec, TRUE);
}
*current_rec = rec;
......@@ -1283,6 +1284,12 @@ page_cur_insert_rec_zip(
insert_rec = page_cur_insert_rec_zip_reorg(
current_rec, block, index, insert_rec,
page, page_zip, mtr);
#ifdef UNIV_DEBUG
if (insert_rec) {
rec_offs_make_valid(
insert_rec, index, offsets);
}
#endif /* UNIV_DEBUG */
}
return(insert_rec);
......
/*****************************************************************************
Copyright (c) 1994, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1994, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -1475,55 +1475,54 @@ page_dir_balance_slot(
}
}
#ifndef UNIV_HOTBACKUP
/************************************************************//**
Returns the middle record of the record list. If there are an even number
of records in the list, returns the first record of the upper half-list.
@return middle record */
Returns the nth record of the record list.
This is the inverse function of page_rec_get_n_recs_before().
@return nth record */
UNIV_INTERN
rec_t*
page_get_middle_rec(
/*================*/
page_t* page) /*!< in: page */
const rec_t*
page_rec_get_nth_const(
/*===================*/
const page_t* page, /*!< in: page */
ulint nth) /*!< in: nth record */
{
page_dir_slot_t* slot;
ulint middle;
const page_dir_slot_t* slot;
ulint i;
ulint n_owned;
ulint count;
rec_t* rec;
/* This many records we must leave behind */
middle = (page_get_n_recs(page) + PAGE_HEAP_NO_USER_LOW) / 2;
const rec_t* rec;
count = 0;
ut_ad(nth < UNIV_PAGE_SIZE / (REC_N_NEW_EXTRA_BYTES + 1));
for (i = 0;; i++) {
slot = page_dir_get_nth_slot(page, i);
n_owned = page_dir_slot_get_n_owned(slot);
if (count + n_owned > middle) {
if (n_owned > nth) {
break;
} else {
count += n_owned;
nth -= n_owned;
}
}
ut_ad(i > 0);
slot = page_dir_get_nth_slot(page, i - 1);
rec = (rec_t*) page_dir_slot_get_rec(slot);
rec = page_rec_get_next(rec);
rec = page_dir_slot_get_rec(slot);
/* There are now count records behind rec */
for (i = 0; i < middle - count; i++) {
rec = page_rec_get_next(rec);
if (page_is_comp(page)) {
do {
rec = page_rec_get_next_low(rec, TRUE);
ut_ad(rec);
} while (nth--);
} else {
do {
rec = page_rec_get_next_low(rec, FALSE);
ut_ad(rec);
} while (nth--);
}
return(rec);
}
#endif /* !UNIV_HOTBACKUP */
/***************************************************************//**
Returns the number of records before the given record in chain.
......@@ -1585,6 +1584,7 @@ page_rec_get_n_recs_before(
n--;
ut_ad(n >= 0);
ut_ad(n < UNIV_PAGE_SIZE / (REC_N_NEW_EXTRA_BYTES + 1));
return((ulint) n);
}
......
/*****************************************************************************
Copyright (c) 1996, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1996, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -354,8 +354,8 @@ row_ins_clust_index_entry_by_modify(
return(DB_LOCK_TABLE_FULL);
}
err = btr_cur_pessimistic_update(0, cursor,
heap, big_rec, update,
err = btr_cur_pessimistic_update(
BTR_KEEP_POS_FLAG, cursor, heap, big_rec, update,
0, thr, mtr);
}
......@@ -1998,6 +1998,7 @@ row_ins_index_entry_low(
ulint modify = 0; /* remove warning */
rec_t* insert_rec;
rec_t* rec;
ulint* offsets;
ulint err;
ulint n_unique;
big_rec_t* big_rec = NULL;
......@@ -2101,6 +2102,64 @@ row_ins_index_entry_low(
err = row_ins_clust_index_entry_by_modify(
mode, &cursor, &heap, &big_rec, entry,
thr, &mtr);
if (big_rec) {
ut_a(err == DB_SUCCESS);
/* Write out the externally stored
columns while still x-latching
index->lock and block->lock. Allocate
pages for big_rec in the mtr that
modified the B-tree, but be sure to skip
any pages that were freed in mtr. We will
write out the big_rec pages before
committing the B-tree mini-transaction. If
the system crashes so that crash recovery
will not replay the mtr_commit(&mtr), the
big_rec pages will be left orphaned until
the pages are allocated for something else.
TODO: If the allocation extends the
tablespace, it will not be redo
logged, in either mini-transaction.
Tablespace extension should be
redo-logged in the big_rec
mini-transaction, so that recovery
will not fail when the big_rec was
written to the extended portion of the
file, in case the file was somehow
truncated in the crash. */
rec = btr_cur_get_rec(&cursor);
offsets = rec_get_offsets(
rec, index, NULL,
ULINT_UNDEFINED, &heap);
DEBUG_SYNC_C("before_row_ins_upd_extern");
err = btr_store_big_rec_extern_fields(
index, btr_cur_get_block(&cursor),
rec, offsets, big_rec, &mtr,
BTR_STORE_INSERT_UPDATE);
DEBUG_SYNC_C("after_row_ins_upd_extern");
/* If writing big_rec fails (for
example, because of DB_OUT_OF_FILE_SPACE),
the record will be corrupted. Even if
we did not update any externally
stored columns, our update could cause
the record to grow so that a
non-updated column was selected for
external storage. This non-update
would not have been written to the
undo log, and thus the record cannot
be rolled back.
However, because we have not executed
mtr_commit(mtr) yet, the update will
not be replayed in crash recovery, and
the following assertion failure will
effectively "roll back" the operation. */
ut_a(err == DB_SUCCESS);
goto stored_big_rec;
}
} else {
ut_ad(!n_ext);
err = row_ins_sec_index_entry_by_modify(
......@@ -2129,9 +2188,6 @@ function_exit:
mtr_commit(&mtr);
if (UNIV_LIKELY_NULL(big_rec)) {
rec_t* rec;
ulint* offsets;
DBUG_EXECUTE_IF(
"row_ins_extern_checkpoint",
log_make_checkpoint_at(IB_ULONGLONG_MAX, TRUE););
......@@ -2146,12 +2202,13 @@ function_exit:
offsets = rec_get_offsets(rec, index, NULL,
ULINT_UNDEFINED, &heap);
DEBUG_SYNC_C("before_row_ins_upd_extern");
DEBUG_SYNC_C("before_row_ins_extern");
err = btr_store_big_rec_extern_fields(
index, btr_cur_get_block(&cursor),
rec, offsets, &mtr, FALSE, big_rec);
DEBUG_SYNC_C("after_row_ins_upd_extern");
rec, offsets, big_rec, &mtr, BTR_STORE_INSERT);
DEBUG_SYNC_C("after_row_ins_extern");
stored_big_rec:
if (modify) {
dtuple_big_rec_free(big_rec);
} else {
......
/*****************************************************************************
Copyright (c) 1996, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1996, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -243,19 +243,20 @@ row_build(
}
#if defined UNIV_DEBUG || defined UNIV_BLOB_LIGHT_DEBUG
/* This condition can occur during crash recovery before
trx_rollback_active() has completed execution.
if (rec_offs_any_null_extern(rec, offsets)) {
/* This condition can occur during crash recovery
before trx_rollback_active() has completed execution.
This condition is possible if the server crashed
during an insert or update before
during an insert or update-by-delete-and-insert before
btr_store_big_rec_extern_fields() did mtr_commit() all
BLOB pointers to the clustered index record.
If the record contains a null BLOB pointer, look up the
transaction that holds the implicit lock on this record, and
assert that it was recovered (and will soon be rolled back). */
ut_a(!rec_offs_any_null_extern(rec, offsets)
|| trx_assert_recovered(row_get_rec_trx_id(rec, index, offsets)));
BLOB pointers to the freshly inserted clustered index
record. */
ut_a(trx_assert_recovered(
row_get_rec_trx_id(rec, index, offsets)));
ut_a(trx_undo_roll_ptr_is_insert(
row_get_rec_roll_ptr(rec, index, offsets)));
}
#endif /* UNIV_DEBUG || UNIV_BLOB_LIGHT_DEBUG */
if (type != ROW_COPY_POINTERS) {
......
/*****************************************************************************
Copyright (c) 1996, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1996, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -1978,33 +1978,62 @@ row_upd_clust_rec(
ut_ad(!rec_get_deleted_flag(btr_pcur_get_rec(pcur),
dict_table_is_comp(index->table)));
err = btr_cur_pessimistic_update(BTR_NO_LOCKING_FLAG, btr_cur,
&heap, &big_rec, node->update,
node->cmpl_info, thr, mtr);
mtr_commit(mtr);
if (err == DB_SUCCESS && big_rec) {
err = btr_cur_pessimistic_update(
BTR_NO_LOCKING_FLAG | BTR_KEEP_POS_FLAG, btr_cur,
&heap, &big_rec, node->update, node->cmpl_info, thr, mtr);
if (big_rec) {
ulint offsets_[REC_OFFS_NORMAL_SIZE];
rec_t* rec;
rec_offs_init(offsets_);
DBUG_EXECUTE_IF(
"row_upd_extern_checkpoint",
log_make_checkpoint_at(IB_ULONGLONG_MAX, TRUE););
ut_a(err == DB_SUCCESS);
/* Write out the externally stored
columns while still x-latching
index->lock and block->lock. Allocate
pages for big_rec in the mtr that
modified the B-tree, but be sure to skip
any pages that were freed in mtr. We will
write out the big_rec pages before
committing the B-tree mini-transaction. If
the system crashes so that crash recovery
will not replay the mtr_commit(&mtr), the
big_rec pages will be left orphaned until
the pages are allocated for something else.
TODO: If the allocation extends the tablespace, it
will not be redo logged, in either mini-transaction.
Tablespace extension should be redo-logged in the
big_rec mini-transaction, so that recovery will not
fail when the big_rec was written to the extended
portion of the file, in case the file was somehow
truncated in the crash. */
mtr_start(mtr);
ut_a(btr_pcur_restore_position(BTR_MODIFY_TREE, pcur, mtr));
rec = btr_cur_get_rec(btr_cur);
DEBUG_SYNC_C("before_row_upd_extern");
err = btr_store_big_rec_extern_fields(
index, btr_cur_get_block(btr_cur), rec,
rec_get_offsets(rec, index, offsets_,
ULINT_UNDEFINED, &heap),
mtr, TRUE, big_rec);
big_rec, mtr, BTR_STORE_UPDATE);
DEBUG_SYNC_C("after_row_upd_extern");
mtr_commit(mtr);
/* If writing big_rec fails (for example, because of
DB_OUT_OF_FILE_SPACE), the record will be corrupted.
Even if we did not update any externally stored
columns, our update could cause the record to grow so
that a non-updated column was selected for external
storage. This non-update would not have been written
to the undo log, and thus the record cannot be rolled
back.
However, because we have not executed mtr_commit(mtr)
yet, the update will not be replayed in crash
recovery, and the following assertion failure will
effectively "roll back" the operation. */
ut_a(err == DB_SUCCESS);
}
mtr_commit(mtr);
if (UNIV_LIKELY_NULL(heap)) {
mem_heap_free(heap);
}
......
/*****************************************************************************
Copyright (c) 1996, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1996, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -1176,6 +1176,7 @@ trx_undo_report_row_operation(
trx_t* trx;
trx_undo_t* undo;
ulint page_no;
buf_block_t* undo_block;
trx_rseg_t* rseg;
mtr_t mtr;
ulint err = DB_SUCCESS;
......@@ -1218,10 +1219,13 @@ trx_undo_report_row_operation(
if (UNIV_UNLIKELY(!undo)) {
/* Did not succeed */
ut_ad(err != DB_SUCCESS);
mutex_exit(&(trx->undo_mutex));
return(err);
}
ut_ad(err == DB_SUCCESS);
} else {
ut_ad(op_type == TRX_UNDO_MODIFY_OP);
......@@ -1235,30 +1239,30 @@ trx_undo_report_row_operation(
if (UNIV_UNLIKELY(!undo)) {
/* Did not succeed */
ut_ad(err != DB_SUCCESS);
mutex_exit(&(trx->undo_mutex));
return(err);
}
ut_ad(err == DB_SUCCESS);
offsets = rec_get_offsets(rec, index, offsets,
ULINT_UNDEFINED, &heap);
}
page_no = undo->last_page_no;
mtr_start(&mtr);
page_no = undo->last_page_no;
undo_block = buf_page_get_gen(
undo->space, undo->zip_size, page_no, RW_X_LATCH,
undo->guess_block, BUF_GET, __FILE__, __LINE__, &mtr);
buf_block_dbg_add_level(undo_block, SYNC_TRX_UNDO_PAGE);
do {
buf_block_t* undo_block;
page_t* undo_page;
ulint offset;
undo_block = buf_page_get_gen(undo->space, undo->zip_size,
page_no, RW_X_LATCH,
undo->guess_block, BUF_GET,
__FILE__, __LINE__, &mtr);
buf_block_dbg_add_level(undo_block, SYNC_TRX_UNDO_PAGE);
undo_page = buf_block_get_frame(undo_block);
ut_ad(page_no == buf_block_get_page_no(undo_block));
if (op_type == TRX_UNDO_INSERT_OP) {
offset = trx_undo_page_report_insert(
......@@ -1335,12 +1339,12 @@ trx_undo_report_row_operation(
a pessimistic insert in a B-tree, and we must reserve the
counterpart of the tree latch, which is the rseg mutex. */
mutex_enter(&(rseg->mutex));
page_no = trx_undo_add_page(trx, undo, &mtr);
mutex_enter(&rseg->mutex);
undo_block = trx_undo_add_page(trx, undo, &mtr);
mutex_exit(&rseg->mutex);
mutex_exit(&(rseg->mutex));
} while (UNIV_LIKELY(page_no != FIL_NULL));
page_no = undo->last_page_no;
} while (undo_block != NULL);
/* Did not succeed: out of space */
err = DB_OUT_OF_FILE_SPACE;
......
/*****************************************************************************
Copyright (c) 1996, 2010, Innobase Oy. All Rights Reserved.
Copyright (c) 1996, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -247,9 +247,7 @@ trx_sys_create_doublewrite_buf(void)
{
buf_block_t* block;
buf_block_t* block2;
#ifdef UNIV_SYNC_DEBUG
buf_block_t* new_block;
#endif /* UNIV_SYNC_DEBUG */
byte* doublewrite;
byte* fseg_header;
ulint page_no;
......@@ -328,10 +326,9 @@ start_again:
for (i = 0; i < 2 * TRX_SYS_DOUBLEWRITE_BLOCK_SIZE
+ FSP_EXTENT_SIZE / 2; i++) {
page_no = fseg_alloc_free_page(fseg_header,
prev_page_no + 1,
FSP_UP, &mtr);
if (page_no == FIL_NULL) {
new_block = fseg_alloc_free_page(
fseg_header, prev_page_no + 1, FSP_UP, &mtr);
if (new_block == NULL) {
fprintf(stderr,
"InnoDB: Cannot create doublewrite"
" buffer: you must\n"
......@@ -352,13 +349,8 @@ start_again:
the page position in the tablespace, then the page
has not been written to in doublewrite. */
#ifdef UNIV_SYNC_DEBUG
new_block =
#endif /* UNIV_SYNC_DEBUG */
buf_page_get(TRX_SYS_SPACE, 0, page_no,
RW_X_LATCH, &mtr);
buf_block_dbg_add_level(new_block,
SYNC_NO_ORDER_CHECK);
ut_ad(rw_lock_get_x_lock_count(&new_block->lock) == 1);
page_no = buf_block_get_page_no(new_block);
if (i == FSP_EXTENT_SIZE / 2) {
ut_a(page_no == FSP_EXTENT_SIZE);
......
/*****************************************************************************
Copyright (c) 1996, 2011, Oracle and/or its affiliates. All Rights Reserved.
Copyright (c) 1996, 2012, Oracle and/or its affiliates. All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
......@@ -11,8 +11,8 @@ ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA
this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA
*****************************************************************************/
......@@ -870,9 +870,9 @@ trx_undo_discard_latest_update_undo(
#ifndef UNIV_HOTBACKUP
/********************************************************************//**
Tries to add a page to the undo log segment where the undo log is placed.
@return page number if success, else FIL_NULL */
@return X-latched block if success, else NULL */
UNIV_INTERN
ulint
buf_block_t*
trx_undo_add_page(
/*==============*/
trx_t* trx, /*!< in: transaction */
......@@ -882,11 +882,10 @@ trx_undo_add_page(
the rollback segment mutex */
{
page_t* header_page;
buf_block_t* new_block;
page_t* new_page;
trx_rseg_t* rseg;
ulint page_no;
ulint n_reserved;
ibool success;
ut_ad(mutex_own(&(trx->undo_mutex)));
ut_ad(!mutex_own(&kernel_mutex));
......@@ -896,37 +895,37 @@ trx_undo_add_page(
if (rseg->curr_size == rseg->max_size) {
return(FIL_NULL);
return(NULL);
}
header_page = trx_undo_page_get(undo->space, undo->zip_size,
undo->hdr_page_no, mtr);
success = fsp_reserve_free_extents(&n_reserved, undo->space, 1,
FSP_UNDO, mtr);
if (!success) {
if (!fsp_reserve_free_extents(&n_reserved, undo->space, 1,
FSP_UNDO, mtr)) {
return(FIL_NULL);
return(NULL);
}
page_no = fseg_alloc_free_page_general(header_page + TRX_UNDO_SEG_HDR
+ TRX_UNDO_FSEG_HEADER,
undo->top_page_no + 1, FSP_UP,
TRUE, mtr);
new_block = fseg_alloc_free_page_general(
TRX_UNDO_SEG_HDR + TRX_UNDO_FSEG_HEADER
+ header_page,
undo->top_page_no + 1, FSP_UP, TRUE, mtr, mtr);
fil_space_release_free_extents(undo->space, n_reserved);
if (page_no == FIL_NULL) {
if (new_block == NULL) {
/* No space left */
return(FIL_NULL);
return(NULL);
}
undo->last_page_no = page_no;
ut_ad(rw_lock_get_x_lock_count(&new_block->lock) == 1);
buf_block_dbg_add_level(new_block, SYNC_TRX_UNDO_PAGE);
undo->last_page_no = buf_block_get_page_no(new_block);
new_page = trx_undo_page_get(undo->space, undo->zip_size,
page_no, mtr);
new_page = buf_block_get_frame(new_block);
trx_undo_page_init(new_page, undo->type, mtr);
......@@ -935,7 +934,7 @@ trx_undo_add_page(
undo->size++;
rseg->curr_size++;
return(page_no);
return(new_block);
}
/********************************************************************//**
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment