Commit 534bd650 authored by heikki@donna.mysql.fi's avatar heikki@donna.mysql.fi

manual.texi Updated InnoDB section to reflect rel. .41

parent 8c432605
...@@ -36019,6 +36019,10 @@ set-variable = innodb_file_io_threads=4 ...@@ -36019,6 +36019,10 @@ set-variable = innodb_file_io_threads=4
set-variable = innodb_lock_wait_timeout=50 set-variable = innodb_lock_wait_timeout=50
@end example @end example
Note that data files must be < 4G, and < 2G on
some file systems! InnoDB does not create directories:
you have to create them yourself.
Suppose you have a Linux machine with 512 MB RAM and Suppose you have a Linux machine with 512 MB RAM and
three 20 GB hard disks (at directory paths @file{/}, three 20 GB hard disks (at directory paths @file{/},
@file{/dr2} and @file{/dr3}). @file{/dr2} and @file{/dr3}).
...@@ -36129,13 +36133,6 @@ resolve the situation. ...@@ -36129,13 +36133,6 @@ resolve the situation.
(Available from 3.23.40 up.) (Available from 3.23.40 up.)
The default value for this is @code{fdatasync}. The default value for this is @code{fdatasync}.
Another option is @code{O_DSYNC}. Another option is @code{O_DSYNC}.
Options @code{littlesync} and @code{nosync} have the
risk that in an operating system crash or a power outage you may easily
end up with a half-written database page, and you have to do a recovery
from a backup. See the section "InnoDB performance tuning", item 6, below
for tips on how to set this parameter. If you are happy with your database
performance it is wisest not to specify this parameter at all, in which
case it will get the default value.
@end multitable @end multitable
...@@ -36495,6 +36492,7 @@ transaction. ...@@ -36495,6 +36492,7 @@ transaction.
* InnoDB Next-key locking:: * InnoDB Next-key locking::
* InnoDB Locks set:: * InnoDB Locks set::
* InnoDB Deadlock detection:: * InnoDB Deadlock detection::
* InnoDB Consistent read example::
@end menu @end menu
...@@ -36690,7 +36688,7 @@ locks. But that does not put transaction integerity into danger. ...@@ -36690,7 +36688,7 @@ locks. But that does not put transaction integerity into danger.
@end itemize @end itemize
@node InnoDB Deadlock detection, , InnoDB Locks set, InnoDB transaction model @node InnoDB Deadlock detection, InnoDB Consistent read example , InnoDB Locks set, InnoDB transaction model
@subsubsection Deadlock detection and rollback @subsubsection Deadlock detection and rollback
InnoDB automatically detects a deadlock of transactions and rolls InnoDB automatically detects a deadlock of transactions and rolls
...@@ -36709,6 +36707,56 @@ set by the SQL statement may be preserved. This is because InnoDB ...@@ -36709,6 +36707,56 @@ set by the SQL statement may be preserved. This is because InnoDB
stores row locks in a format where it cannot afterwards know which was stores row locks in a format where it cannot afterwards know which was
set by which SQL statement. set by which SQL statement.
@node InnoDB Consistent read example, , InnoDB Deadlock detection, InnoDB transaction model
@subsubsection An example of how the consistent read works in InnoDB
When you issue a consistent read, that is, an ordinary @code{SELECT}
statement, InnoDB will give your transaction a timepoint according
to which your query sees the database. Thus, if transaction B deletes
a row and commits after your timepoint was assigned, then you will
not see the row deleted. Similarly with inserts and updates.
You can advance your timepoint by committing your transaction
and then doing another @code{SELECT}.
This is called multiversioned concurrency control.
@example
User A User B
set autocommit=0; set autocommit=0;
time
| SELECT * FROM t;
| empty set
| INSERT INTO t VALUES (1, 2);
|
v SELECT * FROM t;
empty set
COMMIT;
SELECT * FROM t;
empty set;
COMMIT;
SELECT * FROM t;
----------------------
| 1 | 2 |
----------------------
@end example
Thus user A sees the row inserted by B only when B has committed the
insert, and A has committed his own transaction so that the timepoint
is advanced past the the commit of B.
If you want to see the 'freshest' state of the database, you should use
a locking read:
@example
SELECT * FROM t LOCK IN SHARE MODE;
@end example
@subsection Performance tuning tips @subsection Performance tuning tips
@strong{1.} @strong{1.}
...@@ -36751,15 +36799,6 @@ The default method InnoDB uses is the @code{fdatasync} function. ...@@ -36751,15 +36799,6 @@ The default method InnoDB uses is the @code{fdatasync} function.
If you are not satisfied with the database write performance, you may If you are not satisfied with the database write performance, you may
try setting @code{innodb_flush_method} in @file{my.cnf} try setting @code{innodb_flush_method} in @file{my.cnf}
to @code{O_DSYNC}, though O_DSYNC seems to be slower on most systems. to @code{O_DSYNC}, though O_DSYNC seems to be slower on most systems.
You can also try setting it to @code{littlesync}, which means that
InnoDB does not call the file flush for every write it does to a
file, but only
in log flush at transaction commits and data file flush at a checkpoint.
The drawback in @code{littlesync} is that if the operating system
crashes, you can easily end up with a half-written database page,
and you have to
do a recovery from a backup. With @code{nosync} you have even less safety:
InnoDB will only flush the database files to disk at database shutdown
@strong{7.} In importing data to InnoDB, make sure that MySQL does not have @strong{7.} In importing data to InnoDB, make sure that MySQL does not have
@code{autocommit=1} on. Then every insert requires a log flush to disk. @code{autocommit=1} on. Then every insert requires a log flush to disk.
...@@ -36806,6 +36845,153 @@ INSERT INTO yourtable VALUES (1, 2), (5, 5); ...@@ -36806,6 +36845,153 @@ INSERT INTO yourtable VALUES (1, 2), (5, 5);
This tip is of course valid for inserts into any table type, not just InnoDB. This tip is of course valid for inserts into any table type, not just InnoDB.
@subsubsection The InnoDB Monitor
Starting from version 3.23.41 InnoDB includes the InnoDB
Monitor which prints information on the InnoDB internal state.
When swithed on, InnoDB Monitor
will make the MySQL server to print data to the standard
output about once every 10 seconds. This data is useful in
performance tuning.
The printed information includes data on:
@itemize @bullet
@item
table and record locks held by each active transaction,
@item
lock waits of a transactions,
@item
semaphore waits of threads,
@item
pending file i/o requests,
@item
buffer pool statistics, and
@item
purge and insert buffer merge activity of the main thread
of InnoDB.
@end itemize
You can start InnoDB Monitor through the following
SQL command:
@example
CREATE TABLE innodb_monitor(a int) type = innodb;
@end example
and stop it by
@example
DROP TABLE innodb_monitor;
@end example
The @code{CREATE TABLE} syntax is just a way to pass a command
to the InnoDB engine through the MySQL SQL parser: the created
table is not relevant at all for InnoDB Monitor. If you shut down
the database when the monitor is running, and you want to start
the monitor again, you have to drop the
table before you can issue a new @code{CREATE TABLE}
to start the monitor.
This syntax may change in a future release.
A sample output of the InnoDB Monitor:
@example
================================
010809 18:45:06 INNODB MONITOR OUTPUT
================================
--------------------------
LOCKS HELD BY TRANSACTIONS
--------------------------
LOCK INFO:
Number of locks in the record hash table 1294
LOCKS FOR TRANSACTION ID 0 579342744
TABLE LOCK table test/mytable trx id 0 582333343 lock_mode IX
RECORD LOCKS space id 0 page no 12758 n bits 104 table test/mytable index
PRIMARY trx id 0 582333343 lock_mode X
Record lock, heap no 2 PHYSICAL RECORD: n_fields 74; 1-byte offs FALSE;
info bits 0
0: len 4; hex 0001a801; asc ;; 1: len 6; hex 000022b5b39f; asc ";; 2: len 7;
hex 000002001e03ec; asc ;; 3: len 4; hex 00000001;
...
-----------------------------------------------
CURRENT SEMAPHORES RESERVED AND SEMAPHORE WAITS
-----------------------------------------------
SYNC INFO:
Sorry, cannot give mutex list info in non-debug version!
Sorry, cannot give rw-lock list info in non-debug version!
-----------------------------------------------------
SYNC ARRAY INFO: reservation count 6041054, signal count 2913432
4a239430 waited for by thread 49627477 op. S-LOCK file NOT KNOWN line 0
Mut ex 0 sp 5530989 r 62038708 sys 2155035; rws 0 8257574 8025336; rwx 0 1121090 1848344
-----------------------------------------------------
CURRENT PENDING FILE I/O'S
--------------------------
Pending normal aio reads:
Reserved slot, messages 40157658 4a4a40b8
Reserved slot, messages 40157658 4a477e28
...
Reserved slot, messages 40157658 4a4424a8
Reserved slot, messages 40157658 4a39ea38
Total of 36 reserved aio slots
Pending aio writes:
Total of 0 reserved aio slots
Pending insert buffer aio reads:
Total of 0 reserved aio slots
Pending log writes or reads:
Reserved slot, messages 40158c98 40157f98
Total of 1 reserved aio slots
Pending synchronous reads or writes:
Total of 0 reserved aio slots
-----------
BUFFER POOL
-----------
LRU list length 8034
Free list length 0
Flush list length 999
Buffer pool size in pages 8192
Pending reads 39
Pending writes: LRU 0, flush list 0, single page 0
Pages read 31383918, created 51310, written 2985115
----------------------------
END OF INNODB MONITOR OUTPUT
============================
010809 18:45:22 InnoDB starts purge
010809 18:45:22 InnoDB purged 0 pages
@end example
Some notes on the output:
@itemize @bullet
@item
If the section LOCKS HELD
BY TRANSACTIONS reports lock waits, then your application
may have lock contention. The output can also help to
trace reasons for transaction deadlocks.
@item
Section SYNC INFO will report reserved semaphores
if you compile InnoDB with <code>UNIV_SYNC_DEBUG</code>
defined in <tt>univ.i</tt>.
@item
Section SYNC ARRAY INFO reports threads waiting
for a semaphore and statistics on how many times
threads have needed a spin or a wait on a mutex or
a rw-lock semaphore. A big number of threads waiting
for semaphores may be a result of disk i/o, or
contention problems inside InnoDB. Contention can be
due to heavy parallelism of queries, or problems in
operating system thread scheduling.
@item
Section CURRENT PENDING FILE I/O'S lists pending
file i/o requests. A large number of these indicates
that the workload is disk i/o -bound.
@item
Section BUFFER POOL gives you statistics
on pages read and written. You can calculate from these
numbers how many data file i/o's your queries are
currently doing.
@end itemize
@node Implementation, Table and index, InnoDB transaction model, InnoDB @node Implementation, Table and index, InnoDB transaction model, InnoDB
@subsection Implementation of multiversioning @subsection Implementation of multiversioning
...@@ -37044,12 +37230,42 @@ On Windows NT InnoDB uses non-buffered i/o. That means that the disk ...@@ -37044,12 +37230,42 @@ On Windows NT InnoDB uses non-buffered i/o. That means that the disk
pages InnoDB reads or writes are not buffered in the operating system pages InnoDB reads or writes are not buffered in the operating system
file cache. This saves some memory bandwidth. file cache. This saves some memory bandwidth.
You can also use a raw disk in InnoDB, though this has not been tested yet: Starting from 3.23.41 InnoDB uses a novel file flush technique
just define the raw disk in place of a data file in @file{my.cnf}. called doublewrite.
You must give the exact size in bytes of the raw disk in @file{my.cnf}, It adds safety to crash recovery after an operating system crash
because at startup InnoDB checks that the size of the file or a power outage, and improves performance on most Unix flavors
is the same as specified in the configuration file. Using a raw disk by reducing the need for fsync operations.
you can on some versions of Unix perform non-buffered i/o.
Doublewrite means that InnoDB before writing pages to a data file
first writes them to a contiguous tablespace area called the
doublewrite buffer. Only after the write and the flush to the doublewrite
buffer has completed, InnoDB writes the pages to their proper
positions in the data file. If the operating system crashes in the
middle of a page write, InnoDB will in recovery find a good
copy of the page from the doublewrite buffer.
Starting from 3.23.41
you can also use a raw disk partition as a data file, though this has
not been tested yet. When you create a new data file you have
to put the keyword @code{newraw} immediately after the data
file size in @code{innodb_data_file_path}. The partition must be
>= than you specify as the size. Note that 1M in InnoDB is
1024 x 1024 bytes, while in disk specifications 1 MB usually means
1000 000 bytes.
@example
innodb_data_file_path=hdd1:3Gnewraw;hdd2:2Gnewraw
@end example
When you start the database again you MUST change the keyword
to @code{raw}. Otherwise InnoDB will write over your
partition!
@example
innodb_data_file_path=hdd1:3Graw;hdd2:2Graw
@end example
Using a raw disk you can on some Unixes perform non-buffered i/o.
There are two read-ahead heuristics in InnoDB: sequential read-ahead There are two read-ahead heuristics in InnoDB: sequential read-ahead
and random read-ahead. In sequential read-ahead InnoDB notices that and random read-ahead. In sequential read-ahead InnoDB notices that
...@@ -37212,11 +37428,14 @@ the individual InnoDB tables first. ...@@ -37212,11 +37428,14 @@ the individual InnoDB tables first.
@item @item
The default database page size in InnoDB is 16 kB. By recompiling the The default database page size in InnoDB is 16 kB. By recompiling the
code one can set it from 8 kB to 64 kB. code one can set it from 8 kB to 64 kB.
The maximun row length is slightly less than a half of a database page, The maximun row length is slightly less than half of a database page
the row length also includes @code{BLOB} and @code{TEXT} type in versions <= 3.23.40 of InnoDB. Starting from source
columns. The restriction on the size of @code{BLOB} and release 3.23.41 BLOB and
@code{TEXT} columns will be removed by June 2001 in a future version of TEXT columns are allowed to be < 4 GB, the total row length must also be
InnoDB. < 4 GB. InnoDB does not store fields whose size is <= 30 bytes on separate
pages. After InnoDB has modified the row by storing long fields on
separate pages, the remaining length of the row must be slightly less
than half a database page.
@item @item
The maximum data or log file size is 2 GB or 4 GB depending on how large The maximum data or log file size is 2 GB or 4 GB depending on how large
files your operating system supports. Support for > 4 GB files will files your operating system supports. Support for > 4 GB files will
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment