Commit 7dc28cb3 authored by Bradley C. Kuszmaul's avatar Bradley C. Kuszmaul

Move texi from doc to man. Addresses #52.

git-svn-id: file:///svn/tokudb@1161 c7de825b-a66e-492c-adef-691d508d4ae1
parent b945ee9e
MANPAGES = tdb_create
MANPAGES_TEXI = $(patsubst %,%.texi,$(MANPAGES))
MANPAGES_POD = $(patsubst %,%.pod,$(MANPAGES))
MANPAGES_3 = $(patsubst %,%.3,$(MANPAGES))
SECTIONS = intro
SECTIONS_TEXI = $(patsubst %,%.texi,$(SECTIONS))
default: $(MANPAGES_3) tokudb.dvi
tokudb.dvi: tokudb.texi $(MANPAGES_TEXI) $(SECTIONS_TEXI)
texi2dvi4a2ps tokudb.texi
$(MANPAGES_POD): everyman.texi
%.pod: %.texi
perl texi2pod.pl $< > $@
%.3: %.pod
pod2man --center "TokuDB Programmer's Manual" --section 3 $< > $@
@ifnottex
@c man begin CONFORMINGTO
The TokuDB embedded database provides a subset of the functionality of
the Berkeley DB. Programs that work with TokuDB probably work with
with most versions of Berkeley DB with only recompilation or
relinking. The database files are incompatible, however, so to
convert from one library to the other you would need to dump the
database with one library's tool and load it with the other's.
@c man end
@c man begin COPYRIGHT
Copyright @copyright{} 2007 Tokutek, Inc. All rights reserved.
@c man end
@end ifnottex
@node Introduction to TokuDB
@chapter Introduction to TokuDB
TokuDB is an embedded transactional database that provides very fast
insertions and range queries on dictionaries.
TokUDB provides transactions with full ACID properties.
TokuDB is a library that can be linked directly into applications,
making it an @dfn{embedded} database.
TokuDB provides an API that is very similar to the Berkeley@tie{}DB.
@node Dictionaries and Associative Memories
@section Dictionaries and Associative Memories
Here we describe what we mean by ``dictionary''.
A @dfn{dictionary} is an ordered set of key-data pairs.
@itemize @bullet
@item
An @dfn{insertion} stores a key-data pair into a dictionary.
(One of the API calls that inserts pairs is called @code{DB->put}.)
@item
A @dfn{point query} finds the data associated with a particular key.
(One of the API calls that executes a point query is called @code{DB->get}.)
@item
A @dfn{range query} iterates through all the key-value pairs in a
range. For example, in a phone book, you could step through all the
people whose last names are between @samp{Jones} and @samp{Smith}. In
the API, range queries are supported using cursors.
@end itemize
An @dfn{associative memory} is like a dictionary but it supports only
insertions and point queries, but not range queries. Often hash
tables are used to implement associative memories. (In some
languages, such as Perl, an ``dictionary'' is really an associative
memory.)
@node Asymptotic Performance of Fractal Trees
@section Asymptotic Performance of Fractal Trees
For in-memory dictionaries, trees are often the data structure of
choice. For example, red-black trees or AVL trees implement all of
the above operations in @math{O(\log N)} time for in-memory data
structures, where $math{N} is the number of entries in the dictionary
(assuming constant-size key-data pairs).
For disk-resident dictionaries, most systems (such as
Berkeley@tie{}DB) employ B-trees, which can implement insertions and
point queries with @math{O(\log_B N)} disk operations in the worst
case, where @math{B} is the block size of the B-tree (assuming
constant-size key-data pairs).
The @dfn{effective block size} of a disk is the size at which the
disk-head movement does not dominate the cost of storing or retrieving
data off the disk. The effective block size is difficult to calculate
for actual disk systems, since it depends on how far the head must
move (the disk head moves short distances faster than long distances,
reducing the effective block size), whether the data is near the
center of the disk (where the bandwidth is small and hence the
effective block size tends to be small) or at the outer edge of the
disk (where the bandwidth is larger), and on how much prefetching the
operating system and disk firmware perform.
TokuDB, on the other hand, employs @dfn{fractal trees}, a new class of
data structures that provide the same API as B-trees, but are much
faster. Fractal tree point-queries have the same asymptotic cost as
B-trees (@math{O(\log_B N)}), but insertions require only
@math{O((\log N)/B)} disk operations on average, where @math{B} is the
effective block size of the disk system.
How much faster is that in concrete terms? In 2007, the effective
block size of a disk is about a megabyte. If you use 100-byte
key-data pairs, then you can fit about 10,000 pairs into an effective
block. If you have a 100 Terabyte database then @math{N=2^{40}}, so
on average a fractal tree requires about @math{(\log N)/B = 40/10000 =
1/250} of a disk transfer per insertion. That is, on average, you can
perform 250 insertions per disk I/O. In contrast, a traditional
B-tree equires @math{\log_B N = 2} disk I/O's per insertion. In
practice, because of caching in main memory, traditional B-trees
require 1 disk read and 1 disk write per insertion. Thus TokuDB's
insertions are orders of magnitude faster than dictionaries based on
traditional B-trees.
Range queries are also faster for TokuDB than in a typical B-tree.
Since B-trees usually use blocks sizes that relatively small compared
to the effective block size of the disk they require many disk-head
movements to traverse a range. Fractal trees use block sizes that are
the effective block size of the disk.
@section @code{db_create}
@setfilename tokudb
@settitle db_create
@c man title db_create tokudb
@unnumberedsubsec Synopsis
@c man begin SYNOPSIS
@code{#include <db.h>}
@code{int db_create(DB **}@var{dbp}@code{, DB_ENV *}@var{env}@code{, u_int32_t }@var{flags}@code{);}
@c man end
@unnumberedsubsec Description
@c man begin DESCRIPTION
@code{db_create} creates a @code{DB} handle, allocating memory and initializing its contents.
After creating the handle, you open a database using @code{(*}@var{dbp}@code{)->open(...)}.
To free the memory associated with the handle call @code{(*}@var{dbp}@code{)->close(...)}
(or @code{->remove} or @code{->rename} or @code{->verify\}when those are ready).
The handle contains a field called @code{app_private} which is declared
as type @code{void*}. This field is provided for the use of the
application and is not further used by TokuDB. One typical use of
@code{app_private} is to pass information about the database to a
user-defined sort-order function.
@c man end
@unnumberedsubsec Parameters
@c man begin PARAMETERS
@table @var
@item dbp
A pointer to a @code{DB} variable. The @code{db_create} function does something like @code{*}@var{dbp}@code{=malloc(...)}.
@item env
If @var{env}@code{==NULL} then the database is not part of an environment
(hence it has no transactions, locking, or recovery).
Otherwise the database is created within the specified environment,
and inherits the properties of that environment. (For example if
logging for recovery is enabled by the environment, then the database
will have logging too.)
@item flags
The @var{flags} parameter must be set to 0.
@end table
@c man end
@unnumberedsubsec Return Value
@c man begin RETURNVALUE
Returns zero on success. The following non-zero errors can be returned:
@table @code
@item EINVAL
You passed invalid parameters to this operation.
@end table
@c man end
@unnumberedsubsec Notes
@c man begin NOTES
TokuDB allows at most a few thousand databases per file.
@c man end
@include everyman.texi
@bye
#! /usr/bin/perl -w
# Copyright (C) 1999, 2000, 2001, 2003 Free Software Foundation, Inc.
# This file is part of GCC.
# GCC is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2, or (at your option)
# any later version.
# GCC is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with GCC; see the file COPYING. If not, write to
# the Free Software Foundation, 51 Franklin Street, Fifth Floor,
# Boston MA 02110-1301, USA.
# This does trivial (and I mean _trivial_) conversion of Texinfo
# markup to Perl POD format. It's intended to be used to extract
# something suitable for a manpage from a Texinfo document.
$output = 0;
$skipping = 0;
%sects = ();
$section = "";
@icstack = ();
@endwstack = ();
@skstack = ();
@instack = ();
$shift = "";
%defs = ();
$fnno = 1;
$inf = "";
$ibase = "";
@ipath = ();
while ($_ = shift) {
if (/^-D(.*)$/) {
if ($1 ne "") {
$flag = $1;
} else {
$flag = shift;
}
$value = "";
($flag, $value) = ($flag =~ /^([^=]+)(?:=(.+))?/);
die "no flag specified for -D\n"
unless $flag ne "";
die "flags may only contain letters, digits, hyphens, dashes and underscores\n"
unless $flag =~ /^[a-zA-Z0-9_-]+$/;
$defs{$flag} = $value;
} elsif (/^-I(.*)$/) {
if ($1 ne "") {
$flag = $1;
} else {
$flag = shift;
}
push (@ipath, $flag);
} elsif (/^-/) {
usage();
} else {
$in = $_, next unless defined $in;
$out = $_, next unless defined $out;
usage();
}
}
if (defined $in) {
$inf = gensym();
open($inf, "<$in") or die "opening \"$in\": $!\n";
$ibase = $1 if $in =~ m|^(.+)/[^/]+$|;
} else {
$inf = \*STDIN;
}
if (defined $out) {
open(STDOUT, ">$out") or die "opening \"$out\": $!\n";
}
while(defined $inf) {
while(<$inf>) {
# Certain commands are discarded without further processing.
/^\@(?:
[a-z]+index # @*index: useful only in complete manual
|need # @need: useful only in printed manual
|(?:end\s+)?group # @group .. @end group: ditto
|page # @page: ditto
|node # @node: useful only in .info file
|(?:end\s+)?ifnottex # @ifnottex .. @end ifnottex: use contents
)\b/x and next;
chomp;
# Look for filename and title markers.
/^\@setfilename\s+([^.]+)/ and $fn = $1, next;
/^\@settitle\s+([^.]+)/ and $tl = postprocess($1), next;
# Identify a man title but keep only the one we are interested in.
/^\@c\s+man\s+title\s+([A-Za-z0-9-]+)\s+(.+)/ and do {
if (exists $defs{$1}) {
$fn = $1;
$tl = postprocess($2);
}
next;
};
# Look for blocks surrounded by @c man begin SECTION ... @c man end.
# This really oughta be @ifman ... @end ifman and the like, but such
# would require rev'ing all other Texinfo translators.
/^\@c\s+man\s+begin\s+([A-Z]+)\s+([A-Za-z0-9-]+)/ and do {
$output = 1 if exists $defs{$2};
$sect = $1;
next;
};
/^\@c\s+man\s+begin\s+([A-Z]+)/ and $sect = $1, $output = 1, next;
/^\@c\s+man\s+end/ and do {
$sects{$sect} = "" unless exists $sects{$sect};
$sects{$sect} .= postprocess($section);
$section = "";
$output = 0;
next;
};
# handle variables
/^\@set\s+([a-zA-Z0-9_-]+)\s*(.*)$/ and do {
$defs{$1} = $2;
next;
};
/^\@clear\s+([a-zA-Z0-9_-]+)/ and do {
delete $defs{$1};
next;
};
# I want the @include to be unconditional (not dependent on whether we are skipping. -Bradley
/^\@include\s+(.+)$/ and do {
push @instack, $inf;
$inf = gensym();
$file = postprocess($1);
# Try cwd and $ibase, then explicit -I paths.
$done = 0;
foreach $path ("", $ibase, @ipath) {
$mypath = $file;
$mypath = $path . "/" . $mypath if ($path ne "");
open($inf, "<" . $mypath) and ($done = 1, last);
}
die "cannot find $file" if !$done;
next;
};
next unless $output;
# Discard comments. (Can't do it above, because then we'd never see
# @c man lines.)
/^\@c\b/ and next;
# End-block handler goes up here because it needs to operate even
# if we are skipping.
/^\@end\s+([a-z]+)/ and do {
# Ignore @end foo, where foo is not an operation which may
# cause us to skip, if we are presently skipping.
my $ended = $1;
next if $skipping && $ended !~ /^(?:ifset|ifclear|ignore|menu|iftex|copying)$/;
die "\@end $ended without \@$ended at line $.\n" unless defined $endw;
die "\@$endw ended by \@end $ended at line $.\n" unless $ended eq $endw;
$endw = pop @endwstack;
if ($ended =~ /^(?:ifset|ifclear|ignore|menu|iftex)$/) {
$skipping = pop @skstack;
next;
} elsif ($ended =~ /^(?:example|smallexample|display)$/) {
$shift = "";
$_ = ""; # need a paragraph break
} elsif ($ended =~ /^(?:itemize|enumerate|[fv]?table)$/) {
$_ = "\n=back\n";
$ic = pop @icstack;
} elsif ($ended eq "multitable") {
$_ = "\n=back\n";
} else {
die "unknown command \@end $ended at line $.\n";
}
};
# We must handle commands which can cause skipping even while we
# are skipping, otherwise we will not process nested conditionals
# correctly.
/^\@ifset\s+([a-zA-Z0-9_-]+)/ and do {
push @endwstack, $endw;
push @skstack, $skipping;
$endw = "ifset";
$skipping = 1 unless exists $defs{$1};
next;
};
/^\@ifclear\s+([a-zA-Z0-9_-]+)/ and do {
push @endwstack, $endw;
push @skstack, $skipping;
$endw = "ifclear";
$skipping = 1 if exists $defs{$1};
next;
};
/^\@(ignore|menu|iftex|copying)\b/ and do {
push @endwstack, $endw;
push @skstack, $skipping;
$endw = $1;
$skipping = 1;
next;
};
next if $skipping;
# Character entities. First the ones that can be replaced by raw text
# or discarded outright:
s/\@copyright\{\}/(c)/g;
s/\@dots\{\}/.../g;
s/\@enddots\{\}/..../g;
s/\@([.!? ])/$1/g;
s/\@[:-]//g;
s/\@bullet(?:\{\})?/*/g;
s/\@TeX\{\}/TeX/g;
s/\@pounds\{\}/\#/g;
s/\@minus(?:\{\})?/-/g;
s/\\,/,/g;
# Now the ones that have to be replaced by special escapes
# (which will be turned back into text by unmunge())
s/&/&amp;/g;
s/\@\{/&lbrace;/g;
s/\@\}/&rbrace;/g;
s/\@\@/&at;/g;
# Inside a verbatim block, handle @var specially.
if ($shift ne "") {
s/\@var\{([^\}]*)\}/<$1>/g;
}
# POD doesn't interpret E<> inside a verbatim block.
if ($shift eq "") {
s/</&lt;/g;
s/>/&gt;/g;
} else {
s/</&LT;/g;
s/>/&GT;/g;
}
# Single line command handlers.
/^\@(?:section|unnumbered|unnumberedsec|center)\s+(.+)$/
and $_ = "\n=head2 $1\n";
/^\@subsection\s+(.+)$/
and $_ = "\n=head3 $1\n";
/^\@subsubsection\s+(.+)$/
and $_ = "\n=head4 $1\n";
# Block command handlers:
/^\@itemize(?:\s+(\@[a-z]+|\*|-))?/ and do {
push @endwstack, $endw;
push @icstack, $ic;
if (defined $1) {
$ic = $1;
} else {
$ic = '*';
}
$_ = "\n=over 4\n";
$endw = "itemize";
};
/^\@enumerate(?:\s+([a-zA-Z0-9]+))?/ and do {
push @endwstack, $endw;
push @icstack, $ic;
if (defined $1) {
$ic = $1 . ".";
} else {
$ic = "1.";
}
$_ = "\n=over 4\n";
$endw = "enumerate";
};
/^\@multitable\s.*/ and do {
push @endwstack, $endw;
$endw = "multitable";
$_ = "\n=over 4\n";
};
/^\@([fv]?table)\s+(\@[a-z]+)/ and do {
push @endwstack, $endw;
push @icstack, $ic;
$endw = $1;
$ic = $2;
$ic =~ s/\@(?:code|samp|strong|key|gcctabopt|env)/B/;
$ic =~ s/\@(?:kbd)/C/;
$ic =~ s/\@(?:dfn|var|emph|cite|i)/I/;
$ic =~ s/\@(?:file)/F/;
$_ = "\n=over 4\n";
};
/^\@((?:small)?example|display)/ and do {
push @endwstack, $endw;
$endw = $1;
$shift = "\t";
$_ = ""; # need a paragraph break
};
/^\@item\s+(.*\S)\s*$/ and $endw eq "multitable" and do {
@columns = ();
for $column (split (/\s*\@tab\s*/, $1)) {
# @strong{...} is used a @headitem work-alike
$column =~ s/^\@strong{(.*)}$/$1/;
push @columns, $column;
}
$_ = "\n=item ".join (" : ", @columns)."\n";
};
/^\@itemx?\s*(.+)?$/ and do {
if (defined $1) {
# Entity escapes prevent munging by the <> processing below.
$_ = "\n=item $ic\&LT;$1\&GT;\n";
} else {
$_ = "\n=item $ic\n";
$ic =~ y/A-Ya-y/B-Zb-z/;
$ic =~ s/(\d+)/$1 + 1/eg;
}
};
$section .= $shift.$_."\n";
}
# End of current file.
close($inf);
$inf = pop @instack;
}
die "No filename or title\n" unless defined $fn && defined $tl;
$sects{NAME} = "$fn \- $tl\n";
$sects{FOOTNOTES} .= "=back\n" if exists $sects{FOOTNOTES};
for $sect (qw(NAME SYNOPSIS DESCRIPTION PARAMETERS RETURNVALUE OPTIONS ENVIRONMENT FILES
BUGS CONFORMINGTO NOTES FOOTNOTES SEEALSO AUTHOR COPYRIGHT)) {
if(exists $sects{$sect}) {
$head = $sect;
$head =~ s/SEEALSO/SEE ALSO/;
$head =~ s/RETURNVALUE/RETURN VALUE/;
$head =~ s/CONFORMINGTO/CONFORMING TO/;
print "=head1 $head\n\n";
print scalar unmunge ($sects{$sect});
print "\n";
}
}
sub usage
{
die "usage: $0 [-D toggle...] [infile [outfile]]\n";
}
sub postprocess
{
local $_ = $_[0];
# @value{foo} is replaced by whatever 'foo' is defined as.
while (m/(\@value\{([a-zA-Z0-9_-]+)\})/g) {
if (! exists $defs{$2}) {
print STDERR "Option $2 not defined\n";
s/\Q$1\E//;
} else {
$value = $defs{$2};
s/\Q$1\E/$value/;
}
}
# Formatting commands.
# Temporary escape for @r.
s/\@r\{([^\}]*)\}/R<$1>/g;
s/\@(?:dfn|var|emph|cite|i)\{([^\}]*)\}/I<$1>/g;
s/\@(?:kbd)\{([^\}]*)\}/C<$1>/g;
s/\@(?:code|gccoptlist|samp|strong|key|option|env|command|b)\{([^\}]*)\}/B<$1>/g;
s/\@sc\{([^\}]*)\}/\U$1/g;
s/\@file\{([^\}]*)\}/F<$1>/g;
s/\@w\{([^\}]*)\}/S<$1>/g;
s/\@(?:dmn|math)\{([^\}]*)\}/$1/g;
# keep references of the form @ref{...}, print them bold
s/\@(?:ref)\{([^\}]*)\}/B<$1>/g;
# Change double single quotes to double quotes.
s/''/"/g;
s/``/"/g;
# Cross references are thrown away, as are @noindent and @refill.
# (@noindent is impossible in .pod, and @refill is unnecessary.)
# @* is also impossible in .pod; we discard it and any newline that
# follows it. Similarly, our macro @gol must be discarded.
s/\(?\@xref\{(?:[^\}]*)\}(?:[^.<]|(?:<[^<>]*>))*\.\)?//g;
s/\s+\(\@pxref\{(?:[^\}]*)\}\)//g;
s/;\s+\@pxref\{(?:[^\}]*)\}//g;
s/\@noindent\s*//g;
s/\@refill//g;
s/\@gol//g;
s/\@\*\s*\n?//g;
# Anchors are thrown away
s/\@anchor\{(?:[^\}]*)\}//g;
# @uref can take one, two, or three arguments, with different
# semantics each time. @url and @email are just like @uref with
# one argument, for our purposes.
s/\@(?:uref|url|email)\{([^\},]*)\}/&lt;B<$1>&gt;/g;
s/\@uref\{([^\},]*),([^\},]*)\}/$2 (C<$1>)/g;
s/\@uref\{([^\},]*),([^\},]*),([^\},]*)\}/$3/g;
# Un-escape <> at this point.
s/&LT;/</g;
s/&GT;/>/g;
# Now un-nest all B<>, I<>, R<>. Theoretically we could have
# indefinitely deep nesting; in practice, one level suffices.
1 while s/([BIR])<([^<>]*)([BIR])<([^<>]*)>/$1<$2>$3<$4>$1</g;
# Replace R<...> with bare ...; eliminate empty markup, B<>;
# shift white space at the ends of [BI]<...> expressions outside
# the expression.
s/R<([^<>]*)>/$1/g;
s/[BI]<>//g;
s/([BI])<(\s+)([^>]+)>/$2$1<$3>/g;
s/([BI])<([^>]+?)(\s+)>/$1<$2>$3/g;
# Extract footnotes. This has to be done after all other
# processing because otherwise the regexp will choke on formatting
# inside @footnote.
while (/\@footnote/g) {
s/\@footnote\{([^\}]+)\}/[$fnno]/;
add_footnote($1, $fnno);
$fnno++;
}
return $_;
}
sub unmunge
{
# Replace escaped symbols with their equivalents.
local $_ = $_[0];
s/&lt;/E<lt>/g;
s/&gt;/E<gt>/g;
s/&lbrace;/\{/g;
s/&rbrace;/\}/g;
s/&at;/\@/g;
s/&amp;/&/g;
return $_;
}
sub add_footnote
{
unless (exists $sects{FOOTNOTES}) {
$sects{FOOTNOTES} = "\n=over 4\n\n";
}
$sects{FOOTNOTES} .= "=item $fnno.\n\n"; $fnno++;
$sects{FOOTNOTES} .= $_[0];
$sects{FOOTNOTES} .= "\n\n";
}
# stolen from Symbol.pm
{
my $genseq = 0;
sub gensym
{
my $name = "GEN" . $genseq++;
my $ref = \*{$name};
delete $::{$name};
return $ref;
}
}
\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename tokudb.info
@settitle The TokuDB Embedded Database
@c %**end of header
@copying
Copyright @copyright{} 2007, Tokutek, Inc.
@end copying
@titlepage
@title TokuDB Programmmer's Manual
@subtitle for the Tokutek Embedded Database
@author Tokutek, Inc.
@page
@vskip 0pt plus 1filll
@insertcopying
@end titlepage
@contents
@ifnottex
@node Top
@top TokuDB
@menu tdb_create::
@insertcopying
@end ifnottex
@include intro.texi
@node TokuDB C API
@chapter TokuDB C API
@include tdb_create.texi
@node Index
@unnumbered Index
@printindex
@bye
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment