dbis.html 9.97 KB
Newer Older
unknown's avatar
unknown committed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
<!--$Id: dbis.so,v 10.5 2001/01/19 17:30:29 bostic Exp $-->
<!--Copyright 1997, 1998, 1999, 2000 by Sleepycat Software, Inc.-->
<!--All rights reserved.-->
<html>
<head>
<title>Berkeley DB Reference Guide: What is Berkeley DB?</title>
<meta name="description" content="Berkeley DB: An embedded database programmatic toolkit.">
<meta name="keywords" content="embedded,database,programmatic,toolkit,b+tree,btree,hash,hashing,transaction,transactions,locking,logging,access method,access methods,java,C,C++">
</head>
<body bgcolor=white>
<table><tr valign=top>
<td><h3><dl><dt>Berkeley DB Reference Guide:<dd>Introduction</dl></h3></td>
<td width="1%"><a href="../../ref/intro/terrain.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/intro/dbisnot.html"><img src="../../images/next.gif" alt="Next"></a>
</td></tr></table>
<p>
<h1 align=center>What is Berkeley DB?</h1>
<p>So far, we've discussed database systems in general terms. It's time
now to consider Berkeley DB in particular and see how it fits into the
framework we have introduced. The key question is, what kinds of
applications should use Berkeley DB?
<p>Berkeley DB is an open source embedded database library that provides
scalable, high-performance, transaction-protected data management
services to applications.  Berkeley DB provides a simple function-call API
for data access and management.
<p>By "open source," we mean that Berkeley DB is distributed under a license that
conforms to the <a href="http://www.opensource.org/osd.html">Open
Source Definition</a>. This license guarantees that Berkeley DB is freely
available for use and redistribution in other open source products.
<a href="http://www.sleepycat.com">Sleepycat Software</a> sells
commercial licenses for redistribution in proprietary applications, but
in all cases the complete source code for Berkeley DB is freely available for
download and use.
<p>Berkeley DB is embedded because it links directly into the application.  It
runs in the same address space as the application. As a result, no
inter-process communication, either over the network or between
processes on the same machine, is required for database operations.
Berkeley DB provides a simple function-call API for a number of programming
languages, including C, C++, Java, Perl, Tcl, Python, and PHP. All
database operations happen inside the library. Multiple processes, or
multiple threads in a single process, can all use the database at the
same time as each uses the Berkeley DB library. Low-level services like
locking, transaction logging, shared buffer management, memory
management, and so on are all handled transparently by the library.
<p>The library is extremely portable. It runs under almost all UNIX and
Linux variants, Windows, and a number of embedded real-time operating
systems. It runs on both 32-bit and 64-bit systems.
It has been deployed on high-end
Internet servers, desktop machines, and on palmtop computers, set-top
boxes, in network switches, and elsewhere. Once Berkeley DB is linked into
the application, the end user generally does not know that there's a
database present at all.
<p>Berkeley DB is scalable in a number of respects. The database library itself
is quite compact (under 300 kilobytes of text space on common
architectures), but it can manage databases up to 256 terabytes in size.
It also supports high concurrency, with thousands of users operating on
the same database at the same time. Berkeley DB is small enough to run in
tightly constrained embedded systems, but can take advantage of
gigabytes of memory and terabytes of disk on high-end server machines.
<p>Berkeley DB generally outperforms relational and object-oriented database
systems in embedded applications for a couple of reasons. First, because
the library runs in the same address space, no inter-process
communication is required for database operations. The cost of
communicating between processes on a single machine, or among machines
on a network, is much higher than the cost of making a function call.
Second, because Berkeley DB uses a simple function-call interface for all
operations, there is no query language to parse, and no execution plan
to produce.
<h3>Data Access Services</h3>
<p>Berkeley DB applications can choose the storage structure that best suits the
application. Berkeley DB supports hash tables, B-trees, simple
record-number-based storage, and persistent queues. Programmers can
create tables using any of these storage structures, and can mix
operations on different kinds of tables in a single application.
<p>Hash tables are generally good for very large databases that need
predictable search and update times for random-access records.  Hash
tables allow users to ask, "Does this key exist?" or to fetch a record
with a known key.  Hash tables do not allow users to ask for records
with keys that are close to a known key.
<p>B-trees are better for range-based searches, as when the application
needs to find all records with keys between some starting and ending
value.  B-trees also do a better job of exploiting <i>locality
of reference</i>.  If the application is likely to touch keys near each
other at the same time, the B-trees work well. The tree structure keeps
keys that are close together near one another in storage, so fetching
nearby values usually doesn't require a disk access.
<p>Record-number-based storage is natural for applications that need
to store and fetch records, but that do not have a simple way to
generate keys of their own. In a record number table, the record
number is the key for the record. Berkeley DB will can generate these
record numbers automatically.
<p>Queues are well-suited for applications that create records, and then
must deal with those records in creation order. A good example is
on-line purchasing systems. Orders can enter the system at any time,
but should generally be filled in the order in which they were placed.
<h3>Data management services</h3>
<p>Berkeley DB offers important data management services, including concurrency,
transactions, and recovery. All of these services work on all of the
storage structures.
<p>Many users can work on the same database concurrently. Berkeley DB handles
locking transparently, ensuring that two users working on the same
record do not interfere with one another.
<p>The library provides strict ACID transaction semantics. Some systems
allow the user to relax, for example, the isolation guarantees that the
database system makes. Berkeley DB ensures that all applications can see only
committed updates.
<p>Multiple operations can be grouped into a single transaction, and can
be committed or rolled back atomically. Berkeley DB uses a technique called
<i>two-phase locking</i> to be sure that concurrent transactions
are isolated from one another, and a technique called
<i>write-ahead logging</i> to guarantee that committed changes
survive application, system, or hardware failures.
<p>When an application starts up, it can ask Berkeley DB to run recovery.
Recovery restores the database to a clean state, with all committed
changes present, even after a crash. The database is guaranteed to be
consistent and all committed changes are guaranteed to be present when
recovery completes.
<p>An application can specify, when it starts up, which data management
services it will use. Some applications need fast,
single-user, non-transactional B-tree data storage. In that case, the
application can disable the locking and transaction systems, and will
not incur the overhead of locking or logging. If an application needs
to support multiple concurrent users, but doesn't need transactions, it
can turn on locking without transactions. Applications that need
concurrent, transaction-protected database access can enable all of the
subsystems.
<p>In all these cases, the application uses the same function-call API to
fetch and update records.
<h3>Design</h3>
<p>Berkeley DB was designed to provide industrial-strength database services to
application developers, without requiring them to become database
experts.  It is a classic C-library style <i>toolkit</i>, providing
a broad base of functionality to application writers.  Berkeley DB was designed
by programmers, for programmers: its modular design surfaces simple,
orthogonal interfaces to core services, and it provides mechanism (for
example, good thread support) without imposing policy (for example, the
use of threads is not required).  Just as importantly, Berkeley DB allows
developers to balance performance against the need for crash recovery
and concurrent use.  An application can use the storage structure that
provides the fastest access to its data and can request only the degree
of logging and locking that it needs.
<p>Because of the tool-based approach and separate interfaces for each
Berkeley DB subsystem, you can support a complete transaction environment for
other system operations. Berkeley DB even allows you to wrap transactions
around the standard UNIX file read and write operations!  Further, Berkeley DB
was designed to interact correctly with the native system's toolset, a
feature no other database package offers.  For example, Berkeley DB supports
hot backups (database backups while the database is in use), using
standard UNIX system utilities, e.g., dump, tar, cpio, pax or even cp.
<p>Finally, because scripting language interfaces are available for Berkeley DB
(notably Tcl and Perl), application writers can build incredibly powerful
database engines with little effort.  You can build transaction-protected
database applications using your favorite scripting languages, an
increasingly important feature in a world using CGI scripts to deliver
HTML.
<table><tr><td><br></td><td width="1%"><a href="../../ref/intro/terrain.html"><img src="../../images/prev.gif" alt="Prev"></a><a href="../../ref/toc.html"><img src="../../images/ref.gif" alt="Ref"></a><a href="../../ref/intro/dbisnot.html"><img src="../../images/next.gif" alt="Next"></a>
</td></tr></table>
<p><font size=1><a href="http://www.sleepycat.com">Copyright Sleepycat Software</a></font>
</body>
</html>