Design 9.95 KB
Newer Older
unknown's avatar
unknown committed
1
# $Id: Design,v 11.5 2002/02/01 19:07:18 bostic Exp $
unknown's avatar
unknown committed
2 3 4

Synchronization in the Locking Subsystem

unknown's avatar
unknown committed
5 6 7 8 9 10 11 12
This is a document that describes how we implemented fine-grain locking
in the lock manager (that is, locking on a hash bucket level instead of
locking the entire region).  We found that the increase in concurrency
was not sufficient to warrant the increase in complexity or the additional
cost of performing each lock operation.  Therefore, we don't use this
any more.  Should we have to do fine-grain locking in a future release,
this would be a reasonable starting point.

unknown's avatar
unknown committed
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
1. Data structures
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

The lock manager maintains 3 different structures:

Objects (__db_lockobj):
	Describes an object that is locked.  When used with DB, this consists
	of a __db_ilock (a file identifier and a page number).

Lockers (__db_locker):
	Identifies a specific locker ID and maintains the head of a list of
	locks held by a locker (for using during transaction commit/abort).

Locks (__db_lock):
	Describes a particular object lock held on behalf of a particular
	locker id.

Objects and Lockers reference Locks.

These structures are organized via two synchronized hash tables.  Each
hash table consists of two physical arrays: the array of actual hash
buckets and an array of mutexes so we can lock individual buckets, rather
than the whole table.

One hash table contains Objects and the other hash table contains Lockers.
Objects contain two lists of locks, waiters and holders: holders currently
hold a lock on the Object, waiters are lock waiting to be granted.
Lockers are a single linked list that connects the Locks held on behalf
of the specific locker ID.

In the diagram below:

Locker ID #1 holds a lock on Object #1 (L1) and Object #2 (L5), and is
waiting on a lock on Object #1 (L3).

Locker ID #2 holds a lock on Object #1 (L2) and is waiting on a lock for
Object #2 (L7).

Locker ID #3 is waiting for a lock on Object #2 (L6).

	OBJECT                   -----------------------
	HASH                     |                     |
                             ----|-------------        |
	________    _______  |   |   ________ |        |
	|	|-->| O1  |--|---|-->|  O2  | |        |
	|_______|   |_____|  |   |   |______| V        |
	|	|    W  H--->L1->L2   W  H--->L5       |	holders
	|_______|    |       |   |    |                V
	|	|    ------->L3  \    ------->L6------>L7	waiters
	|_______|           /     \            \
	.	.          /       \            \
	.	.          |        \            \
	.	.          |         \            -----------
	|_______|          |          --------------        |
	|	|      ____|____                ___|_____  _|______
	|_______|      |       |                |       |  |      |
	|	|      | LID1  |                |  LID2 |  | LID3 |
	|_______|      |_______|                |_______|  |______|
			   ^                        ^        ^
			   |                        |        |
			___|________________________|________|___
	       LOCKER	|    |    |    |    |    |    |    |    |
	       HASH	|    |    |    |    |    |    |    |    |
			|    |    |    |    |    |    |    |    |
			|____|____|____|____|____|____|____|____|

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
2. Synchronization
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

There are four types of mutexes in the subsystem.

Object mutexes;
	These map one-to-one to each bucket in the Object hash table.
	Holding a mutex on an Object bucket secures all the Objects in
	that bucket as well as the Lock structures linked from those
	Objects.  All fields in the Locks EXCEPT the Locker links (the
	links that attach Locks by Locker ID) are protected by these
	mutexes.

Locker mutexes:
	These map one-to-one to each bucket in the Locker hash table.
	Holding a mutex on a Locker bucket secures the Locker structures
	and the Locker links in the Locks.

Memory mutex:
	This mutex allows calls to allocate/free memory, i.e. calls to
	__db_shalloc and __db_shalloc_free, as well as manipulation of
	the Object, Locker and Lock free lists.

Region mutex:
	This mutex is currently only used to protect the locker ids.
	It may also be needed later to provide exclusive access to
	the region for deadlock detection.

Creating or removing a Lock requires locking both the Object lock and the
Locker lock (and eventually the shalloc lock to return the item to the
free list).

The locking hierarchy is as follows:

	The Region mutex may never be acquired after any other mutex.

	The Object mutex may be acquired after the Region mutex.

	The Locker mutex may be acquired after the Region and Object
	mutexes.

	The Memory mutex may be acquired after any mutex.

So, if both and Object mutex and a Locker mutex are going to be acquired,
the Object mutex must be acquired first.

The Memory mutex may be acquired after any other mutex, but no other mutexes
can be acquired once the Memory mutex is held.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
3. The algorithms:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
The locking subsystem supports four basic operations:
	Get a Lock (lock_get)

	Release a Lock (lock_put)

	Release all the Locks on a specific Object (lock_vec)

	Release all the Locks for a specific Locker (lock_vec)

Get a lock:
	Acquire Object bucket mutex.
	Acquire Locker bucket mutex.

	Acquire Memory mutex.
	If the Object does not exist
		Take an Object off the freelist.
	If the Locker doesn't exist
		Take a Locker off the freelist.
	Take a Lock off the free list.
	Release Memory mutex.

	Add Lock to the Object list.
	Add Lock to the Locker list.
	Release Locker bucket mutex

	If the lock cannot be granted
		Release Object bucket mutex
		Acquire lock mutex (blocks)

		Acquire Object bucket mutex
		If lock acquisition did not succeed (e.g, deadlock)
			Acquire Locker bucket mutex
			If locker should be destroyed
				Remove locker from hash table
				Acquire Memory mutex
				Return locker to free list
				Release Memory mutex
			Release Locker bucket mutex

			If object should be released
				Acquire Memory mutex
				Return object to free list
				Release Memory mutex

	Release Object bucket mutex

Release a lock:
	Acquire Object bucket mutex.
		(Requires that we be able to find the Object hash bucket
		without looking inside the Lock itself.)

	If releasing a single lock and the user provided generation number
	doesn't match the Lock's generation number, the Lock has been reused
	and we return failure.

	Enter lock_put_internal:
		if the Lock is still on the Object's lists:
			Increment Lock's generation number.
			Remove Lock from the Object's list (NULL link fields).
			Promote locks for the Object.

		Enter locker_list_removal
			Acquire Locker bucket mutex.
			If Locker doesn't exist:
				Release Locker bucket mutex
				Release Object bucket mutex
				Return error.
			Else if Locker marked as deleted:
				dont_release = TRUE
			Else
				Remove Lock from Locker list.
				If Locker has no more locks
					Remove Locker from table.
					Acquire Memory mutex.
					Return Locker to free list
					Release Memory mutex
			Release Locker bucket mutex.
		Exit locker_list_removal

		If (!dont_release)
			Acquire Memory mutex
			Return Lock to free list
			Release Memory mutex

	Exit lock_put_internal

	Release Object bucket mutex

Release all the Locks on a specific Object (lock_vec, DB_PUT_ALL_OBJ):

	Acquire Object bucket mutex.

	For each lock on the waiter list:
		lock_put_internal
	For each lock on the holder list:
		lock_put_internal

	Release Object bucket mutex.

Release all the Locks for a specific Locker (lock_vec, DB_PUT_ALL):

	Acquire Locker bucket mutex.
	Mark Locker deleted.
	Release Locker mutex.

	For each lock on the Locker's list:
		Remove from locker's list
			(The lock could get put back on the free list in
			lock_put and then could get reallocated and the
			act of setting its locker links could clobber us.)
		Perform "Release a Lock" above: skip locker_list_removal.

	Acquire Locker bucket mutex.
	Remove Locker
	Release Locker mutex.

	Acquire Memory mutex
	Return Locker to free list
	Release Memory mutex

Deadlock detection (lock_detect):

	For each bucket in Object table
		Acquire the Object bucket mutex.
		create waitsfor

	For each bucket in Object table
		Release the Object mutex.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
FAQ:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Q: Why do you need generation numbers?
A: If a lock has been released due to a transaction abort (potentially in a
   different process), and then lock is released by a thread of control
   unaware of the abort, the lock might have potentially been re-allocated
   to a different object.  The generation numbers detect this problem.

   Note, we assume that reads/writes of lock generation numbers are atomic,
   if they are not, it is theoretically possible that a re-allocated lock
   could be mistaken for another lock.

Q: Why is is safe to walk the Locker list without holding any mutexes at
   all?
A: Locks are created with both the Object and Locker bucket mutexes held.
   Once created, they removed in two ways:

   a) when a specific Lock is released, in which case, the Object and
   Locker bucket mutexes are again held, and

   b) when all Locks for a specific Locker Id is released.

   In case b), the Locker bucket mutex is held while the Locker chain is
   marked as "destroyed", which blocks any further access to the Locker
   chain.  Then, each individual Object bucket mutex is acquired when each
   individual Lock is removed.

Q: What are the implications of doing fine grain locking?

A: Since we no longer globally lock the entire region, lock_vec will no
   longer be atomic.  We still execute the items in a lock_vec in order,
   so things like lock-coupling still work, but you can't make any
   guarantees about atomicity.

Q: How do I configure for FINE_GRAIN locking?

A: We currently do not support any automatic configuration for FINE_GRAIN
   locking.  When we do, will need to document that atomicity discussion
   listed above (it is bug-report #553).