Commit 6413dc84 authored by Tim Peters's avatar Tim Peters

For speed, and to squash delicate code duplication, introduced a

micro-optimized BUCKET_SEARCH macro.  Change _bucket_get() to use
it (more later).
parent 2bdd3d57
...@@ -12,7 +12,41 @@ ...@@ -12,7 +12,41 @@
****************************************************************************/ ****************************************************************************/
#define BUCKETTEMPLATE_C "$Id: BucketTemplate.c,v 1.35 2002/06/08 16:16:32 tim_one Exp $\n" #define BUCKETTEMPLATE_C "$Id: BucketTemplate.c,v 1.36 2002/06/08 19:46:18 tim_one Exp $\n"
/* Use BUCKET_SEARCH to find the index at which a key belongs.
* INDEX An int lvalue to hold the index i such that KEY belongs at
* SELF->keys[i]. Note that this will equal SELF->len if KEY
* is larger than the bucket's largest key. Else it's the
* smallest i such that SELF->keys[i] >= KEY.
* ABSENT An int lvalue to hold a Boolean result, true (!= 0) if the
* key is absent, false (== 0) if the key is at INDEX.
* SELF A pointer to a Bucket node.
* KEY The key you're looking for, of type KEY_TYPE.
* ONERROR What to do if key comparison raises an exception; for example,
* perhaps 'return NULL'.
*
* See Maintainer.txt for discussion: this is optimized in subtle ways.
* It's recommended that you call this at the start of a routine, waiting
* to check for self->len == 0 after (if an empty bucket is special in
* context; INDEX becomes 0 and ABSENT becomes true if this macro is run
* with an empty SELF, and that may be all the invoker needs to know).
*/
#define BUCKET_SEARCH(INDEX, ABSENT, SELF, KEY, ONERROR) { \
int _lo = 0; \
int _hi = (SELF)->len; \
int _i; \
int _cmp = 1; \
for (_i = _hi >> 1; _lo < _hi; _i = (_lo + _hi) >> 1) { \
TEST_KEY_SET_OR(_cmp, (SELF)->keys[_i], (KEY)) \
ONERROR; \
if (_cmp < 0) _lo = _i + 1; \
else if (_cmp == 0) break; \
else _hi = _i; \
} \
(INDEX) = _i; \
(ABSENT) = _cmp; \
}
/* /*
** _bucket_get ** _bucket_get
...@@ -41,45 +75,31 @@ ...@@ -41,45 +75,31 @@
static PyObject * static PyObject *
_bucket_get(Bucket *self, PyObject *keyarg, int has_key) _bucket_get(Bucket *self, PyObject *keyarg, int has_key)
{ {
int min, max, i, l, cmp, copied=1; int i, cmp;
PyObject *r; KEY_TYPE key;
KEY_TYPE key; PyObject *r = NULL;
int copied = 1;
COPY_KEY_FROM_ARG(key, keyarg, copied);
UNLESS (copied) return NULL; COPY_KEY_FROM_ARG(key, keyarg, copied);
UNLESS (copied) return NULL;
PER_USE_OR_RETURN(self, NULL);
PER_USE_OR_RETURN(self, NULL);
for (min=0, max=self->len, i=max/2, l=max; i != l; l=i, i=(min+max)/2)
{ BUCKET_SEARCH(i, cmp, self, key, goto Done);
TEST_KEY_SET_OR(cmp, self->keys[i], key) goto err; if (has_key)
if (PyErr_Occurred()) goto err; r = PyInt_FromLong(cmp ? 0 : has_key);
else {
if (cmp < 0) min=i; if (cmp == 0) {
else if (cmp == 0) COPY_VALUE_TO_OBJECT(r, self->values[i]);
{ }
if (has_key) r=PyInt_FromLong(has_key); else
else PyErr_SetObject(PyExc_KeyError, keyarg);
{
COPY_VALUE_TO_OBJECT(r, self->values[i]);
}
PER_ALLOW_DEACTIVATION(self);
PER_ACCESSED(self);
return r;
}
else max=i;
} }
Done:
PER_ALLOW_DEACTIVATION(self); PER_ALLOW_DEACTIVATION(self);
PER_ACCESSED(self); PER_ACCESSED(self);
if (has_key) return PyInt_FromLong(0); return r;
PyErr_SetObject(PyExc_KeyError, keyarg);
return NULL;
err:
PER_ALLOW_DEACTIVATION(self);
PER_ACCESSED(self);
return NULL;
} }
static PyObject * static PyObject *
......
...@@ -266,3 +266,90 @@ Optimization points: ...@@ -266,3 +266,90 @@ Optimization points:
immediate test-and-branch, and for a very unlikely case (BTree immediate test-and-branch, and for a very unlikely case (BTree
nodes are rarely empty). It's better to get into the loop right nodes are rarely empty). It's better to get into the loop right
away so the normal case makes progress ASAP. away so the normal case makes progress ASAP.
The BUCKET_SEARCH Macro
=======================
This has a different job than BTREE_SEARCH: the key 0 slot is
legitimate in a bucket, and we want to find the index at which the
key belongs. If the key is larger than the bucket's largest key, a
new slot at index len is where it belongs, else it belongs at the
smallest i with keys[i] >= the key we're looking for. We also need
to know whether or not the key is present (BTREE_SEARCH didn't care;
it only wanted to find the next node to search).
The mechanics of the search are quite similar, though. The primary
loop invariant changes to (say we're searching for key k):
K(lo-1) < k < K(hi)
where K(i) means keys[i], and we pretend K(-1) is minus infinity and
K(len) is plus infinity.
If the bucket is empty, lo=hi=i=0 at the start, the loop body is never
entered, and the macro sets INDEX to 0 and ABSENT to true. That's why
_cmp is initialized to 1 (_cmp becomes ABSENT).
Else the bucket is not empty, lo<hi at the start, and the loop body
is entered. The invariant is obviously satisfied then, as lo=0 and
hi=len.
If K[i]<k, lo is set to i+1, preserving that K(lo-1) = K[i] < k.
If K[i]>k, hi is set to i, preserving that K[hi] = K[i] > k.
If the loop exits after either of those, _cmp != 0, so ABSENT becomes
true.
If K[i]=k, the loop breaks, so that INDEX becomes i, and ABSENT
becomes false (_cmp=0 in this case).
The same case analysis for BTREE_SEARCH on lo and hi holds here:
a. (lo == i == hi) if and only if (lo == hi).
b. (lo == i < hi) if and only if (lo+1 == hi).
c. (lo < i < hi) if and only if (lo+1 < hi).
So long as lo+1 < hi, we're in case #c, and either break with
equality (in which case the right results are obviously computed) or
narrow the range. If equality doesn't obtain, the range eventually
narrows to cases #a or #b.
To go from #c to #a, we must have lo+2==hi at the start, and
K[i]=K[lo+1]<k. Then the new lo gets set to i+1 = lo+2 = hi, and the
loop exits with lo=hi=i and _cmp<0. This is correct, because we
know that k != K(i) (loop invariant! we actually know something
stronger, that k < K(hi); since i=hi, this implies k != K(i)).
Else #c eventually falls into case #b, lo+1==hi and i==lo. The
invariant tells us K(lo-1) < k < K(hi) = K(lo+1), so if the key
is present it must be at K(lo). i==lo in this case, so we test
K(lo) against k. As always, if equality obtains we do the right
thing, else case #b becomes case #a.
When #b becomes #a, the last comparison was non-equal, so _cmp is
non-zero, and the loop exits because lo==hi==i in case #a. The
invariant then tells us K(lo-1) < k < K(lo), so the key is in fact
not present, it's correct to exit with _cmp non-zero, and i==lo is
again the index at which k belongs.
Optimization points:
+ As for BTREE_SEARCH, shifting of signed ints is cheaper than
division.
+ Unlike as for BTREE_SEARCH, there's nothing special about searching
an empty bucket, and the macro computes thoroughly sensible results
in that case.
+ The order of _cmp comparisons differs from BTREE_SEARCH. When
searching a bucket, it's much more likely (than when searching a
BTree node) that the key is present, so testing __cmp==0 isn't a
systematic waste of cycles. At the extreme, if all searches are
successful (key present), on average this saves one comparison per
search, against leaving the determination of _cmp==0 implicit (as
BTREE_SEARCH does). But even on successful searches, __cmp != 0 is
a more popular outcome than __cmp == 0 across iterations (unless
the bucket has only a few keys), so it's important to check one
of the inequality cases first. It turns out it's better on average
to check K(i) < key (than to check K(i) > key), because when it
pays it narrows the range more (we get a little boost from setting
lo=i+1 in this case; the other case sets hi=i, which isn't as much
of a narrowing).
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment