For speed, and to squash delicate code duplication, introduced a

micro-optimized BUCKET_SEARCH macro. Change _bucket_get() to use it (more later).

For speed, and to squash delicate code duplication, introduced a
micro-optimized BUCKET_SEARCH macro. Change _bucket_get() to use it (more later).
6413dc84 · Tim Peters · 2bdd3d57 · 6413dc84 · 6413dc84
Commit 6413dc84 authored Jun 08, 2002 by Tim Peters
Hide whitespace changes
Inline Side-by-side

Showing with 143 additions and 36 deletions

src/BTrees/BucketTemplate.c src/BTrees/BucketTemplate.c +56 -36

src/BTrees/Maintainer.txt src/BTrees/Maintainer.txt +87 -0

No files found.
--- a/src/BTrees/BucketTemplate.c
+++ b/src/BTrees/BucketTemplate.c
@@ -12,7 +12,41 @@
 ****************************************************************************/
-#define BUCKETTEMPLATE_C "$Id: BucketTemplate.c,v 1.35 2002/06/08 16:16:32 tim_one Exp $\n"
+#define BUCKETTEMPLATE_C "$Id: BucketTemplate.c,v 1.36 2002/06/08 19:46:18 tim_one Exp $\n"
+/* Use BUCKET_SEARCH to find the index at which a key belongs.
+ * INDEX    An int lvalue to hold the index i such that KEY belongs at
+ *          SELF->keys[i].  Note that this will equal SELF->len if KEY
+ *          is larger than the bucket's largest key.  Else it's the
+ *          smallest i such that SELF->keys[i] >= KEY.
+ * ABSENT   An int lvalue to hold a Boolean result, true (!= 0) if the
+ *          key is absent, false (== 0) if the key is at INDEX.
+ * SELF     A pointer to a Bucket node.
+ * KEY      The key you're looking for, of type KEY_TYPE.
+ * ONERROR  What to do if key comparison raises an exception; for example,
+ *          perhaps 'return NULL'.
+ *
+ * See Maintainer.txt for discussion:  this is optimized in subtle ways.
+ * It's recommended that you call this at the start of a routine, waiting
+ * to check for self->len == 0 after (if an empty bucket is special in
+ * context; INDEX becomes 0 and ABSENT becomes true if this macro is run
+ * with an empty SELF, and that may be all the invoker needs to know).
+ */
+#define BUCKET_SEARCH(INDEX, ABSENT, SELF, KEY, ONERROR) {  \
+    int _lo = 0;                                            \
+    int _hi = (SELF)->len;                                  \
+    int _i;                                                 \
+    int _cmp = 1;                                           \
+    for (_i = _hi >> 1; _lo < _hi; _i = (_lo + _hi) >> 1) { \
+        TEST_KEY_SET_OR(_cmp, (SELF)->keys[_i], (KEY))      \
+            ONERROR;                                        \
+        if      (_cmp < 0)  _lo = _i + 1;                   \
+        else if (_cmp == 0) break;                          \
+        else                _hi = _i;                       \
+    }                                                       \
+    (INDEX) = _i;                                           \
+    (ABSENT) = _cmp;                                        \
+}
 /*
 ** _bucket_get
@@ -41,45 +75,31 @@
 static PyObject *
 _bucket_get(Bucket *self, PyObject *keyarg, int has_key)
 {
-  int min, max, i, l, cmp, copied=1;
+    int i, cmp;
-  PyObject *r;
+    KEY_TYPE key;
-  KEY_TYPE key;
+    PyObject *r = NULL;
+    int copied = 1;
-  COPY_KEY_FROM_ARG(key, keyarg, copied);
-  UNLESS (copied) return NULL;
+    COPY_KEY_FROM_ARG(key, keyarg, copied);
+    UNLESS (copied) return NULL;
-  PER_USE_OR_RETURN(self, NULL);
+    PER_USE_OR_RETURN(self, NULL);
-  for (min=0, max=self->len, i=max/2, l=max; i != l; l=i, i=(min+max)/2)
-    {
+    BUCKET_SEARCH(i, cmp, self, key, goto Done);
-      TEST_KEY_SET_OR(cmp, self->keys[i], key) goto err;
+    if (has_key)
-      if (PyErr_Occurred()) goto err;
+    	r = PyInt_FromLong(cmp ? 0 : has_key);
+    else {
-      if (cmp < 0) min=i;
+        if (cmp == 0) {
-      else if (cmp == 0)
+            COPY_VALUE_TO_OBJECT(r, self->values[i]);
-	{
+        }
-	  if (has_key) r=PyInt_FromLong(has_key);
+        else
-	  else
+            PyErr_SetObject(PyExc_KeyError, keyarg);
-	    {
-              COPY_VALUE_TO_OBJECT(r, self->values[i]);
-	    }
-	  PER_ALLOW_DEACTIVATION(self);
-          PER_ACCESSED(self);
-	  return r;
-	}
-      else max=i;
    }
+Done:
  PER_ALLOW_DEACTIVATION(self);
  PER_ACCESSED(self);
-  if (has_key) return PyInt_FromLong(0);
+  return r;
-  PyErr_SetObject(PyExc_KeyError, keyarg);
-  return NULL;
-err:
-  PER_ALLOW_DEACTIVATION(self);
-  PER_ACCESSED(self);
-  return NULL;
 }
 static PyObject *

--- a/src/BTrees/Maintainer.txt
+++ b/src/BTrees/Maintainer.txt
@@ -266,3 +266,90 @@ Optimization points:
  immediate test-and-branch, and for a very unlikely case (BTree
  nodes are rarely empty).  It's better to get into the loop right
  away so the normal case makes progress ASAP.
+The BUCKET_SEARCH Macro
+=======================
+This has a different job than BTREE_SEARCH:  the key 0 slot is
+legitimate in a bucket, and we want to find the index at which the
+key belongs.  If the key is larger than the bucket's largest key, a
+new slot at index len is where it belongs, else it belongs at the
+smallest i with keys[i] >= the key we're looking for.  We also need
+to know whether or not the key is present (BTREE_SEARCH didn't care;
+it only wanted to find the next node to search).
+The mechanics of the search are quite similar, though.  The primary
+loop invariant changes to (say we're searching for key k):
+    K(lo-1) < k < K(hi)
+where K(i) means keys[i], and we pretend K(-1) is minus infinity and
+K(len) is plus infinity.
+If the bucket is empty, lo=hi=i=0 at the start, the loop body is never
+entered, and the macro sets INDEX to 0 and ABSENT to true.  That's why
+_cmp is initialized to 1 (_cmp becomes ABSENT).
+Else the bucket is not empty, lo<hi at the start, and the loop body
+is entered.  The invariant is obviously satisfied then, as lo=0 and
+hi=len.
+If K[i]<k, lo is set to i+1, preserving that K(lo-1) = K[i] < k.
+If K[i]>k, hi is set to i, preserving that K[hi] = K[i] > k.
+If the loop exits after either of those, _cmp != 0, so ABSENT becomes
+true.
+If K[i]=k, the loop breaks, so that INDEX becomes i, and ABSENT
+becomes false (_cmp=0 in this case).
+The same case analysis for BTREE_SEARCH on lo and hi holds here:
+a. (lo == i == hi) if and only if (lo   == hi).
+b. (lo == i  < hi) if and only if (lo+1 == hi).
+c. (lo  < i  < hi) if and only if (lo+1  < hi).
+So long as lo+1 < hi, we're in case #c, and either break with
+equality (in which case the right results are obviously computed) or
+narrow the range.  If equality doesn't obtain, the range eventually
+narrows to cases #a or #b.
+To go from #c to #a, we must have lo+2==hi at the start, and
+K[i]=K[lo+1]<k.  Then the new lo gets set to i+1 = lo+2 = hi, and the
+loop exits with lo=hi=i and _cmp<0.  This is correct, because we
+know that k != K(i) (loop invariant! we actually know something
+stronger, that k < K(hi); since i=hi, this implies k != K(i)).
+Else #c eventually falls into case #b, lo+1==hi and i==lo.  The
+invariant tells us K(lo-1) < k < K(hi) = K(lo+1), so if the key
+is present it must be at K(lo).  i==lo in this case, so we test
+K(lo) against k.  As always, if equality obtains we do the right
+thing, else case #b becomes case #a.
+When #b becomes #a, the last comparison was non-equal, so _cmp is
+non-zero, and the loop exits because lo==hi==i in case #a.  The
+invariant then tells us K(lo-1) < k < K(lo), so the key is in fact
+not present, it's correct to exit with _cmp non-zero, and i==lo is
+again the index at which k belongs.
+Optimization points:
+ As for BTREE_SEARCH, shifting of signed ints is cheaper than
+  division.
+ Unlike as for BTREE_SEARCH, there's nothing special about searching
+  an empty bucket, and the macro computes thoroughly sensible results
+  in that case.
+ The order of _cmp comparisons differs from BTREE_SEARCH.  When
+  searching a bucket, it's much more likely (than when searching a
+  BTree node) that the key is present, so testing __cmp==0 isn't a
+  systematic waste of cycles.  At the extreme, if all searches are
+  successful (key present), on average this saves one comparison per
+  search, against leaving the determination of _cmp==0 implicit (as
+  BTREE_SEARCH does).  But even on successful searches, __cmp != 0 is
+  a more popular outcome than __cmp == 0 across iterations (unless
+  the bucket has only a few keys), so it's important to check one
+  of the inequality cases first.  It turns out it's better on average
+  to check K(i) < key (than to check K(i) > key), because when it
+  pays it narrows the range more (we get a little boost from setting
+  lo=i+1 in this case; the other case sets hi=i, which isn't as much
+  of a narrowing).