Commit 0e829874 authored by Kirill Smelkov's avatar Kirill Smelkov

wcfs: tests: package for inspecting/manipulating internal structure of BTrees

To handle invalidations, WCFS will need to detect changes to both ZBlk
objects and to ZBigFile.blktab BTree that is mapping file blocks to ZBlk
objects. And with BTree detecting changes is much more complex, because
when a BTree changes, it might be rebalanced, or keys migrated from one
tree/bucket node to another tree/bucket node. In other words a BTree
change might be not only a change to a {}key->value dictionary, but also
a change to BTree topology.

Because there are many BTree topologies that correspond to the same
{}key->value state, a change from kv₁ to kv₂, even if kv₁ and kv₂ are
close to each other, might be accompanied by a dramatic change to
topology of the tree. This creates a need for thoroughly testing the
BTree difference algorithm because many of BTree topologies changes are
tricky, and if a simple algorithm works on relatively stable topology
updates, it does not necessarily mean that that same algorithm will
continue to work correctly in the general case.

So, as a preparatory step, here comes package, that can be
used to inspect tree topologies, to create trees with specified topology
and to manipulate topology of an existing tree. This package will be
used in tests for upcoming ΔBtail.

For debugging, and also since those tests will involve both Go and
Python parts, it creates the need to be able to specify and exchange
topology of a tree via compact string. This package also defines so
called "topology encoding" to do so.

Some preliminar history:

fb56193f    X fix metric to keep Z <- N order stable over key^
809304d1    X "B:" indicates ø bucket with k&b, "B" - ø bucket with only keys
9eca74ec    X Teach AllStructs to emit topologies with values
1b962f03    X Restructure: found bug that it was not marking objects as modified
9181c5d9    X Restructure; verify that it marks as changed only modifed nodes
e9902c4a    X improve `xbtree topoview`

For the reference package documentation is quoted below.

---- 8< ----

Package xbtree provides utilities for inspecting/manipulating internal
structure of integer-keyed BTrees.

It will be primarily used to help verify ΔBTail in WCFS.

- `Tree` represents a tree node.
- `Bucket` represents a bucket node.
- `StructureOf` returns internal structure of ZODB BTree represented as Tree
  and Bucket nodes.
- `Restructure` reorganizes ZODB BTree instance according to specified topology

- `AllStructs` generates all possible BTree topology structures with given keys.

Topology encoding

Topology encoding provides way to represent structure of a Tree as path-like string.

TopoEncode converts Tree into its topology-encoded representation, while
TopoDecode decodes topology-encoded string back into Tree.

The following example illustrates topology encoding represented by string

      [ 3 ]             T3/         represents Tree([3])
       / \
     [ ] [ ]            T-T/        represents two empty Tree([])
      ↓   ↓
     |1|[ 5 ]           B1-T5/      represent Bucket([1]) and Tree([5])
         / \
        || |7|8|9|      B-B7,8,9    represents empty Bucket([]) and Bucket([7,8,9])

Topology encoding specification:

A Tree is encoded by level-order traversal, delimiting layers with "/".
Inside a layer Tree and Bucket nodes are signalled as

    "T<keys>"           ; Tree
    "B<keys>"           ; Bucket with only keys
    "B<keys+values>"    ; Bucket with keys and values

Keys are represented as ","-delimited list of integers. For example Tree
or Bucket with [1,3,5] keys are represented as

    "T1,3,5"        ; Tree([1,3,5])
    "B1,3,5"        ; Bucket([1,3,5])

Keys+values are represented as ","-delimited list of "<key>:<value>" pairs. For
example Bucket corresponding to {1:1, 2:4, 3:9} is represented as

    "B1:1,2:4,3:9"  ; Bucket([1,2,3], [1,4,9])

Empty keys+values are represented as ":" - an empty Bucket for key->value
mapping is represented as

    "B:"            ; Bucket([], [])

Nodes inside one layer are delimited with "-". For example a layer consisting
of an empty Tree, a Tree with [1,3] keys, and Bucket with [4,5] keys is
represented as

    "T-T1,3-B4,5"   ; layer with Tree([]), Tree([1,3]) and Bucket([4,5])

A layer consists of nodes that are followed by node-node links from upper layer
in left-to-right order.


The following visualization utilities are provided to help understand BTrees

- `topoview` displays BTree structure given its topology-encoded representation.
- `Tree.graphviz` returns Tree graph representation in dot language.
parent d81d2cbb
...@@ -346,7 +346,7 @@ setup( ...@@ -346,7 +346,7 @@ setup(
], ],
extras_require = { extras_require = {
'test': ['pytest'], 'test': ['pytest', 'scipy'],
}, },
cmdclass = {'build_ext': build_ext, cmdclass = {'build_ext': build_ext,
# -*- coding: utf-8 -*-
# Copyright (C) 2020-2021 Nexedi SA and Contributors.
# Kirill Smelkov <>
# This program is free software: you can Use, Study, Modify and Redistribute
# it under the terms of the GNU General Public License version 3, or (at your
# option) any later version, as published by the Free Software Foundation.
# You can also Link and Combine this program with other software covered by
# the terms of any of the Free Software licenses or any of the Open Source
# Initiative approved licenses and Convey the resulting work. Corresponding
# source of such a combination shall include the source code for all other
# software used.
# This program is distributed WITHOUT ANY WARRANTY; without even the implied
# See COPYING file for full licensing terms.
# See for rationale and options.
"""Package xbtree provides utilities for inspecting/manipulating internal
structure of integer-keyed BTrees.
It will be primarily used to help verify ΔBTail in WCFS.
- `Tree` represents a tree node.
- `Bucket` represents a bucket node.
- `StructureOf` returns internal structure of ZODB BTree represented as Tree
and Bucket nodes.
- `Restructure` reorganizes ZODB BTree instance according to specified topology
- `AllStructs` generates all possible BTree topology structures with given keys.
Topology encoding
Topology encoding provides way to represent structure of a Tree as path-like string.
TopoEncode converts Tree into its topology-encoded representation, while
TopoDecode decodes topology-encoded string back into Tree.
The following example illustrates topology encoding represented by string
[ 3 ] T3/ represents Tree([3])
/ \
[ ] [ ] T-T/ represents two empty Tree([])
↓ ↓
|1|[ 5 ] B1-T5/ represent Bucket([1]) and Tree([5])
/ \
|| |7|8|9| B-B7,8,9 represents empty Bucket([]) and Bucket([7,8,9])
Topology encoding specification:
A Tree is encoded by level-order traversal, delimiting layers with "/".
Inside a layer Tree and Bucket nodes are signalled as
"T<keys>" ; Tree
"B<keys>" ; Bucket with only keys
"B<keys+values>" ; Bucket with keys and values
Keys are represented as ","-delimited list of integers. For example Tree
or Bucket with [1,3,5] keys are represented as
"T1,3,5" ; Tree([1,3,5])
"B1,3,5" ; Bucket([1,3,5])
Keys+values are represented as ","-delimited list of "<key>:<value>" pairs. For
example Bucket corresponding to {1:1, 2:4, 3:9} is represented as
"B1:1,2:4,3:9" ; Bucket([1,2,3], [1,4,9])
Empty keys+values are represented as ":" - an empty Bucket for key->value
mapping is represented as
"B:" ; Bucket([], [])
Nodes inside one layer are delimited with "-". For example a layer consisting
of an empty Tree, a Tree with [1,3] keys, and Bucket with [4,5] keys is
represented as
"T-T1,3-B4,5" ; layer with Tree([]), Tree([1,3]) and Bucket([4,5])
A layer consists of nodes that are followed by node-node links from upper layer
in left-to-right order.
The following visualization utilities are provided to help understand BTrees
- `topoview` displays BTree structure given its topology-encoded representation.
- `Tree.graphviz` returns Tree graph representation in dot language.
from __future__ import print_function, absolute_import
from BTrees import check as zbcheck
from golang import func, panic, defer
from golang.gcompat import qq
import itertools
import re
inf = float('inf')
import numpy as np
import scipy.optimize
import copy
# Tree represents a tree node.
class Tree(object):
# .keyv () of keys
# .children () of children len(.children) == len(.keyv) + 1
def __init__(t, keyv, *children):
keyv = tuple(keyv)
assert len(children) == len(keyv) + 1, (keyv, children)
if len(children) > 0:
# assert all children are of the same type
childtypes = set([type(_) for _ in children])
if len(childtypes) != 1:
panic("children are of distinct types: %s" % (childtypes,))
# assert type(child) is Tree | Bucket
childtype = childtypes.pop()
assert childtype in (Tree, Bucket), childtype
# assert children keys are consistent
v = (-inf,) + keyv + (+inf,)
for i, (klo, khi) in enumerate(zip(v[:-1], v[1:])): # (klo, khi) = [] of (k_i, k_{i+1})
for k in children[i].keyv:
if not (klo <= k < khi):
panic("children[%d] key %d is outside key range [%s, %s)" % (i, k, klo, khi))
t.keyv = keyv
t.children = tuple(children)
def __hash__(t):
return hash(t.keyv) ^ hash(t.children)
def __ne__(a, b):
return not (a == b)
def __eq__(a, b):
if not isinstance(b, Tree):
return False
return (a.keyv == b.keyv) and (a.children == b.children)
def __str__(t):
s = "T([" + ",".join(['%s' % _ for _ in t.keyv]) + "]"
for ch in t.children:
s += ",\n"
s += _indent(' '*4, str(ch))
s += ")"
return s
__repr__ = __str__
# copy returns a deep copy of the tree.
# if onlyKeys=Y buckets in returned tree will contain only keys not values.
def copy(t, onlyKeys=False):
return Tree(t.keyv, *[_.copy(onlyKeys) for _ in t.children])
# firstbucket returns Bucket reachable through leftmost child links.
def firstbucket(t):
child0 = t.children[0]
if isinstance(child0, Bucket):
return child0
assert isinstance(child0, Tree)
return child0.firstbucket()
# Bucket represents a bucket node.
class Bucket(object):
# .keyv () of keys
# .valuev None | () of values len(.valuev) == len(.keyv)
def __init__(b, keyv, valuev):
b.keyv = tuple(keyv)
if valuev is None:
b.valuev = None
assert len(valuev) == len(keyv)
b.valuev = tuple(valuev)
def __hash__(b):
return hash(b.keyv) ^ hash(b.valuev)
def __ne__(a, b):
return not (a == b)
def __eq__(a, b):
if not isinstance(b, Bucket):
return False
return (a.keyv == b.keyv and a.valuev == b.valuev)
def __str__(b):
if b.valuev is None:
kvv = ['%s' % k for k in b.keyv]
assert len(b.keyv) == len(b.valuev)
if len(b.keyv) == 0:
kvv = [':']
kvv = ['%s:%s' % (k,v) for (k,v) in zip(b.keyv, b.valuev)]
return "B" + ','.join(kvv)
__repr__ = __str__
def copy(b, onlyKeys=False):
if onlyKeys:
return Bucket(b.keyv, None)
return Bucket(b.keyv, b.valuev)
# StructureOf returns internal structure of a ZODB BTree.
# The structure is represented as Tree and Bucket nodes.
# If onlyKeys=Y values of the tree are not represented in returned structure.
def StructureOf(znode, onlyKeys=False):
ztype = _zclassify(znode)
if ztype.is_zbucket:
keys, values = zbcheck.crack_bucket(znode, ztype.is_map)
if (not ztype.is_map) or onlyKeys:
return Bucket(keys, None)
return Bucket(keys, values)
if ztype.is_ztree:
kind, keys, children = zbcheck.crack_btree(znode, ztype.is_map)
if kind == zbcheck.BTREE_EMPTY:
return Tree([], Bucket([], None if ((not ztype.is_map) or onlyKeys) else []))
if kind == zbcheck.BTREE_ONE:
b = znode._bucket_type()
b.__setstate__(keys) # it is keys+values for BTREE_ONE case
return Tree([], StructureOf(b, onlyKeys))
if kind == zbcheck.BTREE_NORMAL:
return Tree(keys, *[StructureOf(_, onlyKeys) for _ in children])
panic("bad tree kind %r" % kind)
# Restructure reorganizes ZODB BTree instance (not Tree) according to specified
# topology structure.
# The new structure should be usually given with key-only buckets.
# If new structure comes with values, values associated with keys must not be changed.
# NOTE ZODB BTree package does not tolerate structures with empty BTree nodes
# except for the sole single case of empty tree.
def Restructure(ztree, newStructure):
_ = _zclassify(ztree)
assert _.is_ztree
assert isinstance(newStructure, Tree)
newStructOnlyKeys = newStructure.copy(onlyKeys=True)
from_ = TopoEncode(StructureOf(ztree, onlyKeys=True))
to_ = TopoEncode(newStructOnlyKeys)
def _():
exc = recover() # FIXME for panic - returns unwrapped arg, not PanicError
if exc is not None:
# FIXME %w creates only .Unwrap link, not with .__cause__
raise fmt.Errorf("Restructure %s -> %s: %w", from_, to_, exc)
def _(): # XXX hack
exc = sys.exc_info()[1]
if exc is not None:
assert len(exc.args) == 1
exc.args = ("Restructure %s -> %s: %r" % (from_, to_, exc.args[0]),)
ztreeType = type(ztree)
zbucketType = ztreeType._bucket_type
zcheck(ztree) # verify ztree before our tweaks
# {} with all k->v from ztree
kv = dict(ztree)
# walk original and new structures level by level.
# push buckets till the end, then
# for each level we have A1...An "old" nodes, and B1...Bm "new" nodes.
# if n == m - map A-B 1-to-1
# if n < m - we have to "insert" (m-n) new nodes
# if n > m - we have to "forget" (n-m) old nodes.
# The process of insert/forget is organized as follows:
# every node from B set will in the end be associated to a node from A set.
# to make this association we:
# - compute D(Ai,Bj) where D is distance in between ranges
# - find solution to linear assignment problem A <- B with the cost given by D
# 2 2
# D(A,B) = (A.lo - B.lo) + (A.hi - B.hi)
# we will modify nodes from new set:
# - node.Z will point to associated znode
# - bucket.next_bucket will point to bucket that is coming with next keys in the tree
tnew = newStructure.copy() # NOTE _with_ values
# assign assigns tree nodes from RNv to ztree nodes from RZv in optimal way.
# Bj ∈ RNv is mapped into Ai ∈ RZv such that that sum_j D(A_i, Bj) is minimal.
# it is used to associate nodes in tnew to ztree nodes.
def assign(RZv, RNv):
#print('assign %s <- %s' % (RZv, RNv))
for rzn in RZv:
assert isinstance(rzn, _NodeInRange)
assert isinstance(rzn.node, (ztreeType, zbucketType))
for rn in RNv:
assert isinstance(rn, _NodeInRange)
assert isinstance(rn.node, (Tree,Bucket))
assert not hasattr(rn.node, 'Z')
rn.node.Z = None
# D(a,b)
def D(a, b):
def fin(v): # TeX hack: inf = 10000
if v == +inf: return +1E4
if v == -inf: return -1E4
assert abs(v) < 1E3
return v
def d2(x,y):
return (fin(x)-fin(y))**2
return d2(a.range.klo, b.range.klo) + \
d2(a.range.khi, b.range.khi)
# cost matrix
C = np.zeros((len(RNv), len(RZv)))
for j in range(len(RNv)): # "workers"
for i in range(len(RZv)): # "jobs"
C[j,i] = D(RZv[i], RNv[j])
# find the assignments.
# TODO try to avoid scipy dependency; we could also probably make
# assignments more efficiently taking into account that Av and Bv are
# key↑ and the property of D function is so that if B2 > B1 (all keys
# in B2 > all keys in B1) and A < B1.hi, then D(B2, A) > D(B1, A).
jv, iv = scipy.optimize.linear_sum_assignment(C)
for (j,i) in zip(jv, iv):
RNv[j].node.Z = RZv[i].node
# if N(old) > N(new) - some old nodes won't be linked to (it is ok)
# if N(old) < N(new) - some new nodes won't be linked from - link them
# to newly created Z nodes.
for j2 in range(len(RNv)):
if j2 not in jv:
n = RNv[j2].node
assert n.Z is None
assert isinstance(n, (Tree,Bucket))
if isinstance(n, Tree):
zn = ztreeType()
zn = zbucketType()
n.Z = zn
if len(RZv) == len(RNv):
# assert the result is 1-1 mapping
for j in range(len(RNv)):
if RNv[j].node.Z is not RZv[j].node:
panic("BUG: assign: not 1-1 mapping:\n RZv: %s\nRNv: %s" % (RZv, RNv))
# XXX assert assignments are in key↑ order ?
zrlevelv = list(__zwalkBFS(ztree)) # [] of _NodeInRange
rlevelv = list( __walkBFS(tnew)) # [] of _NodeInRange
# associate every non-bucket node in tnew to a znode,
# extract bucket nodes.
zrbucketv = [] # of _NodeInRange
rbucketv = [] # of _NodeInRange
rlevelv_orig = copy.copy(rlevelv)
while len(rlevelv) > 0 or len(zrlevelv) > 0:
rlevel = []
zrlevel = []
if len(rlevelv) > 0:
rlevel = rlevelv .pop(0)
if len(zrlevelv) > 0:
zrlevel = zrlevelv.pop(0)
# filter-out buckets to be processed in the end
_, zrlevel = _filter2(zrlevel, lambda zrn: isinstance(zrn.node, zbucketType))
_, rlevel = _filter2(rlevel, lambda rn: isinstance(rn.node, Bucket))
if len(rlevel) == 0:
# associate nodes to znodes
assign(zrlevel, rlevel)
assert zrlevelv == []
assert rlevelv == []
# order queued buckets and zbuckets by key↑
zrbucketv.sort(key = lambda rn: rn.range.klo)
rbucketv .sort(key = lambda rn: rn.range.klo)
# verify that old keys == new keys
zkeys = set()
keys = set()
for _ in zrbucketv: zkeys.update(_.node.keys())
for _ in rbucketv: keys.update(_.node.keyv)
assert set(kv.keys()) == zkeys, (set(kv.keys()), zkeys)
if zkeys != keys:
raise ValueError("new keys != old keys\ndiff: %s" % zkeys.symmetric_difference(keys))
# chain buckets via .next_bucket
assert len(rbucketv) > 0
for i in range(len(rbucketv)-1):
assert rbucketv[i].range.khi <= rbucketv[i+1].range.klo
rbucketv[i].node.next_bucket = rbucketv[i+1].node
rbucketv[-1].node.next_bucket = None
# associate buckets to zbuckets
assign(zrbucketv, rbucketv)
# set znode states according to established tnew->znode association
rlevelv = rlevelv_orig
for rnodev in reversed(rlevelv): # bottom -> up
for rn in rnodev:
node = rn.node
assert isinstance(node, (Tree,Bucket))
if isinstance(node, Tree):
# special case for empty tree and tree with just one bucket without oid.
# we cannot special case e.g. T/T/B because on loading ZODB
# wants to have !NULL T.firstbucket, and so if T/B is
# represented as empty or as T with embedded B, for top-level T
# there won't be a way to link to firstbucket and
# Tree.__setstate__ will raise "TypeError: No firstbucket in
# non-empty BTree".
zstate = unset = object()
if len(node.keyv) == 0:
child = node.children[0]
if len(rlevelv) == 2: # only 2 levels
assert len(rlevelv[0]) == 1 # top-one has empty tree
assert rlevelv[0][0].node is node
assert isinstance(child, Bucket)# bottom-one has 1 bucket
assert len(rlevelv[1]) == 1
assert rlevelv[1][0].node is child
# empty bucket -> empty tree
if len(child.keyv) == 0:
zstate = None
# tree with single bucket without oid -> tree with embedded bucket
elif child.Z._p_oid is None:
zstate = ((child.Z.__getstate__(),),)
# more than 2 levels. For .../T/B B._p_oid must be set - else
# T.__getstate__ will embed B instead of preserving what we
# pass into T.__setstate__
if isinstance(child, Bucket) and child.Z._p_oid is None:
if ztree._p_jar is None:
raise ValueError("Cannot generate .../T/B topology not under ZODB connection")
assert child.Z._p_oid is not None
if zstate is unset:
# normal tree node
zstate = ()
assert len(node.children) == len(node.keyv) + 1
zstate += (node.children[0].Z,)
for (child, k) in zip(node.children[1:], node.keyv):
zstate += (k, child.Z) # (child0, k0, child1, k1, ..., childN, kN, childN+1)
zstate = (zstate,)
# firstbucket
#print(' (firstbucket -> B %x)' % (id(node.firstbucket().Z),))
zstate += (node.firstbucket().Z,)
if isinstance(node, Bucket):
# if Bucket was specified with values - verify k->v do not
# change from what was in original tree.
if node.valuev is not None:
assert len(node.keyv) == len(node.valuev)
for (k,v) in zip(node.keyv, node.valuev):
if kv[k] is not v:
raise ValueError("target bucket changes [%d] %r -> %r" % (k, kv[k], v))
zstate = ()
for k in node.keyv:
zstate += (k, kv.pop(k)) # (k1, v1, k2, v2, ..., kN, vN)
zstate = (zstate,)
if node.next_bucket is not None: # next
zstate += (node.next_bucket.Z,)
#print('%s %x: ZSTATE: %r' % ('T' if _zclassify(node.Z).is_ztree else 'B', id(node.Z), zstate,))
zstate_old = node.Z.__getstate__()
if zstate_old != zstate:
node.Z._p_changed = True
zstate2 = node.Z.__getstate__()
if zstate2 != zstate:
panic("BUG: node.__getstate__ returns not what "
"we passed into node.__setstate__.\nnode: %r\n"
"__setstate__ <- %r\n__getstate__ -> %r" % (node.Z, zstate, zstate2))
assert tnew.Z is ztree
assert len(kv) == 0 # all keys must have been popped
zcheck(ztree) # verify ztree after our tweaks
tstruct = StructureOf(ztree, onlyKeys=True)
if tstruct != newStructOnlyKeys:
panic("BUG: result structure is not what was"
" requested:\n%s\n\nwant:\n%s" % (tstruct, newStructOnlyKeys))
# AllStructs generates subset of all possible BTree structures for BTrees with
# specified keys and btree depth up-to maxdepth. Each tree node is split by
# up-to maxsplit points.
# kv is {} that defines values to be linked from buckets.
# By default kv=None and generated trees contains key-only Buckets.
# If allowEmptyBuckets=y tree structures with empty buckets are also generated.
# By default, except for empty-tree case, only tree structures with non-empty
# buckets are generated, because ZODB BTree package misbehaves if it finds an
# empty bucket in a trees. ZODB.BTree._check also asserts if bucket is empty.
def AllStructs(keys, maxdepth, maxsplit, allowEmptyBuckets=False, kv=None): # -> i[] of Tree
assert isinstance(maxdepth, int); assert maxdepth >= 0
assert isinstance(maxsplit, int); assert maxsplit >= 0
ks = set(keys)
for k in keys:
assert isinstance(k, int)
assert k in ks # no duplicates
keyv = list(keys)
# initial [lo, hi) covering keys and such that split points will be there withing +-1 of min/max key
if len(keyv) > 0:
klo = keyv[0] - 1 - 1
khi = keyv[-1] + 1 + 1 # hi is ")", not "]"
# the only possible case for empty tree is T/B
if not allowEmptyBuckets:
yield Tree([], Bucket([], None if kv is None else []))
# XXX ok? (ideally should be -inf,+inf)
klo = 0
khi = 0
for tree in _allStructs(klo, khi, keyv, maxdepth, maxsplit, allowEmptyBuckets, kv):
yield tree
def _allStructs(klo, khi, keyv, maxdepth, maxsplit, allowEmptyBuckets, kv):
assert klo <= khi
if len(keyv) > 0:
assert klo <= keyv[0]
assert keyv[-1] < khi
#print('_allStructs [%s, %s) keyv: %r, maxdepth=%d, maxsplit=%d' %
# (klo, khi, keyv, maxdepth, maxsplit))
for nsplit in range(0, maxsplit+1):
if not allowEmptyBuckets:
iksplitv = _iterSplitKeyvByN(klo, khi, keyv, nsplit)
iksplitv = _iterSplitByN(klo, khi, nsplit)
for ksplitv in iksplitv:
# ksplitv = [klo, s1, s2, ..., sN, khi]
#print('ksplitv: %r' % ksplitv)
# emit Tree -> Buckets
children = []
for (xlo, xhi) in zip(ksplitv[:-1], ksplitv[1:]): # (klo, s1), (s1, s2), ..., (sN, khi)
bkeyv = _keyvSliceBy(keyv, xlo, xhi)
if not allowEmptyBuckets:
assert len(bkeyv) > 0
valuev = None
if kv is not None:
valuev = [kv[k] for k in bkeyv]
children.append(Bucket(bkeyv, valuev))
yield Tree(ksplitv[1:-1], *children) # (s1, s2, ..., sN)
# emit Tree -> Trees -> ...
if maxdepth == 0:
ichildrenv = [] # of _allStructs for each child link
for (xlo, xhi) in zip(ksplitv[:-1], ksplitv[1:]): # (klo, s1), (s1, s2), ..., (sN, khi)
ckeyv = _keyvSliceBy(keyv, xlo, xhi)
if not allowEmptyBuckets:
assert len(ckeyv) > 0
ichildrenv.append( _allStructs(
xlo, xhi, ckeyv, maxdepth - 1, maxsplit, allowEmptyBuckets, kv))
for children in itertools.product(*ichildrenv):
yield Tree(ksplitv[1:-1], *children) # (s1, s2, ..., sN)
# _keyvSliceBy returns [] of keys from keyv : k ∈ [klo, khi)
def _keyvSliceBy(keyv, klo, khi):
assert klo <= khi
return list([k for k in keyv if (klo <= k < khi)])
# _iterSplitByN iterates through all nsplit splitting of [lo, hi) range.
# lo < si < s_{i+1} < hi
def _iterSplitByN(lo, hi, nsplit): # -> i[] of [lo, s1, s2, ..., sn, hi)
assert lo <= hi
assert nsplit >= 0
if nsplit == 0:
yield [lo, hi]
for s in range(lo+1, hi): # [lo+1, hi-1]
for tail in _iterSplitByN(s, hi, nsplit-1):
yield [lo] + tail
# _iterSplitKeyvByN is similar to _iterSplitByN but makes sure that every
# splitted range contains at least one key from keyv.
def _iterSplitKeyvByN(lo, hi, keyv, nsplit): # -> i[] of [lo, s1, s2, ..., sn, hi)
#print('_iterSplitKeyvByN [%s, %s) keyv=%r nsplit=%r' % (lo, hi, keyv, nsplit))
assert lo <= hi
assert 0 <= nsplit
assert lo <= keyv[0]
assert keyv[-1] < hi
if nsplit >= len(keyv):
return # no split exists
if nsplit == 0:
yield [lo, hi]
for i in range(len(keyv)-nsplit):
for s in range(keyv[i]+1, keyv[i+1]+1): # (ki, k_{i+1}]
for tail in _iterSplitKeyvByN(s, hi, keyv[i+1:], nsplit-1):
yield [lo] + tail
# ---- treewalk ----
# _Range represents a range under which a node is placed in its tree.
class _Range:
# .klo
# .khi
def __init__(r, klo, khi):
assert klo <= khi
r.klo = klo
r.khi = khi
def __hash__(r):
return hash(r.klo) ^ hash(r.khi)
def __ne__(a, b):
return not (a == b)
def __eq__(a, b):
if not isinstance(b, _Range):
return False
return (a.klo == b.klo) and (a.khi == b.khi)
def __str__(r):
return "[%s, %s)" % (r.klo, r.khi)
__repr__ = __str__
# _NodeInRange represents a node (node or znode) coming under key range in its tree.
class _NodeInRange:
# .range
# .node
def __init__(nr, r, node):
nr.range = r
nr.node = node
def __str__(nr):
return "%s%s" % (nr.range, nr.node)
__repr__ = __str__
# _walkBFS walks tree in breadth-first order layer by layer.
def _walkBFS(tree): # i[] of [](of nodes on each level)
for level in __walkBFS(tree):
yield tuple(rn.node for rn in level)
# _zwalkBFS, similarly to _walkBFS, walks ZODB BTree in breadth-first order layer by layer.
def _zwalkBFS(ztree): # i[] of [](of nodes on each level)
for zlevel in __zwalkBFS(ztree):
yield tuple(rn.node for rn in zlevel)
def __walkBFS(tree): # i[] of [](of _NodeInRange on each level)
assert isinstance(tree, Tree)
currentq = []
nextq = [_NodeInRange(_Range(-inf,+inf), tree)]
while len(nextq) > 0:
yield tuple(nextq)
currentq = nextq
nextq = []
while len(currentq) > 0:
rn = currentq.pop(0)
assert isinstance(rn.node, (Tree, Bucket))
if isinstance(rn.node, Tree):
v = (rn.range.klo,) + rn.node.keyv + (rn.range.khi,)
rv = zip(v[:-1], v[1:]) # (klo,k1), (k1,k2), ..., (kN,khi)
assert len(rv) == len(rn.node.children)
for i in range(len(rv)):
nextq.append(_NodeInRange(_Range(*rv[i]), rn.node.children[i]))
def __zwalkBFS(ztree): # i[] of [](of _NodeInRange on each level)
_ = _zclassify(ztree)
assert _.is_ztree
ztreeType = type(ztree)
zbucketType = ztreeType._bucket_type
currentq = []
nextq = [_NodeInRange(_Range(-inf,+inf), ztree)]
while len(nextq) > 0:
yield tuple(nextq)
currentq = nextq
nextq = []
while len(currentq) > 0:
rn = currentq.pop(0)
znode = rn.node
ztype = _zclassify(znode)
assert ztype.is_ztree or ztype.is_zbucket
if ztype.is_zbucket:
assert type(znode) is zbucketType
if ztype.is_ztree:
assert type(znode) is ztreeType
kind, keyv, kids = zbcheck.crack_btree(znode, ztype.is_map)
if kind == zbcheck.BTREE_EMPTY:
b = znode._bucket_type()
children = [b]
elif kind == zbcheck.BTREE_ONE:
b = znode._bucket_type()
keyv = []
children = [b]
elif kind == zbcheck.BTREE_NORMAL:
children = kids
panic("bad tree kind %r" % kind)
v = [rn.range.klo] + keyv + [rn.range.khi]
rv = zip(v[:-1], v[1:]) # (klo,k1), (k1,k2), ..., (kN,khi)
assert len(rv) == len(children)
for i in range(len(rv)):
nextq.append(_NodeInRange(_Range(*rv[i]), children[i]))
# ---- topology encoding ----
# TopoEncode returns topology encoding for internal structure of the tree.
# Vencode specifies way to encode values referred-to by buckets.
# See top-level docstring for description of topology encoding.
def TopoEncode(tree, vencode=lambda v: '%d' % v):
assert isinstance(tree, Tree)
topo = ''
# vdecode to be used in the verification at the end
vencoded = {} # vencode(vobj) -> vobj
def vdecode(vtxt):
return vencoded[vtxt]
# breadth-first traversal of the tree with '/' injected in between layers
for nodev in _walkBFS(tree):
if len(topo) != 0:
topo += '/'
tnodev = []
for node in nodev:
assert isinstance(node, (Tree, Bucket))
tnode = ('T' if isinstance(node, Tree) else 'B')
if isinstance(node, Bucket) and node.valuev is not None:
# bucket with key and values
assert len(node.keyv) == len(node.valuev)
if len(node.keyv) == 0:
tnode += ':'
vtxtv = []
for v in node.valuev:
vtxt = vencode(v)
assert ' ' not in vtxt
assert ':' not in vtxt
assert ',' not in vtxt
assert '-' not in vtxt
if vtxt in vencoded:
assert vencoded[vtxt] == v
vencoded[vtxt] = v
tnode += ','.join(['%d:%s' % (k,vtxt)
for (k,vtxt) in zip(node.keyv, vtxtv)])
# tree or bucket with keys
tnode += ','.join(['%d' % _ for _ in node.keyv])
topo += '-'.join(tnodev)
if 1: # make sure that every topology we emit, can be loaded back
t2 = TopoDecode(topo, vdecode)
if t2 != tree:
panic("BUG: TopoEncode: D(E(·)) != identity\n· = %s\n D(E(·) = %s" % (tree, t2))
return topo
# TopoDecode decodes topology-encoded text into Tree structure.
# Vdecode specifies way to decode values referred-to by buckets.
# See top-level docstring for description of topology encoding.
class TopoDecodeError(Exception):
def TopoDecode(text, vdecode=int):
levelv = text.split('/') # T3/T-T/B1:a-T5/B-B7,8,9 -> T3 T-T B1:a-T5 B-B7,8,9
# NOTE we don't forbid mixing buckets-with-value with buckets-without-value
# build nodes from bottom-up
currentv = [] # of nodes on current level (that we are building)
bottomq = [] # of nodes below current level that we are building
# shrinks as fifo as nodes added to currentv link to bottom
while len(levelv) > 0:
level = levelv.pop() # e.g. B1:a-T5
tnodev = level.split('-') # e.g. B1:a T5
bottomq = currentv
currentv = []
for tnode in tnodev:
if tnode[:1] == 'T':
typ = Tree
elif tnode[:1] == 'B':
typ = Bucket
raise TopoDecodeError("incorrect node %s: unknown prefix" % qq(tnode))
tkeys = tnode[1:] # e.g. '7,8,9' or '1:a,3:def' or ''
if tkeys == '':
tkeyv = []
tkeyv = tkeys.split(',') # e.g. 7 8 9
withV = (typ is Bucket and ':' in tkeys)
keyv = []
valuev= [] if withV else None
if tkeys != ':': # "B:" indicates ø bucket with values
for tkey in tkeyv:
ktxt = tkey
if withV:
ktxt, vtxt = tkey.split(':')
v = vdecode(vtxt)
k = int(ktxt)
if typ is Bucket:
node = Bucket(keyv, valuev)
# Tree
nchild = len(keyv) + 1
if len(bottomq) < nchild:
raise TopoDecodeError(
"node %s at level %d: next level does not have enough children to link to" %
(qq(tnode), len(levelv)+1))
children = bottomq[:nchild]
bottomq = bottomq[nchild:]
node = Tree(keyv, *children)
if len(bottomq) != 0:
raise TopoDecodeError("level %d does not link to all nodes in the next level" %
if len(currentv) != 1:
raise TopoDecodeError("first level has %d entries; must be 1" % len(currentv))
root = currentv[0]
return root
# ---- misc ----
# _indent returns text with each line of it indented with prefix.
def _indent(prefix, text): # -> text
textv = text.split('\n')
textv = [prefix+_ for _ in textv]
text = '\n'.join(textv)
return text
# _filter2(l,pred) = filter(l,pred), filter(l,!pred)
def _filter2(l, pred):
t, f = [], []
for _ in l:
if pred(_):
return t,f
# _assertIncv asserts that values of vector v are strictly ↑
def _assertIncv(v):
prev = -inf
for i in range(len(v)):
if not (v[i] > prev):
panic("assert incv: [%d] not ↑: %s -> %s" % (i, v[i], prev))
prev = v[i]
# _zclassify returns kind of btree node znode is.
# raises TypeError if znode is not a ZODB btree node.
class _ZNodeType:
# .is_ztree znode is a BTree node
# .is_zbucket znode is a Bucket node
# .is_map whether znode is k->v or just set(k)
def _zclassify(znode): # -> _ZNodeType
# XXX-> use zbcheck.classify ?
typ = type(znode)
is_ztree = ("Tree" in typ.__name__)
is_zset = ("Set" in typ.__name__)
is_zbucket = (("Bucket" in typ.__name__) or re.match("..Set", typ.__name__))
is_map = (not is_zset)
if not (is_ztree or is_zbucket):
raise TypeError("type %r is not a ZODB BTree node" % typ)
_ = _ZNodeType()
_.is_ztree = is_ztree
_.is_zbucket = is_zbucket
_.is_map = is_map
return _
# zcheck performs full consistency checks on ztree provided by ZODB.
# The checks are what is provided by BTree.check and node._check().
def zcheck(ztree):
# verify internal C-level pointers consistency.
# Only valid to be called on root node and verifies whole tree.
# If called on a non-root node will lead to assert "Bucket next pointer is
# damaged" since it calls BTree_check_inner(ztree, next_bucket=nil)
# assuming ztree is a root tree node, and its rightmost bucket should
# indeed have ->next=nil. If ztree is not root, there can be right-adjacent
# part of the tree, to which ztree's rightmost bucket must be linking to
# via its ->next.
# verify nodes values consistency
# ----------------------------------------
import sys, tempfile, shutil, subprocess
# graphviz returns tree graph representation in dot language.
def graphviz(t, clustername=''):
assert isinstance(t, Tree)
symtab = {} # id(node) -> name
valtab = {} # value -> name
outv = []
def emit(text):
#emit('subgraph %s {' % clustername) FIXME kills arrows
emit(' splines=false')
for (level, nodev) in enumerate(_walkBFS(t)):
for (i, node) in enumerate(nodev):
assert isinstance(node, (Tree,Bucket))
# register node in symtab
assert id(node) not in symtab
kind = ('T' if isinstance(node, Tree) else 'B')
symtab[id(node)] = '%s%s%d:%d' % (clustername+'_', kind, level, i)
# emit node itself
# approach based on
emit(' %s' % qq(symtab[id(node)]))
emit(' [')
emit(' shape = box')
emit(' margin = 0')
emit(' height = 0')
emit(' width = 0')
if kind == 'T':
emit(' style = rounded')
emit(' label = <<table border="0" cellborder="0" cellspacing="0">')
emit(' <tr>')
for (j,key) in enumerate(node.keyv):
if kind == 'T':
emit(' <td port="con%d"></td>' % j)
if 1:
emit(' <td port="key%d">%d</td>' % (j,key))
emit(' <td port="con%d">%s</td>' % (len(node.keyv), \
(' ' if len(node.keyv) == 0 else '')))
emit(' </tr>')
emit(' </table>>')
emit(' ]')
# emit values
if kind == 'B' and node.valuev is not None:
assert len(node.keyv) == len(node.valuev)
for (j,key) in enumerate(node.keyv):
v = node.valuev[j]
valtab[v] = '%sV%s' % (clustername+'_', v)
# second pass: emit links + node ranks + links to values
for nodev in _walkBFS(t):
# same rank for nodes on the same level
emit(' {rank=same; %s}' % ' '.join([qq(symtab[id(_)]) for _ in nodev]))
# links
for node in nodev:
assert isinstance(node, (Tree,Bucket))
if isinstance(node, Tree):
for (j,child) in enumerate(node.children):
emit(' %s:"con%d" -> %s' % (qq(symtab[id(node)]), j, qq(symtab[id(child)])))
elif node.valuev is not None:
assert len(node.valuev) == len(node.keyv)
for (j,key) in enumerate(node.keyv):
emit(' %s:"key%d" -> %s' % (qq(symtab[id(node)]), j, valtab[node.valuev[j]]))
# third pass: emit values
for v in sorted(valtab):
emit(' %s' % qq(valtab[v]))
emit(' [')
emit(' shape = plain')
emit(' label = "%s"' % v)
emit(' margin = 0')
emit(' height = 0')
emit(' width = 0')
emit(' ]')
return '\n'.join(outv)
# topoview displays topologies provided in argv.
def topoview(argv):
if len(argv) == 0:
raise RuntimeError('E: empty argument')
treev = [TopoDecode(_, vdecode=lambda v: v) for _ in argv]
outv = []
def emit(text):
emit('digraph {')
emit(' label=%s' % qq(' '.join(argv)))
emit(' labelloc="t"')
for (i, tree) in enumerate(treev):
emit(tree.graphviz(clustername='c%d' % i))
import graphviz as gv
g = gv.Source('\n'.join(outv))
tmpd = tempfile.mkdtemp('', 'xbtree')
def _():
# set filename so that it shows in window title.
filename = ' '.join([_.replace('/', '\\') for _ in argv]) # no / in filename
g.render(filename, tmpd, format='svg')
# XXX g.view spawns viewer, but does not wait for it to stop
subprocess.check_call(["inkview", "%s/%s.svg" % (tmpd, filename)])
if __name__ == '__main__':
# -*- coding: utf-8 -*-
# Copyright (C) 2020-2021 Nexedi SA and Contributors.
# Kirill Smelkov <>
# This program is free software: you can Use, Study, Modify and Redistribute
# it under the terms of the GNU General Public License version 3, or (at your
# option) any later version, as published by the Free Software Foundation.
# You can also Link and Combine this program with other software covered by
# the terms of any of the Free Software licenses or any of the Open Source
# Initiative approved licenses and Convey the resulting work. Corresponding
# source of such a combination shall include the source code for all other
# software used.
# This program is distributed WITHOUT ANY WARRANTY; without even the implied
# See COPYING file for full licensing terms.
# See for rationale and options.
from __future__ import print_function, absolute_import
from wendelin.wcfs.internal import xbtree
from BTrees.LOBTree import LOBTree
from BTrees.IIBTree import IITreeSet, IISet
from BTrees.tests import testBTrees
from BTrees import check as zbcheck
from ZODB.MappingStorage import MappingStorage
from ZODB import DB
from persistent import Persistent
import transaction
from pytest import raises
inf = float('inf')
# T/B are shorthands for Tree and Bucket without values.
# Bv is shorthand for Bucket with values.
T = xbtree.Tree
B = lambda *keyv: xbtree.Bucket(keyv, None)
Bv = lambda keyv, *valuev: xbtree.Bucket(keyv, valuev)
# buildDegenerateZTree builds ztree with known degenerate topology, see:
def buildDegenerateZTree():
ztree, keys = testBTrees.DegenerateBTree("testBasicOps")._build_degenerate_tree()
assert keys == [1, 3, 5, 7, 11]
assert xbtree.StructureOf(ztree) == T([4],
T([], B(1)), T([], B(3))),
T([6, 10],
T([], T([], B(5))),
T([], B(7)),
T([], B(11))) ))
return ztree
def test_structureOf():
# empty tree
t = LOBTree()
assert xbtree.StructureOf(t) == T([], Bv([]))
# tree with 1 k->v
t = LOBTree()
t[10] = 'hello'
assert xbtree.StructureOf(t) == T([], Bv([10], 'hello'))
# known degenerate topology
t = buildDegenerateZTree()
assert xbtree.StructureOf(t) == T([4],
T([], B(1)), T([], B(3))),
T([6, 10],
T([], T([], B(5))),
T([], B(7)),
T([], B(11))) ))
def test_topoEncoding():
def X(tree):
topo = xbtree.TopoEncode(tree)
t2 = xbtree.TopoDecode(topo)
assert t2 == tree
return topo
assert X(T([], B())) == 'T/B'
assert X(T([], B(1))) == 'T/B1'
assert X(T([], B(1,3))) == 'T/B1,3'
assert X(T([], T([], B()))) == 'T/T/B'
assert X(T([3],
T([], B(1)),
T([5], B(), B(7,8,9))))) == "T3/T-T/B1-T5/B-B7,8,9"
# degenerate btree from ZODB
assert X(T([4],
T([], B(1)), T([], B(3))),
T([6, 10],
T([], T([], B(5))),
T([], B(7)),
T([], B(11))) ))) == "T4/T2-T/T-T-T6,10/B1-B3-T-T-T/T-B7-B11/B5"
# tree with key->value
assert X(T([], Bv([]))) == 'T/B:'
assert X(T([], Bv([1], 4))) == 'T/B1:4'
assert X(T([], Bv([1,2], 4,5))) == 'T/B1:4,2:5'
assert X(T([3],
Bv([1], 10),
Bv([4,5], 11,12))) == 'T3/B1:10-B4:11,5:12'
# TopoEncode/TopoDecode on autogenerated topologies.
for tree in xbtree.AllStructs([1,3,7,8], 1,2, allowEmptyBuckets=True):
t2 = xbtree.TopoDecode(xbtree.TopoEncode(tree))
assert t2 == tree
def test_allStructs():
# X = AllStructs(..., allowEmptyBuckets=True)
# Y = AllStructs(..., allowEmptyBuckets=False)
# XY = X = Y + assert X == Y
def X(keys, maxdepth, maxsplit, allowEmptyBuckets=True, kv=None):
return list(xbtree.AllStructs(keys, maxdepth, maxsplit, allowEmptyBuckets, kv))
def Y(keys, maxdepth, maxsplit, kv=None):
return X(keys, maxdepth, maxsplit, allowEmptyBuckets=False, kv=kv)
def XY(keys, maxdepth, maxsplit, kv=None):
x = X(keys, maxdepth, maxsplit, kv=kv)
y = Y(keys, maxdepth, maxsplit, kv=kv)
assert x == y
return x
assert XY([], 0, 0) == [ T([], B()) ]
assert XY([1], 0, 0) == [ T([], B(1)) ]
assert XY([1,3], 0, 0) == [ T([], B(1,3)) ]
assert XY([], 0, 1) == [ T([], B()) ] # nothing to split
assert X([], 1, 0) == [ T([], B()),
T([], B())) ]
assert Y([], 1, 0) == [ T([], B()) ]
assert X([], 2, 0) == [ T([], B()),
T([], B())),
T([], B()))) ]
assert Y([], 2, 0) == [ T([], B()) ]
assert XY([1,3], 0, 0) == [ T([], B(1,3)) ]
assert X([1,3], 0, 1) == [ T([], B(1,3)),
# nsplit=1
T([0], B(), B(1,3)),
T([1], B(), B(1,3)),
T([2], B(1), B(3)),
T([3], B(1), B(3)),
T([4], B(1,3), B()),
assert Y([1,3], 0, 1) == [ T([], B(1,3)),
# nsplit=1
T([2], B(1), B(3)),
T([3], B(1), B(3)),
assert XY([1,3], 1, 0) == [ T([], B(1,3)),
# depth=1
T([], B(1,3)))
assert X([1,3], 1, 1) == [
# T/
T([], B(1,3)), # nsplit=0
T([], # nsplit=0,0
T([], B(1,3))),
T([], # nsplit=0,1
T([0], B(), B(1,3))),
T([1], B(), B(1,3))),
T([2], B(1), B(3))),
T([3], B(1), B(3))),
T([4], B(1,3), B())),
# T0/
T([0], B(), B(1,3)), # nsplit=1
T([0], # nsplit=1,(0,0)
T([], B()),
T([], B(1,3))),
T([0], # nsplit=1,(0,1)
T([], B()),
T([1], B(), B(1,3))),
T([], B()),
T([2], B(1), B(3))),
T([], B()),
T([3], B(1), B(3))),
T([], B()),
T([4], B(1,3), B())),
# nsplit=1,(1,*) -> ø
# T1/
T([1], B(), B(1,3)), # nsplit=1
T([1], # nsplit=1,(0,0)
T([], B()),
T([], B(1,3))),
T([], B()), # nsplit=1,(0,1)
T([2], B(1), B(3))),
T([], B()),
T([3], B(1), B(3))),
T([], B()),
T([4], B(1,3), B())),
T([1], # nsplit=1,(1,0)
T([0], B(), B()),
T([], B(1,3))),
T([1], # nsplit=1,(1,1)
T([0], B(), B()),
T([2], B(1), B(3))),
T([0], B(), B()),
T([3], B(1), B(3))),
T([0], B(), B()),
T([4], B(1,3), B())),
# T2/
T([2], B(1), B(3)), # nsplit=1
T([2], # nsplit=1,(0,0)
T([], B(1)),
T([], B(3))),
T([2], # nsplit=1,(0,1)
T([], B(1)),
T([3], B(), B(3))),
T([], B(1)),
T([4], B(3), B())),
T([2], # nsplit=1,(1,0)
T([0], B(), B(1)),
T([], B(3))),
T([2], # nsplit=1,(1,1)
T([0], B(), B(1)),
T([3], B(), B(3))),
T([0], B(), B(1)),
T([4], B(3), B())),
T([2], # nsplit=1,(1,0)
T([1], B(), B(1)),
T([], B(3))),
T([2], # nsplit=1,(1,1)
T([1], B(), B(1)),
T([3], B(), B(3))),
T([1], B(), B(1)),
T([4], B(3), B())),
# T3/
T([3], B(1), B(3)), # nsplit=1
T([3], # nsplit=1,(0,0)
T([], B(1)),
T([], B(3))),
T([3], # nsplit=1,(0,1)
T([], B(1)),
T([4], B(3), B())),
T([3], # nsplit=1,(1,0)
T([0], B(), B(1)),
T([], B(3))),
T([3], # nsplit=1,(1,1)
T([0], B(), B(1)),
T([4], B(3), B())),
T([3], # nsplit=1,(1,0)
T([1], B(), B(1)),
T([], B(3))),
T([3], # nsplit=1,(1,1)
T([1], B(), B(1)),
T([4], B(3), B())),
T([3], # nsplit=1,(1,0)
T([2], B(1), B()),
T([], B(3))),
T([3], # nsplit=1,(1,1)
T([2], B(1), B()),
T([4], B(3), B())),
# T4/
T([4], B(1,3), B()), # nsplit=1
T([4], # nsplit=1,(0,0)
T([], B(1,3)),
T([], B())),
# nsplit=1,(0,1) -> ø
T([4], # nsplit=1,(1,0)
T([0], B(), B(1,3)),
T([], B())),
T([1], B(), B(1,3)),
T([], B())),
T([2], B(1), B(3)),
T([], B())),
T([3], B(1), B(3)),
T([], B())),
# nsplit=1,(1,1) -> ø
assert Y([1,3], 1, 1) == [
# T/
T([], B(1,3)), # nsplit=0
T([], # nsplit=0,0
T([], B(1,3))),
T([], # nsplit=0,1
T([2], B(1), B(3))),
T([3], B(1), B(3))),
# T0/
# nothing - leftmost bucket is always empty
# T1/
# nothing - leftmost bucket is always empty
# T2/
T([2], B(1), B(3)), # nsplit=1
T([2], # nsplit=1,(0,0)
T([], B(1)),
T([], B(3))),
# T3/
T([3], B(1), B(3)), # nsplit=1
T([3], # nsplit=1,(0,0)
T([], B(1)),
T([], B(3))),
# T4/
# nothing - rightmost bucket is always empty
# TODO test for maxsplit=2 / maxdepth=2 vvv
def TY(keys, maxdepth, maxsplit, kv=None):
yv = Y(keys, maxdepth, maxsplit, kv=kv)
return list([xbtree.TopoEncode(_, vencode=lambda v: v) for _ in yv])
assert TY([1,3], 1, 1) == [
# with values
assert TY([1,3], 1,1, kv={1:'a',3:'c'}) == [
# XBlk simulates ZBlk without depending on
class XBlk(Persistent):
def __init__(self, data): = data
def __str__(self):
return 'X%s' %
__repr__ = __str__
# XLOTree is like LOBTree but with small max tree and bucket node sizes.
# It's tree and bucket nodes are split often on regular tree updates.
class XLOTree(LOBTree):
#_bucket_type = XLOBucket
max_leaf_size = 2
max_internal_size = 2
zbcheck._type2kind[XLOTree] = (zbcheck.TYPE_BTREE, True)
zbcheck._btree2bucket[XLOTree] = XLOTree._bucket_type
def crack_btree(ztree):
assert xbtree._zclassify(ztree).is_ztree, ztree
return zbcheck.crack_btree(ztree, is_mapping=True)
def crack_bucket(zbucket):
assert xbtree._zclassify(zbucket).is_zbucket, zbucket
return zbcheck.crack_bucket(zbucket, is_mapping=True)
# assertT asserts that znode is normal tree node + has specified keys and children.
# by default children are checked exactly via "is"
# if a child is represented as 'T' or 'B' - it is only verified to be of tree
# or bucket type correspondingly.
def assertT(znode, keyv, *children): # -> [] of children marked with 'T'/'B'
_ = xbtree._zclassify(znode)
assert _.is_ztree
kind, keys, kids = zbcheck.crack_btree(znode, _.is_map)
assert kind == BTREE_NORMAL
assert keys == keyv
assert len(kids) == len(children)
retv = []
for (child, childOK) in zip(kids, children):
if childOK == 'T':
assert type(child) is type(znode)
elif childOK == 'B':
assert type(child) is znode._bucket_type
assert child is childOK
return retv
# assertB asserts that znode is bucket node with specified keys and values
def assertB(znode, *kvv):
_ = xbtree._zclassify(znode)
assert _.is_zbucket
keys, values = zbcheck.crack_bucket(znode, _.is_map)
if not _.is_map:
assert values == []
assert keys == kvv
assert len(keys) == len(values)
assert len(keys) == len(kvv)
for (i,(k,v)) in enumerate(zip(keys, values)):
kok, vok = kvv[i]
assert k == kok
assert v is vok
def test_restructure():
# do restructure tests under ZODB because without ZODB connection it is not
# always possible to __setstate__ for e.g. .../T/B. We also want to make
# sure Restructure correctly marks modified nodes as changed so that the
# changes are actually persisted to storage on commit.
zstor = MappingStorage()
db = DB(zstor)
zconn =
X = [] # X[i] -> XBlk corresponding to block #i
xv = 'abcdefghijkl'
for i in range(len(xv)):
def xdecode(v):
assert len(v) == 1
assert v in xv
return X[xv.index(v)]
def xencode(x):
assert isinstance(x, XBlk)
# assertB wraps global assertB to automatically fill in X[k] values for specified keys.
def assertB(znode, *keyv):
globals()['assertB'](znode, *[(k,X[k]) for k in keyv])
# Z prepares XLOTree ztree with given keys via usual way.
# the tree is setup as {} k -> X[k].
def Z(*keys):
ztree = XLOTree()
for k in keys:
ztree[k] = X[k]
# check all keys via iterating (this verifies firstbucket and B->next pointers)
keys2 = set(ztree.keys())
assert keys2 == set(keys)
# check all keys by [] access
for k in keys:
assert ztree[k] is X[k]
return ztree
# R restructures ztree to have specified new topology.
# The result is committed unless dontcommit=Y specified.
def R(ztree, newtopo, dontcommit=False):
# verify ztree consistency
items = list(ztree.items())
for (k,v) in items:
assert ztree[k] == v
if isinstance(newtopo, str):
newStructure = xbtree.TopoDecode(newtopo, xdecode)
assert isinstance(newtopo, xbtree.Tree)
newStructure = newtopo
xbtree.Restructure(ztree, newStructure)
if not dontcommit:
# force objects state to be reloaded from storage.
# this leads further checks to also verify if Restructure modified a
# node, but did not marked it as changed. If this bug is indeed there -
# then the modifications will be lost after live cache clearance.
assert xbtree.StructureOf(ztree, onlyKeys=True) == \
# verify iteration produces the same [] of (key, v)
assert list(ztree.items()) == items
# verify [k] gives the same v (for all k)
for (k,v) in items:
assert ztree[k] == v
# S returns topo-encoded keys-only structure of ztree.
# Sv returns topo-encoded structure of ztree with values.
def S(ztree):
return xbtree.TopoEncode(xbtree.StructureOf(ztree, onlyKeys=True))
def Sv(ztree):
return xbtree.TopoEncode(xbtree.StructureOf(ztree), xencode)
# Z0 creates new empty tree
def Z0():
z = Z()
assert crack_btree(z) == (BTREE_EMPTY, [], [])
return z
# ---- tests with manual verification of resulting topology and nodes ----
# ø -> T/B
z = Z0()
R(z, 'T/B')
assert crack_btree(z) == (BTREE_EMPTY, [], [])
with raises(ValueError, match="new keys != old keys"):
R(z, 'T/B1')
# ø -> T/T/B (don't - we don't emit topologies with empty buckets for
# tests since ZODB breaks on them)
z = Z0()
R(z, 'T/T/B')
t, = assertT(z, [], 'T')
b, = assertT(t, [], 'B')
# ø -> T/T-T/B-B (don't - see ^^^)
z = Z0()
R(z, 'T0/T-T/B-B')
Tl, Tr = assertT(z, [0], 'T','T')
bl, = assertT(Tl, [], 'B')
br, = assertT(Tr, [], 'B')
# tree with 1 k->v (not yet committed bucket)
z = Z(1)
assert crack_btree(z) == (BTREE_ONE, ((1, X[1]),), None)
R(z, 'T/B1', dontcommit=True)
assert crack_btree(z) == (BTREE_ONE, ((1, X[1]),), None)
R(z, 'T/T/B1', dontcommit=True)
t, = assertT(z, [], 'T')
b1, = assertT(t, [], 'B')
assertB(b1, 1)
assert b1._p_oid is not None
R(z, 'T/B1', dontcommit=True)
assertT(z, [], b1)
assertB(b1, 1)
# tree with 2 k->v (not-yet committed bucket)
z = Z(1,3)
assert crack_btree(z) == (BTREE_ONE, ((1, X[1], 3, X[3]),), None)
R(z, 'T2/B1-B3', dontcommit=True)
b1, b3 = assertT(z, [2], 'B','B')
assert b1._p_oid is None
assert b3._p_oid is None
assertB(b1, 1)
assertB(b3, 3)
R(z, 'T/B1,3')
# buckets were not yet assigned oid -> collapsed back into T
assert crack_btree(z) == (BTREE_ONE, ((1, X[1], 3, X[3]),), None)
R(z, 'T3/B1-B3', dontcommit=True)
b1, b3 = assertT(z, [3], 'B','B')
assert b1._p_oid is None
assert b3._p_oid is None
assertB(b1, 1)
assertB(b3, 3)
transaction.commit() # force buckets to be assigned oid
assert b1._p_oid is not None
assert b3._p_oid is not None
# restructure back - buckets not collapsed back into T
R(z, 'T/B1,3')
b13, = assertT(z, [], 'B')
assertB(b13, 1,3)
# add 1 key -> B splits -> B + B
assert S(z) == 'T/B1,3'
z[5] = X[5]
assert S(z) == 'T3/B1-B3,5'
b1, b35 = assertT(z, [3], 'B','B')
assertB(b1, 1)
assertB(b35, 3,5)
# -> T2/T-T/B1-B3,5 (add intermediate T-T level)
R(z, 'T2/T-T/B1-B3,5')
tl, tr = assertT(z, [2], 'T','T')
assertT(tl, [], b1)
assertT(tr, [], b35)
assertB(b1, 1)
assertB(b35, 3,5)
# -> T2/T-T/B1-T/B3,5 (add intermediate T level in right arm)
R(z, 'T2/T-T/B1-T/B3,5')
assertT(z, [2], tl, tr)
assertT(tl, [], b1)
trr, = assertT(tr, [], 'T')
assert isinstance(trr, XLOTree)
assertT(trr, [], b35)
assertB(b1, 1)
assertB(b35, 3,5)
# -> T2,4/B1-B3-B5 (kill intermediate trees, split B35->B3+B5)
R(z, 'T2,4/B1-B3-B5')
b3, = assertT(z, [2,4], b1,'B',b35)
b5 = b35; del b35
assertB(b1, 1)
assertB(b3, 3)
assertB(b5, 5)
# -> T2/T-T4/B1-B3-B5 (add intermediate T-T4 level)
R(z, 'T2/T-T4/B1-B3-B5')
tl, tr = assertT(z, [2], 'T','T')
assertT(tl, [], b1)
assertT(tr, [4], b3,b5)
assertB(b1, 1)
assertB(b3, 3)
assertB(b5, 5)
# -> T2/T-T/B1-T4/B3-B5 (add intermediate level in right arm)
R(z, 'T2/T-T/B1-T4/B3-B5')
tr, = assertT(z, [2], tl,'T')
assertT(tl, [], b1)
trr, = assertT(tr, [], 'T')
assertT(trr, [4], b3,b5)
assertB(b1, 1)
assertB(b3, 3)
assertB(b5, 5)
# -> T/B1,3,5 (collapse into T/B)
R(z, 'T/B1,3,5')
assertT(z, [], b1)
b135 = b1
assertB(b135, 1,3,5)
# grow the tree with four more keys (6,7,8,9) till top-level tree node splits
assert S(z) == 'T/B1,3,5'
z[6] = X[6]
assert S(z) == 'T5/B1,3-B5,6'
z[7] = X[7]
assert S(z) == 'T5,6/B1,3-B5-B6,7'
z[8] = X[8]
assert S(z) == 'T6/T5-T7/B1,3-B5-B6-B7,8'
# rotate keys in T and reflow B to the left
tl, tr = assertT(z, [6], 'T','T')
b13, b5 = assertT(tl, [5], 'B','B')
b6, b78 = assertT(tr, [7], 'B','B')
assertB(b13, 1,3)
assertB(b5, 5)
assertB(b6, 6)
assertB(b78, 7,8)
R(z, 'T7/T4,6-T/B1,3-B5-B6-B7,8')
assertT(z, [7], tl,tr)
assertT(tl, [4,6], b13,b5,b6)
assertT(tr, [], b78)
assertB(b13, 1,3)
assertB(b5, 5)
assertB(b6, 6)
assertB(b78, 7,8)
# migrate keys in between buckets
R(z, 'T6/T3-T8/B1-B3,5-B6,7-B8')
assertT(z, [6], tl,tr)
assertT(tl, [3], b13,b5)
assertT(tr, [8], b6,b78)
b1 = b13; del b13
b35 = b5; del b5
b67 = b6; del b6
b8 = b78; del b78
assertB(b1, 1)
assertB(b35, 3,5)
assertB(b67, 6,7)
assertB(b8, 8)
# ---- new structure given with values ----
z = Z(0,2)
R(z, T([1], Bv([0],X[0]), Bv([2],X[2])))
b0, b2 = assertT(z, [1], 'B','B')
assertB(b0, 0)
assertB(b2, 2)
assert b0[0] is X[0]
assert b2[2] is X[2]
# [2] changes value from X[2] to X[3]
with raises(ValueError, match=r"target bucket changes \[2\]"):
R(z, T([1], Bv([0],X[0]), Bv([2],X[3])))
# ---- tricky cases
z = Z(0,1,2,3)
R(z, 'T2/T1-T3/B0-B1-T-T/B2-B3')
R(z, 'T2/T1-T/T-T-B2,3/B0-B1')
# degenerate topology from ZODB example
z = Z(1,3,5,7,11)
R(z, 'T4/T2-T/T-T-T6,10/B1-B3-T-T-T/T-B7-B11/B5')
R(z, 'T/B1,3,5,7,11')
# verify that changed objects are marked as such and so included into commit
# (just R also partly verifies this on every call)
z = Z(0,2,3)
def Rz(newtopo):
R(z, newtopo, dontcommit=True)
assert Sv(z) == newtopo
zconn.cacheMinimize() # force z state to be reloaded from storage
assert Sv(z) == newtopo # will fail if T or B is not marked as changed
# make sure that only modified nodes are marked as changed.
z = Z(0,1,2,3)
R(z, 'T1/T-T2/B0-B1-B2,3')
tl, tr = assertT(z, [1], 'T','T')
b0, = assertT(tl, [], 'B')
b1, b23 = assertT(tr, [2], 'B','B')
assertB(b0, 0)
assertB(b1, 1)
assertB(b23, 2,3)
assert z._p_changed == False
assert tl._p_changed == False
assert tr._p_changed == False
assert b0._p_changed == False
assert b1._p_changed == False
assert b23._p_changed == False
R(z, 'T1/T-T3/B0-B1,2-B3', dontcommit=True) # reflow right arm
assertT(z, [1], tl, tr)
assertT(tl, [], b0)
assertT(tr, [3], b1, b23) # changed
assertB(b0, 0)
assertB(b1, 1,2) # changed
assertB(b23, 3) # changed
assert z._p_changed == False
assert tl._p_changed == False
assert tr._p_changed == True
assert b0._p_changed == False
assert b1._p_changed == True
assert b23._p_changed == True
# ---- tests on automatically generated topologies ----
# ( we make sure that Restructure can make the restructurement and that
# after restructure a tree remains valid without any error introduced )
for nkeys in range(5): # XXX !slow -> ↑
for xkeyv in xbtree._iterSplitByN(-1, 5+1, nkeys):
keyv = xkeyv[1:-1] # -1, ..., N -> ...
z = Z(*keyv)
# d s Nvariants Ttest
# 3 2 35·10³ 40s
# 3 1 18·10³ 20s
# 2 2 8·10³ 8s
# 2 1 3·103 4s
# 1 1 1·10³ 1s
for tree in xbtree.AllStructs(keyv, 2, 1): # XXX !slow -> d=3, s=2
#print('\t%s' % xbtree.TopoEncode(tree))
R(z, tree)
def test_walkBFS():
R = xbtree._Range
# T/B
b = B()
t = T([], b)
walkv = list(xbtree.__walkBFS(t))
assert len(walkv) == 2 # (t) (b)
_ = walkv[0]
assert len(_) == 1
assert _[0].range == R(-inf, inf)
assert _[0].node is t
_ = walkv[1]
assert len(_) == 1
assert _[0].range == R(-inf, inf)
assert _[0].node is b
# T0/T-T/B-B
bl = B(); br = B()
tl = T([], bl)
tr = T([], br)
t = T([0], tl, tr)
walkv = list(xbtree.__walkBFS(t))
assert len(walkv) == 3 # (t) (tl, tr), (bl, br)
_ = walkv[0]
assert len(_) == 1
assert _[0].range == R(-inf, inf)
assert _[0].node is t
_ = walkv[1]
assert len(_) == 2
assert _[0].range == R(-inf, 0)
assert _[0].node is tl
assert _[1].range == R(0, inf)
assert _[1].node is tr
_ = walkv[2]
assert len(_) == 2
assert _[0].range == R(-inf, 0)
assert _[0].node is bl
assert _[1].range == R(0, inf)
assert _[1].node is br
# XXX more tests?
def test_zwalkBFS():
zt = buildDegenerateZTree()
# assign oid to created objects to force btrees not to embed bucket state
zstor = MappingStorage()
db = DB(zstor)
zconn =
zroot = zconn.root()
zroot['x'] = zt
def assertT(znode, keyv, *children):
assert isinstance(znode, IITreeSet)
return globals()['assertT'](znode, keyv, *children)
# assertB asserts that znode is bucket + has specified keys
def assertB(znode, *keyv):
assert isinstance(znode, IISet)
globals()['assertB'](znode, *keyv)
R = xbtree._Range
zwalkv = list(xbtree.__zwalkBFS(zt))
assert len(zwalkv) == 6 # [-∞,∞)T4,
# [-∞,4)T2, [4,∞)T
# [-∞,2)T, [2,4)T, [4,∞)T6,10
# [-∞,2)B1, [2,4)B3, [4,6)T, [6,10)T, [10,∞]T
# [4,6)T, [6,10)B7, [10,∞)B11
# [4,6)B5
_ = zwalkv[5] # [4,6)B5
assert len(_) == 1
assert _[0].range == R(4,6)
b5 = _[0].node; assertB(b5, 5)
_ = zwalkv[4] # [4,6)T, [6,10)B7, [10,∞)B11
assert len(_) == 3
assert _[0].range == R(4,6)
assert _[1].range == R(6,10)
assert _[2].range == R(10,inf)
t4_b5= _[0].node; assertT(t4_b5, [], b5)
b7 = _[1].node; assertB(b7, 7)
b11 = _[2].node; assertB(b11, 11)
_ = zwalkv[3] # [-∞,2)B1, [2,4)B3, [4,6)T, [6,10)T, [10,∞]T
assert len(_) == 5
assert _[0].range == R(-inf,2)
assert _[1].range == R(2,4)
assert _[2].range == R(4,6)
assert _[3].range == R(6,10)
assert _[4].range == R(10,inf)
b1 = _[0].node; assertB(b1, 1)
b3 = _[1].node; assertB(b3, 3)
t3_t4_b5 = _[2].node; assertT(t3_t4_b5, [], t4_b5)
t3_b7 = _[3].node; assertT(t3_b7, [], b7)
t3_b11 = _[4].node; assertT(t3_b11, [], b11)
_ = zwalkv[2] # [-∞,2)T, [2,4)T, [4,∞)T6,10
assert len(_) == 3
assert _[0].range == R(-inf,2)
assert _[1].range == R(2,4)
assert _[2].range == R(4,inf)
t2_b1 = _[0].node; assertT(t2_b1, [], b1)
t2_b3 = _[1].node; assertT(t2_b3, [], b3)
t2_610= _[2].node; assertT(t2_610, [6,10], t3_t4_b5, t3_b7, t3_b11)
_ = zwalkv[1] # [-∞,4)T2, [4,∞)T
assert len(_) == 2
assert _[0].range == R(-inf, 4)
assert _[1].range == R(4, inf)
t1_2 = _[0].node; assertT(t1_2, [2], t2_b1, t2_b3)
t1_t2_610 = _[1].node; assertT(t1_t2_610, [], t2_610)
_ = zwalkv[0] # [-∞,∞)T4,
assert len(_) == 1
assert _[0].range == R(-inf, inf)
assertT(_[0].node, [4], t1_2, t1_t2_610)
def test_keyvSliceBy():
X = xbtree._keyvSliceBy
assert X([], 0,0) == []
assert X([1], 0,0) == []
assert X([1], 0,1) == []
assert X([1], 1,1) == []
assert X([1], 1,2) == [1]
assert X([1,3,5,10,17], 3,10) == [3,5]
def test_iterSplitByN():
def X(lo, hi, nsplit):
return tuple(xbtree._iterSplitByN(lo, hi, nsplit))
assert X(0,0, 0) == ( [0,0], )
assert X(0,0, 1) == ()
assert X(0,1, 0) == ( [0,1], )
assert X(0,1, 1) == ()
assert X(0,2, 0) == ( [0,2], )
assert X(0,2, 1) == ( [0,1,2], )
assert X(0,2, 2) == ()
assert X(0,3, 0) == ( [0,3], )
assert X(0,3, 1) == ( [0,1,3], [0,2,3] )
assert X(0,3, 2) == ( [0,1,2,3], )
assert X(0,3, 3) == ()
assert X(0,4, 0) == ( [0,4], )
assert X(0,4, 1) == ( [0,1,4], [0,2,4], [0,3,4] )
assert X(0,4, 2) == ( [0,1,2,4], [0,1,3,4], [0,2,3,4] )
assert X(0,4, 3) == ( [0,1,2,3,4], )
assert X(0,4, 4) == ()
assert X(0,5, 0) == ( [0,5], )
assert X(0,5, 1) == ( [0,1,5], [0,2,5], [0,3,5], [0,4,5] )
assert X(0,5, 2) == ( [0,1,2,5], [0,1,3,5], [0,1,4,5], [0,2,3,5], [0,2,4,5], [0,3,4,5] )
assert X(0,5, 3) == ( [0,1,2,3,5], [0,1,2,4,5], [0,1,3,4,5], [0,2,3,4,5] )
assert X(0,5, 4) == ( [0,1,2,3,4,5], )
assert X(0,5, 5) == ()
def test_iterSplitKeyvByN():
keyv = [1,3,4]
def X(lo, hi, nsplit):
return tuple(xbtree._iterSplitKeyvByN(lo, hi, keyv, nsplit))
assert X(0,7, 0) == ( [0,7], )
assert X(0,7, 1) == ( [0,2,7], [0,3,7], [0,4,7] )
assert X(0,7, 2) == ( [0,2,4,7], [0,3,4,7] )
assert X(0,7, 3) == ()
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment