• Kirill Smelkov's avatar
    wcfs: tests: xbtree.py package for inspecting/manipulating internal structure of BTrees · 0e829874
    Kirill Smelkov authored
    To handle invalidations, WCFS will need to detect changes to both ZBlk
    objects and to ZBigFile.blktab BTree that is mapping file blocks to ZBlk
    objects. And with BTree detecting changes is much more complex, because
    when a BTree changes, it might be rebalanced, or keys migrated from one
    tree/bucket node to another tree/bucket node. In other words a BTree
    change might be not only a change to a {}key->value dictionary, but also
    a change to BTree topology.
    
    Because there are many BTree topologies that correspond to the same
    {}key->value state, a change from kv₁ to kv₂, even if kv₁ and kv₂ are
    close to each other, might be accompanied by a dramatic change to
    topology of the tree. This creates a need for thoroughly testing the
    BTree difference algorithm because many of BTree topologies changes are
    tricky, and if a simple algorithm works on relatively stable topology
    updates, it does not necessarily mean that that same algorithm will
    continue to work correctly in the general case.
    
    So, as a preparatory step, here comes xbtree.py package, that can be
    used to inspect tree topologies, to create trees with specified topology
    and to manipulate topology of an existing tree. This package will be
    used in tests for upcoming ΔBtail.
    
    For debugging, and also since those tests will involve both Go and
    Python parts, it creates the need to be able to specify and exchange
    topology of a tree via compact string. This package also defines so
    called "topology encoding" to do so.
    
    Some preliminar history:
    
    kirr/wendelin.core@fb56193f    X fix metric to keep Z <- N order stable over key^
    kirr/wendelin.core@809304d1    X "B:" indicates ø bucket with k&b, "B" - ø bucket with only keys
    kirr/wendelin.core@9eca74ec    X Teach AllStructs to emit topologies with values
    kirr/wendelin.core@1b962f03    X Restructure: found bug that it was not marking objects as modified
    kirr/wendelin.core@9181c5d9    X Restructure; verify that it marks as changed only modifed nodes
    kirr/wendelin.core@e9902c4a    X improve `xbtree topoview`
    
    For the reference xbtree.py package documentation is quoted below.
    
    ---- 8< ----
    
    Package xbtree provides utilities for inspecting/manipulating internal
    structure of integer-keyed BTrees.
    
    It will be primarily used to help verify ΔBTail in WCFS.
    
    - `Tree` represents a tree node.
    - `Bucket` represents a bucket node.
    - `StructureOf` returns internal structure of ZODB BTree represented as Tree
      and Bucket nodes.
    - `Restructure` reorganizes ZODB BTree instance according to specified topology
      structure.
    
    - `AllStructs` generates all possible BTree topology structures with given keys.
    
    Topology encoding
    -----------------
    
    Topology encoding provides way to represent structure of a Tree as path-like string.
    
    TopoEncode converts Tree into its topology-encoded representation, while
    TopoDecode decodes topology-encoded string back into Tree.
    
    The following example illustrates topology encoding represented by string
    "T3/T-T/B1-T5/B-B7,8,9":
    
          [ 3 ]             T3/         represents Tree([3])
           / \
         [ ] [ ]            T-T/        represents two empty Tree([])
          ↓   ↓
         |1|[ 5 ]           B1-T5/      represent Bucket([1]) and Tree([5])
             / \
            || |7|8|9|      B-B7,8,9    represents empty Bucket([]) and Bucket([7,8,9])
    
    Topology encoding specification:
    
    A Tree is encoded by level-order traversal, delimiting layers with "/".
    Inside a layer Tree and Bucket nodes are signalled as
    
        "T<keys>"           ; Tree
        "B<keys>"           ; Bucket with only keys
        "B<keys+values>"    ; Bucket with keys and values
    
    Keys are represented as ","-delimited list of integers. For example Tree
    or Bucket with [1,3,5] keys are represented as
    
        "T1,3,5"        ; Tree([1,3,5])
        "B1,3,5"        ; Bucket([1,3,5])
    
    Keys+values are represented as ","-delimited list of "<key>:<value>" pairs. For
    example Bucket corresponding to {1:1, 2:4, 3:9} is represented as
    
        "B1:1,2:4,3:9"  ; Bucket([1,2,3], [1,4,9])
    
    Empty keys+values are represented as ":" - an empty Bucket for key->value
    mapping is represented as
    
        "B:"            ; Bucket([], [])
    
    Nodes inside one layer are delimited with "-". For example a layer consisting
    of an empty Tree, a Tree with [1,3] keys, and Bucket with [4,5] keys is
    represented as
    
        "T-T1,3-B4,5"   ; layer with Tree([]), Tree([1,3]) and Bucket([4,5])
    
    A layer consists of nodes that are followed by node-node links from upper layer
    in left-to-right order.
    
    Visualization
    -------------
    
    The following visualization utilities are provided to help understand BTrees
    better:
    
    - `topoview` displays BTree structure given its topology-encoded representation.
    - `Tree.graphviz` returns Tree graph representation in dot language.
    0e829874
xbtree.py 39.1 KB