• Kirill Smelkov's avatar
    bigfile/zodb/ZBlk1: Don't miss to deactivate/free internal .chunktab buckets in loadblkdata() · 542917d1
    Kirill Smelkov authored
    13c0c17c (bigfile/zodb: Format #1 which is optimized for small changes)
    used BTree to organize ZBlk1 block's chunks and for loadblkdata() added
    "TODO we are missing to free internal BTree structures on data load".
    
    nexedi/wendelin.core#3 besides other
    things showed that even when we deactivate ZData objects, we are still
    keeping them as ghosts occupying memory and the same for IOBucket
    objects.
    
    This all happens because there is no proper way to deactivate whole
    btree - including internal buckets objects. And since internal buckets
    are not deactivated, they stay in picklecache and thus hold a reference
    to ZData objects and ZData objects in turn, even if explicitly
    deactivated, stay in memory.
    
    We can fix this all via implementing whole-btree deactivation procedure.
    
    To do so we need to iterate over all btree buckets recursively, but
    unfortunately there is no BTree API to access/iterate btree's buckets.
    We can however still get reference to first top-level buckets via
    gc.get_referents(btree) and then scan buckets further without hacks.
    
    gc.get_referents(btree) is a hack, but
    
    - it works in O(1)  (we only get pointers from btree, not scanning all
      gcable objects and deducing them)
    - it works reliable if we filter out non-interesting objects.
    
    So in the end it works.
    
    Before the patch loading more and more ZBlk1 data with objgraph
    instrumentation was showing itself like
    
        #                                    Nobj        δ
        wendelin.bigfile.file_zodb.ZData     7168      +512
        BTrees.IOBTree.IOBucket               238       +17
        BTrees.IOBTree.IOBTree                 14        +1
    
    and after this patch we now have
    
        BTrees.IOBTree.IOBTree                 14        +1
    
    we cannot remove that "IOBTree + 1", since ZBlk1 is holding direct
    reference on it (via .chunktab) and we have to keep ZBlk1 live with
    ._v_zfile and ._v_zblk set for invalidation to work. "+1 IOBtree" is
    however small - 144 bytes per 2M (= 0.006%) so we can neglect that the
    same way we neglect keeping ZBlk1 staying live for each block.
    542917d1
test_zodb.py 2.48 KB