bigfile/zodb/ZBlk1: Don't miss to deactivate/free internal .chunktab buckets in loadblkdata()
13c0c17c (bigfile/zodb: Format #1 which is optimized for small changes) used BTree to organize ZBlk1 block's chunks and for loadblkdata() added "TODO we are missing to free internal BTree structures on data load". nexedi/wendelin.core#3 besides other things showed that even when we deactivate ZData objects, we are still keeping them as ghosts occupying memory and the same for IOBucket objects. This all happens because there is no proper way to deactivate whole btree - including internal buckets objects. And since internal buckets are not deactivated, they stay in picklecache and thus hold a reference to ZData objects and ZData objects in turn, even if explicitly deactivated, stay in memory. We can fix this all via implementing whole-btree deactivation procedure. To do so we need to iterate over all btree buckets recursively, but unfortunately there is no BTree API to access/iterate btree's buckets. We can however still get reference to first top-level buckets via gc.get_referents(btree) and then scan buckets further without hacks. gc.get_referents(btree) is a hack, but - it works in O(1) (we only get pointers from btree, not scanning all gcable objects and deducing them) - it works reliable if we filter out non-interesting objects. So in the end it works. Before the patch loading more and more ZBlk1 data with objgraph instrumentation was showing itself like # Nobj δ wendelin.bigfile.file_zodb.ZData 7168 +512 BTrees.IOBTree.IOBucket 238 +17 BTrees.IOBTree.IOBTree 14 +1 and after this patch we now have BTrees.IOBTree.IOBTree 14 +1 we cannot remove that "IOBTree + 1", since ZBlk1 is holding direct reference on it (via .chunktab) and we have to keep ZBlk1 live with ._v_zfile and ._v_zblk set for invalidation to work. "+1 IOBtree" is however small - 144 bytes per 2M (= 0.006%) so we can neglect that the same way we neglect keeping ZBlk1 staying live for each block.
Showing
lib/tests/test_zodb.py
0 → 100644