• Levin Zimmermann's avatar
    ZBigFile: Add ZBlk format option 'h' (heuristic) · ac2dd458
    Levin Zimmermann authored
    There are two formats to save data with a ZBigFile: ZBlk0 and ZBlk1.
    They differ by adjusting the ratio between access-time and growing
    disk-space, where ZBlk1 is better regarding to disk space, while ZBlk0
    has a better access-time. Wendelin.core users may not always know yet or
    care which format fits better for their data. In this case it may be
    easier for users to just let the program automatically select the ZBlk
    format. With this patch and the new 'h' (for heuristic) option of the
    'ZBlk' argument of ZBigFile, this is now possible. The 'h' option isn't
    really a new ZBlk format in itself, but it just tries to automatically
    select the best ZBlk format option according to the characteristics
    of the changes that the user applies to the ZBigFile.
    
    In its current implementation, the heuristic tackles the use-case of
    large arrays with many small append-only changes. In this case 'h' is
    smaller in space than ZBlk0, but faster to read than ZBlk1. It does so,
    by initally using ZBlk1 until a blk is filled up. Once a blk is full,
    it switches to ZBlk1, as it was recommended by @kirr in
    nexedi/wendelin.core!20 (comment 196084).
    
    With this patch comes a test (bigfile/tests/test-zblk-fmt) that creates
    benchmarks for different combinations and zblk formats. The test aims
    to check how the 'heuristic' format performs in contrast to 'ZBlk0'
    and 'ZBlk1':
    
    ---
    
    Run append tests
    ---------------------------------------------
    ---------------------------------------------
    Set change_percentage_set to 0.15
    Set change_count to 500
    Set arrsize to 500000
    Set change_type to append
    
    Run tests with format h:
    
    	ZODB storage size: 318.565101 MB
    	Access time: 0.747 ms / blk  (initially cold; might get warmer during benchmark)
    
    Run tests with format ZBlk0:
    
    	ZODB storage size: 704.347196 MB
    	Access time: 0.737 ms / blk  (initially cold; might get warmer during benchmark)
    
    Run tests with format ZBlk1:
    
    	ZODB storage size: 163.367072 MB
    	Access time: 74.628 ms / blk  (initially cold; might get warmer during benchmark)
    ac2dd458
test_filezodb.py 20.2 KB