tox.ini · 13c0c17c287dfd0f7f422918f8108ee45c2675ec · Klaus Wölfel / wendelin.core

bigfile/zodb: Format which is optimized for small changes · 13c0c17c
Kirill Smelkov authored Sep 24, 2015
Our current approach is that each file block is represented by 1 zodb
object, with block size being 2M. Even with trailing \0 trimming, which
halves the overhead on average, DB size grows very fast if we do a lot
of small appends or changes. So another format needs to be introduced
which has lower overhead for storing small changes:

In general, to represent BigFile as ZODB objects, each file block could
be represented separately either as

    1) one ZODB object, or          (ZBlk0 - this what we have already)
    2) group of ZODB objects        (ZBlk1 - this is what we introduce)

with top-level BTree directory #blk -> objects representing block.

For "1" we have

    - low-overhead access time (only 1 object loaded from DB), but
    - high-overhead in terms of ZODB size (with FileStorage / ZEO, every change
      to a block causes it to be written into DB in full again)

For "2" we have

    - low-overhead in terms of ZODB size (only part of a block is overwritten
      in DB on single change), but
    - high-overhead in terms of access time
      (several objects need to be loaded for 1 block)

In general it is not possible to have low-overhead for both i) access-time, and
ii) DB size, with approach where we do block objects representation /
management on *client* side.

On the other hand, if object management is moved to DB *server* side, it is
possible to deduplicate them there and this way have low-overhead for both
access-time and DB size with just client storing 1 object per file block. This
will be our future approach after we teach NEO about object deduplication.

~~~~

As shown above in the last paragraph it is not possible to perform
optimally on client side. Thus ZBlk1 should be only an intermediate
solution until we move data management to DB server side, with main
criteria for ZBlk1 to keep it simple.

In this patch a simple scheme is used, where every block is divided into
chunks organized via BTree. When a block part changes, only corresponding
chunk is updated. Chunk size is chosen to be 4K which creates ~ 512
fanout for 2M block.

DB size after tests is changed as follows:

        bigfile     bigarray

ZBlk0     24K       6200K
ZBlk1     36K         36K

( slight size increase for bigfile tests is because of btree structures
  overhead )

Time to run tests stays approximately the same.

/cc @Tyagov, @klaus
13c0c17c
tox.ini 1.1 KB
Replace tox.ini