• Kirill Smelkov's avatar
    wcfs: zdata: ΔFtail · f980471f
    Kirill Smelkov authored
    ΔFtail builds on ΔBtail and  provides ZBigFile-level history that WCFS
    will use to compute which blocks of a ZBigFile need to be invalidated in
    OS file cache given raw ZODB changes on ZODB invalidation message.
    
    It also will be used by WCFS to implement isolation protocol, where on
    every FUSE READ request WCFS will query ΔFtail to find out revision of
    corresponding file block.
    
    Quoting ΔFtail documentation:
    
    ---- 8< ----
    
    ΔFtail provides ZBigFile-level history tail.
    
    It translates ZODB object-level changes to information about which blocks of
    which ZBigFile were modified, and provides service to query that information.
    
    ΔFtail class documentation
    ~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    ΔFtail represents tail of revisional changes to files.
    
    It semantically consists of
    
        []δF			; rev ∈ (tail, head]
    
    where δF represents a change in files space
    
        δF:
        	.rev↑
        	{} file ->  {}blk | EPOCH
    
    Only files and blocks explicitly requested to be tracked are guaranteed to
    be present. In particular a block that was not explicitly requested to be
    tracked, even if it was changed in δZ, is not guaranteed to be present in δF.
    
    After file epoch (file creation, deletion, or any other change to file
    object) previous track requests for that file become forgotten and have no
    further effect.
    
    ΔFtail provides the following operations:
    
      .Track(file, blk, path, zblk)	- add file and block reached via BTree path to tracked set.
    
      .Update(δZ) -> δF				- update files δ tail given raw ZODB changes
      .ForgetPast(revCut)			- forget changes ≤ revCut
      .SliceByRev(lo, hi) -> []δF		- query for all files changes with rev ∈ (lo, hi]
      .SliceByFileRev(file, lo, hi) -> []δfile	- query for changes of a file with rev ∈ (lo, hi]
      .BlkRevAt(file, #blk, at) -> blkrev	- query for what is last revision that changed
        					  file[#blk] as of @at database state.
    
    where δfile represents a change to one file
    
        δfile:
        	.rev↑
        	{}blk | EPOCH
    
    See also zodb.ΔTail and xbtree.ΔBtail
    
    Concurrency
    
    ΔFtail is safe to use in single-writer / multiple-readers mode. That is at
    any time there should be either only sole writer, or, potentially several
    simultaneous readers. The table below classifies operations:
    
        Writers:  Update, ForgetPast
        Readers:  Track + all queries (SliceByRev, SliceByFileRev, BlkRevAt)
    
    Note that, in particular, it is correct to run multiple Track and queries
    requests simultaneously.
    
    ΔFtail organization
    ~~~~~~~~~~~~~~~~~~~
    
    ΔFtail leverages:
    
        - ΔBtail to track changes to ZBigFile.blktab BTree, and
        - ΔZtail to track changes to ZBlk objects and to ZBigFile object itself.
    
    then every query merges ΔBtail and ΔZtail data on the fly to provide
    ZBigFile-level result.
    
    Merging on the fly, contrary to computing and maintaining vδF data, is done
    to avoid complexity of recomputing vδF when tracking set changes. Most of
    ΔFtail complexity is, thus, located in ΔBtail, which implements BTree diff
    and handles complexity of recomputing vδB when set of tracked blocks
    changes after new track requests.
    
    Changes to ZBigFile object indicate epochs. Epochs could be:
    
        - file creation or deletion,
        - change of ZBigFile.blksize,
        - change of ZBigFile.blktab to point to another BTree.
    
    Epochs represent major changes to file history where file is assumed to
    change so dramatically, that practically it can be considered to be a
    "whole" change. In particular, WCFS, upon seeing a ZBigFile epoch,
    invalidates all data in corresponding OS-level cache for the file.
    
    The only historical data, that ΔFtail maintains by itself, is history of
    epochs. That history does not need to be recomputed when more blocks become
    tracked and is thus easy to maintain. It also can be maintained only in
    ΔFtail because ΔBtail and ΔZtail does not "know" anything about ZBigFile.
    
    Concurrency
    
    In order to allow multiple Track and queries requests to be served in
    parallel, ΔFtail bases its concurrency promise on ΔBtail guarantees +
    snapshot-style access for vδE and ztrackInBlk in queries:
    
    1. Track calls ΔBtail.Track and quickly updates .byFile, .byRoot and
       _RootTrack indices under a lock.
    
    2. BlkRevAt queries ΔBtail.GetAt and then combines retrieved information
       about zblk with vδE and δZ.
    
    3. SliceByFileRev queries ΔBtail.SliceByRootRev and then merges retrieved
       vδT data with vδZ, vδE and ztrackInBlk.
    
    4. In queries vδE is retrieved/built in snapshot style similarly to how vδT
       is built in ΔBtail. Note that vδE needs to be built only the first time,
       and does not need to be further rebuilt, so the logic in ΔFtail is simpler
       compared to ΔBtail.
    
    5. for ztrackInBlk - that is used by SliceByFileRev query - an atomic
       snapshot is retrieved for objects of interest. This allows to hold
       δFtail.mu lock for relatively brief time without blocking other parallel
       Track/queries requests for long.
    
    Combined this organization allows non-overlapping queries/track-requests
    to run simultaneously. (This property is essential to WCFS because otherwise
    WCFS would not be able to serve several non-overlapping READ requests to one
    file in parallel.)
    
    See also "Concurrency" in ΔBtail organization for more details.
    
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    Some preliminary history:
    
    ef74aebc    X ΔFtail: Keep reference to ZBigFile via Oid, not via *ZBigFile
    bf9a7405    X No longer rely on ZODB cache invariant for invalidations
    46340069    X found by Random
    e7b598c6    X start of ΔFtail.SliceByFileRev rework to function via merging δB and δZ histories on the fly
    59c83009    X ΔFtail.SliceByFileRoot tests started to work draftly after "on-the-fly" rework
    210e9b07    X Fix ΔBtail.SliceByRootRev (lo,hi] handling
    bf3ace66    X ΔFtail: Rebuild vδE after first track
    46624787    X ΔFtail: `go test -failfast -short -v -run Random -randseed=1626793016249041295` discovered problems
    786dd336    X Size no longer tracks [0,∞) since we start tracking when zfile is non-empty
    4f707117    X test that shows problem of SliceByRootRev where untracked blocks are not added uniformly into whole history
    c0b7e4c3    X ΔFtail.SliceByFileRev: Fix untracked entries to be present uniformly in result
    aac37c11    X zdata: Introduce T to start removing duplication in tests
    bf411aa9    X zdata: Deduplicate zfile loading
    b74dda09    X Start switching Track from Track(key) to Track(keycov)
    aa0288ce    X Switch SliceByRootRev to vδTSnapForTracked
    588a512a    X zdata: Switch SliceByFileRev not to clone Zinblk
    8b5d8523    X Move tracking of which blocks were accessed from wcfs to ΔFtail
    30f5ddc7    ΔFtail += .Epoch in δf
    22f5f096    X Rework ΔFtail so that BlkRevAt works with ZBigFile checkout from any at ∈ (tail, head]
    0853cc9f    X ΔFtail + tests
    124688f9    X ΔFtail fixes
    d85bb82c    ΔFtail concurrency
    f980471f
δftail_test.go 22.1 KB