• Kirill Smelkov's avatar
    Hook in git2go (cgo bindings to libgit2) · 624393db
    Kirill Smelkov authored
    Currently for every file -> blob, and blob -> file we invoke git
    subprocess (cat-file or hash-object). We also invoke git subprocess for
    every tag read/write and the same for commits and this 1-subprocess per
    1 object has very high overhead.
    
    The ways to avoid such overhead could be:
    
    1) for every kind of operation spawn git service process, like e.g.
       `git cat-file --batch` for reading files, and only do request/reply
       per object with it.
    
    2) use some go library to work with git repository ourselves.
    
    "1" can work but:
    
        - at present there is no counterpart of `cat-file --batch` for
          e.g. `hash-object` - i.e. we cannot write objects without quirks
          or patching git.
    
        - even if we add support for hashing via request/reply, as all
          requests are processed sequentially on git side by e.g. `git
          cat-file --batch`, we won't be able to leverage parallelism.
    
        - request/reply has also latency attached.
    
    For "2" we have roughly the following choices:
    
        - use cgo bindings to libgit2   (git2go)
    
        - use some pure-go git library
    
    Pure-go approach has pros that it by design avoids problems related to
    tricky CGo pointer C <-> Go passing rules. The fact that this was sorted
    out by go team itself only during 1.6 cycle
    
        https://github.com/golang/go/issues/12416
    
    tells a lot. The net is full of examples where those were hard to get,
    and git2go in particular has a story of e.g. heap corruption (the bug
    was on golang itself side and fixed only for 1.5)
    
        https://github.com/libgit2/git2go/issues/223
        https://groups.google.com/forum/#!topic/golang-nuts/Vi1HD-54BTA/discussion
    
    However there is no good (to my knowledge) pure-go git library, and the
    family of forks around github.com/speedata/gogit either:
    
        - works 3x slower compared to git2go
    
          ( or the same 3x in serial mode compared to e.g. `git cat-file --batch`
            as in serial mode git subservice and git2go has roughly similar performance )
    
        - or does not work at all (e.g. barfing out on REF_DELTA pack
          entries, etc)
    
    So because of 3x slowdown, pure-go way is currently a no-runner.
    
    Since one person from golang team cared to update git2go to properly
    follow the CGo rules
    
        https://github.com/libgit2/git2go/pull/282
    
    we can be relatively confident about git2go bindings quality and try to
    use it.
    
    This commit only hooks git2go into the build, subcommands and to Sha1
    for to/from Oid conversion. We'll be switching places to git2go
    incrementally in upcoming patches.
    
    NOTE for now we need git2go from next branch for
    
        https://github.com/libgit2/git2go/commit/cf7553e7
    
    The plan is to eventually switch to
    
        gopkg.in/libgit2/git2go.v25
    
    once it is out.
    624393db
git-backup.go 33.1 KB