Commits · 6af054b0ee5631b101b633f4895ac883a30e2f1b · Jérome Perrin / git-backup

13 Jan, 2020 1 commit

restore: Rework extraction pipeline to use xsync.WorkGroup · 6af054b0

Kirill Smelkov authored Jan 13, 2020

The pattern where multiple workers are spawned to work on a common task
and where whole work needs to be canceled on first error is now well
understood, with the functionality to broadcast cancel and propagate
errors being wrapped into libraries such as

	https://godoc.org/golang.org/x/sync/errgroup			and
	https://godoc.org/lab.nexedi.com/kirr/go123/xsync#WorkGroup
	(go123@515a6d14)

Let's streamline the code by using xsync.WorkGroup (it is in our hands,
a bit more well designed (imho), has analog in Pygolang, and can be
changed/enhanced as needed).

The other reason to rework the code is that the workgroup is created
under context (currently always background) and can be canceled by that
context cancel. In the next patch we'll teach all git-backup
subcommands, including restore, to work under context, and by using
xsync.WorkGroup we will automatically handle cancellation from outside,
while without reworking extraction pipeline we would need to
additionally glue ctx cancel to signal to workers to stop.

Compared to previous state both xsync.WorkGroup and errogroup return
only the first error, however it should likely not cause problems in
practice as the first error is usually the most informative one.

6af054b0

05 Jan, 2020 2 commits

gitlab-backup: pull|restore: Cleanup $tmpd in defer-style · 00f58d0b

Kirill Smelkov authored Jan 03, 2020

Similarly to previous patch, let's cleanup gitlab-backup temporary
folder always unconditionally in the presence of errors. Keeping $tmpd
on error was not preventing further gitlab-backup run to proceed, but it
can quickly eat up disk space if there are many such runs. If debugging
is needed one can comment the cleanup, but by default let's be
production friendly out of the box.

Based on patch by @alain.takoudjou:
kirr/git-backup!4

Original description from Alain:

---- 8< ----
When script exit, remove tmp backup folder which are not longuer needed.
Keep this folder when backup is failing will contribute to fill the disk
of server. backup.locked is also removed, because we want to
automatically retry gitlab-backup if previous backup failed, without
human action. If the file is not removed automatically, backup is
blocked until someone remove it.

00f58d0b

pull: Don't leave backup repository locked on error · 2cc61da3

Kirill Smelkov authored Jan 03, 2020

On pull git-backup locks backup repository to make sure another
concurrent `git-backup pull` process is not running. However until now,
if a pull was failing, the lock was left unreleased, which made followup
pull attempts to fail while acquiring the lock until the lock was
manually removed with `git update-ref -d ...`. Probably originally I
made it like this in 6f237f22 (git-backup: Initial draft) to make sure
that if there is a problem it does not go unnoticed and forces me to
investigate. But in general we do _not_ need to keep the lock on error
return after `git-backup pull` completes even abnormally.

This "lock left unreleased" is causing operational issues on
lab.nexedi.com from time to time: if a pull try fails for some, even
temporary, reason, all next pull tries will fail until a human intervene
and remove the lock ref.

Fix it.

See also: kirr/git-backup!4

2cc61da3

29 Aug, 2018 1 commit

Fix build with Go1.11 · 9791c04e

Kirill Smelkov authored Aug 29, 2018

	# lab.nexedi.com/kirr/git-backup
	./git.go:177: Raisef call needs 1 arg but has 2 args

The bug was there from day 1 after rewrite in Go in 28986e0e.

9791c04e

20 Jun, 2018 1 commit
- fixup! *: Minimal fixes so that program documentation renders under godoc properly · 8e1e69b6
  Kirill Smelkov authored Jun 20, 2018
```
Without trailing dot the following sentence was included into TOC.
```
  8e1e69b6
13 Jun, 2018 1 commit
- ~gofmt · f3f694b9
  Kirill Smelkov authored Jun 13, 2018
  
  f3f694b9
12 Jun, 2018 4 commits

pull: TODO on how to avoid O(n^2) on every `git fetch` for references · cc6ac54f
Kirill Smelkov authored Jun 12, 2018

cc6ac54f

pull: Speedup fetching by prebuilding index of objects we already have at start · 3efed898

Kirill Smelkov authored Jun 12, 2018

Like it was already said in 899103bf (pull: Switch from porcelain `git
fetch` to plumbing `git fetch-pack` + friends) currently on
lab.nexedi.com `git-backup pull` became slow and most of the slowness
was tracked down to the fact that `git fetch` for every pulled repository does
linear scan of whole backup repository history just to find out there is
usually nothing to fetch. Quoting 899103bf:

"""
`git fetch`, before fetching data from remote repository, first checks
whether it already locally has all the objects remote advertises. This
boils down to running

echo $remote_tips | git rev-list --quiet --objects --stdin --not --all

and checking whether it succeeds or not:

https://git.kernel.org/pub/scm/git/git.git/commit/?h=4191c35671
https://git.kernel.org/pub/scm/git/git.git/tree/builtin/fetch.c?h=v2.18.0-rc1-1-g6f333ff2fb#n925
https://git.kernel.org/pub/scm/git/git.git/tree/connected.c?h=v2.18.0-rc1-1-g6f333ff2fb#n8

The "--not --all" in the query means that objects should be not
reachable from all locally existing refs and is implemented by linearly
scanning from tip of those existing refs and marking objects reachable
from there as "do not print".

In case of git-backup, where we have mostly master which is super commit
merging from whole histories of all projects and from backup history,
linearly scanning from such a tip goes through lots of commits. Up to
the point where fetching a small, outdated repository, which was already
pulled into backup and did not changed since long, takes more than 30
seconds with almost 100% of that time being spent in quickfetch() only.
"""

The solution is that we can build index of objects we already have ourselves
only once at startup, and then in fetch, after checking lsremote output, consult
that index, and if we see we already have everything for an advertised
reference - just avoid giving it to fetch-pack to process. It turns out for
many pulled repositories there is no references changed at all and this way
fetch-pack can be skipped completely. This leads to dramatical speedup: before
`gitlab-backup pull` was taking ~ 2 hours, and now something under ~ 5 minutes.

The index building itself takes ~ 30 seconds - the time which we were
previously spending to fetch just from 1 unchanged repository. The index size
is small and so it all can be kept in RAM - please see details in the code
comments on this.

I initially wanted to speedup fetching by teaching `git fetch-objects` to
consult backup repo bitmap reachability index (if, for a commit, we can see
that there is an entry in this index -> we know we already have all reachable
objects for this commit and can skip fetching). This won't however work
fully for all our refs - 40% of them are mostly tags, and since in the backup
repository we don't keep tag objects - we keep tags/tree/blobs encoded as
commits - sha1 of those 40% references to tags won't be in bitmap index.

So just do the indexing ourselves.

3efed898

Factor out backup.refs loading code from restore · 1be6aaaa

Kirill Smelkov authored Jun 12, 2018

In the next patch we will need to load backup.refs in the beginning of
pull too. Factored function changed to return regular error instead of
raising exception (which will be the general plan from now on).

1be6aaaa

pull: Switch from porcelain `git fetch` to plumbing `git fetch-pack` + friends · 899103bf

Kirill Smelkov authored Jun 12, 2018

On lab.nexedi.com `git-backup pull` became slow, and most of the slowness
was tracked down to the following:

`git fetch`, before fetching data from remote repository, first checks
whether it already locally has all the objects remote advertises. This
boils down to running

echo $remote_tips | git rev-list --quiet --objects --stdin --not --all

and checking whether it succeeds or not:

The solution will be to optimize checking whether we already have all the
remote objects and to not repeat whole backup-repo scanning for every
pulled repository. This will be done via first querying through `git
ls-remote` what tips remote repository has, then checking on
git-backup specific index which tips we already have and then fetching
only the rest. This way we are essentially moving most of quickfetch
phase of git into git-backup.

Since we'll be tailing to git to fetch only some of the remote refs, we
will either have to amend ourselves the refs `git fetch` creates after
fetching, or to not rely on `git fetch` creating any refs at all. Since
we already have a long standing issue that many many refs that are
coming live after `git fetch` slow down further git fetches

https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L551

the longer term plan will be not to create unneeded references.
Since 2 forks could have references covering the same commits, we would
either have to compare references created after git-fetch and deduplicate
them or manage references creation ourselves.

It is also generally better to split `git fetch` into steps at plumbing
layer, because after doing so, we can have the chance to optimize or
tweak any of the steps at our side with knowing full git-backup context
and indices.

This commit only switches from using `git fetch` to its plumbing
counterpart `git fetch-pack` + friends + manually creating fetched refs
the way `git fetch` used to do exactly. There should be neither
functionality changed nor any speedup.

Further commits will start to take advantage of the switch and optimize
`git-backup pull`.

899103bf

11 Jun, 2018 2 commits

Clarify git Ref* types a bit · 350a01f9

Kirill Smelkov authored Jun 11, 2018

- tell that reference name always goes without "refs/" prefix
- use .name for reference name, not .ref: this way

	ref.name

  is more readable than

	ref.ref

  and so there is less need to use for __ in range loops.

350a01f9

restore: Show details when extracted repo refs were found corrupt · 23e07d70

Kirill Smelkov authored Jun 11, 2018

Noticed this while changing how pull works and making error there
incidentally with leaving more "refs/" prefix. With the error before
this patch tests show:

        git-backup_test.go:91: git-backup_test.go:204: lab.nexedi.com/kirr/git-backup.cmd_restore: 2 errors:
			- E: extracted /tmp/t-git-backup981909377/1/dir 2 + β/repo with+fragile name %αβγ.git refs corrupt:
			- E: extracted /tmp/t-git-backup981909377/1/dir/hello.git refs corrupt:

with the patch tests report:

        git-backup_test.go:91: git-backup_test.go:204: lab.nexedi.com/kirr/git-backup.cmd_restore: 2 errors:
                        - E: extracted /tmp/t-git-backup981909377/1/dir 2 + β/repo with+fragile name %αβγ.git refs corrupt:

                want:
                cbb6d3f205749888f77fb1a88fbac3b8a0b8000f refs/refs/heads/master

                have:
                cbb6d3f205749888f77fb1a88fbac3b8a0b8000f refs/heads/master
                        - E: extracted /tmp/t-git-backup981909377/1/dir/hello.git refs corrupt:

                want:
                647e137fd3b31939b36889eba854a298ef97b6ff refs/refs/heads/branch2
                feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/refs/heads/master
                11e67095628aa17b03436850e690faea3006c25d refs/refs/tags/tag-to-blob
                f735011c9fcece41219729a33f7876cd8791f659 refs/refs/tags/tag-to-commit
                7124713e403925bc772cd252b0dec099f3ced9c5 refs/refs/tags/tag-to-tag
                ba899e5639273a6fa4d50d684af8db1ae070351e refs/refs/tags/tag-to-tree
                7a3343f584218e973165d943d7c0af47a52ca477 refs/refs/test/ref-to-blob
                61882eb85774ed4401681d800bb9c638031375e2 refs/refs/test/ref-to-tree

                have:
                647e137fd3b31939b36889eba854a298ef97b6ff refs/heads/branch2
                feeed96ca75fcf8dcf183008f61dbf72e91ab4de refs/heads/master
                11e67095628aa17b03436850e690faea3006c25d refs/tags/tag-to-blob
                f735011c9fcece41219729a33f7876cd8791f659 refs/tags/tag-to-commit
                7124713e403925bc772cd252b0dec099f3ced9c5 refs/tags/tag-to-tag
                ba899e5639273a6fa4d50d684af8db1ae070351e refs/tags/tag-to-tree
                7a3343f584218e973165d943d7c0af47a52ca477 refs/test/ref-to-blob
                61882eb85774ed4401681d800bb9c638031375e2 refs/test/ref-to-tree

Should be good to have this details if something really breaks after restore.

23e07d70

08 Jun, 2018 2 commits

restore: Use bitmap index from backup repo, if present · 0ab7bbb6

Kirill Smelkov authored Jun 08, 2018

This way, if backup repository was freshly repacked with bitmap index
generation turned on, we can get ~ 30% - 50% speedup for a typical
erp5.git pack extraction.

"--use-bitmap-index" option was added to git in v2.0, but was only
active for to-stdout packs generation. It was enabled for to-file packs
generation in git v2.11.

Since git v2.0 was released in 2014 - 4 years ago - I'm not adding
runtime detection of "--use-bitmap-index" availability.

See https://git.kernel.org/pub/scm/git/git.git/commit/?h=645c432d61 for
details.

0ab7bbb6

*: Handle Git object types as git.ObjectType instead of string · cbfa78d2
Kirill Smelkov authored Jun 08, 2018

cbfa78d2

05 Jun, 2018 1 commit

*: Minimal fixes so that program documentation renders under godoc properly · 7f349cd9

Kirill Smelkov authored Jun 05, 2018

- remove blank line between main description and package clause, so that
  the main description is understood as such;
- move notes describing what a file does after package clause, so that
  those notes do not get mixed into program description under godoc.

7f349cd9

25 Apr, 2018 1 commit

gitlab-backup: don't keep backup_gitlab.pulled files · 0b8d834b

Alain Takoudjou authored Apr 18, 2018

add option to remove or keep pulled backup data

[ kirr: The .pulled files with gitlab backup data (SQL and the like)
  were originally not removed "just in case" in the early days of
  git/gitlab-backup. They are clearly not needed to be kept since their
  content is entered into git backup database by gitlab-backup, and
  leaving those .pulled files just wastes disk space.

  So default to not keep them around and for now add an option to
  forcibly preserve the raw gitlab backup if we'll need it just in case or
  for the debugging.

  However if it turns out we won't really need -keep in practice, it
  might go away in some time. ]

/reviewed-on kirr/git-backup!3

0b8d834b

07 Mar, 2018 1 commit

pull: skip repository disapear during gitlab backup pull · c4d4e857

Alain Takoudjou authored Mar 05, 2018

If a repository is removed when git-backup is running, print a warning
message and continue pulling instead of exiting with error.

/reviewed-on kirr/git-backup!2

c4d4e857

24 Oct, 2017 1 commit

Relicense to GPLv3+ with wide exception for all Free Software / Open Source... · e37d99b4

Kirill Smelkov authored Oct 24, 2017

Relicense to GPLv3+ with wide exception for all Free Software / Open Source projects + Business options.

Nexedi stack is licensed under Free Software licenses with various exceptions
that cover three business cases:

- Free Software
- Proprietary Software
- Rebranding

As long as one intends to develop Free Software based on Nexedi stack, no
license cost is involved. Developing proprietary software based on Nexedi stack
may require a proprietary exception license. Rebranding Nexedi stack is
prohibited unless rebranding license is acquired.

Through this licensing approach, Nexedi expects to encourage Free Software
development without restrictions and at the same time create a framework for
proprietary software to contribute to the long term sustainability of the
Nexedi stack.

Please see https://www.nexedi.com/licensing for details, rationale and options.

e37d99b4

19 Apr, 2017 1 commit

Adjust to recent go123 myname & xruntime changes: · 78887b76

Kirill Smelkov authored Apr 19, 2017

- myname moved -> my
  kirr/go123@98249b24

- Traceback now returns []runtime.Frame
  kirr/go123@7deb28a5

78887b76

13 Dec, 2016 4 commits

countFlag moved to lab.nexedi.com/kirr/go123/xflag · b4dd16c6
Kirill Smelkov authored Dec 13, 2016
```
to

	xflag.Count
```
b4dd16c6

Move some string-related utilities to https://lab.nexedi.com/kirr/go123/xstrings/ · 48b3ab43

Kirill Smelkov authored Dec 13, 2016

	xstrings.SplitLines
	xstrings.Split2
	xstrings.HeadTail

Other string-related routines stay in git-backup for now as I don't
feel they are general enough or interface chosen is really ok.

48b3ab43

Move string <-> []byte zero-copy conversion routines to https://lab.nexedi.com/kirr/go123/mem/ · 0be1f647
Kirill Smelkov authored Dec 13, 2016
```
It is now

	mem.String(),	and
	mem.Bytes()
```
0be1f647

Move error-handling routines & co to lab.nexedi.com/kirr/go123 · 3aedc246

Kirill Smelkov authored Dec 13, 2016

error.go is completely being moved to that shared place for handy Go
utilities into several subpackages:

lab.nexedi.com/kirr/go123/exc -- exception-style error handling for Go
lab.nexedi.com/kirr/go123/myname -- easy way to determine current function's name and package
lab.nexedi.com/kirr/go123/xerr -- addons for error-handling
lab.nexedi.com/kirr/go123/xruntime -- addons to standard package runtime

3aedc246

03 Nov, 2016 1 commit

Don't be fooled by strings.Split(..., "\n") result always having empty "" last element · 3ba6cf73

Kirill Smelkov authored Nov 03, 2016

By definition of strings.Split(..., sep) it "slices s into all substrings
separated by sep and returns a slice of the substrings between those
separators". That means that

    string.Split("hello\nworld\n", "\n") -> ["hello", "world", ""])     # NOTE the last ""

when parsing file by lines, it is handy though to do not get last empty
"" after last "\n". #6 shows how we missed to do that filtering-out for
case of empty backup.refs file and errored-out because of that.

To fix let's introduce a helper - splitlines(), which does the job of
filtering-out last empty entry after last separator. By using this
helper everywhere we can hopefully avoid problems while pulling only
empty repositories (#6 case), and also similar ones.

Fixes #6
/reported-by @iv

3ba6cf73

01 Aug, 2016 3 commits

pull: Don't let a lot of empty directories stay under refs/backup/... work prefix after end of pull · 7535343c

Kirill Smelkov authored Aug 01, 2016

Continuing 62374038 (pull: Turns unused refs are removed not 100% and a
lot of empty directories are accumulated) we just make sure to remove
them in the end of pull.

But NOTE: there could be O(n^2) behaviour still hidden, so it makes
sense to eventually revisit it and cleanup empty dirs earlier.

For now we just care not to degrade future pull performance. The
appropriate time for revisiting could be when reworking pull to do
fetches in parallel.

Updates: https://lab.nexedi.com/lab.nexedi.com/lab.nexedi.com/issues/4

7535343c

restore: Extract packs in multiple workers · ff2f0b67

Kirill Smelkov authored Aug 01, 2016

This way it allows us to leverage multiple CPUs on a system for pack
extractions, which are computation-heavy operations.

The way to do is more-or-less classical:

    - main worker prepares requests for pack extraction jobs

    - there are multiple pack-extraction workers, which read requests
      from jobs queue and perform them

    - at the end we wait for everything to stop, collect errors and
      optionally signalling the whole thing to cancel if we see an error
      coming. (it is only a signal and we still have to wait for
      everything to stop)

The default number of workers is N(CPU) on the system - because we spawn
separate `git pack-objects ...` for every request.

We also now explicitly limit N(CPU) each `git pack-objects ...` can use
to 1. This way control how many resources to use is in git-backup hand
and also git packs better this way (when only using 1 thread) because
when deltifying all objects are considered to each other, not only all
objects inside 1 thread's object poll, and even when pack.threads is not
1, first "objects counting" phase of pack is serial - wasting all but 1
core.

On lab.nexedi.com we already use pack.threads=1 by default in global
gitconfig, but the above change is for code to be universal.

Time to restore nexedi/ from lab.nexedi.com backup:

2CPU laptop:

    before (pack.threads=1)     10m11s
    before (pack.threads=NCPU)   9m13s
    after  -j1                  10m11s
    after                        6m17s

8CPU system (with other load present, noisy) :

    before (pack.threads=1)     ~5m
    after                       ~1m30s

ff2f0b67

raisef: Fix it wrt erraddcallingcontext() · 6c2abbbf

Kirill Smelkov authored Aug 01, 2016

like in 302aaaea (raiseif: Fix it wrt erraddcallingcontext()) now fix
raisef, which I originally overlooked.

6c2abbbf

31 Jul, 2016 3 commits

xcommit_tree: Teach it to create commit without spawning `git commit-tree ...` · 3a7b390c

Kirill Smelkov authored Jul 31, 2016

Because spawning separate process per 1 commit is slow.

Libgit2 does not allow to create commits only knowing tree & parentv
sha1s, but we can create commit objects by hand pretty easily - their format is

    tree <sha1>
    parent <parent1-sha1>
    parent <parent2-sha1>
    ...
    author user <email> date +offset
    committer user <email> date +offset
    LF
    message

Time for pulling-in kirr/slapos.git

before: 2.5s
after:  0.9s

NOTE AuthorInfo is changed to inherit from git.Signature (same fields
    and semantic)

NOTE Since libgit2 default ident can fail, and does not look beyond
    user.name and user.email we do backup identity detection
    (user/hostname) - in similar way Git does - ourselves.

3a7b390c

Move xcommit_tree() & friends to gitobjects.go · cc450765

Kirill Smelkov authored Jul 31, 2016

We are going to rework this function, but before adding changes let's
move it to more appropriate place. Since xcommit_tree() creates commit
object from tree and parents and is pretty standard git function - the
appropriate place is gitobjects.

NOTE we cannot just replace xcommit_tree() with g.CreateCommit() as the
latter works with already loaded tree and parent objects, but we
want to be able to make commits only knowing tree and parents sha1.

cc450765

Verify tag/tree/blob encoding is consistent and always the same · 5aac4734

Kirill Smelkov authored Jul 31, 2016

In upcoming patch we are going to switch xcommit_tree() to our own
implementation, and since this can potentially change how commits are
represented, for backward compatibility reason we need to make sure
objects encoded as commits stay the same.

So for all kind of objects (they are present in testdata/ repositories)
add checks that:

    - encode/decode is idempotent
    - encoding and decoding produces exactly expected sha1

One nice side effect of this is that we can now remove runtime
consistency check from tail of decoding. That check was there from the
beginning - from 6f237f22 (git-backup: Initial draft) mainly present
because there was no testsuite at that time. That check place is however
even not completely right - in case we somehow wrongly pulled an object
it has to be detected at pull time, not restore time. So that check was
checking only 1/2 of implementation - and not the main one - that
decoding does not mess up.

Since now we have proper testsuite and add encode/decode tests in this
patch, we can remove that partial runtime check. And even if decoding
messes something up, despite having it testsuited, it will be 100%
caught by restore process, because for an extracted repository, if
there is no some object which needs to be present in it, pack generation
for that repository will fail. So we can be safe with the removal.

Time for restoring kirr/slapos.git from lab.nexedi.com backup

before: 5.5s
after:  3.5s

( so much because there are ~ 500 tags in slapos.git and currently tag
  encoding is done with spawning separate subprocess per tag )

5aac4734

30 Jul, 2016 1 commit

pull: Add blobs to index in batch · dbf86b19

Kirill Smelkov authored Jul 30, 2016

Do not waste resources adding every file converted to blob with spawning
`git update-index ...` per file - we can queue the info and add all
entries to index in one go.

Time to pull files part for lab.nexedi.com

before: ~110s
after:    ~3s

dbf86b19

29 Jul, 2016 6 commits

obj_recreate_from_commit: Re-create tag without spawning hash-object · c33dc392
Kirill Smelkov authored Jul 30, 2016
```
Time for restoring kirr/slapos.git from lab.nexedi.com backup

before: 7.4s
after:  5.6s
```
c33dc392

Switch xload_tag() too work without spawning Git subprocess · 5b1cdca3

Kirill Smelkov authored Jul 30, 2016

We can reuse ReadObject() like for blob_to_file().

We cannot drop xload_tag() in favor of Repository.LookupTag() because
upon tag loading we need to have not only parsed tag, but also its raw
content for encoding in another object.

Time for restoring kirr/slapos.git from lab.nexedi.com backup

before: 8.9s
after:  7.4s

( it goes down because on restore restored tags are reencoded again to
  verify restoration was ok. Pulling time should go down appropriately
  as well )

5b1cdca3

Switch file_to_blob() and blob_to_file() to work without spawning Git subprocesses · fbd72c02

Kirill Smelkov authored Jul 29, 2016

Substituting `git cat-file` to Odb.Read() and `git hash-object -w` to
Odb.Write().

Timing for restoring only files from lab.nexedi.com backup:

before: ~95s
after:   ~8s

Timings for making backup in file part should have similar effect.

fbd72c02

Drop xload_commit() in favor of git2go's Repository.LookupCommit() · 87283e4b

Kirill Smelkov authored Jul 29, 2016

This saves us one `git cat-file` call per recreated tag.

Time for restoring kirr/slapos.git from lab.nexedi.com backup

before: 10.3s
after:   8.9s

87283e4b

Hook in git2go (cgo bindings to libgit2) · 624393db

Kirill Smelkov authored Jul 29, 2016

Currently for every file -> blob, and blob -> file we invoke git
subprocess (cat-file or hash-object). We also invoke git subprocess for
every tag read/write and the same for commits and this 1-subprocess per
1 object has very high overhead.

The ways to avoid such overhead could be:

1) for every kind of operation spawn git service process, like e.g.
   `git cat-file --batch` for reading files, and only do request/reply
   per object with it.

2) use some go library to work with git repository ourselves.

"1" can work but:

    - at present there is no counterpart of `cat-file --batch` for
      e.g. `hash-object` - i.e. we cannot write objects without quirks
      or patching git.

    - even if we add support for hashing via request/reply, as all
      requests are processed sequentially on git side by e.g. `git
      cat-file --batch`, we won't be able to leverage parallelism.

    - request/reply has also latency attached.

For "2" we have roughly the following choices:

    - use cgo bindings to libgit2   (git2go)

    - use some pure-go git library

Pure-go approach has pros that it by design avoids problems related to
tricky CGo pointer C <-> Go passing rules. The fact that this was sorted
out by go team itself only during 1.6 cycle

    https://github.com/golang/go/issues/12416

tells a lot. The net is full of examples where those were hard to get,
and git2go in particular has a story of e.g. heap corruption (the bug
was on golang itself side and fixed only for 1.5)

    https://github.com/libgit2/git2go/issues/223
    https://groups.google.com/forum/#!topic/golang-nuts/Vi1HD-54BTA/discussion

However there is no good (to my knowledge) pure-go git library, and the
family of forks around github.com/speedata/gogit either:

    - works 3x slower compared to git2go

      ( or the same 3x in serial mode compared to e.g. `git cat-file --batch`
        as in serial mode git subservice and git2go has roughly similar performance )

    - or does not work at all (e.g. barfing out on REF_DELTA pack
      entries, etc)

So because of 3x slowdown, pure-go way is currently a no-runner.

Since one person from golang team cared to update git2go to properly
follow the CGo rules

    https://github.com/libgit2/git2go/pull/282

we can be relatively confident about git2go bindings quality and try to
use it.

This commit only hooks git2go into the build, subcommands and to Sha1
for to/from Oid conversion. We'll be switching places to git2go
incrementally in upcoming patches.

NOTE for now we need git2go from next branch for

    https://github.com/libgit2/git2go/commit/cf7553e7

The plan is to eventually switch to

    gopkg.in/libgit2/git2go.v25

once it is out.

624393db

Rename git() -> ggit() · fdaa4a19

Kirill Smelkov authored Jul 29, 2016

We are going to use git2go (see next patch) for which canonical import
path is git (import "github.com/libgit2/git2go" results in package name
being autotruncated to just "git") so free up the "git" name for that
package.

Reason is: git() - as function - is used not often, while the package
will be used often.

Regarding naming: not sure it is good choice but ggit() is something
like xgit(), only g is for "GitError".

fdaa4a19

27 Jul, 2016 1 commit

NOTES.restore: Clarify heuristic to limit search · ad6c6853

Kirill Smelkov authored Jul 27, 2016

We can do similar to what git does for blobs - searching in a window of
repositories sorted by repo basename.

ad6c6853

25 Jul, 2016 1 commit

error/mypkgname: Fix for a package living under dotted prefix · 36da74e6

Kirill Smelkov authored Jul 25, 2016

In 28986e0e (Rewrite in Go) I've added mypkgname() with comment that go
escapes all '.' in function name with %2e. That turned out to be not
true: Go escapes only dots in last component after last slash, e.g.

    lab.nexedi.com/kirr/git-backup/package%2ename.Function
    lab.nexedi.com/kirr/git-backup/pkg2.qqq/name%2ezzz.Function

Correct mypkgname() accordingly.

Noted while trying to run git-backup in a GOPATH root, not as
standalone.

36da74e6

07 Jul, 2016 1 commit

raiseif: Fix it wrt erraddcallingcontext() · 302aaaea

Kirill Smelkov authored Jul 07, 2016

erraddcallingcontext() already tries not to go beyond raise, but since
raiseif wes calling raise, it was omitting raiseif but not raise itself.
So an error could be like this

    cmd_restore: raiseif: mkdir ../R/1: file exists

while it should be

    cmd_restore: mkdir ../R/1: file exists

Fix it.

302aaaea