pull: Switch from porcelain `git fetch` to plumbing `git fetch-pack` + friends
On lab.nexedi.com `git-backup pull` became slow, and most of the slowness was tracked down to the following: `git fetch`, before fetching data from remote repository, first checks whether it already locally has all the objects remote advertises. This boils down to running echo $remote_tips | git rev-list --quiet --objects --stdin --not --all and checking whether it succeeds or not: https://git.kernel.org/pub/scm/git/git.git/commit/?h=4191c35671 https://git.kernel.org/pub/scm/git/git.git/tree/builtin/fetch.c?h=v2.18.0-rc1-1-g6f333ff2fb#n925 https://git.kernel.org/pub/scm/git/git.git/tree/connected.c?h=v2.18.0-rc1-1-g6f333ff2fb#n8 The "--not --all" in the query means that objects should be not reachable from all locally existing refs and is implemented by linearly scanning from tip of those existing refs and marking objects reachable from there as "do not print". In case of git-backup, where we have mostly master which is super commit merging from whole histories of all projects and from backup history, linearly scanning from such a tip goes through lots of commits. Up to the point where fetching a small, outdated repository, which was already pulled into backup and did not changed since long, takes more than 30 seconds with almost 100% of that time being spent in quickfetch() only. The solution will be to optimize checking whether we already have all the remote objects and to not repeat whole backup-repo scanning for every pulled repository. This will be done via first querying through `git ls-remote` what tips remote repository has, then checking on git-backup specific index which tips we already have and then fetching only the rest. This way we are essentially moving most of quickfetch phase of git into git-backup. Since we'll be tailing to git to fetch only some of the remote refs, we will either have to amend ourselves the refs `git fetch` creates after fetching, or to not rely on `git fetch` creating any refs at all. Since we already have a long standing issue that many many refs that are coming live after `git fetch` slow down further git fetches https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L551 the longer term plan will be not to create unneeded references. Since 2 forks could have references covering the same commits, we would either have to compare references created after git-fetch and deduplicate them or manage references creation ourselves. It is also generally better to split `git fetch` into steps at plumbing layer, because after doing so, we can have the chance to optimize or tweak any of the steps at our side with knowing full git-backup context and indices. This commit only switches from using `git fetch` to its plumbing counterpart `git fetch-pack` + friends + manually creating fetched refs the way `git fetch` used to do exactly. There should be neither functionality changed nor any speedup. Further commits will start to take advantage of the switch and optimize `git-backup pull`.
Showing
File added
File added
File added