restore: Extract packs in multiple workers
This way it allows us to leverage multiple CPUs on a system for pack extractions, which are computation-heavy operations. The way to do is more-or-less classical: - main worker prepares requests for pack extraction jobs - there are multiple pack-extraction workers, which read requests from jobs queue and perform them - at the end we wait for everything to stop, collect errors and optionally signalling the whole thing to cancel if we see an error coming. (it is only a signal and we still have to wait for everything to stop) The default number of workers is N(CPU) on the system - because we spawn separate `git pack-objects ...` for every request. We also now explicitly limit N(CPU) each `git pack-objects ...` can use to 1. This way control how many resources to use is in git-backup hand and also git packs better this way (when only using 1 thread) because when deltifying all objects are considered to each other, not only all objects inside 1 thread's object poll, and even when pack.threads is not 1, first "objects counting" phase of pack is serial - wasting all but 1 core. On lab.nexedi.com we already use pack.threads=1 by default in global gitconfig, but the above change is for code to be universal. Time to restore nexedi/ from lab.nexedi.com backup: 2CPU laptop: before (pack.threads=1) 10m11s before (pack.threads=NCPU) 9m13s after -j1 10m11s after 6m17s 8CPU system (with other load present, noisy) : before (pack.threads=1) ~5m after ~1m30s
Showing
This diff is collapsed.
Please register or sign in to comment