contrib/gitlab-backup · 6fa6df4b39cd8408ba134bd3262389100ac6ca6d · Alain Takoudjou / git-backup

gitlab-backup: Dump DB ourselves · 6fa6df4b

Kirill Smelkov authored Feb 08, 2016

The reason to do this is that we want to have more control over DB dump
process. Current problems which lead to this decision are:

1. DB dump is one large file which size grows over time. This is not
friendly to git;

2. DB dump is currently not git/rsync friendly - when PostgreSQL
does a dump, it just copes internal pages for data to output.
And internal ordering changes every time a row is updated.

http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/bin/pg_dump/pg_dump.c;h=aa01d6a6;hb=HEAD#l1590
http://stackoverflow.com/questions/24622579/does-or-can-the-postgresql-copy-to-command-guarantee-a-particular-row-order

both 1 and 2 currently put our backup tool to their knees. We'll be
handling those issues in the following patches.

For now we perform the dump manually and switch from dumping in
plain-text SQL to dumping in PostgreSQL native "directory" format, where
there is small table of contents with schema (toc.dat) and output of
`COPY <table> TO stdout` for each table in separate file.

http://www.postgresql.org/docs/9.5/static/app-pgdump.html

On restore we restore plain-text SQL with pg_restore and give this
plain-text SQL back to gitlab, so it thinks it restores it the usual way.

NOTE: backward compatibility is preserved - restore part, if it sees
backup made by older version of gitlab-backup, which dumps
database.sql in plain text - restores it correctly.

NOTE2: now gitlab-backup supports only PostgreSQL (e.g. not MySQL).
Adding support for other databases is possible, but requires custom
handler for every DB (or just a fallback to usual plaintext maybe).

NOTE3: even as we split DB into separate tables, this does not currently
help problem #1, as in GitLab it is mostly just one table which
occupies the whole space.

/cc @kazuhiko

6fa6df4b

gitlab-backup 7.52 KB

Replace gitlab-backup