Commit f1924d2b authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Merge branch 'docs/geo-split-omnibus-source' into 'master'

split Geo Omnibus/source info

Closes #1908 and #1917

See merge request !1431
parents 080311e7 65349269
# GitLab Geo
> **Note:**
This feature was introduced in GitLab 8.5 EE as Alpha.
We recommend you use with at least GitLab 8.6 EE.
> **Notes:**
- GitLab Geo is part of [GitLab Enterprise Edition Premium][ee].
- Introduced in GitLab Enterprise Edition 8.9.
We recommend you use it with at least GitLab Enterprise Edition 8.14.
- You should make sure that all nodes run the same GitLab version.
GitLab Geo allows you to replicate your GitLab instance to other geographical
locations as a read-only fully operational version.
- [Overview](#overview)
- [Setup instructions](#setup-instructions)
- [Database Replication](database.md)
- [Configuration](configuration.md)
- [Current limitations](#current-limitations)
- [Disaster Recovery](disaster-recovery.md)
- [Frequently Asked Questions](#frequently-asked-questions)
- [Can I use Geo in a disaster recovery situation?](#can-i-use-geo-in-a-disaster-recovery-situation)
- [What data is replicated to a secondary node?](#what-data-is-replicated-to-a-secondary-node)
- [Can I git push to a secondary node?](#can-i-git-push-to-a-secondary-node)
- [How long does it take to have a commit replicated to a secondary node?](#how-long-does-it-take-to-have-a-commit-replicated-to-a-secondary-node)
- [What happens if the SSH server runs at a different port?](#what-happens-if-the-ssh-server-runs-at-a-different-port)
## Overview
If you have two or more teams geographically spread out, but your GitLab
......@@ -44,36 +33,39 @@ Keep in mind that:
## Setup instructions
In order to set up one or more GitLab Geo instances, follow the steps below in
this **exact order**:
the **exact order** they appear. **Make sure the GitLab version is the same on
all nodes.**
### Using Omnibus GitLab
If you installed GitLab using the Omnibus packages (highly recommended):
1. Follow the first 3 steps to [install GitLab Enterprise Edition][install-ee]
on the server that will serve as the secondary Geo node. Do not login or
set up anything else in the secondary node for the moment.
1. [Setup the database replication](database.md) (`primary <-> secondary (read-only)` topology)
1. [Install GitLab Enterprise Edition][install-ee] on the server that will serve
as the **secondary** Geo node. Do not login or set up anything else in the
secondary node for the moment.
1. [Setup the database replication](database.md) (`primary (read-write) <-> secondary (read-only)` topology).
1. [Configure GitLab](configuration.md) to set the primary and secondary nodes.
1. [Follow the after setup steps](after_setup.md).
## After setup
[install-ee]: https://about.gitlab.com/downloads-ee/ "GitLab Enterprise Edition Omnibus packages downloads page"
After you set up the database replication and configure the GitLab Geo nodes,
there are a few things to consider:
### Using GitLab installed from source
1. When you create a new project in the primary node, the Git repository will
appear in the secondary only _after_ the first `git push`.
1. You need an extra step to be able to fetch code from the `secondary` and push
to `primary`:
If you installed GitLab from source:
1. Clone your repository as you would normally do from the `secondary` node
1. Change the remote push URL following this example:
1. [Install GitLab Enterprise Edition][install-ee-source] on the server that
will serve as the **secondary** Geo node. Do not login or set up anything
else in the secondary node for the moment.
1. [Setup the database replication](database_source.md) (`primary (read-write) <-> secondary (read-only)` topology).
1. [Configure GitLab](configuration_source.md) to set the primary and secondary
nodes.
1. [Follow the after setup steps](after_setup.md).
```bash
git remote set-url --push origin git@primary.gitlab.example.com:user/repo.git
```
[install-ee-source]: https://docs.gitlab.com/ee/install/installation.html "GitLab Enterprise Edition installation from source"
>**Important**:
The initialization of a new Geo secondary node on versions older than 8.14
requires data to be copied from the primary, as there is no backfill
feature bundled with those versions.
See more details in the [Configure GitLab](configuration.md) step.
## Updating the Geo nodes
Read how to [update your Geo nodes to the latest GitLab version](updating_the_geo_nodes.md).
## Current limitations
......@@ -83,41 +75,12 @@ See more details in the [Configure GitLab](configuration.md) step.
## Frequently Asked Questions
### Can I use Geo in a disaster recovery situation?
There are limitations to what we replicate (see
[What data is replicated to a secondary node?](#what-data-is-replicated-to-a-secondary-node)).
In an extreme data-loss situation you can make a secondary Geo into your
primary, but this is not officially supported yet.
If you still want to proceed, see our step-by-step instructions on how to
manually [promote a secondary node](disaster-recovery.md) into primary.
### What data is replicated to a secondary node?
We currently replicate project repositories and the whole database. This
means user accounts, issues, merge requests, groups, project data, etc.,
will be available for query.
We currently don't replicate user generated attachments / avatars or any
other file in `public/upload`. We also don't replicate LFS or
artifacts data (`shared/folder`).
### Can I git push to a secondary node?
No. All writing operations (this includes `git push`) must be done in your
primary node.
### How long does it take to have a commit replicated to a secondary node?
All replication operations are asynchronous and are queued to be dispatched in
a batched request every 10 seconds. Besides that, it depends on a lot of other
factors including the amount of traffic, how big your commit is, the
connectivity between your nodes, your hardware, etc.
Read more in the [Geo FAQ](faq.md).
### What happens if the SSH server runs at a different port?
## Troubleshooting
We send the clone url from the primary server to any secondaries, so it
doesn't matter. If primary is running on port `2200` clone url will reflect
that.
Read the [troubleshooting document](troubleshooting.md).
[install-ee]: https://about.gitlab.com/downloads-ee/
[ee]: https://about.gitlab.com/gitlab-ee/ "GitLab Enterprise Edition landing page"
[install-ee]: https://about.gitlab.com/downloads-ee/ "GitLab Enterprise Edition Omnibus packages downloads page"
[install-ee-source]: https://docs.gitlab.com/ee/install/installation.html "GitLab Enterprise Edition installation from source"
# After setup
After you set up the [database replication and configure the GitLab Geo nodes][req],
there are a few things to consider:
1. When you create a new project in the primary node, the Git repository will
appear in the secondary only _after_ the first `git push`.
1. You need an extra step to be able to fetch code from the `secondary` and push
to `primary`:
1. Clone your repository as you would normally do from the `secondary` node
1. Change the remote push URL following this example:
```bash
git remote set-url --push origin git@primary.gitlab.example.com:user/repo.git
```
[req]: README.md#setup-instructions
# GitLab Geo configuration
This is the final step you need to follow in order to setup a Geo node.
>**Note:**
This is the documentation for the Omnibus GitLab packages. For installations
from source, follow the [**GitLab Geo nodes configuration for installations
from source**](configuration_source.md) guide.
---
1. [Install GitLab Enterprise Edition][install-ee] on the server that will serve
as the secondary Geo node. Do not login or set up anything else in the
secondary node for the moment.
1. [Setup the database replication](database.md) (`primary (read-write) <-> secondary (read-only)` topology).
1. **Configure GitLab to set the primary and secondary nodes.**
1. [Follow the after setup steps](after_setup.md).
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents**
- [Setting up GitLab](#setting-up-gitlab)
- [Prerequisites](#prerequisites)
- [Step 1. Adding the primary GitLab node](#step-1-adding-the-primary-gitlab-node)
- [Step 2. Updating the `known_hosts` file of the secondary nodes](#step-2-updating-the-known_hosts-file-of-the-secondary-nodes)
- [Step 3. Copying the database encryption key](#step-3-copying-the-database-encryption-key)
- [Step 4. Enabling the secondary GitLab node](#step-4-enabling-the-secondary-gitlab-node)
- [Step 5. Replicating the repositories data](#step-5-replicating-the-repositories-data)
- [Step 6. Regenerating the authorized keys in the secondary node](#step-6-regenerating-the-authorized-keys-in-the-secondary-node)
- [Next steps](#next-steps)
- [Adding another secondary Geo node](#adding-another-secondary-geo-node)
- [Additional information for the SSH key pairs](#additional-information-for-the-ssh-key-pairs)
- [Troubleshooting](#troubleshooting)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
[install-ee]: https://about.gitlab.com/downloads-ee/ "GitLab Enterprise Edition Omnibus packages downloads page"
This is the final step you need to follow in order to setup a Geo node.
You are encouraged to first read through all the steps before executing them
in your testing/production environment.
## Setting up GitLab
>**Notes:**
- Don't setup any custom authentication in the secondary nodes, this will be
- **Do not** setup any custom authentication in the secondary nodes, this will be
handled by the primary node.
- Do not add anything in the secondaries Geo nodes admin area
(**Admin Area ➔ Geo Nodes**). This is handled solely by the primary node.
---
- **Do not** add anything in the secondaries Geo nodes admin area
(**Admin Area ➔ Geo Nodes**). This is handled solely by the primary node.
After having installed GitLab Enterprise Edition in the instance that will serve
as a Geo node and set up the [database replication](database.md), the next steps can be summed
up to:
as a Geo node and set up the [database replication](database.md), the next steps
can be summed up to:
1. Configure the primary node
1. Replicate some required configurations between the primary and the secondaries
......@@ -65,25 +59,15 @@ logins opened on all nodes as we will be moving back and forth.
sudo -i
```
1. (Source install only): Create a new SSH key pair for the primary node. Choose the default location
and leave the password blank by hitting 'Enter' three times:
```bash
sudo -u git -H ssh-keygen -b 4096 -C 'Primary GitLab Geo node'
```
Read more in [additional info for SSH key pairs](#additional-information-for-the-ssh-key-pairs).
1. Get the contents of `id_rsa.pub` for the git user:
```
# Omnibus GitLab installations
sudo -u git cat /var/opt/gitlab/.ssh/id_rsa.pub
# Installations from source
sudo -u git cat /home/git/.ssh/id_rsa.pub
```
Read more in [additional info for SSH key pairs](#additional-information-for-the-ssh-key-pairs).
1. Visit the primary node's **Admin Area ➔ Geo Nodes** (`/admin/geo_nodes`) in
your browser.
......@@ -118,9 +102,6 @@ logins opened on all nodes as we will be moving back and forth.
```
# Omnibus GitLab installations
cat /var/opt/gitlab/.ssh/known_hosts
# Installations from source
cat /home/git/.ssh/known_hosts
```
### Step 3. Copying the database encryption key
......@@ -140,9 +121,6 @@ sensitive data in the database. Any secondary node must have the
```
# Omnibus GitLab installations
cat /etc/gitlab/gitlab-secrets.json | grep db_key_base
# Installations from source
cat /home/git/gitlab/config/secrets.yml | grep db_key_base
```
1. SSH into the **secondary** node and login as root:
......@@ -157,9 +135,6 @@ sensitive data in the database. Any secondary node must have the
```
# Omnibus GitLab installations
editor /etc/gitlab/gitlab-secrets.json
# Installations from source
editor /home/git/gitlab/config/secrets.yml
```
1. Save and close the file.
......@@ -186,9 +161,6 @@ sensitive data in the database. Any secondary node must have the
```
# Omnibus installations
sudo -u git cat /var/opt/gitlab/.ssh/id_rsa.pub
# Installations from source
sudo -u git cat /home/git/.ssh/id_rsa.pub
```
1. Visit the **primary** node's **Admin Area ➔ Geo Nodes** (`/admin/geo_nodes`)
......@@ -239,10 +211,6 @@ connection between the servers.
# For Omnibus installations
rsync -guavrP root@1.2.3.4:/var/opt/gitlab/git-data/repositories/ /var/opt/gitlab/git-data/repositories/
gitlab-ctl reconfigure # to fix directory permissions
# For installations from source
rsync -guavrP root@1.2.3.4:/home/git/repositories/ /home/git/repositories/
chmod ug+rwX,o-rwx /home/git/repositories
```
If this step is not followed, the secondary node will eventually clone and
......@@ -264,15 +232,12 @@ run:
```
# For Omnibus installations
gitlab-rake gitlab:shell:setup
# For source installations
sudo -u git -H bundle exec rake gitlab:shell:setup RAILS_ENV=production
```
This will enable `git` operations to authorize against your existing users.
New users and SSH keys updated after this step, will be replicated automatically.
### Next steps
## Next steps
Your nodes should now be ready to use. You can login to the secondary node
with the same credentials as used in the primary. Visit the secondary node's
......@@ -282,6 +247,8 @@ correctly identified as a secondary Geo node and if Geo is enabled.
If your installation isn't working properly, check the
[troubleshooting](#troubleshooting) section.
Point your users to the [after setup steps](after_setup.md).
## Adding another secondary Geo node
To add another Geo node in an already Geo configured infrastructure, just follow
......@@ -316,62 +283,4 @@ Host example.com # The FQDN of the primary Geo node
## Troubleshooting
Setting up Geo requires careful attention to details and sometimes it's easy to
miss a step. Here is a checklist of questions you should ask to try to detect
where you have to fix (all commands and path locations are for Omnibus installs):
- Is Postgres replication working?
- Are my nodes pointing to the correct database instance?
- You should make sure your primary Geo node points to the instance with
writing permissions.
- Any secondary nodes should point only to read-only instances.
- Can Geo detect my current node correctly?
- Geo uses your defined node from `Admin ➔ Geo` screen, and tries to match
with the value defined in `/etc/gitlab/gitlab.rb` configuration file.
The relevant line looks like: `external_url "http://gitlab.example.com"`.
- To check if node on current machine is correctly detected type:
```
sudo gitlab-rails runner "puts Gitlab::Geo.current_node.inspect"
```
and expect something like:
```
#<GeoNode id: 2, schema: "https", host: "gitlab.example.com", port: 443, relative_url_root: "", primary: false, ...>
```
- By running the command above, `primary` should be `true` when executed in
the primary node, and `false` on any secondary
- Did I define the correct SSH Key for the node?
- You must create an SSH Key for `git` user
- This key is the one you have to inform at `Admin > Geo`
- Can I SSH from secondary to primary node using `git` user account?
- This is the most obvious cause of problems with repository replication issues.
If you haven't added the primary node's key to `known_hosts`, you will end up with
a lot of failed sidekiq jobs with an error similar to:
```
Gitlab::Shell::Error: Host key verification failed. fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
```
An easy way to fix is by logging in as the `git` user in the secondary node and run:
```
# remove old entries to your primary gitlab in known_hosts
ssh-keyscan -R your-primary-gitlab.example.com
# add a new entry in known_hosts
ssh-keyscan -t rsa your-primary-gitlab.example.com >> ~/.ssh/known_hosts
```
- Can primary node communicate with secondary node by HTTP/HTTPS ports?
- Can secondary nodes communicate with primary node by HTTP/HTTPS/SSH ports?
- Can secondary nodes execute a successful git clone using git user's own
SSH Key to primary node repository?
>**Note:**
This list is an attempt to document all the moving parts that can go wrong.
We are working into getting all this steps verified automatically in a
rake task in the future.
[ssh-pair]: #create-ssh-key-pairs-for-new-geo-nodes
See the [troubleshooting document](troubleshooting.md).
# GitLab Geo configuration
>**Note:**
This is the documentation for installations from source. For installations
using the Omnibus GitLab packages, follow the
[**Omnibus GitLab Geo nodes configuration**](configuration.md) guide.
1. [Install GitLab Enterprise Edition][install-ee-source] on the server that
will serve as the secondary Geo node. Do not login or set up anything else
in the secondary node for the moment.
1. [Setup the database replication](database_source.md) (`primary (read-write) <-> secondary (read-only)` topology).
1. **Configure GitLab to set the primary and secondary nodes.**
1. [Follow the after setup steps](after_setup.md).
[install-ee-source]: https://docs.gitlab.com/ee/install/installation.html "GitLab Enterprise Edition installation from source"
This is the final step you need to follow in order to setup a Geo node.
You are encouraged to first read through all the steps before executing them
in your testing/production environment.
## Setting up GitLab
>**Notes:**
- Don't setup any custom authentication in the secondary nodes, this will be
handled by the primary node.
- Do not add anything in the secondaries Geo nodes admin area
(**Admin Area ➔ Geo Nodes**). This is handled solely by the primary node.
After having installed GitLab Enterprise Edition in the instance that will serve
as a Geo node and set up the [database replication](database_source.md), the
next steps can be summed up to:
1. Configure the primary node
1. Replicate some required configurations between the primary and the secondaries
1. Start GitLab in the secondary node's machine
1. Configure every secondary node in the primary's Admin screen
### Prerequisites
This is the last step of configuring a Geo node. Make sure you have followed the
first two steps of the [Setup instructions](README.md#setup-instructions):
1. You have already installed on the secondary server the same version of
GitLab Enterprise Edition that is present on the primary server.
1. You have set up the database replication.
1. Your secondary node is allowed to communicate via HTTP/HTTPS and SSH with
your primary node (make sure your firewall is not blocking that).
Some of the following steps require to configure the primary and secondary
nodes almost at the same time. For your convenience make sure you have SSH
logins opened on all nodes as we will be moving back and forth.
### Step 1. Adding the primary GitLab node
1. SSH into the **primary** node and login as root:
```
sudo -i
```
1. (Source install only): Create a new SSH key pair for the primary node. Choose the default location
and leave the password blank by hitting 'Enter' three times:
```bash
sudo -u git -H ssh-keygen -b 4096 -C 'Primary GitLab Geo node'
```
Read more in [additional info for SSH key pairs](#additional-information-for-the-ssh-key-pairs).
1. Get the contents of `id_rsa.pub` for the git user:
```
# Installations from source
sudo -u git cat /home/git/.ssh/id_rsa.pub
```
1. Visit the primary node's **Admin Area ➔ Geo Nodes** (`/admin/geo_nodes`) in
your browser.
1. Add the primary node by providing its full URL and the public SSH key
you created previously. Make sure to check the box 'This is a primary node'
when adding it.
![Add new primary Geo node](img/geo_nodes_add_new.png)
1. Click the **Add node** button.
### Step 2. Updating the `known_hosts` file of the secondary nodes
1. SSH into the **secondary** node and login as root:
```
sudo -i
```
1. The secondary nodes need to know the SSH fingerprint of the primary node that
will be used for the Git clone/fetch operations. In order to add it to the
`known_hosts` file, run the following command and type `yes` when asked:
```
sudo -u git -H ssh git@<primary-node-url>
```
Replace `<primary-node-url>` with the FQDN of the primary node.
1. Verify that the fingerprint was added by checking `known_hosts`:
```
# Installations from source
cat /home/git/.ssh/known_hosts
```
### Step 3. Copying the database encryption key
GitLab stores a unique encryption key in disk that we use to safely store
sensitive data in the database. Any secondary node must have the
**exact same value** for `db_key_base` as defined in the primary one.
1. SSH into the **primary** node and login as root:
```
sudo -i
```
1. Find the value of `db_key_base` and copy it:
```
# Installations from source
cat /home/git/gitlab/config/secrets.yml | grep db_key_base
```
1. SSH into the **secondary** node and login as root:
```
sudo -i
```
1. Open the secrets file and paste the value of `db_key_base` you copied in the
previous step:
```
# Installations from source
editor /home/git/gitlab/config/secrets.yml
```
1. Save and close the file.
### Step 4. Enabling the secondary GitLab node
1. SSH into the **secondary** node and login as root:
```
sudo -i
```
1. Create a new SSH key pair for the secondary node. Choose the default location
and leave the password blank by hitting 'Enter' three times:
```bash
sudo -u git -H ssh-keygen -b 4096 -C 'Secondary GitLab Geo node'
```
Read more in [additional info for SSH key pairs](#additional-information-for-the-ssh-key-pairs).
1. Get the contents of `id_rsa.pub` the was just created:
```
# Installations from source
sudo -u git cat /home/git/.ssh/id_rsa.pub
```
1. Visit the **primary** node's **Admin Area ➔ Geo Nodes** (`/admin/geo_nodes`)
in your browser.
1. Add the secondary node by providing its full URL and the public SSH key
you created previously. **Do NOT** check the box 'This is a primary node'.
1. Click the **Add node** button.
---
After the **Add Node** button is pressed, the primary node will start to notify
changes to the secondary. Make sure the secondary instance is running and
accessible.
The two most obvious issues that replication can have here are:
1. Database replication not working well
1. Instance to instance notification not working. In that case, it can be
something of the following:
- You are using a custom certificate or custom CA (see the
[Troubleshooting](configuration.md#troubleshooting) section)
- Instance is firewalled (check your firewall rules)
### Step 5. Replicating the repositories data
Getting a new secondary Geo node up and running, will also require the
repositories directory to be synced from the primary node.
With GitLab **8.14** you can start the syncing process by clicking the
"Backfill all repositories" button on `Admin > Geo Nodes` screen.
On previous versions, you can use `rsync` for that:
Make sure `rsync` is installed in both primary and secondary servers and root
SSH access with a password is enabled. Otherwise, you can set up an SSH key-based
connection between the servers.
1. SSH into the **secondary** node and login as root:
```
sudo -i
```
1. Assuming `1.2.3.4` is the IP of the primary node, run the following command
to start the sync:
```bash
# Installations from source
rsync -guavrP root@1.2.3.4:/home/git/repositories/ /home/git/repositories/
chmod ug+rwX,o-rwx /home/git/repositories
```
If this step is not followed, the secondary node will eventually clone and
fetch every missing repository as they are updated with new commits on the
primary node, so syncing the repositories beforehand will buy you some time.
While active repositories will be eventually replicated, if you don't rsync,
the files, any archived/inactive repositories will not get in the secondary node
as Geo doesn't run any routine task to look for missing repositories.
### Step 6. Regenerating the authorized keys in the secondary node
The final step is to regenerate the keys for `~/.ssh/authorized_keys`
(HTTPS clone will still work without this extra step).
On the **secondary** node where the database is [already replicated](./database.md),
run:
```
# Installations from source
sudo -u git -H bundle exec rake gitlab:shell:setup RAILS_ENV=production
```
This will enable `git` operations to authorize against your existing users.
New users and SSH keys updated after this step, will be replicated automatically.
## Next steps
Your nodes should now be ready to use. You can login to the secondary node
with the same credentials as used in the primary. Visit the secondary node's
**Admin Area ➔ Geo Nodes** (`/admin/geo_nodes`) in your browser to check if it's
correctly identified as a secondary Geo node and if Geo is enabled.
If your installation isn't working properly, check the
[troubleshooting](configuration.md#troubleshooting) section.
Point your users to the [after setup steps](after_setup.md).
## Adding another secondary Geo node
To add another Geo node in an already Geo configured infrastructure, just follow
[the steps starting form step 2](#step-2-updating-the-known_hosts-file-of-the-secondary-nodes).
Just omit the first step that sets up the primary node.
## Additional information for the SSH key pairs
Read [Additional information for the SSH key pairs](configuration.md#additional-information-for-the-ssh-key-pairs).
## Troubleshooting
Read the [troubleshooting document](troubleshooting.md).
# GitLab Geo database replication
>**Note:**
This is the documentation for the Omnibus GitLab packages. For installations
from source, follow the
[**database replication for installations from source**](database_source.md) guide.
1. [Install GitLab Enterprise Edition][install-ee] on the server that will serve
as the secondary Geo node. Do not login or set up anything else in the
secondary node for the moment.
1. **Setup the database replication (`primary (read-write) <-> secondary (read-only)` topology).**
1. [Configure GitLab](configuration.md) to set the primary and secondary nodes.
1. [Follow the after setup steps](after_setup.md).
[install-ee]: https://about.gitlab.com/downloads-ee/ "GitLab Enterprise Edition Omnibus packages downloads page"
This document describes the minimal steps you have to take in order to
replicate your GitLab database into another server. You may have to change
some values according to your database setup, how big it is, etc.
<!-- START doctoc generated TOC please keep comment here to allow auto update -->
<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
**Table of Contents**
- [PostgreSQL replication](#postgresql-replication)
- [Prerequisites](#prerequisites)
- [Step 1. Configure the primary server](#step-1-configure-the-primary-server)
- [Step 2. Configure the secondary server](#step-2-configure-the-secondary-server)
- [Step 3. Initiate the replication process](#step-3-initiate-the-replication-process)
- [Next steps](#next-steps)
- [MySQL replication](#mysql-replication)
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
You are encouraged to first read through all the steps before executing them
in your testing/production environment.
## PostgreSQL replication
......@@ -44,8 +47,6 @@ The following guide assumes that:
### Step 1. Configure the primary server
**For Omnibus installations**
1. SSH into your GitLab **primary** server and login as root:
```
......@@ -118,70 +119,8 @@ The following guide assumes that:
public IP.
1. Continue to [set up the secondary server](#step-2-configure-the-secondary-server).
---
**For installations from source**
1. SSH into your database **primary** server and login as root:
```
sudo -i
```
1. Create a replication user named `gitlab_replicator`:
```bash
sudo -u postgres psql -c "CREATE USER gitlab_replicator REPLICATION ENCRYPTED PASSWORD 'thepassword';"
```
1. Edit `postgresql.conf` to configure the primary server for streaming replication
(for Debian/Ubuntu that would be `/etc/postgresql/9.x/main/postgresql.conf`):
```bash
listen_address = '1.2.3.4'
wal_level = hot_standby
max_wal_senders = 5
checkpoint_segments = 10
wal_keep_segments = 10
hot_standby = on
```
See the Omnibus notes above for more details of `listen_address`.
Edit the `wal` values as you see fit.
1. Set the access control on the primary to allow TCP connections using the
server's public IP and set the connection from the secondary to require a
password. Edit `pg_hba.conf` (for Debian/Ubuntu that would be
`/etc/postgresql/9.x/main/pg_hba.conf`):
```bash
host all all 127.0.0.1/32 trust
host all all 1.2.3.4/32 trust
host replication gitlab_replicator 5.6.7.8/32 md5
```
Where `1.2.3.4` is the public IP address of the primary server, and `5.6.7.8`
the public IP address of the secondary one. If you want to add another
secondary, add one more row like the replication one and change the IP
address:
```bash
host all all 127.0.0.1/32 trust
host all all 1.2.3.4/32 trust
host replication gitlab_replicator 5.6.7.8/32 md5
host replication gitlab_replicator 11.22.33.44/32 md5
```
1. Restart PostgreSQL for the changes to take effect.
1. Now that the PostgreSQL server is set up to accept remote connections, run
`netstat -plnt` to make sure that PostgreSQL is listening to the server's
public IP.
### Step 2. Configure the secondary server
**For Omnibus installations**
1. SSH into your GitLab **secondary** server and login as root:
```
......@@ -217,46 +156,6 @@ The following guide assumes that:
1. [Reconfigure GitLab][] for the changes to take effect.
1. Continue to [initiate the replication process](#step-3-initiate-the-replication-process).
---
**For installations from source**
1. SSH into your database **secondary** server and login as root:
```
sudo -i
```
1. Test that the remote connection to the primary server works:
```
sudo -u postgres psql -h 1.2.3.4 -U gitlab_replicator -d gitlabhq_production -W
```
When prompted enter the password you set in the first step for the
`gitlab_replicator` user. If all worked correctly, you should see the
database prompt.
1. Exit the PostgreSQL console:
```
\q
```
1. Edit `postgresql.conf` to configure the secondary for streaming replication
(for Debian/Ubuntu that would be `/etc/postgresql/9.x/main/postgresql.conf`):
```bash
wal_level = hot_standby
max_wal_senders = 5
checkpoint_segments = 10
wal_keep_segments = 10
hot_standby = on
```
1. Restart PostgreSQL for the changes to take effect.
1. Continue to [initiate the replication process](#step-3-initiate-the-replication-process).
### Step 3. Initiate the replication process
Below we provide a script that connects to the primary server, replicates the
......@@ -342,5 +241,9 @@ Now that the database replication is done, the next step is to configure GitLab.
We don't support MySQL replication for GitLab Geo.
## Troubleshooting
Read the [troubleshooting document](troubleshooting.md).
[pgback]: http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
[reconfigure GitLab]: ../administration/restart_gitlab.md#omnibus-gitlab-reconfigure
# GitLab Geo database replication
>**Note:**
This is the documentation for installations from source. For installations
using the Omnibus GitLab packages, follow the
[**database replication for Omnibus GitLab**](database.md) guide.
1. [Install GitLab Enterprise Edition][install-ee-source] on the server that
will serve as the secondary Geo node. Do not login or set up anything else
in the secondary node for the moment.
1. **Setup the database replication (`primary (read-write) <-> secondary (read-only)` topology).**
1. [Configure GitLab](configuration_source.md) to set the primary and secondary
nodes.
1. [Follow the after setup steps](after_setup.md).
[install-ee-source]: https://docs.gitlab.com/ee/install/installation.html "GitLab Enterprise Edition installation from source"
This document describes the minimal steps you have to take in order to
replicate your GitLab database into another server. You may have to change
some values according to your database setup, how big it is, etc.
You are encouraged to first read through all the steps before executing them
in your testing/production environment.
## PostgreSQL replication
The GitLab primary node where the write operations happen will connect to
`primary` database server, and the secondary ones which are read-only will
connect to `secondary` database servers (which are read-only too).
>**Note:**
In many databases documentation you will see `primary` being references as `master`
and `secondary` as either `slave` or `standby` server (read-only).
### Prerequisites
The following guide assumes that:
- You are using PostgreSQL 9.2 or later which includes the
[`pg_basebackup` tool][pgback]. If you are using Omnibus it includes the required
PostgreSQL version for Geo.
- You have a primary server already set up (the GitLab server you are
replicating from), and you have a new secondary server set up on the same OS
and PostgreSQL version.
- The IP of the primary server for our examples will be `1.2.3.4`, whereas the
secondary's IP will be `5.6.7.8`.
### Step 1. Configure the primary server
1. SSH into your database **primary** server and login as root:
```
sudo -i
```
1. Create a replication user named `gitlab_replicator`:
```bash
sudo -u postgres psql -c "CREATE USER gitlab_replicator REPLICATION ENCRYPTED PASSWORD 'thepassword';"
```
1. Edit `postgresql.conf` to configure the primary server for streaming replication
(for Debian/Ubuntu that would be `/etc/postgresql/9.x/main/postgresql.conf`):
```bash
listen_address = '1.2.3.4'
wal_level = hot_standby
max_wal_senders = 5
checkpoint_segments = 10
wal_keep_segments = 10
hot_standby = on
```
See the Omnibus notes above for more details of `listen_address`.
Edit the `wal` values as you see fit.
1. Set the access control on the primary to allow TCP connections using the
server's public IP and set the connection from the secondary to require a
password. Edit `pg_hba.conf` (for Debian/Ubuntu that would be
`/etc/postgresql/9.x/main/pg_hba.conf`):
```bash
host all all 127.0.0.1/32 trust
host all all 1.2.3.4/32 trust
host replication gitlab_replicator 5.6.7.8/32 md5
```
Where `1.2.3.4` is the public IP address of the primary server, and `5.6.7.8`
the public IP address of the secondary one. If you want to add another
secondary, add one more row like the replication one and change the IP
address:
```bash
host all all 127.0.0.1/32 trust
host all all 1.2.3.4/32 trust
host replication gitlab_replicator 5.6.7.8/32 md5
host replication gitlab_replicator 11.22.33.44/32 md5
```
1. Restart PostgreSQL for the changes to take effect.
1. Now that the PostgreSQL server is set up to accept remote connections, run
`netstat -plnt` to make sure that PostgreSQL is listening to the server's
public IP.
### Step 2. Configure the secondary server
1. SSH into your database **secondary** server and login as root:
```
sudo -i
```
1. Test that the remote connection to the primary server works:
```
sudo -u postgres psql -h 1.2.3.4 -U gitlab_replicator -d gitlabhq_production -W
```
When prompted enter the password you set in the first step for the
`gitlab_replicator` user. If all worked correctly, you should see the
database prompt.
1. Exit the PostgreSQL console:
```
\q
```
1. Edit `postgresql.conf` to configure the secondary for streaming replication
(for Debian/Ubuntu that would be `/etc/postgresql/9.x/main/postgresql.conf`):
```bash
wal_level = hot_standby
max_wal_senders = 5
checkpoint_segments = 10
wal_keep_segments = 10
hot_standby = on
```
1. Restart PostgreSQL for the changes to take effect.
1. Continue to [initiate the replication process](#step-3-initiate-the-replication-process).
### Step 3. Initiate the replication process
Below we provide a script that connects to the primary server, replicates the
database and creates the needed files for replication.
The directories used are the defaults for Debian/Ubuntu. If you have changed
any defaults, configure it as you see fit replacing the directories and paths.
>**Warning:**
Make sure to run this on the **secondary** server as it removes all PostgreSQL's
data before running `pg_basebackup`.
1. SSH into your GitLab **secondary** server and login as root:
```
sudo -i
```
1. Save the snippet below in a file, let's say `/tmp/replica.sh`:
```bash
#!/bin/bash
PORT="5432"
USER="gitlab_replicator"
echo ---------------------------------------------------------------
echo WARNING: Make sure this scirpt is run from the secondary server
echo ---------------------------------------------------------------
echo
echo Enter the IP of the primary PostgreSQL server
read HOST
echo Enter the password for $USER@$HOST
read -s PASSWORD
echo Stopping PostgreSQL and all GitLab services
gitlab-ctl stop
echo Backing up postgresql.conf
sudo -u gitlab-psql mv /var/opt/gitlab/postgresql/data/postgresql.conf /var/opt/gitlab/postgresql/
echo Cleaning up old cluster directory
sudo -u gitlab-psql rm -rf /var/opt/gitlab/postgresql/data
rm -f /tmp/postgresql.trigger
echo Starting base backup as the replicator user
echo Enter the password for $USER@$HOST
sudo -u gitlab-psql /opt/gitlab/embedded/bin/pg_basebackup -h $HOST -D /var/opt/gitlab/postgresql/data -U gitlab_replicator -v -x -P
echo Writing recovery.conf file
sudo -u gitlab-psql bash -c "cat > /var/opt/gitlab/postgresql/data/recovery.conf <<- _EOF1_
standby_mode = 'on'
primary_conninfo = 'host=$HOST port=$PORT user=$USER password=$PASSWORD'
trigger_file = '/tmp/postgresql.trigger'
_EOF1_
"
echo Restoring postgresql.conf
sudo -u gitlab-psql mv /var/opt/gitlab/postgresql/postgresql.conf /var/opt/gitlab/postgresql/data/
echo Starting PostgreSQL and all GitLab services
gitlab-ctl start
```
1. Run it with:
```
bash /tmp/replica.sh
```
When prompted, enter the password you set up for the `gitlab_replicator`
user in the first step.
The replication process is now over.
### Next steps
Now that the database replication is done, the next step is to configure GitLab.
[➤ GitLab Geo configuration](configuration_source.md)
## MySQL replication
We don't support MySQL replication for GitLab Geo.
## Troubleshooting
Read the [troubleshooting document](troubleshooting.md).
[pgback]: http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
[reconfigure GitLab]: ../administration/restart_gitlab.md#omnibus-gitlab-reconfigure
# Geo Frequently Asked Questions
## Can I use Geo in a disaster recovery situation?
There are limitations to what we replicate (see
[What data is replicated to a secondary node?](#what-data-is-replicated-to-a-secondary-node)).
In an extreme data-loss situation you can make a secondary Geo into your
primary, but this is not officially supported yet.
If you still want to proceed, see our step-by-step instructions on how to
manually [promote a secondary node](disaster-recovery.md) into primary.
## What data is replicated to a secondary node?
We currently replicate project repositories and the whole database. This
means user accounts, issues, merge requests, groups, project data, etc.,
will be available for query.
We currently don't replicate user generated attachments / avatars or any
other file in `public/upload`. We also don't replicate LFS or
artifacts data (`shared/folder`).
## Can I git push to a secondary node?
No. All writing operations (this includes `git push`) must be done in your
primary node.
## How long does it take to have a commit replicated to a secondary node?
All replication operations are asynchronous and are queued to be dispatched in
a batched request every 10 seconds. Besides that, it depends on a lot of other
factors including the amount of traffic, how big your commit is, the
connectivity between your nodes, your hardware, etc.
## What happens if the SSH server runs at a different port?
We send the clone url from the primary server to any secondaries, so it
doesn't matter. If primary is running on port `2200` clone url will reflect
that.
# GitLab Geo troubleshooting
>**Note:**
This list is an attempt to document all the moving parts that can go wrong.
We are working into getting all this steps verified automatically in a
rake task in the future.
Setting up Geo requires careful attention to details and sometimes it's easy to
miss a step. Here is a checklist of questions you should ask to try to detect
where you have to fix (all commands and path locations are for Omnibus installs):
- Is Postgres replication working?
- Are my nodes pointing to the correct database instance?
- You should make sure your primary Geo node points to the instance with
writing permissions.
- Any secondary nodes should point only to read-only instances.
- Can Geo detect my current node correctly?
- Geo uses your defined node from `Admin ➔ Geo` screen, and tries to match
with the value defined in `/etc/gitlab/gitlab.rb` configuration file.
The relevant line looks like: `external_url "http://gitlab.example.com"`.
- To check if node on current machine is correctly detected type:
```
sudo gitlab-rails runner "puts Gitlab::Geo.current_node.inspect"
```
and expect something like:
```
#<GeoNode id: 2, schema: "https", host: "gitlab.example.com", port: 443, relative_url_root: "", primary: false, ...>
```
- By running the command above, `primary` should be `true` when executed in
the primary node, and `false` on any secondary
- Did I define the correct SSH Key for the node?
- You must create an SSH Key for `git` user
- This key is the one you have to inform at `Admin > Geo`
- Can I SSH from secondary to primary node using `git` user account?
- This is the most obvious cause of problems with repository replication issues.
If you haven't added the primary node's key to `known_hosts`, you will end up with
a lot of failed sidekiq jobs with an error similar to:
```
Gitlab::Shell::Error: Host key verification failed. fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists.
```
An easy way to fix is by logging in as the `git` user in the secondary node and run:
```
# remove old entries to your primary gitlab in known_hosts
ssh-keyscan -R your-primary-gitlab.example.com
# Updating the Geo nodes
In order to update the GitLab Geo nodes when a new GitLab version is released,
all you need to do is update GitLab itself:
1. Log into each node (primary and secondaries)
1. Upgrade GitLab
1. Test primary and secondary nodes, and check version in each.
---
For Omnibus GitLab installations it's a matter of updating the package:
```
# Debian/Ubuntu
sudo apt-get update
sudo apt-get install gitlab-ee
# Centos/RHEL
sudo yum install gitlab-ee
```
For installations from source, [follow the instructions for your GitLab version]
(https://gitlab.com/gitlab-org/gitlab-ee/tree/master/doc/update).
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment