Commit e2fe9c8e authored by Marcel Amirault's avatar Marcel Amirault Committed by Achilleas Pipinellis

Docs: Merge EE doc/​administration/​geo to CE

parent 2c1a9367
# Automatic background verification **[PREMIUM ONLY]**
NOTE: **Note:**
Automatic background verification of repositories and wikis was added in
GitLab EE 10.6 but is enabled by default only on GitLab EE 11.1. You can
disable or enable this feature manually by following
[these instructions](#disabling-or-enabling-the-automatic-background-verification).
Automatic background verification ensures that the transferred data matches a
calculated checksum. If the checksum of the data on the **primary** node matches checksum of the
data on the **secondary** node, the data transferred successfully. Following a planned failover,
any corrupted data may be **lost**, depending on the extent of the corruption.
If verification fails on the **primary** node, this indicates that Geo is
successfully replicating a corrupted object; restore it from backup or remove it
it from the **primary** node to resolve the issue.
If verification succeeds on the **primary** node but fails on the **secondary** node,
this indicates that the object was corrupted during the replication process.
Geo actively try to correct verification failures marking the repository to
be resynced with a backoff period. If you want to reset the verification for
these failures, so you should follow [these instructions][reset-verification].
If verification is lagging significantly behind replication, consider giving
the node more time before scheduling a planned failover.
## Disabling or enabling the automatic background verification
Run the following commands in a Rails console on the **primary** node:
```sh
# Omnibus GitLab
gitlab-rails console
# Installation from source
cd /home/git/gitlab
sudo -u git -H bin/rails console RAILS_ENV=production
```
To check if automatic background verification is enabled:
```ruby
Gitlab::Geo.repository_verification_enabled?
```
To disable automatic background verification:
```ruby
Feature.disable('geo_repository_verification')
```
To enable automatic background verification:
```ruby
Feature.enable('geo_repository_verification')
```
## Repository verification
Navigate to the **Admin Area > Geo** dashboard on the **primary** node and expand
the **Verification information** tab for that node to view automatic checksumming
status for repositories and wikis. Successes are shown in green, pending work
in grey, and failures in red.
![Verification status](img/verification-status-primary.png)
Navigate to the **Admin Area > Geo** dashboard on the **secondary** node and expand
the **Verification information** tab for that node to view automatic verification
status for repositories and wikis. As with checksumming, successes are shown in
green, pending work in grey, and failures in red.
![Verification status](img/verification-status-secondary.png)
## Using checksums to compare Geo nodes
To check the health of Geo **secondary** nodes, we use a checksum over the list of
Git references and their values. The checksum includes `HEAD`, `heads`, `tags`,
`notes`, and GitLab-specific references to ensure true consistency. If two nodes
have the same checksum, then they definitely hold the same references. We compute
the checksum for every node after every update to make sure that they are all
in sync.
## Repository re-verification
> [Introduced](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/8550) in GitLab Enterprise Edition 11.6. Available in [GitLab Premium](https://about.gitlab.com/pricing/).
Due to bugs or transient infrastructure failures, it is possible for Git
repositories to change unexpectedly without being marked for verification.
Geo constantly reverifies the repositories to ensure the integrity of the
data. The default and recommended re-verification interval is 7 days, though
an interval as short as 1 day can be set. Shorter intervals reduce risk but
increase load and vice versa.
Navigate to the **Admin Area > Geo** dashboard on the **primary** node, and
click the **Edit** button for the **primary** node to customize the minimum
re-verification interval:
![Re-verification interval](img/reverification-interval.png)
The automatic background re-verification is enabled by default, but you can
disable if you need. Run the following commands in a Rails console on the
**primary** node:
```sh
# Omnibus GitLab
gitlab-rails console
# Installation from source
cd /home/git/gitlab
sudo -u git -H bin/rails console RAILS_ENV=production
```
To disable automatic background re-verification:
```ruby
Feature.disable('geo_repository_reverification')
```
To enable automatic background re-verification:
```ruby
Feature.enable('geo_repository_reverification')
```
## Reset verification for projects where verification has failed
Geo actively try to correct verification failures marking the repository to
be resynced with a backoff period. If you want to reset them manually, this
rake task marks projects where verification has failed or the checksum mismatch
to be resynced without the backoff period:
For repositories:
- Omnibus Installation
```sh
sudo gitlab-rake geo:verification:repository:reset
```
- Source Installation
```sh
sudo -u git -H bundle exec rake geo:verification:repository:reset RAILS_ENV=production
```
For wikis:
- Omnibus Installation
```sh
sudo gitlab-rake geo:verification:wiki:reset
```
- Source Installation
```sh
sudo -u git -H bundle exec rake geo:verification:wiki:reset RAILS_ENV=production
```
## Current limitations
Until [issue #5064][ee-5064] is completed, background verification doesn't cover
CI job artifacts and traces, LFS objects, or user uploads in file storage.
Verify their integrity manually by following [these instructions][foreground-verification]
on both nodes, and comparing the output between them.
Data in object storage is **not verified**, as the object store is responsible
for ensuring the integrity of the data.
[reset-verification]: background_verification.md#reset-verification-for-projects-where-verification-has-failed
[foreground-verification]: ../../raketasks/check.md
[ee-5064]: https://gitlab.com/gitlab-org/gitlab-ee/issues/5064
# Bring a demoted primary node back online **[PREMIUM ONLY]**
After a failover, it is possible to fail back to the demoted **primary** node to
restore your original configuration. This process consists of two steps:
1. Making the old **primary** node a **secondary** node.
1. Promoting a **secondary** node to a **primary** node.
CAUTION: **Caution:**
If you have any doubts about the consistency of the data on this node, we recommend setting it up from scratch.
## Configure the former **primary** node to be a **secondary** node
Since the former **primary** node will be out of sync with the current **primary** node, the first step is to bring the former **primary** node up to date. Note, deletion of data stored on disk like
repositories and uploads will not be replayed when bringing the former **primary** node back
into sync, which may result in increased disk usage.
Alternatively, you can [set up a new **secondary** GitLab instance][setup-geo] to avoid this.
To bring the former **primary** node up to date:
1. SSH into the former **primary** node that has fallen behind.
1. Make sure all the services are up:
```sh
sudo gitlab-ctl start
```
> **Note 1:** If you [disabled the **primary** node permanently][disaster-recovery-disable-primary],
> you need to undo those steps now. For Debian/Ubuntu you just need to run
> `sudo systemctl enable gitlab-runsvdir`. For CentOS 6, you need to install
> the GitLab instance from scratch and set it up as a **secondary** node by
> following [Setup instructions][setup-geo]. In this case, you don't need to follow the next step.
>
> **Note 2:** If you [changed the DNS records](index.md#step-4-optional-updating-the-primary-domain-dns-record)
> for this node during disaster recovery procedure you may need to [block
> all the writes to this node](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/doc/gitlab-geo/planned-failover.md#block-primary-traffic)
> during this procedure.
1. [Setup database replication][database-replication]. Note that in this
case, **primary** node refers to the current **primary** node, and **secondary** node refers to the
former **primary** node.
If you have lost your original **primary** node, follow the
[setup instructions][setup-geo] to set up a new **secondary** node.
## Promote the **secondary** node to **primary** node
When the initial replication is complete and the **primary** node and **secondary** node are
closely in sync, you can do a [planned failover].
## Restore the **secondary** node
If your objective is to have two nodes again, you need to bring your **secondary**
node back online as well by repeating the first step
([configure the former **primary** node to be a **secondary** node](#configure-the-former-primary-node-to-be-a-secondary-node))
for the **secondary** node.
[setup-geo]: ../replication/index.md#setup-instructions
[database-replication]: ../replication/database.md
[disaster-recovery-disable-primary]: index.md#step-2-permanently-disable-the-primary-node
[planned failover]: planned_failover.md
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
# Geo configuration (source) **[PREMIUM ONLY]**
NOTE: **Note:**
This documentation applies to GitLab source installations. In GitLab 11.5, this documentation was deprecated and will be removed in a future release.
Please consider [migrating to GitLab Omnibus install](https://docs.gitlab.com/omnibus/update/convert_to_omnibus.html). For installations
using the Omnibus GitLab packages, follow the
[**Omnibus Geo nodes configuration**][configuration] guide.
## Configuring a new **secondary** node
NOTE: **Note:**
This is the final step in setting up a **secondary** node. Stages of the setup
process must be completed in the documented order. Before attempting the steps
in this stage, [complete all prior stages](index.md#using-gitlab-installed-from-source-deprecated).
The basic steps of configuring a **secondary** node are to:
- Replicate required configurations between the **primary** and **secondary** nodes.
- Configure a tracking database on each **secondary** node.
- Start GitLab on the **secondary** node.
You are encouraged to first read through all the steps before executing them
in your testing/production environment.
NOTE: **Note:**
**Do not** set up any custom authentication on **secondary** nodes, this will be handled by the **primary** node.
NOTE: **Note:**
**Do not** add anything in the **secondary** node's admin area (**Admin Area > Geo**). This is handled solely by the **primary** node.
### Step 1. Manually replicate secret GitLab values
GitLab stores a number of secret values in the `/home/git/gitlab/config/secrets.yml`
file which *must* match between the **primary** and **secondary** nodes. Until there is
a means of automatically replicating these between nodes (see [gitlab-org/gitlab-ee#3789]), they must
be manually replicated to **secondary** nodes.
1. SSH into the **primary** node, and execute the command below:
```sh
sudo cat /home/git/gitlab/config/secrets.yml
```
This will display the secrets that need to be replicated, in YAML format.
1. SSH into the **secondary** node and login as the `git` user:
```sh
sudo -i -u git
```
1. Make a backup of any existing secrets:
```sh
mv /home/git/gitlab/config/secrets.yml /home/git/gitlab/config/secrets.yml.`date +%F`
```
1. Copy `/home/git/gitlab/config/secrets.yml` from the **primary** node to the **secondary** node, or
copy-and-paste the file contents between nodes:
```sh
sudo editor /home/git/gitlab/config/secrets.yml
# paste the output of the `cat` command you ran on the primary
# save and exit
```
1. Ensure the file permissions are correct:
```sh
chown git:git /home/git/gitlab/config/secrets.yml
chmod 0600 /home/git/gitlab/config/secrets.yml
```
1. Restart GitLab
```sh
service gitlab restart
```
Once restarted, the **secondary** node will automatically start replicating missing data
from the **primary** node in a process known as backfill. Meanwhile, the **primary** node
will start to notify the **secondary** node of any changes, so that the **secondary** node can
act on those notifications immediately.
Make sure the **secondary** node is running and accessible. You can login to
the **secondary** node with the same credentials as used for the **primary** node.
### Step 2. Manually replicate the **primary** node's SSH host keys
Read [Manually replicate the **primary** node's SSH host keys](configuration.md#step-2-manually-replicate-the-primary-nodes-ssh-host-keys)
### Step 3. Add the **secondary** GitLab node
1. Navigate to the **primary** node's **Admin Area > Geo**
(`/admin/geo/nodes`) in your browser.
1. Add the **secondary** node by providing its full URL. **Do NOT** check the
**This is a primary node** checkbox.
1. Optionally, choose which namespaces should be replicated by the
**secondary** node. Leave blank to replicate all. Read more in
[selective synchronization](#selective-synchronization).
1. Click the **Add node** button.
1. SSH into your GitLab **secondary** server and restart the services:
```sh
service gitlab restart
```
Check if there are any common issue with your Geo setup by running:
```sh
bundle exec rake gitlab:geo:check
```
1. SSH into your GitLab **primary** server and login as root to verify the
**secondary** node is reachable or there are any common issue with your Geo setup:
```sh
bundle exec rake gitlab:geo:check
```
Once reconfigured, the **secondary** node will automatically start
replicating missing data from the **primary** node in a process known as backfill.
Meanwhile, the **primary** node will start to notify the **secondary** node of any changes, so
that the **secondary** node can act on those notifications immediately.
Make sure the **secondary** node is running and accessible.
You can log in to the **secondary** node with the same credentials as used for the **primary** node.
### Step 4. Enabling Hashed Storage
Read [Enabling Hashed Storage](configuration.md#step-4-enabling-hashed-storage).
### Step 5. (Optional) Configuring the secondary to trust the primary
You can safely skip this step if your **primary** node uses a CA-issued HTTPS certificate.
If your **primary** node is using a self-signed certificate for *HTTPS* support, you will
need to add that certificate to the **secondary** node's trust store. Retrieve the
certificate from the **primary** node and follow your distribution's instructions for
adding it to the **secondary** node's trust store. In Debian/Ubuntu, you would follow these steps:
```sh
sudo -i
cp <primary_node_certification_file> /usr/local/share/ca-certificates
update-ca-certificates
```
### Step 6. Enable Git access over HTTP/HTTPS
Geo synchronizes repositories over HTTP/HTTPS, and therefore requires this clone
method to be enabled. Navigate to **Admin Area > Settings**
(`/admin/application_settings`) on the **primary** node, and set
`Enabled Git access protocols` to `Both SSH and HTTP(S)` or `Only HTTP(S)`.
### Step 7. Verify proper functioning of the secondary node
Read [Verify proper functioning of the secondary node][configuration-verify-node].
## Selective synchronization
Read [Selective synchronization][configuration-selective-replication].
## Troubleshooting
Read the [troubleshooting document][troubleshooting].
[gitlab-org/gitlab-ee#3789]: https://gitlab.com/gitlab-org/gitlab-ee/issues/3789
[configuration]: configuration.md
[configuration-selective-replication]: configuration.md#selective-synchronization
[configuration-verify-node]: configuration.md#step-7-verify-proper-functioning-of-the-secondary-node
[troubleshooting]: troubleshooting.md
This diff is collapsed.
This diff is collapsed.
# Docker Registry for a secondary node **[PREMIUM ONLY]**
You can set up a [Docker Registry] on your
**secondary** Geo node that mirrors the one on the **primary** Geo node.
## Storage support
CAUTION: **Warning:**
If you use [local storage][registry-storage]
for the Container Registry you **cannot** replicate it to a **secondary** node.
Docker Registry currently supports a few types of storages. If you choose a
distributed storage (`azure`, `gcs`, `s3`, `swift`, or `oss`) for your Docker
Registry on the **primary** node, you can use the same storage for a **secondary**
Docker Registry as well. For more information, read the
[Load balancing considerations][registry-load-balancing]
when deploying the Registry, and how to set up the storage driver for GitLab's
integrated [Container Registry][registry-storage].
[ee]: https://about.gitlab.com/pricing/
[Docker Registry]: https://docs.docker.com/registry/
[registry-storage]: ../../container_registry.md#container-registry-storage-driver
[registry-load-balancing]: https://docs.docker.com/registry/deploying/#load-balancing-considerations
# Geo with external PostgreSQL instances **[PREMIUM ONLY]**
This document is relevant if you are using a PostgreSQL instance that is *not
managed by Omnibus*. This includes cloud-managed instances like AWS RDS, or
manually installed and configured PostgreSQL instances.
NOTE: **Note**:
We strongly recommend running Omnibus-managed instances as they are actively
developed and tested. We aim to be compatible with most external
(not managed by Omnibus) databases but we do not guarantee compatibility.
## **Primary** node
1. SSH into a GitLab **primary** application server and login as root:
```sh
sudo -i
```
1. Execute the command below to define the node as **primary** node:
```sh
gitlab-ctl set-geo-primary-node
```
This command will use your defined `external_url` in `/etc/gitlab/gitlab.rb`.
### Configure the external database to be replicated
To set up an external database, you can either:
- Set up streaming replication yourself (for example, in AWS RDS).
- Perform the Omnibus configuration manually as follows.
#### Leverage your cloud provider's tools to replicate the primary database
Given you have a primary node set up on AWS EC2 that uses RDS.
You can now just create a read-only replica in a different region and the
replication process will be managed by AWS. Make sure you've set Network ACL, Subnet, and
Security Group according to your needs, so the secondary application node can access the database.
Skip to the [Configure secondary application node](#configure-secondary-application-nodes-to-use-the-external-read-replica) section below.
#### Manually configure the primary database for replication
The [geo_primary_role](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
configures the **primary** node's database to be replicated by making changes to
`pg_hba.conf` and `postgresql.conf`. Make the following configuration changes
manually to your external database configuration:
```
##
## Geo Primary Role
## - pg_hba.conf
##
host replication gitlab_replicator <trusted secondary IP>/32 md5
```
```
##
## Geo Primary Role
## - postgresql.conf
##
sql_replication_user = gitlab_replicator
wal_level = hot_standby
max_wal_senders = 10
wal_keep_segments = 50
max_replication_slots = 1 # number of secondary instances
hot_standby = on
```
## **Secondary** nodes
### Manually configure the replica database
Make the following configuration changes manually to your `postgresql.conf`
of external replica database:
```
##
## Geo Secondary Role
## - postgresql.conf
##
wal_level = hot_standby
max_wal_senders = 10
wal_keep_segments = 10
hot_standby = on
```
### Configure **secondary** application nodes to use the external read-replica
With Omnibus, the
[geo_secondary_role](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
has three main functions:
1. Configure the replica database.
1. Configure the tracking database.
1. Enable the [Geo Log Cursor](index.md#geo-log-cursor) (not covered in this section).
To configure the connection to the external read-replica database and enable Log Cursor:
1. SSH into a GitLab **secondary** application server and login as root:
```bash
sudo -i
```
1. Edit `/etc/gitlab/gitlab.rb` and add the following
```ruby
##
## Geo Secondary role
## - configure dependent flags automatically to enable Geo
##
roles ['geo_secondary_role']
# note this is shared between both databases,
# make sure you define the same password in both
gitlab_rails['db_password'] = '<your_password_here>'
gitlab_rails['db_username'] = 'gitlab'
gitlab_rails['db_host'] = '<database_read_replica_host>'
```
1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)
### Configure the tracking database
**Secondary** nodes use a separate PostgreSQL installation as a tracking
database to keep track of replication status and automatically recover from
potential replication issues. Omnibus automatically configures a tracking database
when `roles ['geo_secondary_role']` is set. For high availability,
refer to [Geo High Availability](https://docs.gitlab.com/ee/administration/high_availability).
If you want to run this database external to Omnibus, please follow the instructions below.
The tracking database requires an [FDW](https://www.postgresql.org/docs/9.6/static/postgres-fdw.html)
connection with the **secondary** replica database for improved performance.
If you have an external database ready to be used as the tracking database,
follow the instructions below to use it:
NOTE: **Note:**
If you want to use AWS RDS as a tracking database, make sure it has access to
the secondary database. Unfortunately, just assigning the same security group is not enough as
outbound rules do not apply to RDS PostgreSQL databases. Therefore, you need to explicitly add an inbound
rule to the read-replica's security group allowing any TCP traffic from
the tracking database on port 5432.
1. SSH into a GitLab **secondary** server and login as root:
```bash
sudo -i
```
1. Edit `/etc/gitlab/gitlab.rb` with the connection params and credentials for
the machine with the PostgreSQL instance:
```ruby
geo_secondary['db_username'] = 'gitlab_geo'
geo_secondary['db_password'] = '<your_password_here>'
geo_secondary['db_host'] = '<tracking_database_host>'
geo_secondary['db_port'] = <tracking_database_port> # change to the correct port
geo_secondary['db_fdw'] = true # enable FDW
geo_postgresql['enable'] = false # don't use internal managed instance
```
1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)
1. Run the tracking database migrations:
```bash
gitlab-rake geo:db:create
gitlab-rake geo:db:migrate
```
1. Configure the
[PostgreSQL FDW](https://www.postgresql.org/docs/9.6/static/postgres-fdw.html)
connection and credentials:
Save the script below in a file, ex. `/tmp/geo_fdw.sh` and modify the connection
params to match your environment. Execute it to set up the FDW connection.
```bash
#!/bin/bash
# Secondary Database connection params:
DB_HOST="<public_ip_or_vpc_private_ip>"
DB_NAME="gitlabhq_production"
DB_USER="gitlab"
DB_PASS="<your_password_here>"
DB_PORT="5432"
# Tracking Database connection params:
GEO_DB_HOST="<public_ip_or_vpc_private_ip>"
GEO_DB_NAME="gitlabhq_geo_production"
GEO_DB_USER="gitlab_geo"
GEO_DB_PORT="5432"
query_exec () {
gitlab-psql -h $GEO_DB_HOST -d $GEO_DB_NAME -p $GEO_DB_PORT -c "${1}"
}
query_exec "CREATE EXTENSION postgres_fdw;"
query_exec "CREATE SERVER gitlab_secondary FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${DB_HOST}', dbname '${DB_NAME}', port '${DB_PORT}');"
query_exec "CREATE USER MAPPING FOR ${GEO_DB_USER} SERVER gitlab_secondary OPTIONS (user '${DB_USER}', password '${DB_PASS}');"
query_exec "CREATE SCHEMA gitlab_secondary;"
query_exec "GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO ${GEO_DB_USER};"
```
NOTE: **Note:** The script template above uses `gitlab-psql` as it's intended to be executed from the Geo machine,
but you can change it to `psql` and run it from any machine that has access to the database. We also recommend using
`psql` for AWS RDS.
1. Save the file and [restart GitLab](../../restart_gitlab.md#omnibus-gitlab-restart)
1. Populate the FDW tables:
```bash
gitlab-rake geo:db:refresh_foreign_tables
```
# Geo Frequently Asked Questions **[PREMIUM ONLY]**
## What are the minimum requirements to run Geo?
The requirements are listed [on the index page](index.md#requirements-for-running-geo)
## How does Geo know which projects to sync?
On each **secondary** node, there is a read-only replicated copy of the GitLab database.
A **secondary** node also has a tracking database where it stores which projects have been synced.
Geo compares the two databases to find projects that are not yet tracked.
At the start, this tracking database is empty, so Geo will start trying to update from every project that it can see in the GitLab database.
For each project to sync:
1. Geo will issue a `git fetch geo --mirror` to get the latest information from the **primary** node.
If there are no changes, the sync will be fast and end quickly. Otherwise, it will pull the latest commits.
1. The **secondary** node will update the tracking database to store the fact that it has synced projects A, B, C, etc.
1. Repeat until all projects are synced.
When someone pushes a commit to the **primary** node, it generates an event in the GitLab database that the repository has changed.
The **secondary** node sees this event, marks the project in question as dirty, and schedules the project to be resynced.
To ensure that problems with pipelines (for example, syncs failing too many times or jobs being lost) don't permanently stop projects syncing, Geo also periodically checks the tracking database for projects that are marked as dirty. This check happens when
the number of concurrent syncs falls below `repos_max_capacity` and there are no new projects waiting to be synced.
Geo also has a checksum feature which runs a SHA256 sum across all the Git references to the SHA values.
If the refs don't match between the **primary** node and the **secondary** node, then the **secondary** node will mark that project as dirty and try to resync it.
So even if we have an outdated tracking database, the validation should activate and find discrepancies in the repository state and resync.
## Can I use Geo in a disaster recovery situation?
Yes, but there are limitations to what we replicate (see
[What data is replicated to a **secondary** node?](#what-data-is-replicated-to-a-secondary-node)).
Read the documentation for [Disaster Recovery](../disaster_recovery/index.md).
## What data is replicated to a **secondary** node?
We currently replicate project repositories, LFS objects, generated
attachments / avatars and the whole database. This means user accounts,
issues, merge requests, groups, project data, etc., will be available for
query.
## Can I git push to a **secondary** node?
Yes! Pushing directly to a **secondary** node (for both HTTP and SSH, including git-lfs) was [introduced](https://about.gitlab.com/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
## How long does it take to have a commit replicated to a **secondary** node?
All replication operations are asynchronous and are queued to be dispatched. Therefore, it depends on a lot of
factors including the amount of traffic, how big your commit is, the
connectivity between your nodes, your hardware, etc.
## What if the SSH server runs at a different port?
That's totally fine. We use HTTP(s) to fetch repository changes from the **primary** node to all **secondary** nodes.
## Is this possible to set up a Docker Registry for a **secondary** node that mirrors the one on the **primary** node?
Yes. See [Docker Registry for a **secondary** node](docker_registry.md).
# Geo High Availability **[PREMIUM ONLY]**
This document describes a minimal reference architecture for running Geo
in a high availability configuration. If your HA setup differs from the one
described, it is possible to adapt these instructions to your needs.
## Architecture overview
![Geo HA Diagram](https://docs.gitlab.com/ee/administration/img/high_availability/geo-ha-diagram.png)
_[diagram source - gitlab employees only][diagram-source]_
The topology above assumes that the **primary** and **secondary** Geo clusters
are located in two separate locations, on their own virtual network
with private IP addresses. The network is configured such that all machines within
one geographic location can communicate with each other using their private IP addresses.
The IP addresses given are examples and may be different depending on the
network topology of your deployment.
The only external way to access the two Geo deployments is by HTTPS at
`gitlab.us.example.com` and `gitlab.eu.example.com` in the example above.
NOTE: **Note:**
The **primary** and **secondary** Geo deployments must be able to communicate to each other over HTTPS.
## Redis and PostgreSQL High Availability
The **primary** and **secondary** Redis and PostgreSQL should be configured
for high availability. Because of the additional complexity involved
in setting up this configuration for PostgreSQL and Redis,
it is not covered by this Geo HA documentation.
For more information about setting up a highly available PostgreSQL cluster and Redis cluster using the omnibus package see the high availability documentation for
[PostgreSQL](../../high_availability/database.md) and
[Redis](../../high_availability/redis.md), respectively.
NOTE: **Note:**
It is possible to use cloud hosted services for PostgreSQL and Redis, but this is beyond the scope of this document.
## Prerequisites: A working GitLab HA cluster
This cluster will serve as the **primary** node. Use the
[GitLab HA documentation](../../high_availability/README.md) to set this up.
## Configure the GitLab cluster to be the **primary** node
The following steps enable a GitLab cluster to serve as the **primary** node.
### Step 1: Configure the **primary** frontend servers
1. Edit `/etc/gitlab/gitlab.rb` and add the following:
```ruby
##
## Enable the Geo primary role
##
roles ['geo_primary_role']
##
## Disable automatic migrations
##
gitlab_rails['auto_migrate'] = false
```
After making these changes, [reconfigure GitLab][gitlab-reconfigure] so the changes take effect.
NOTE: **Note:** PostgreSQL and Redis should have already been disabled on the
application servers, and connections from the application servers to those
services on the backend servers configured, during normal GitLab HA set up. See
high availability configuration documentation for
[PostgreSQL](https://docs.gitlab.com/ee/administration/high_availability/database.html#configuring-the-application-nodes)
and [Redis](../../high_availability/redis.md#example-configuration-for-the-gitlab-application).
The **primary** database will require modification later, as part of
[step 2](#step-2-configure-the-main-read-only-replica-postgresql-database-on-the-secondary-node).
## Configure a **secondary** node
A **secondary** cluster is similar to any other GitLab HA cluster, with two
major differences:
* The main PostgreSQL database is a read-only replica of the **primary** node's
PostgreSQL database.
* There is also a single PostgreSQL database for the **secondary** cluster,
called the "tracking database", which tracks the synchronization state of
various resources.
Therefore, we will set up the HA components one-by-one, and include deviations
from the normal HA setup.
### Step 1: Configure the Redis and NFS services on the **secondary** node
Configure the following services, again using the non-Geo high availability
documentation:
* [Configuring Redis for GitLab HA](../../high_availability/redis.md) for high
availability.
* [NFS](../../high_availability/nfs.md) which will store data that is
synchronized from the **primary** node.
### Step 2: Configure the main read-only replica PostgreSQL database on the **secondary** node
NOTE: **Note:** The following documentation assumes the database will be run on
only a single machine, rather than as a PostgreSQL cluster.
Configure the [**secondary** database](database.md) as a read-only replica of
the **primary** database.
If using an external PostgreSQL instance, refer also to
[Geo with external PostgreSQL instances](external_database.md).
### Step 3: Configure the tracking database on the **secondary** node
NOTE: **Note:** This documentation assumes the tracking database will be run on
only a single machine, rather than as a PostgreSQL cluster.
Configure the tracking database.
1. Edit `/etc/gitlab/gitlab.rb` in the tracking database machine, and add the
following:
```ruby
##
## Enable the Geo secondary tracking database
##
geo_postgresql['enable'] = true
geo_postgresql['ha'] = true
```
After making these changes [Reconfigure GitLab][gitlab-reconfigure] so the changes take effect.
If using an external PostgreSQL instance, refer also to
[Geo with external PostgreSQL instances](external_database.md).
### Step 4: Configure the frontend application servers on the **secondary** node
In the architecture overview, there are two machines running the GitLab
application services. These services are enabled selectively in the
configuration.
Configure the application servers following
[Configuring GitLab for HA](../../high_availability/gitlab.md), then make the
following modifications:
1. Edit `/etc/gitlab/gitlab.rb` on each application server in the **secondary**
cluster, and add the following:
```ruby
##
## Enable the Geo secondary role
##
roles ['geo_secondary_role', 'application_role']
##
## Disable automatic migrations
##
gitlab_rails['auto_migrate'] = false
##
## Configure the connection to the tracking DB. And disable application
## servers from running tracking databases.
##
geo_secondary['db_host'] = '<geo_tracking_db_host>'
geo_secondary['db_password'] = '<geo_tracking_db_password>'
geo_postgresql['enable'] = false
##
## Configure connection to the streaming replica database, if you haven't
## already
##
gitlab_rails['db_host'] = '<replica_database_host>'
gitlab_rails['db_password'] = '<replica_database_password>'
##
## Configure connection to Redis, if you haven't already
##
gitlab_rails['redis_host'] = '<redis_host>'
gitlab_rails['redis_password'] = '<redis_password>'
##
## If you are using custom users not managed by Omnibus, you need to specify
## UIDs and GIDs like below, and ensure they match between servers in a
## cluster to avoid permissions issues
##
user['uid'] = 9000
user['gid'] = 9000
web_server['uid'] = 9001
web_server['gid'] = 9001
registry['uid'] = 9002
registry['gid'] = 9002
```
NOTE: **Note:**
If you had set up PostgreSQL cluster using the omnibus package and you had set
up `postgresql['sql_user_password'] = 'md5 digest of secret'` setting, keep in
mind that `gitlab_rails['db_password']` and `geo_secondary['db_password']`
mentioned above contains the plaintext passwords. This is used to let the Rails
servers connect to the databases.
NOTE: **Note:**
Make sure that current node IP is listed in `postgresql['md5_auth_cidr_addresses']` setting of your remote database.
After making these changes [Reconfigure GitLab][gitlab-reconfigure] so the changes take effect.
On the secondary the following GitLab frontend services will be enabled:
* geo-logcursor
* gitlab-pages
* gitlab-workhorse
* logrotate
* nginx
* registry
* remote-syslog
* sidekiq
* unicorn
Verify these services by running `sudo gitlab-ctl status` on the frontend
application servers.
### Step 5: Set up the LoadBalancer for the **secondary** node
In this topology, a load balancer is required at each geographic location to
route traffic to the application servers.
See [Load Balancer for GitLab HA](../../high_availability/load_balancer.md) for
more information.
[diagram-source]: https://docs.google.com/drawings/d/1z0VlizKiLNXVVVaERFwgsIOuEgjcUqDTWPdQYsE7Z4c/edit
[gitlab-reconfigure]: ../../restart_gitlab.md#omnibus-gitlab-reconfigure
This diff is collapsed.
# Geo with Object storage **[PREMIUM ONLY]**
Geo can be used in combination with Object Storage (AWS S3, or
other compatible object storage).
## Configuration
At this time it is required that if object storage is enabled on the
**primary** node, it must also be enabled on each **secondary** node.
**Secondary** nodes can use the same storage bucket as the **primary** node, or
they can use a replicated storage bucket. At this time GitLab does not
take care of content replication in object storage.
For LFS, follow the documentation to
[set up LFS object storage](../../../workflow/lfs/lfs_administration.md#storing-lfs-objects-in-remote-object-storage).
For CI job artifacts, there is similar documentation to configure
[jobs artifact object storage](../../job_artifacts.md#using-object-storage)
For user uploads, there is similar documentation to configure [upload object storage](../../uploads.md#using-object-storage-core-only)
You should enable and configure object storage on both **primary** and **secondary**
nodes. Migrating existing data to object storage should be performed on the
**primary** node only. **Secondary** nodes will automatically notice that the migrated
files are now in object storage.
## Replication
When using Amazon S3, you can use
[CRR](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html) to
have automatic replication between the bucket used by the **primary** node and
the bucket used by **secondary** nodes.
If you are using Google Cloud Storage, consider using
[Multi-Regional Storage](https://cloud.google.com/storage/docs/storage-classes#multi-regional).
Or you can use the [Storage Transfer Service](https://cloud.google.com/storage/transfer/),
although this only supports daily synchronization.
For manual synchronization, or scheduled by `cron`, please have a look at:
- [`s3cmd sync`](http://s3tools.org/s3cmd-sync)
- [`gsutil rsync`](https://cloud.google.com/storage/docs/gsutil/commands/rsync)
# Removing secondary Geo nodes **[PREMIUM ONLY]**
**Secondary** nodes can be removed from the Geo cluster using the Geo admin page of the **primary** node. To remove a **secondary** node:
1. Navigate to **Admin Area > Geo** (`/admin/geo/nodes`).
1. Click the **Remove** button for the **secondary** node you want to remove.
1. Confirm by clicking **Remove** when the prompt appears.
Once removed from the Geo admin page, you must stop and uninstall the **secondary** node:
1. On the **secondary** node, stop GitLab:
```bash
sudo gitlab-ctl stop
```
1. On the **secondary** node, uninstall GitLab:
```bash
# Stop gitlab and remove its supervision process
sudo gitlab-ctl uninstall
# Debian/Ubuntu
sudo dpkg --remove gitlab-ee
# Redhat/Centos
sudo rpm --erase gitlab-ee
```
Once GitLab has been uninstalled from the **secondary** node, the replication slot must be dropped from the **primary** node's database as follows:
1. On the **primary** node, start a PostgreSQL console session:
```bash
sudo gitlab-psql
```
NOTE: **Note:**
Using `gitlab-rails dbconsole` will not work, because managing replication slots requires superuser permissions.
1. Find the name of the relevant replication slot. This is the slot that is specified with `--slot-name` when running the replicate command: `gitlab-ctl replicate-geo-database`.
```sql
SELECT * FROM pg_replication_slots;
```
1. Remove the replication slot for the **secondary** node:
```sql
SELECT pg_drop_replication_slot('<name_of_slot>');
```
This diff is collapsed.
This diff is collapsed.
# Tuning Geo **[PREMIUM ONLY]**
## Changing the sync capacity values
In the Geo admin page (`/admin/geo/nodes`), there are several variables that
can be tuned to improve performance of Geo:
- Repository sync capacity.
- File sync capacity.
Increasing these values will increase the number of jobs that are scheduled.
However, this may not lead to more downloads in parallel unless the number of
available Sidekiq threads is also increased. For example, if repository sync
capacity is increased from 25 to 50, you may also want to increase the number
of Sidekiq threads from 25 to 50. See the
[Sidekiq concurrency documentation](https://docs.gitlab.com/ee/administration/operations/extra_sidekiq_processes.html#number-of-threads)
for more details.
This diff is collapsed.
[//]: # (Please update EE::GitLab::GeoGitAccess::GEO_SERVER_DOCS_URL if this file is moved)
# Using a Geo Server **[PREMIUM ONLY]**
After you set up the [database replication and configure the Geo nodes][req], use your closest GitLab node as you would a normal standalone GitLab instance.
Pushing directly to a **secondary** node (for both HTTP, SSH including git-lfs) was [introduced](https://about.gitlab.com/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
Example of the output you will see when pushing to a **secondary** node:
```bash
$ git push
> GitLab: You're pushing to a Geo secondary.
> GitLab: We'll help you by proxying this request to the primary: ssh://git@primary.geo/user/repo.git
Everything up-to-date
```
[req]: index.md#setup-instructions
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment