Commit 20687d4c authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Merge branch 'geo-docs-overview-refactor' into 'master'

Moved main Geo page and setup instructions

Closes #243513

See merge request gitlab-org/gitlab!40964
parents 85e13edd dc6e9202
......@@ -21,7 +21,7 @@ If you have any doubts about the consistency of the data on this node, we recomm
Since the former **primary** node will be out of sync with the current **primary** node, the first step is to bring the former **primary** node up to date. Note, deletion of data stored on disk like
repositories and uploads will not be replayed when bringing the former **primary** node back
into sync, which may result in increased disk usage.
Alternatively, you can [set up a new **secondary** GitLab instance](../replication/index.md#setup-instructions) to avoid this.
Alternatively, you can [set up a new **secondary** GitLab instance](../setup/index.md) to avoid this.
To bring the former **primary** node up to date:
......@@ -37,7 +37,7 @@ To bring the former **primary** node up to date:
you need to undo those steps now. For Debian/Ubuntu you just need to run
`sudo systemctl enable gitlab-runsvdir`. For CentOS 6, you need to install
the GitLab instance from scratch and set it up as a **secondary** node by
following [Setup instructions](../replication/index.md#setup-instructions). In this case, you don't need to follow the next step.
following [Setup instructions](../setup/index.md). In this case, you don't need to follow the next step.
NOTE: **Note:**
If you [changed the DNS records](index.md#step-4-optional-updating-the-primary-domain-dns-record)
......@@ -50,7 +50,7 @@ To bring the former **primary** node up to date:
former **primary** node.
If you have lost your original **primary** node, follow the
[setup instructions](../replication/index.md#setup-instructions) to set up a new **secondary** node.
[setup instructions](../setup/index.md) to set up a new **secondary** node.
## Promote the **secondary** node to **primary** node
......
......@@ -11,7 +11,7 @@ Geo replicates your database, your Git repositories, and few other assets.
We will support and replicate more data in the future, that will enable you to
failover with minimal effort, in a disaster situation.
See [Geo current limitations](../replication/index.md#current-limitations) for more information.
See [Geo current limitations](../index.md#current-limitations) for more information.
CAUTION: **Warning:**
Disaster recovery for multi-secondary configurations is in **Alpha**.
......@@ -330,7 +330,7 @@ secondary domain, like changing Git remotes and API URLs.
Promoting a **secondary** node to **primary** node using the process above does not enable
Geo on the new **primary** node.
To bring a new **secondary** node online, follow the [Geo setup instructions](../replication/index.md#setup-instructions).
To bring a new **secondary** node online, follow the [Geo setup instructions](../index.md#setup-instructions).
### Step 6. (Optional) Removing the secondary's tracking database
......@@ -437,7 +437,7 @@ section to resolve the error. Otherwise, the secret is lost and you'll need to
### While Promoting the secondary, I got an error `ActiveRecord::RecordInvalid`
If you disabled a secondary node, either with the [replication pause task](../replication/index.md#pausing-and-resuming-replication)
If you disabled a secondary node, either with the [replication pause task](../index.md#pausing-and-resuming-replication)
(13.2) or via the UI (13.1 and earlier), you must first re-enable the
node before you can continue. This is fixed in 13.4.
......
......@@ -27,7 +27,7 @@ have a high degree of confidence in being able to perform them accurately.
## Not all data is automatically replicated
If you are using any GitLab features that Geo [doesn't support](../replication/index.md#current-limitations),
If you are using any GitLab features that Geo [doesn't support](../index.md#current-limitations),
you must make separate provisions to ensure that the **secondary** node has an
up-to-date copy of any data associated with that feature. This may extend the
required scheduled maintenance period significantly.
......
---
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
type: howto
---
# Geo **(PREMIUM ONLY)**
> - Introduced in GitLab Enterprise Edition 8.9.
> - Using Geo in combination with
> [multi-node architectures](../reference_architectures/index.md)
> is considered **Generally Available** (GA) in
> [GitLab Premium](https://about.gitlab.com/pricing/) 10.4.
Geo is the solution for widely distributed development teams and for providing a warm-standby as part of a disaster recovery strategy.
## Overview
CAUTION: **Caution:**
Geo undergoes significant changes from release to release. Upgrades **are** supported and [documented](#updating-geo), but you should ensure that you're using the right version of the documentation for your installation.
Fetching large repositories can take a long time for teams located far from a single GitLab instance.
Geo provides local, read-only instances of your GitLab instances. This can reduce the time it takes
to clone and fetch large repositories, speeding up development.
For a video introduction to Geo, see [Introduction to GitLab Geo - GitLab Features](https://www.youtube.com/watch?v=-HDLxSjEh6w).
To make sure you're using the right version of the documentation, navigate to [the source version of this page on GitLab.com](https://gitlab.com/gitlab-org/gitlab/blob/master/doc/administration/geo/index.md) and choose the appropriate release from the **Switch branch/tag** dropdown. For example, [`v11.2.3-ee`](https://gitlab.com/gitlab-org/gitlab/blob/v11.2.3-ee/doc/administration/geo/index.md).
## Use cases
Implementing Geo provides the following benefits:
- Reduce from minutes to seconds the time taken for your distributed developers to clone and fetch large repositories and projects.
- Enable all of your developers to contribute ideas and work in parallel, no matter where they are.
- Balance the read-only load between your **primary** and **secondary** nodes.
In addition, it:
- Can be used for cloning and fetching projects, in addition to reading any data available in the GitLab web interface (see [current limitations](#current-limitations)).
- Overcomes slow connections between distant offices, saving time by improving speed for distributed teams.
- Helps reducing the loading time for automated tasks, custom integrations, and internal workflows.
- Can quickly fail over to a **secondary** node in a [disaster recovery](disaster_recovery/index.md) scenario.
- Allows [planned failover](disaster_recovery/planned_failover.md) to a **secondary** node.
Geo provides:
- Read-only **secondary** nodes: Maintain one **primary** GitLab node while still enabling read-only **secondary** nodes for each of your distributed teams.
- Authentication system hooks: **Secondary** nodes receives all authentication data (like user accounts and logins) from the **primary** instance.
- An intuitive UI: **Secondary** nodes utilize the same web interface your team has grown accustomed to. In addition, there are visual notifications that block write operations and make it clear that a user is on a **secondary** node.
## How it works
Your Geo instance can be used for cloning and fetching projects, in addition to reading any data. This will make working with large repositories over large distances much faster.
![Geo overview](replication/img/geo_overview.png)
When Geo is enabled, the:
- Original instance is known as the **primary** node.
- Replicated read-only nodes are known as **secondary** nodes.
Keep in mind that:
- **Secondary** nodes talk to the **primary** node to:
- Get user data for logins (API).
- Replicate repositories, LFS Objects, and Attachments (HTTPS + JWT).
- Since GitLab Premium 10.0, the **primary** node no longer talks to **secondary** nodes to notify for changes (API).
- Pushing directly to a **secondary** node (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
- There are [limitations](#current-limitations) in the current implementation.
### Architecture
The following diagram illustrates the underlying architecture of Geo.
![Geo architecture](replication/img/geo_architecture.png)
In this diagram:
- There is the **primary** node and the details of one **secondary** node.
- Writes to the database can only be performed on the **primary** node. A **secondary** node receives database
updates via PostgreSQL streaming replication.
- If present, the [LDAP server](#ldap) should be configured to replicate for [Disaster Recovery](disaster_recovery/index.md) scenarios.
- A **secondary** node performs different type of synchronizations against the **primary** node, using a special
authorization protected by JWT:
- Repositories are cloned/updated via Git over HTTPS.
- Attachments, LFS objects, and other files are downloaded via HTTPS using a private API endpoint.
From the perspective of a user performing Git operations:
- The **primary** node behaves as a full read-write GitLab instance.
- **Secondary** nodes are read-only but proxy Git push operations to the **primary** node. This makes **secondary** nodes appear to support push operations themselves.
To simplify the diagram, some necessary components are omitted. Note that:
- Git over SSH requires [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) and OpenSSH.
- Git over HTTPS required [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse).
Note that a **secondary** node needs two different PostgreSQL databases:
- A read-only database instance that streams data from the main GitLab database.
- [Another database instance](#geo-tracking-database) used internally by the **secondary** node to record what data has been replicated.
In **secondary** nodes, there is an additional daemon: [Geo Log Cursor](#geo-log-cursor).
## Requirements for running Geo
The following are required to run Geo:
- An operating system that supports OpenSSH 6.9+ (needed for
[fast lookup of authorized SSH keys in the database](../operations/fast_ssh_key_lookup.md))
The following operating systems are known to ship with a current version of OpenSSH:
- [CentOS](https://www.centos.org) 7.4+
- [Ubuntu](https://ubuntu.com) 16.04+
- PostgreSQL 11+ with [Streaming Replication](https://wiki.postgresql.org/wiki/Streaming_Replication)
- Git 2.9+
- All nodes must run the same GitLab version.
Additionally, check GitLab's [minimum requirements](../../install/requirements.md),
and we recommend you use:
- At least GitLab Enterprise Edition 10.0 for basic Geo features.
- The latest version for a better experience.
### Firewall rules
The following table lists basic ports that must be open between the **primary** and **secondary** nodes for Geo.
| **Primary** node | **Secondary** node | Protocol |
|:-----------------|:-------------------|:-------------|
| 80 | 80 | HTTP |
| 443 | 443 | TCP or HTTPS |
| 22 | 22 | TCP |
| 5432 | | PostgreSQL |
See the full list of ports used by GitLab in [Package defaults](https://docs.gitlab.com/omnibus/package-information/defaults.html)
NOTE: **Note:**
[Web terminal](../../ci/environments/index.md#web-terminals) support requires your load balancer to correctly handle WebSocket connections.
When using HTTP or HTTPS proxying, your load balancer must be configured to pass through the `Connection` and `Upgrade` hop-by-hop headers. See the [web terminal](../integration/terminal.md) integration guide for more details.
NOTE: **Note:**
When using HTTPS protocol for port 443, you will need to add an SSL certificate to the load balancers.
If you wish to terminate SSL at the GitLab application server instead, use TCP protocol.
### LDAP
We recommend that if you use LDAP on your **primary** node, you also set up secondary LDAP servers on each **secondary** node. Otherwise, users will not be able to perform Git operations over HTTP(s) on the **secondary** node using HTTP Basic Authentication. However, Git via SSH and personal access tokens will still work.
NOTE: **Note:**
It is possible for all **secondary** nodes to share an LDAP server, but additional latency can be an issue. Also, consider what LDAP server will be available in a [disaster recovery](disaster_recovery/index.md) scenario if a **secondary** node is promoted to be a **primary** node.
Check for instructions on how to set up replication in your LDAP service. Instructions will be different depending on the software or service used. For example, OpenLDAP provides [these instructions](https://www.openldap.org/doc/admin24/replication.html).
### Geo Tracking Database
The tracking database instance is used as metadata to control what needs to be updated on the disk of the local instance. For example:
- Download new assets.
- Fetch new LFS Objects.
- Fetch changes from a repository that has recently been updated.
Because the replicated database instance is read-only, we need this additional database instance for each **secondary** node.
### Geo Log Cursor
This daemon:
- Reads a log of events replicated by the **primary** node to the **secondary** database instance.
- Updates the Geo Tracking Database instance with changes that need to be executed.
When something is marked to be updated in the tracking database instance, asynchronous jobs running on the **secondary** node will execute the required operations and update the state.
This new architecture allows GitLab to be resilient to connectivity issues between the nodes. It doesn't matter how long the **secondary** node is disconnected from the **primary** node as it will be able to replay all the events in the correct order and become synchronized with the **primary** node again.
## Setup instructions
For setup instructions, see [Setting up Geo](setup/index.md).
## Post-installation documentation
After installing GitLab on the **secondary** nodes and performing the initial configuration, see the following documentation for post-installation information.
### Configuring Geo
For information on configuring Geo, see [Geo configuration](replication/configuration.md).
### Updating Geo
For information on how to update your Geo nodes to the latest GitLab version, see [Updating the Geo nodes](replication/updating_the_geo_nodes.md).
### Pausing and resuming replication
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/35913) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.2.
In some circumstances, like during [upgrades](replication/updating_the_geo_nodes.md) or a [planned failover](disaster_recovery/planned_failover.md), it is desirable to pause replication between the primary and secondary.
Pausing and resuming replication is done via a command line tool from the secondary node.
**To Pause: (from secondary)**
```shell
gitlab-ctl geo-replication-pause
```
**To Resume: (from secondary)**
```shell
gitlab-ctl geo-replication-resume
```
### Configuring Geo for multiple nodes
For information on configuring Geo for multiple nodes, see [Geo for multiple servers](replication/multiple_servers.md).
### Configuring Geo with Object Storage
For information on configuring Geo with object storage, see [Geo with Object storage](replication/object_storage.md).
### Disaster Recovery
For information on using Geo in disaster recovery situations to mitigate data-loss and restore services, see [Disaster Recovery](disaster_recovery/index.md).
### Replicating the Container Registry
For more information on how to replicate the Container Registry, see [Docker Registry for a **secondary** node](replication/docker_registry.md).
### Security Review
For more information on Geo security, see [Geo security review](replication/security_review.md).
### Tuning Geo
For more information on tuning Geo, see [Tuning Geo](replication/tuning.md).
### Set up a location-aware Git URL
For an example of how to set up a location-aware Git remote URL with AWS Route53, see [Location-aware Git remote URL with AWS Route53](replication/location_aware_git_url.md).
## Remove Geo node
For more information on removing a Geo node, see [Removing **secondary** Geo nodes](replication/remove_geo_node.md).
## Disable Geo
To find out how to disable Geo, see [Disabling Geo](replication/disable_geo.md).
## Current limitations
CAUTION: **Caution:**
This list of limitations only reflects the latest version of GitLab. If you are using an older version, extra limitations may be in place.
- Pushing directly to a **secondary** node redirects (for HTTP) or proxies (for SSH) the request to the **primary** node instead of [handling it directly](https://gitlab.com/gitlab-org/gitlab/-/issues/1381), except when using Git over HTTP with credentials embedded within the URI. For example, `https://user:password@secondary.tld`.
- Cloning, pulling, or pushing repositories that exist on the **primary** node but not on the **secondary** nodes where [selective synchronization](replication/configuration.md#selective-synchronization) does not include the project is not supported over SSH [but support is planned](https://gitlab.com/groups/gitlab-org/-/epics/2562). HTTP(S) is supported.
- The **primary** node has to be online for OAuth login to happen. Existing sessions and Git are not affected. Support for the **secondary** node to use an OAuth provider independent from the primary is [being planned](https://gitlab.com/gitlab-org/gitlab/-/issues/208465).
- The installation takes multiple manual steps that together can take about an hour depending on circumstances. We are working on improving this experience. See [Omnibus GitLab issue #2978](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/2978) for details.
- Real-time updates of issues/merge requests (for example, via long polling) doesn't work on the **secondary** node.
- [Selective synchronization](replication/configuration.md#selective-synchronization) applies only to files and repositories. Other datasets are replicated to the **secondary** node in full, making it inappropriate for use as an access control mechanism.
- Object pools for forked project deduplication work only on the **primary** node, and are duplicated on the **secondary** node.
- [External merge request diffs](../merge_request_diffs.md) will not be replicated if they are on-disk, and viewing merge requests will fail. However, external MR diffs in object storage **are** supported. The default configuration (in-database) does work.
- GitLab Runners cannot register with a **secondary** node. Support for this is [planned for the future](https://gitlab.com/gitlab-org/gitlab/-/issues/3294).
### Limitations on replication/verification
You can keep track of the progress to implement the missing items in
these epics/issues:
- [Unreplicated Data Types](https://gitlab.com/groups/gitlab-org/-/epics/893)
- [Verify all replicated data](https://gitlab.com/groups/gitlab-org/-/epics/1430)
There is a complete list of all GitLab [data types](replication/datatypes.md) and [existing support for replication and verification](replication/datatypes.md#limitations-on-replicationverification).
## Frequently Asked Questions
For answers to common questions, see the [Geo FAQ](replication/faq.md).
## Log files
Since GitLab 9.5, Geo stores structured log messages in a `geo.log` file. For Omnibus installations, this file is at `/var/log/gitlab/gitlab-rails/geo.log`.
This file contains information about when Geo attempts to sync repositories and files. Each line in the file contains a separate JSON entry that can be ingested into. For example, Elasticsearch or Splunk.
For example:
```json
{"severity":"INFO","time":"2017-08-06T05:40:16.104Z","message":"Repository update","project_id":1,"source":"repository","resync_repository":true,"resync_wiki":true,"class":"Gitlab::Geo::LogCursor::Daemon","cursor_delay_s":0.038}
```
This message shows that Geo detected that a repository update was needed for project `1`.
## Troubleshooting
For troubleshooting steps, see [Geo Troubleshooting](replication/troubleshooting.md).
......@@ -12,7 +12,7 @@ type: howto
NOTE: **Note:**
This is the final step in setting up a **secondary** Geo node. Stages of the
setup process must be completed in the documented order.
Before attempting the steps in this stage, [complete all prior stages](index.md#using-omnibus-gitlab).
Before attempting the steps in this stage, [complete all prior stages](../setup/index.md#using-omnibus-gitlab).
The basic steps of configuring a **secondary** node are to:
......
......@@ -15,7 +15,7 @@ configuration steps. In this case,
NOTE: **Note:**
The stages of the setup process must be completed in the documented order.
Before attempting the steps in this stage, [complete all prior stages](index.md#using-omnibus-gitlab).
Before attempting the steps in this stage, [complete all prior stages](../setup/index.md#using-omnibus-gitlab).
This document describes the minimal steps you have to take in order to
replicate your **primary** GitLab database to a **secondary** node's database. You may
......
......@@ -135,7 +135,7 @@ has three main functions:
1. Configure the replica database.
1. Configure the tracking database.
1. Enable the [Geo Log Cursor](index.md#geo-log-cursor) (not covered in this section).
1. Enable the [Geo Log Cursor](../index.md#geo-log-cursor) (not covered in this section).
To configure the connection to the external read-replica database and enable Log Cursor:
......
......@@ -9,7 +9,7 @@ type: howto
## What are the minimum requirements to run Geo?
The requirements are listed [on the index page](index.md#requirements-for-running-geo)
The requirements are listed [on the index page](../index.md#requirements-for-running-geo)
## How does Geo know which projects to sync?
......
---
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
type: howto
redirect_to: '../index.md'
---
# Replication (Geo) **(PREMIUM ONLY)**
> - Introduced in GitLab Enterprise Edition 8.9.
> - Using Geo in combination with
> [multi-node architectures](../../reference_architectures/index.md)
> is considered **Generally Available** (GA) in
> [GitLab Premium](https://about.gitlab.com/pricing/) 10.4.
Replication with Geo is the solution for widely distributed development teams.
## Overview
Fetching large repositories can take a long time for teams located far from a single GitLab instance.
Geo provides local, read-only instances of your GitLab instances. This can reduce the time it takes
to clone and fetch large repositories, speeding up development.
NOTE: **Note:**
Check the [requirements](#requirements-for-running-geo) carefully before setting up Geo.
For a video introduction to Geo, see [Introduction to GitLab Geo - GitLab Features](https://www.youtube.com/watch?v=-HDLxSjEh6w).
CAUTION: **Caution:**
Geo undergoes significant changes from release to release. Upgrades **are** supported and [documented](#updating-geo), but you should ensure that you're using the right version of the documentation for your installation.
To make sure you're using the right version of the documentation, navigate to [the source version of this page on GitLab.com](https://gitlab.com/gitlab-org/gitlab/blob/master/doc/administration/geo/replication/index.md) and choose the appropriate release from the **Switch branch/tag** dropdown. For example, [`v11.2.3-ee`](https://gitlab.com/gitlab-org/gitlab/blob/v11.2.3-ee/doc/administration/geo/replication/index.md).
## Use cases
Implementing Geo provides the following benefits:
- Reduce from minutes to seconds the time taken for your distributed developers to clone and fetch large repositories and projects.
- Enable all of your developers to contribute ideas and work in parallel, no matter where they are.
- Balance the read-only load between your **primary** and **secondary** nodes.
In addition, it:
- Can be used for cloning and fetching projects, in addition to reading any data available in the GitLab web interface (see [current limitations](#current-limitations)).
- Overcomes slow connections between distant offices, saving time by improving speed for distributed teams.
- Helps reducing the loading time for automated tasks, custom integrations, and internal workflows.
- Can quickly fail over to a **secondary** node in a [disaster recovery](../disaster_recovery/index.md) scenario.
- Allows [planned failover](../disaster_recovery/planned_failover.md) to a **secondary** node.
Geo provides:
- Read-only **secondary** nodes: Maintain one **primary** GitLab node while still enabling read-only **secondary** nodes for each of your distributed teams.
- Authentication system hooks: **Secondary** nodes receives all authentication data (like user accounts and logins) from the **primary** instance.
- An intuitive UI: **Secondary** nodes utilize the same web interface your team has grown accustomed to. In addition, there are visual notifications that block write operations and make it clear that a user is on a **secondary** node.
## How it works
Your Geo instance can be used for cloning and fetching projects, in addition to reading any data. This will make working with large repositories over large distances much faster.
![Geo overview](img/geo_overview.png)
When Geo is enabled, the:
- Original instance is known as the **primary** node.
- Replicated read-only nodes are known as **secondary** nodes.
Keep in mind that:
- **Secondary** nodes talk to the **primary** node to:
- Get user data for logins (API).
- Replicate repositories, LFS Objects, and Attachments (HTTPS + JWT).
- Since GitLab Premium 10.0, the **primary** node no longer talks to **secondary** nodes to notify for changes (API).
- Pushing directly to a **secondary** node (for both HTTP and SSH, including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
- There are [limitations](#current-limitations) in the current implementation.
### Architecture
The following diagram illustrates the underlying architecture of Geo.
![Geo architecture](img/geo_architecture.png)
In this diagram:
- There is the **primary** node and the details of one **secondary** node.
- Writes to the database can only be performed on the **primary** node. A **secondary** node receives database
updates via PostgreSQL streaming replication.
- If present, the [LDAP server](#ldap) should be configured to replicate for [Disaster Recovery](../disaster_recovery/index.md) scenarios.
- A **secondary** node performs different type of synchronizations against the **primary** node, using a special
authorization protected by JWT:
- Repositories are cloned/updated via Git over HTTPS.
- Attachments, LFS objects, and other files are downloaded via HTTPS using a private API endpoint.
From the perspective of a user performing Git operations:
- The **primary** node behaves as a full read-write GitLab instance.
- **Secondary** nodes are read-only but proxy Git push operations to the **primary** node. This makes **secondary** nodes appear to support push operations themselves.
To simplify the diagram, some necessary components are omitted. Note that:
- Git over SSH requires [`gitlab-shell`](https://gitlab.com/gitlab-org/gitlab-shell) and OpenSSH.
- Git over HTTPS required [`gitlab-workhorse`](https://gitlab.com/gitlab-org/gitlab-workhorse).
Note that a **secondary** node needs two different PostgreSQL databases:
- A read-only database instance that streams data from the main GitLab database.
- [Another database instance](#geo-tracking-database) used internally by the **secondary** node to record what data has been replicated.
In **secondary** nodes, there is an additional daemon: [Geo Log Cursor](#geo-log-cursor).
## Requirements for running Geo
The following are required to run Geo:
- An operating system that supports OpenSSH 6.9+ (needed for
[fast lookup of authorized SSH keys in the database](../../operations/fast_ssh_key_lookup.md))
The following operating systems are known to ship with a current version of OpenSSH:
- [CentOS](https://www.centos.org) 7.4+
- [Ubuntu](https://ubuntu.com) 16.04+
- PostgreSQL 11+ with [Streaming Replication](https://wiki.postgresql.org/wiki/Streaming_Replication)
- All nodes must run the same GitLab version.
Additionally, check GitLab's [minimum requirements](../../../install/requirements.md),
and we recommend you use:
- At least GitLab Enterprise Edition 10.0 for basic Geo features.
- The latest version for a better experience.
### Firewall rules
The following table lists basic ports that must be open between the **primary** and **secondary** nodes for Geo.
| **Primary** node | **Secondary** node | Protocol |
|:-----------------|:-------------------|:-------------|
| 80 | 80 | HTTP |
| 443 | 443 | TCP or HTTPS |
| 22 | 22 | TCP |
| 5432 | | PostgreSQL |
See the full list of ports used by GitLab in [Package defaults](https://docs.gitlab.com/omnibus/package-information/defaults.html)
NOTE: **Note:**
[Web terminal](../../../ci/environments/index.md#web-terminals) support requires your load balancer to correctly handle WebSocket connections.
When using HTTP or HTTPS proxying, your load balancer must be configured to pass through the `Connection` and `Upgrade` hop-by-hop headers. See the [web terminal](../../integration/terminal.md) integration guide for more details.
NOTE: **Note:**
When using HTTPS protocol for port 443, you will need to add an SSL certificate to the load balancers.
If you wish to terminate SSL at the GitLab application server instead, use TCP protocol.
### LDAP
We recommend that if you use LDAP on your **primary** node, you also set up secondary LDAP servers on each **secondary** node. Otherwise, users will not be able to perform Git operations over HTTP(s) on the **secondary** node using HTTP Basic Authentication. However, Git via SSH and personal access tokens will still work.
NOTE: **Note:**
It is possible for all **secondary** nodes to share an LDAP server, but additional latency can be an issue. Also, consider what LDAP server will be available in a [disaster recovery](../disaster_recovery/index.md) scenario if a **secondary** node is promoted to be a **primary** node.
Check for instructions on how to set up replication in your LDAP service. Instructions will be different depending on the software or service used. For example, OpenLDAP provides [these instructions](https://www.openldap.org/doc/admin24/replication.html).
### Geo Tracking Database
The tracking database instance is used as metadata to control what needs to be updated on the disk of the local instance. For example:
- Download new assets.
- Fetch new LFS Objects.
- Fetch changes from a repository that has recently been updated.
Because the replicated database instance is read-only, we need this additional database instance for each **secondary** node.
### Geo Log Cursor
This daemon:
- Reads a log of events replicated by the **primary** node to the **secondary** database instance.
- Updates the Geo Tracking Database instance with changes that need to be executed.
When something is marked to be updated in the tracking database instance, asynchronous jobs running on the **secondary** node will execute the required operations and update the state.
This new architecture allows GitLab to be resilient to connectivity issues between the nodes. It doesn't matter how long the **secondary** node is disconnected from the **primary** node as it will be able to replay all the events in the correct order and become synchronized with the **primary** node again.
## Setup instructions
These instructions assume you have a working instance of GitLab. They guide you through:
1. Making your existing instance the **primary** node.
1. Adding **secondary** nodes.
CAUTION: **Caution:**
The steps below should be followed in the order they appear. **Make sure the GitLab version is the same on all nodes.**
### Using Omnibus GitLab
If you installed GitLab using the Omnibus packages (highly recommended):
1. [Install GitLab Enterprise Edition](https://about.gitlab.com/install/) on the server that will serve as the **secondary** node. Do not create an account or log in to the new **secondary** node.
1. [Upload the GitLab License](../../../user/admin_area/license.md) on the **primary** node to unlock Geo. The license must be for [GitLab Premium](https://about.gitlab.com/pricing/) or higher.
1. [Set up the database replication](database.md) (`primary (read-write) <-> secondary (read-only)` topology).
1. [Configure fast lookup of authorized SSH keys in the database](../../operations/fast_ssh_key_lookup.md). This step is required and needs to be done on **both** the **primary** and **secondary** nodes.
1. [Configure GitLab](configuration.md) to set the **primary** and **secondary** nodes.
1. Optional: [Configure a secondary LDAP server](../../auth/ldap/index.md) for the **secondary** node. See [notes on LDAP](#ldap).
1. [Follow the "Using a Geo Server" guide](using_a_geo_server.md).
## Post-installation documentation
After installing GitLab on the **secondary** nodes and performing the initial configuration, see the following documentation for post-installation information.
### Configuring Geo
For information on configuring Geo, see [Geo configuration](configuration.md).
### Updating Geo
For information on how to update your Geo nodes to the latest GitLab version, see [Updating the Geo nodes](updating_the_geo_nodes.md).
### Pausing and resuming replication
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/35913) in [GitLab Premium](https://about.gitlab.com/pricing/) 13.2.
In some circumstances, like during [upgrades](updating_the_geo_nodes.md) or a [planned failover](../disaster_recovery/planned_failover.md), it is desirable to pause replication between the primary and secondary.
Pausing and resuming replication is done via a command line tool from the secondary node.
**To Pause: (from secondary)**
```shell
gitlab-ctl geo-replication-pause
```
**To Resume: (from secondary)**
```shell
gitlab-ctl geo-replication-resume
```
### Configuring Geo for multiple nodes
For information on configuring Geo for multiple nodes, see [Geo for multiple servers](multiple_servers.md).
### Configuring Geo with Object Storage
For information on configuring Geo with object storage, see [Geo with Object storage](object_storage.md).
### Disaster Recovery
For information on using Geo in disaster recovery situations to mitigate data-loss and restore services, see [Disaster Recovery](../disaster_recovery/index.md).
### Replicating the Container Registry
For more information on how to replicate the Container Registry, see [Docker Registry for a **secondary** node](docker_registry.md).
### Security Review
For more information on Geo security, see [Geo security review](security_review.md).
### Tuning Geo
For more information on tuning Geo, see [Tuning Geo](tuning.md).
### Set up a location-aware Git URL
For an example of how to set up a location-aware Git remote URL with AWS Route53, see [Location-aware Git remote URL with AWS Route53](location_aware_git_url.md).
## Remove Geo node
For more information on removing a Geo node, see [Removing **secondary** Geo nodes](remove_geo_node.md).
## Disable Geo
To find out how to disable Geo, see [Disabling Geo](disable_geo.md).
## Current limitations
CAUTION: **Caution:**
This list of limitations only reflects the latest version of GitLab. If you are using an older version, extra limitations may be in place.
- Pushing directly to a **secondary** node redirects (for HTTP) or proxies (for SSH) the request to the **primary** node instead of [handling it directly](https://gitlab.com/gitlab-org/gitlab/-/issues/1381), except when using Git over HTTP with credentials embedded within the URI. For example, `https://user:password@secondary.tld`.
- Cloning, pulling, or pushing repositories that exist on the **primary** node but not on the **secondary** nodes where [selective synchronization](configuration.md#selective-synchronization) does not include the project is not supported over SSH [but support is planned](https://gitlab.com/groups/gitlab-org/-/epics/2562). HTTP(S) is supported.
- The **primary** node has to be online for OAuth login to happen. Existing sessions and Git are not affected. Support for the **secondary** node to use an OAuth provider independent from the primary is [being planned](https://gitlab.com/gitlab-org/gitlab/-/issues/208465).
- The installation takes multiple manual steps that together can take about an hour depending on circumstances. We are working on improving this experience. See [Omnibus GitLab issue #2978](https://gitlab.com/gitlab-org/omnibus-gitlab/-/issues/2978) for details.
- Real-time updates of issues/merge requests (for example, via long polling) doesn't work on the **secondary** node.
- [Selective synchronization](configuration.md#selective-synchronization) applies only to files and repositories. Other datasets are replicated to the **secondary** node in full, making it inappropriate for use as an access control mechanism.
- Object pools for forked project deduplication work only on the **primary** node, and are duplicated on the **secondary** node.
- [External merge request diffs](../../merge_request_diffs.md) will not be replicated if they are on-disk, and viewing merge requests will fail. However, external MR diffs in object storage **are** supported. The default configuration (in-database) does work.
- Runners cannot register with a **secondary** node. Support for this is [planned for the future](https://gitlab.com/gitlab-org/gitlab/-/issues/3294).
### Limitations on replication/verification
You can keep track of the progress to implement the missing items in
these epics/issues:
- [Unreplicated Data Types](https://gitlab.com/groups/gitlab-org/-/epics/893)
- [Verify all replicated data](https://gitlab.com/groups/gitlab-org/-/epics/1430)
There is a complete list of all GitLab [data types](datatypes.md) and [existing support for replication and verification](datatypes.md#limitations-on-replicationverification).
## Frequently Asked Questions
For answers to common questions, see the [Geo FAQ](faq.md).
## Log files
Since GitLab 9.5, Geo stores structured log messages in a `geo.log` file. For Omnibus installations, this file is at `/var/log/gitlab/gitlab-rails/geo.log`.
This file contains information about when Geo attempts to sync repositories and files. Each line in the file contains a separate JSON entry that can be ingested into. For example, Elasticsearch or Splunk.
For example:
```json
{"severity":"INFO","time":"2017-08-06T05:40:16.104Z","message":"Repository update","project_id":1,"source":"repository","resync_repository":true,"resync_wiki":true,"class":"Gitlab::Geo::LogCursor::Daemon","cursor_delay_s":0.038}
```
This message shows that Geo detected that a repository update was needed for project `1`.
## Troubleshooting
For troubleshooting steps, see [Geo Troubleshooting](troubleshooting.md).
This document was moved to [another location](../index.md).
......@@ -44,7 +44,7 @@ In any case, you require:
- A Route53 Hosted Zone managing your domain.
If you have not yet setup a Geo **primary** node and **secondary** node, please consult
[the Geo setup instructions](index.md#setup-instructions).
[the Geo setup instructions](../index.md#setup-instructions).
## Create a traffic policy
......
......@@ -123,7 +123,7 @@ from [owasp.org](https://owasp.org/).
- Geo imposes no additional restrictions on operating system (see the
[GitLab installation](https://about.gitlab.com/install/) page for more
details), however we recommend using the operating systems listed in the [Geo documentation](index.md#requirements-for-running-geo).
details), however we recommend using the operating systems listed in the [Geo documentation](../index.md#requirements-for-running-geo).
### What details regarding required OS components and lock‐down needs have been defined?
......
......@@ -24,12 +24,12 @@ These general update steps are not intended for [high-availability deployments](
To update the Geo nodes when a new GitLab version is released, update **primary**
and all **secondary** nodes:
1. **Optional:** [Pause replication on each **secondary** node.](./index.md#pausing-and-resuming-replication)
1. **Optional:** [Pause replication on each **secondary** node.](../index.md#pausing-and-resuming-replication)
1. Log into the **primary** node.
1. [Update GitLab on the **primary** node using Omnibus](https://docs.gitlab.com/omnibus/update/README.html).
1. Log into each **secondary** node.
1. [Update GitLab on each **secondary** node using Omnibus](https://docs.gitlab.com/omnibus/update/README.html).
1. If you paused replication in step 1, [resume replication on each **secondary**](./index.md#pausing-and-resuming-replication)
1. If you paused replication in step 1, [resume replication on each **secondary**](../index.md#pausing-and-resuming-replication)
1. [Test](#check-status-after-updating) **primary** and **secondary** nodes, and check version in each.
### Check status after updating
......
......@@ -9,7 +9,7 @@ type: howto
# Using a Geo Server **(PREMIUM ONLY)**
After you set up the [database replication and configure the Geo nodes](index.md#setup-instructions), use your closest GitLab node as you would a normal standalone GitLab instance.
After you set up the [database replication and configure the Geo nodes](../index.md#setup-instructions), use your closest GitLab node as you would a normal standalone GitLab instance.
Pushing directly to a **secondary** node (for both HTTP, SSH including Git LFS) was [introduced](https://about.gitlab.com/releases/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
......
---
stage: Enablement
group: Geo
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#designated-technical-writers
type: howto
---
# Setting up Geo
These instructions assume you have a working instance of GitLab. They guide you through:
1. Making your existing instance the **primary** node.
1. Adding **secondary** nodes.
CAUTION: **Caution:**
The steps below should be followed in the order they appear. **Make sure the GitLab version is the same on all nodes.**
## Using Omnibus GitLab
If you installed GitLab using the Omnibus packages (highly recommended):
1. [Install GitLab Enterprise Edition](https://about.gitlab.com/install/) on the server that will serve as the **secondary** node. Do not create an account or log in to the new **secondary** node.
1. [Upload the GitLab License](../../../user/admin_area/license.md) on the **primary** node to unlock Geo. The license must be for [GitLab Premium](https://about.gitlab.com/pricing/) or higher.
1. [Set up the database replication](../replication/database.md) (`primary (read-write) <-> secondary (read-only)` topology).
1. [Configure fast lookup of authorized SSH keys in the database](../../operations/fast_ssh_key_lookup.md). This step is required and needs to be done on **both** the **primary** and **secondary** nodes.
1. [Configure GitLab](../replication/configuration.md) to set the **primary** and **secondary** nodes.
1. Optional: [Configure a secondary LDAP server](../../auth/ldap/index.md) for the **secondary** node. See [notes on LDAP](../index.md#ldap).
1. [Follow the "Using a Geo Server" guide](../replication/using_a_geo_server.md).
## Post-installation documentation
After installing GitLab on the **secondary** nodes and performing the initial configuration, see the [following documentation for post-installation information](../index.md#post-installation-documentation).
......@@ -171,7 +171,7 @@ We will note in the instructions below where these secrets are required.
NOTE: **Note:**
Do not store the GitLab application database and the Praefect
database on the same PostgreSQL server if using
[Geo](../geo/replication/index.md). The replication state is internal to each instance
[Geo](../geo/index.md). The replication state is internal to each instance
of GitLab and should not be replicated.
These instructions help set up a single PostgreSQL database, which creates a single point of
......
......@@ -36,7 +36,7 @@ Learn how to install, configure, update, and maintain your GitLab instance.
- [Omnibus support for log forwarding](https://docs.gitlab.com/omnibus/settings/logs.html#udp-log-shipping-gitlab-enterprise-edition-only) **(STARTER ONLY)**
- [Reference architectures](reference_architectures/index.md): Add additional resources to support more users.
- [Installing GitLab on Amazon Web Services (AWS)](../install/aws/index.md): Set up GitLab on Amazon AWS.
- [Geo](geo/replication/index.md): Replicate your GitLab instance to other geographic locations as a read-only fully operational version. **(PREMIUM ONLY)**
- [Geo](geo/index.md): Replicate your GitLab instance to other geographic locations as a read-only fully operational version. **(PREMIUM ONLY)**
- [Disaster Recovery](geo/disaster_recovery/index.md): Quickly fail-over to a different site with minimal effort in a disaster situation. **(PREMIUM ONLY)**
- [Pivotal Tile](../install/pivotal/index.md): Deploy GitLab as a preconfigured appliance using Ops Manager (BOSH) for Pivotal Cloud Foundry. **(PREMIUM ONLY)**
- [Add License](../user/admin_area/license.md): Upload a license at install time to unlock features that are in paid tiers of GitLab. **(STARTER ONLY)**
......
......@@ -32,10 +32,10 @@ feature for CentOS 6, follow [the instructions on how to build and install a cus
By default, GitLab manages an `authorized_keys` file, which contains all the
public SSH keys for users allowed to access GitLab. However, to maintain a
single source of truth, [Geo](../geo/replication/index.md) needs to be configured to perform SSH fingerprint
single source of truth, [Geo](../geo/index.md) needs to be configured to perform SSH fingerprint
lookups via database lookup.
As part of [setting up Geo](../geo/replication/index.md#setup-instructions),
As part of [setting up Geo](../geo/index.md#setup-instructions),
you will be required to follow the steps outlined below for both the primary and
secondary nodes, but note that the `Write to "authorized keys" file` checkbox
only needs to be unchecked on the primary node since it will be reflected
......
# Geo Rake Tasks **(PREMIUM ONLY)**
The following Rake tasks are for [Geo installations](../geo/replication/index.md).
The following Rake tasks are for [Geo installations](../geo/index.md).
## Git housekeeping
......
......@@ -110,7 +110,7 @@ If you find it necessary, you can run this migration script again to schedule mi
Any error or warning will be logged in Sidekiq's log file.
NOTE: **Note:**
If [Geo](../geo/replication/index.md) is enabled, each project that is successfully migrated
If [Geo](../geo/index.md) is enabled, each project that is successfully migrated
generates an event to replicate the changes on any **secondary** nodes.
You only need the `gitlab:storage:migrate_to_hashed` Rake task to migrate your repositories, but we have additional
......
......@@ -147,7 +147,7 @@ is recommended.
> - Required domain knowledge: Storage replication
> - Supported tiers: [GitLab Premium and Ultimate](https://about.gitlab.com/pricing/)
[GitLab Geo](../geo/replication/index.md) allows you to replicate your GitLab
[GitLab Geo](../geo/index.md) allows you to replicate your GitLab
instance to other geographical locations as a read-only fully operational instance
that can also be promoted in case of disaster.
......
......@@ -37,7 +37,7 @@ Note the following about server hooks:
- [GitLab CI/CD](../ci/README.md).
- [Push Rules](../push_rules/push_rules.md), for a user-configurable Git hook
interface. **(STARTER)**
- Server hooks aren't replicated to [Geo](geo/replication/index.md) secondary nodes.
- Server hooks aren't replicated to [Geo](geo/index.md) secondary nodes.
## Create a server hook for a repository
......
......@@ -304,7 +304,7 @@ repository updates to secondary nodes.
#### GitLab Geo
- Configuration:
- [Omnibus](../administration/geo/replication/index.md#setup-instructions)
- [Omnibus](../administration/geo/setup/index.md)
- [Charts](https://docs.gitlab.com/charts/advanced/geo/)
- [GDK](https://gitlab.com/gitlab-org/gitlab-development-kit/blob/master/doc/howto/geo.md)
- Layer: Core Service (Processor)
......
......@@ -91,7 +91,7 @@ Examples of use cases on feature pages:
- CE and EE: [Issues](../../user/project/issues/index.md#use-cases)
- CE and EE: [Merge Requests](../../user/project/merge_requests/index.md)
- EE-only: [Geo](../../administration/geo/replication/index.md)
- EE-only: [Geo](../../administration/geo/index.md)
- EE-only: [Jenkins integration](../../integration/jenkins.md)
## Prerequisites
......
......@@ -22,7 +22,7 @@ project.feature_available?(:feature_symbol)
## Restricting global features (instance)
However, for features such as [Geo](../administration/geo/replication/index.md) and
However, for features such as [Geo](../administration/geo/index.md) and
[Load balancing](../administration/database_load_balancing.md), which cannot be restricted
to only a subset of projects or namespaces, the check will be made directly in
the instance license.
......
---
redirect_to: '../administration/geo/replication/index.md'
redirect_to: '../administration/geo/index.md'
---
This document was moved to [another location](../administration/geo/replication/index.md).
This document was moved to [another location](../administration/geo/index.md).
......@@ -799,7 +799,7 @@ to request additional material:
- [Scaling GitLab](../../administration/reference_architectures/index.md):
GitLab supports several different types of clustering.
- [Geo replication](../../administration/geo/replication/index.md):
- [Geo replication](../../administration/geo/index.md):
Geo is the solution for widely distributed development teams.
- [Omnibus GitLab](https://docs.gitlab.com/omnibus/) - Everything you need to know
about administering your GitLab instance.
......
......@@ -151,7 +151,7 @@ Support for [PostgreSQL 9.6 and 10 has been removed in GitLab 13.0](https://abou
#### Additional requirements for GitLab Geo
If you're using [GitLab Geo](../administration/geo/replication/index.md):
If you're using [GitLab Geo](../administration/geo/index.md):
- We strongly recommend running Omnibus-managed instances as they are actively
developed and tested. We aim to be compatible with most external (not managed
......
......@@ -24,7 +24,7 @@ The following are available Rake tasks:
| [Elasticsearch](../integration/elasticsearch.md#gitlab-elasticsearch-rake-tasks) **(STARTER ONLY)** | Maintain Elasticsearch in a GitLab instance. |
| [Enable namespaces](features.md) | Enable usernames and namespaces for user projects. |
| [General maintenance](../administration/raketasks/maintenance.md) | General maintenance and self-check tasks. |
| [Geo maintenance](../administration/raketasks/geo.md) **(PREMIUM ONLY)** | [Geo](../administration/geo/replication/index.md)-related maintenance. |
| [Geo maintenance](../administration/raketasks/geo.md) **(PREMIUM ONLY)** | [Geo](../administration/geo/index.md)-related maintenance. |
| [GitHub import](../administration/raketasks/github_import.md) | Retrieve and import repositories from GitHub. |
| [Import repositories](import.md) | Import bare repositories into your GitLab instance. |
| [Import large project exports](../development/import_project.md#importing-via-a-rake-task) | Import large GitLab [project exports](../user/project/settings/import_export.md). |
......
......@@ -8,7 +8,7 @@ type: howto
# Geo nodes Admin Area **(PREMIUM ONLY)**
You can configure various settings for GitLab Geo nodes. For more information, see
[Geo documentation](../../administration/geo/replication/index.md).
[Geo documentation](../../administration/geo/index.md).
On the primary node, go to **Admin Area > Geo**. On secondary nodes, go to **Admin Area > Geo > Nodes**.
......
......@@ -11,7 +11,7 @@
= link_to s_("GeoNodes|New node"), new_admin_geo_node_path, class: 'btn btn-success ml-auto qa-new-node-link'
%p.page-subtitle.light
= s_('GeoNodes|With %{geo} you can install a special read-only and replicated instance anywhere. Before you add nodes, follow the %{instructions} in the exact order they appear.').html_safe % { geo: link_to('GitLab Geo', help_page_path('administration/geo/replication/index.md'), target: '_blank'), instructions: link_to('setup instructions', help_page_path('administration/geo/replication/index.md', anchor: 'setup-instructions'), target: '_blank') }
= s_('GeoNodes|With %{geo} you can install a special read-only and replicated instance anywhere. Before you add nodes, follow the %{instructions} in the exact order they appear.').html_safe % { geo: link_to('GitLab Geo', help_page_path('administration/geo/index.md'), target: '_blank'), instructions: link_to('setup instructions', help_page_path('administration/geo/setup/index.md'), target: '_blank') }
- if @nodes.any?
#js-geo-nodes{ data: node_vue_list_properties }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment