Docs: Merge EE doc/administration/geo to CE

e2fe9c8e · Marcel Amirault · Achilleas Pipinellis · 2c1a9367 · e2fe9c8e · e2fe9c8e
Commit e2fe9c8e authored May 05, 2019 by Marcel Amirault Committed by Achilleas Pipinellis May 05, 2019
28 changed files
--- a/doc/administration/geo/disaster_recovery/background_verification.md
+++ b/doc/administration/geo/disaster_recovery/background_verification.md
+# Automatic background verification **[PREMIUM ONLY]**
+
+NOTE: **Note:**
+Automatic background verification of repositories and wikis was added in
+GitLab EE 10.6 but is enabled by default only on GitLab EE 11.1. You can
+disable or enable this feature manually by following
+[these instructions](#disabling-or-enabling-the-automatic-background-verification).
+
+Automatic background verification ensures that the transferred data matches a
+calculated checksum. If the checksum of the data on the **primary** node matches checksum of the
+data on the **secondary** node, the data transferred successfully. Following a planned failover,
+any corrupted data may be **lost**, depending on the extent of the corruption.
+
+If verification fails on the **primary** node, this indicates that Geo is
+successfully replicating a corrupted object; restore it from backup or remove it
+it from the **primary** node to resolve the issue.
+
+If verification succeeds on the **primary** node but fails on the **secondary** node,
+this indicates that the object was corrupted during the replication process.
+Geo actively try to correct verification failures marking the repository to
+be resynced with a backoff period. If you want to reset the verification for
+these failures, so you should follow [these instructions][reset-verification].
+
+If verification is lagging significantly behind replication, consider giving
+the node more time before scheduling a planned failover.
+
+## Disabling or enabling the automatic background verification
+
+Run the following commands in a Rails console on the **primary** node:
+
+```sh
+# Omnibus GitLab
+gitlab-rails console
+
+# Installation from source
+cd /home/git/gitlab
+sudo -u git -H bin/rails console RAILS_ENV=production
+```
+
+To check if automatic background verification is enabled:
+
+```ruby
+Gitlab::Geo.repository_verification_enabled?
+```
+
+To disable automatic background verification:
+
+```ruby
+Feature.disable('geo_repository_verification')
+```
+
+To enable automatic background verification:
+
+```ruby
+Feature.enable('geo_repository_verification')
+```
+
+## Repository verification
+
+Navigate to the **Admin Area > Geo** dashboard on the **primary** node and expand
+the **Verification information** tab for that node to view automatic checksumming
+status for repositories and wikis. Successes are shown in green, pending work
+in grey, and failures in red.
+
+![Verification status](img/verification-status-primary.png)
+
+Navigate to the **Admin Area > Geo** dashboard on the **secondary** node and expand
+the **Verification information** tab for that node to view automatic verification
+status for repositories and wikis. As with checksumming, successes are shown in
+green, pending work in grey, and failures in red.
+
+![Verification status](img/verification-status-secondary.png)
+
+## Using checksums to compare Geo nodes
+
+To check the health of Geo **secondary** nodes, we use a checksum over the list of
+Git references and their values. The checksum includes `HEAD`, `heads`, `tags`,
+`notes`, and GitLab-specific references to ensure true consistency. If two nodes
+have the same checksum, then they definitely hold the same references. We compute
+the checksum for every node after every update to make sure that they are all
+in sync.
+
+## Repository re-verification
+
+> [Introduced](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/8550) in GitLab Enterprise Edition 11.6. Available in [GitLab Premium](https://about.gitlab.com/pricing/).
+
+Due to bugs or transient infrastructure failures, it is possible for Git
+repositories to change unexpectedly without being marked for verification.
+Geo constantly reverifies the repositories to ensure the integrity of the
+data. The default and recommended re-verification interval is 7 days, though
+an interval as short as 1 day can be set. Shorter intervals reduce risk but
+increase load and vice versa.
+
+Navigate to the **Admin Area > Geo** dashboard on the **primary** node, and
+click the **Edit** button for the **primary** node to customize the minimum
+re-verification interval:
+
+![Re-verification interval](img/reverification-interval.png)
+
+The automatic background re-verification is enabled by default, but you can
+disable if you need. Run the following commands in a Rails console on the
+**primary** node:
+
+```sh
+# Omnibus GitLab
+gitlab-rails console
+
+# Installation from source
+cd /home/git/gitlab
+sudo -u git -H bin/rails console RAILS_ENV=production
+```
+
+To disable automatic background re-verification:
+
+```ruby
+Feature.disable('geo_repository_reverification')
+```
+
+To enable automatic background re-verification:
+
+```ruby
+Feature.enable('geo_repository_reverification')
+```
+
+## Reset verification for projects where verification has failed
+
+Geo actively try to correct verification failures marking the repository to
+be resynced with a backoff period. If you want to reset them manually, this
+rake task marks projects where verification has failed or the checksum mismatch
+to be resynced without the backoff period:
+
+For repositories:
+
+- Omnibus Installation
+
+    ```sh
+    sudo gitlab-rake geo:verification:repository:reset
+    ```
+
+- Source Installation
+
+    ```sh
+    sudo -u git -H bundle exec rake geo:verification:repository:reset RAILS_ENV=production
+    ```
+
+For wikis:
+
+- Omnibus Installation
+
+    ```sh
+    sudo gitlab-rake geo:verification:wiki:reset
+    ```
+
+- Source Installation
+
+    ```sh
+    sudo -u git -H bundle exec rake geo:verification:wiki:reset RAILS_ENV=production
+    ```
+
+## Current limitations
+
+Until [issue #5064][ee-5064] is completed, background verification doesn't cover
+CI job artifacts and traces, LFS objects, or user uploads in file storage.
+Verify their integrity manually by following [these instructions][foreground-verification]
+on both nodes, and comparing the output between them.
+
+Data in object storage is **not verified**, as the object store is responsible
+for ensuring the integrity of the data.
+
+[reset-verification]: background_verification.md#reset-verification-for-projects-where-verification-has-failed
+[foreground-verification]: ../../raketasks/check.md
+[ee-5064]: https://gitlab.com/gitlab-org/gitlab-ee/issues/5064
--- a/doc/administration/geo/disaster_recovery/bring_primary_back.md
+++ b/doc/administration/geo/disaster_recovery/bring_primary_back.md
+# Bring a demoted primary node back online **[PREMIUM ONLY]**
+
+After a failover, it is possible to fail back to the demoted **primary** node to
+restore your original configuration. This process consists of two steps:
+
+1. Making the old **primary** node a **secondary** node.
+1. Promoting a **secondary** node to a **primary** node.
+
+CAUTION: **Caution:**
+If you have any doubts about the consistency of the data on this node, we recommend setting it up from scratch.
+
+## Configure the former **primary** node to be a **secondary** node
+
+Since the former **primary** node will be out of sync with the current **primary** node, the first step is to bring the former **primary** node up to date. Note, deletion of data stored on disk like
+repositories and uploads will not be replayed when bringing the former **primary** node back
+into sync, which may result in increased disk usage.
+Alternatively, you can [set up a new **secondary** GitLab instance][setup-geo] to avoid this.
+
+To bring the former **primary** node up to date:
+
+1. SSH into the former **primary** node that has fallen behind.
+1. Make sure all the services are up:
+
+    ```sh
+    sudo gitlab-ctl start
+    ```
+
+    > **Note 1:** If you [disabled the **primary** node permanently][disaster-recovery-disable-primary],
+    > you need to undo those steps now. For Debian/Ubuntu you just need to run
+    > `sudo systemctl enable gitlab-runsvdir`. For CentOS 6, you need to install
+    > the GitLab instance from scratch and set it up as a **secondary** node by
+    > following [Setup instructions][setup-geo]. In this case, you don't need to follow the next step.
+    >
+    > **Note 2:** If you [changed the DNS records](index.md#step-4-optional-updating-the-primary-domain-dns-record)
+    > for this node during disaster recovery procedure you may need to [block
+    > all the writes to this node](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/doc/gitlab-geo/planned-failover.md#block-primary-traffic)
+    > during this procedure.
+
+1. [Setup database replication][database-replication]. Note that in this
+   case, **primary** node refers to the current **primary** node, and **secondary** node refers to the
+   former **primary** node.
+
+If you have lost your original **primary** node, follow the
+[setup instructions][setup-geo] to set up a new **secondary** node.
+
+## Promote the **secondary** node to **primary** node
+
+When the initial replication is complete and the **primary** node and **secondary** node are
+closely in sync, you can do a [planned failover].
+
+## Restore the **secondary** node
+
+If your objective is to have two nodes again, you need to bring your **secondary**
+node back online as well by repeating the first step
+([configure the former **primary** node to be a **secondary** node](#configure-the-former-primary-node-to-be-a-secondary-node))
+for the **secondary** node.
+
+[setup-geo]: ../replication/index.md#setup-instructions
+[database-replication]: ../replication/database.md
+[disaster-recovery-disable-primary]: index.md#step-2-permanently-disable-the-primary-node
+[planned failover]: planned_failover.md
--- a/doc/administration/geo/disaster_recovery/img/replication-status.png
+++ b/doc/administration/geo/disaster_recovery/img/replication-status.png
--- a/doc/administration/geo/disaster_recovery/img/reverification-interval.png
+++ b/doc/administration/geo/disaster_recovery/img/reverification-interval.png
--- a/doc/administration/geo/disaster_recovery/img/verification-status-primary.png
+++ b/doc/administration/geo/disaster_recovery/img/verification-status-primary.png
--- a/doc/administration/geo/disaster_recovery/img/verification-status-secondary.png
+++ b/doc/administration/geo/disaster_recovery/img/verification-status-secondary.png
--- a/doc/administration/geo/disaster_recovery/index.md
+++ b/doc/administration/geo/disaster_recovery/index.md
--- a/doc/administration/geo/disaster_recovery/planned_failover.md
+++ b/doc/administration/geo/disaster_recovery/planned_failover.md
--- a/doc/administration/geo/replication/configuration.md
+++ b/doc/administration/geo/replication/configuration.md
--- a/doc/administration/geo/replication/configuration_source.md
+++ b/doc/administration/geo/replication/configuration_source.md
+# Geo configuration (source) **[PREMIUM ONLY]**
+
+NOTE: **Note:**
+This documentation applies to GitLab source installations. In GitLab 11.5, this documentation was deprecated and will be removed in a future release.
+Please consider [migrating to GitLab Omnibus install](https://docs.gitlab.com/omnibus/update/convert_to_omnibus.html). For installations
+using the Omnibus GitLab packages, follow the
+[**Omnibus Geo nodes configuration**][configuration] guide.
+
+## Configuring a new **secondary** node
+
+NOTE: **Note:**
+This is the final step in setting up a **secondary** node. Stages of the setup
+process must be completed in the documented order. Before attempting the steps
+in this stage, [complete all prior stages](index.md#using-gitlab-installed-from-source-deprecated).
+
+The basic steps of configuring a **secondary** node are to:
+
+- Replicate required configurations between the **primary** and **secondary** nodes.
+- Configure a tracking database on each **secondary** node.
+- Start GitLab on the **secondary** node.
+
+You are encouraged to first read through all the steps before executing them
+in your testing/production environment.
+
+NOTE: **Note:**
+**Do not** set up any custom authentication on **secondary** nodes, this will be handled by the **primary** node.
+
+NOTE: **Note:**
+**Do not** add anything in the **secondary** node's admin area (**Admin Area > Geo**). This is handled solely by the **primary** node.
+
+### Step 1. Manually replicate secret GitLab values
+
+GitLab stores a number of secret values in the `/home/git/gitlab/config/secrets.yml`
+file which *must* match between the **primary** and **secondary** nodes. Until there is
+a means of automatically replicating these between nodes (see [gitlab-org/gitlab-ee#3789]), they must
+be manually replicated to **secondary** nodes.
+
+1. SSH into the **primary** node, and execute the command below:
+
+    ```sh
+    sudo cat /home/git/gitlab/config/secrets.yml
+    ```
+
+    This will display the secrets that need to be replicated, in YAML format.
+
+1. SSH into the **secondary** node and login as the `git` user:
+
+    ```sh
+    sudo -i -u git
+    ```
+
+1. Make a backup of any existing secrets:
+
+    ```sh
+    mv /home/git/gitlab/config/secrets.yml /home/git/gitlab/config/secrets.yml.`date +%F`
+    ```
+
+1. Copy `/home/git/gitlab/config/secrets.yml` from the **primary** node to the **secondary** node, or
+   copy-and-paste the file contents between nodes:
+
+    ```sh
+    sudo editor /home/git/gitlab/config/secrets.yml
+
+    # paste the output of the `cat` command you ran on the primary
+    # save and exit
+    ```
+
+1. Ensure the file permissions are correct:
+
+    ```sh
+    chown git:git /home/git/gitlab/config/secrets.yml
+    chmod 0600 /home/git/gitlab/config/secrets.yml
+    ```
+
+1. Restart GitLab
+
+    ```sh
+    service gitlab restart
+    ```
+
+Once restarted, the **secondary** node will automatically start replicating missing data
+from the **primary** node in a process known as backfill. Meanwhile, the **primary** node
+will start to notify the **secondary** node of any changes, so that the **secondary** node can
+act on those notifications immediately.
+
+Make sure the **secondary** node is running and accessible. You can login to
+the **secondary** node with the same credentials as used for the **primary** node.
+
+### Step 2. Manually replicate the **primary** node's SSH host keys
+
+Read [Manually replicate the **primary** node's SSH host keys](configuration.md#step-2-manually-replicate-the-primary-nodes-ssh-host-keys)
+
+### Step 3. Add the **secondary** GitLab node
+
+1. Navigate to the **primary** node's **Admin Area > Geo**
+   (`/admin/geo/nodes`) in your browser.
+1. Add the **secondary** node by providing its full URL. **Do NOT** check the
+   **This is a primary node** checkbox.
+1. Optionally, choose which namespaces should be replicated by the
+   **secondary** node. Leave blank to replicate all. Read more in
+   [selective synchronization](#selective-synchronization).
+1. Click the **Add node** button.
+1. SSH into your GitLab **secondary** server and restart the services:
+
+    ```sh
+    service gitlab restart
+    ```
+
+    Check if there are any common issue with your Geo setup by running:
+
+    ```sh
+    bundle exec rake gitlab:geo:check
+    ```
+
+1. SSH into your GitLab **primary** server and login as root to verify the
+   **secondary** node is reachable or there are any common issue with your Geo setup:
+
+    ```sh
+    bundle exec rake gitlab:geo:check
+    ```
+
+Once reconfigured, the **secondary** node will automatically start
+replicating missing data from the **primary** node in a process known as backfill.
+Meanwhile, the **primary** node will start to notify the **secondary** node of any changes, so
+that the **secondary** node can act on those notifications immediately.
+
+Make sure the **secondary** node is running and accessible.
+You can log in to the **secondary** node with the same credentials as used for the **primary** node.
+
+### Step 4. Enabling Hashed Storage
+
+Read [Enabling Hashed Storage](configuration.md#step-4-enabling-hashed-storage).
+
+### Step 5. (Optional) Configuring the secondary to trust the primary
+
+You can safely skip this step if your **primary** node uses a CA-issued HTTPS certificate.
+
+If your **primary** node is using a self-signed certificate for *HTTPS* support, you will
+need to add that certificate to the **secondary** node's trust store. Retrieve the
+certificate from the **primary** node and follow your distribution's instructions for
+adding it to the **secondary** node's trust store. In Debian/Ubuntu, you would follow these steps:
+
+```sh
+sudo -i
+cp <primary_node_certification_file> /usr/local/share/ca-certificates
+update-ca-certificates
+```
+
+### Step 6. Enable Git access over HTTP/HTTPS
+
+Geo synchronizes repositories over HTTP/HTTPS, and therefore requires this clone
+method to be enabled. Navigate to **Admin Area > Settings**
+(`/admin/application_settings`) on the **primary** node, and set
+`Enabled Git access protocols` to `Both SSH and HTTP(S)` or `Only HTTP(S)`.
+
+### Step 7. Verify proper functioning of the secondary node
+
+Read [Verify proper functioning of the secondary node][configuration-verify-node].
+
+## Selective synchronization
+
+Read [Selective synchronization][configuration-selective-replication].
+
+## Troubleshooting
+
+Read the [troubleshooting document][troubleshooting].
+
+[gitlab-org/gitlab-ee#3789]: https://gitlab.com/gitlab-org/gitlab-ee/issues/3789
+[configuration]: configuration.md
+[configuration-selective-replication]: configuration.md#selective-synchronization
+[configuration-verify-node]: configuration.md#step-7-verify-proper-functioning-of-the-secondary-node
+[troubleshooting]: troubleshooting.md
--- a/doc/administration/geo/replication/database.md
+++ b/doc/administration/geo/replication/database.md
--- a/doc/administration/geo/replication/database_source.md
+++ b/doc/administration/geo/replication/database_source.md
--- a/doc/administration/geo/replication/docker_registry.md
+++ b/doc/administration/geo/replication/docker_registry.md
+# Docker Registry for a secondary node **[PREMIUM ONLY]**
+
+You can set up a [Docker Registry] on your
+**secondary** Geo node that mirrors the one on the **primary** Geo node.
+
+## Storage support
+
+CAUTION: **Warning:**
+If you use [local storage][registry-storage]
+for the Container Registry you **cannot** replicate it to a **secondary** node.
+
+Docker Registry currently supports a few types of storages. If you choose a
+distributed storage (`azure`, `gcs`, `s3`, `swift`, or `oss`) for your Docker
+Registry on the **primary** node, you can use the same storage for a **secondary**
+Docker Registry as well. For more information, read the
+[Load balancing considerations][registry-load-balancing]
+when deploying the Registry, and how to set up the storage driver for GitLab's
+integrated [Container Registry][registry-storage].
+
+[ee]: https://about.gitlab.com/pricing/
+[Docker Registry]: https://docs.docker.com/registry/
+[registry-storage]: ../../container_registry.md#container-registry-storage-driver
+[registry-load-balancing]: https://docs.docker.com/registry/deploying/#load-balancing-considerations
--- a/doc/administration/geo/replication/external_database.md
+++ b/doc/administration/geo/replication/external_database.md
+# Geo with external PostgreSQL instances **[PREMIUM ONLY]**
+
+This document is relevant if you are using a PostgreSQL instance that is *not
+managed by Omnibus*. This includes cloud-managed instances like AWS RDS, or
+manually installed and configured PostgreSQL instances.
+
+NOTE: **Note**:
+We strongly recommend running Omnibus-managed instances as they are actively
+developed and tested. We aim to be compatible with most external
+(not managed by Omnibus) databases but we do not guarantee compatibility.
+
+## **Primary** node
+
+1. SSH into a GitLab **primary** application server and login as root:
+
+    ```sh
+    sudo -i
+    ```
+
+1. Execute the command below to define the node as **primary** node:
+
+    ```sh
+    gitlab-ctl set-geo-primary-node
+    ```
+
+    This command will use your defined `external_url` in `/etc/gitlab/gitlab.rb`.
+
+
+### Configure the external database to be replicated
+
+To set up an external database, you can either:
+
+- Set up streaming replication yourself (for example, in AWS RDS).
+- Perform the Omnibus configuration manually as follows.
+
+#### Leverage your cloud provider's tools to replicate the primary database
+
+Given you have a primary node set up on AWS EC2 that uses RDS.
+You can now just create a read-only replica in a different region and the
+replication process will be managed by AWS. Make sure you've set Network ACL, Subnet, and
+Security Group according to your needs, so the secondary application node can access the database.
+Skip to the [Configure secondary application node](#configure-secondary-application-nodes-to-use-the-external-read-replica) section below.
+
+#### Manually configure the primary database for replication
+
+The [geo_primary_role](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
+configures the **primary** node's database to be replicated by making changes to
+`pg_hba.conf` and `postgresql.conf`. Make the following configuration changes
+manually to your external database configuration:
+
+```
+##
+## Geo Primary Role
+## - pg_hba.conf
+##
+host    replication gitlab_replicator <trusted secondary IP>/32     md5
+```
+
+```
+##
+## Geo Primary Role
+## - postgresql.conf
+##
+sql_replication_user = gitlab_replicator
+wal_level = hot_standby
+max_wal_senders = 10
+wal_keep_segments = 50
+max_replication_slots = 1 # number of secondary instances
+hot_standby = on
+```
+
+## **Secondary** nodes
+
+### Manually configure the replica database
+
+Make the following configuration changes manually to your `postgresql.conf`
+of external replica database:
+
+```
+##
+## Geo Secondary Role
+## - postgresql.conf
+##
+wal_level = hot_standby
+max_wal_senders = 10
+wal_keep_segments = 10
+hot_standby = on
+```
+
+### Configure **secondary** application nodes to use the external read-replica
+
+With Omnibus, the
+[geo_secondary_role](https://docs.gitlab.com/omnibus/roles/#gitlab-geo-roles)
+has three main functions:
+
+1. Configure the replica database.
+1. Configure the tracking database.
+1. Enable the [Geo Log Cursor](index.md#geo-log-cursor) (not covered in this section).
+
+To configure the connection to the external read-replica database and enable Log Cursor:
+
+1. SSH into a GitLab **secondary** application server and login as root:
+
+    ```bash
+    sudo -i
+    ```
+
+1. Edit `/etc/gitlab/gitlab.rb` and add the following
+
+    ```ruby
+    ##
+    ## Geo Secondary role
+    ## - configure dependent flags automatically to enable Geo
+    ##
+    roles ['geo_secondary_role']
+
+    # note this is shared between both databases,
+    # make sure you define the same password in both
+    gitlab_rails['db_password'] = '<your_password_here>'
+
+    gitlab_rails['db_username'] = 'gitlab'
+    gitlab_rails['db_host'] = '<database_read_replica_host>'
+    ```
+1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)
+
+### Configure the tracking database
+
+**Secondary** nodes use a separate PostgreSQL installation as a tracking
+database to keep track of replication status and automatically recover from
+potential replication issues. Omnibus automatically configures a tracking database
+when `roles ['geo_secondary_role']` is set. For high availability,
+refer to [Geo High Availability](https://docs.gitlab.com/ee/administration/high_availability).
+If you want to run this database external to Omnibus, please follow the instructions below.
+
+The tracking database requires an [FDW](https://www.postgresql.org/docs/9.6/static/postgres-fdw.html)
+connection with the **secondary** replica database for improved performance.
+
+If you have an external database ready to be used as the tracking database,
+follow the instructions below to use it:
+
+NOTE: **Note:**
+If you want to use AWS RDS as a tracking database, make sure it has access to
+the secondary database. Unfortunately, just assigning the same security group is not enough as
+outbound rules do not apply to RDS PostgreSQL databases. Therefore, you need to explicitly add an inbound
+rule to the read-replica's security group allowing any TCP traffic from
+the tracking database on port 5432.
+
+1. SSH into a GitLab **secondary** server and login as root:
+
+    ```bash
+    sudo -i
+    ```
+
+1. Edit `/etc/gitlab/gitlab.rb` with the connection params and credentials for
+    the machine with the PostgreSQL instance:
+
+    ```ruby
+    geo_secondary['db_username'] = 'gitlab_geo'
+    geo_secondary['db_password'] = '<your_password_here>'
+
+    geo_secondary['db_host'] = '<tracking_database_host>'
+    geo_secondary['db_port'] = <tracking_database_port>      # change to the correct port
+    geo_secondary['db_fdw'] = true       # enable FDW
+    geo_postgresql['enable'] = false     # don't use internal managed instance
+    ```
+
+1. Save the file and [reconfigure GitLab](../../restart_gitlab.md#omnibus-gitlab-reconfigure)
+
+1. Run the tracking database migrations:
+
+    ```bash
+    gitlab-rake geo:db:create
+    gitlab-rake geo:db:migrate
+    ```
+
+1. Configure the
+    [PostgreSQL FDW](https://www.postgresql.org/docs/9.6/static/postgres-fdw.html)
+    connection and credentials:
+
+    Save the script below in a file, ex. `/tmp/geo_fdw.sh` and modify the connection
+    params to match your environment. Execute it to set up the FDW connection.
+
+    ```bash
+    #!/bin/bash
+
+    # Secondary Database connection params:
+    DB_HOST="<public_ip_or_vpc_private_ip>"
+    DB_NAME="gitlabhq_production"
+    DB_USER="gitlab"
+    DB_PASS="<your_password_here>"
+    DB_PORT="5432"
+
+    # Tracking Database connection params:
+    GEO_DB_HOST="<public_ip_or_vpc_private_ip>"
+    GEO_DB_NAME="gitlabhq_geo_production"
+    GEO_DB_USER="gitlab_geo"
+    GEO_DB_PORT="5432"
+
+    query_exec () {
+      gitlab-psql -h $GEO_DB_HOST -d $GEO_DB_NAME -p $GEO_DB_PORT -c "${1}"
+    }
+
+    query_exec "CREATE EXTENSION postgres_fdw;"
+    query_exec "CREATE SERVER gitlab_secondary FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '${DB_HOST}', dbname '${DB_NAME}', port '${DB_PORT}');"
+    query_exec "CREATE USER MAPPING FOR ${GEO_DB_USER} SERVER gitlab_secondary OPTIONS (user '${DB_USER}', password '${DB_PASS}');"
+    query_exec "CREATE SCHEMA gitlab_secondary;"
+    query_exec "GRANT USAGE ON FOREIGN SERVER gitlab_secondary TO ${GEO_DB_USER};"
+    ```
+
+    NOTE: **Note:** The script template above uses `gitlab-psql` as it's intended to be executed from the Geo machine,
+    but you can change it to `psql` and run it from any machine that has access to the database. We also recommend using
+    `psql` for AWS RDS.
+
+1. Save the file and [restart GitLab](../../restart_gitlab.md#omnibus-gitlab-restart)
+1. Populate the FDW tables:
+
+    ```bash
+    gitlab-rake geo:db:refresh_foreign_tables
+    ```
--- a/doc/administration/geo/replication/faq.md
+++ b/doc/administration/geo/replication/faq.md
+# Geo Frequently Asked Questions **[PREMIUM ONLY]**
+
+## What are the minimum requirements to run Geo?
+
+The requirements are listed [on the index page](index.md#requirements-for-running-geo)
+
+## How does Geo know which projects to sync?
+
+On each **secondary** node, there is a read-only replicated copy of the GitLab database. 
+A **secondary** node also has a tracking database where it stores which projects have been synced. 
+Geo compares the two databases to find projects that are not yet tracked.
+
+At the start, this tracking database is empty, so Geo will start trying to update from every project that it can see in the GitLab database.
+
+For each project to sync:
+
+1. Geo will issue a `git fetch geo --mirror` to get the latest information from the **primary** node.
+If there are no changes, the sync will be fast and end quickly. Otherwise, it will pull the latest commits.
+1. The **secondary** node will update the tracking database to store the fact that it has synced projects A, B, C, etc.
+1. Repeat until all projects are synced.
+
+When someone pushes a commit to the **primary** node, it generates an event in the GitLab database that the repository has changed. 
+The **secondary** node sees this event, marks the project in question as dirty, and schedules the project to be resynced.
+
+To ensure that problems with pipelines (for example, syncs failing too many times or jobs being lost) don't permanently stop projects syncing, Geo also periodically checks the tracking database for projects that are marked as dirty. This check happens when
+the number of concurrent syncs falls below `repos_max_capacity` and there are no new projects waiting to be synced. 
+
+Geo also has a checksum feature which runs a SHA256 sum across all the Git references to the SHA values. 
+If the refs don't match between the **primary** node and the **secondary** node, then the **secondary** node will mark that project as dirty and try to resync it. 
+So even if we have an outdated tracking database, the validation should activate and find discrepancies in the repository state and resync.   
+
+## Can I use Geo in a disaster recovery situation?
+
+Yes, but there are limitations to what we replicate (see
+[What data is replicated to a **secondary** node?](#what-data-is-replicated-to-a-secondary-node)).
+
+Read the documentation for [Disaster Recovery](../disaster_recovery/index.md).
+
+## What data is replicated to a **secondary** node?
+
+We currently replicate project repositories, LFS objects, generated
+attachments / avatars and the whole database. This means user accounts,
+issues, merge requests, groups, project data, etc., will be available for
+query.
+
+## Can I git push to a **secondary** node?
+
+Yes!  Pushing directly to a **secondary** node (for both HTTP and SSH, including git-lfs) was [introduced](https://about.gitlab.com/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
+
+## How long does it take to have a commit replicated to a **secondary** node?
+
+All replication operations are asynchronous and are queued to be dispatched. Therefore, it depends on a lot of
+factors including the amount of traffic, how big your commit is, the
+connectivity between your nodes, your hardware, etc.
+
+## What if the SSH server runs at a different port?
+
+That's totally fine. We use HTTP(s) to fetch repository changes from the **primary** node to all **secondary** nodes.
+
+## Is this possible to set up a Docker Registry for a **secondary** node that mirrors the one on the **primary** node?
+
+Yes. See [Docker Registry for a **secondary** node](docker_registry.md).
--- a/doc/administration/geo/replication/high_availability.md
+++ b/doc/administration/geo/replication/high_availability.md
+# Geo High Availability **[PREMIUM ONLY]**
+
+This document describes a minimal reference architecture for running Geo
+in a high availability configuration. If your HA setup differs from the one
+described, it is possible to adapt these instructions to your needs.
+
+## Architecture overview
+
+![Geo HA Diagram](https://docs.gitlab.com/ee/administration/img/high_availability/geo-ha-diagram.png)
+
+_[diagram source - gitlab employees only][diagram-source]_
+
+The topology above assumes that the **primary** and **secondary** Geo clusters
+are located in two separate locations, on their own virtual network
+with private IP addresses. The network is configured such that all machines within
+one geographic location can communicate with each other using their private IP addresses.
+The IP addresses given are examples and may be different depending on the
+network topology of your deployment.
+
+The only external way to access the two Geo deployments is by HTTPS at
+`gitlab.us.example.com` and `gitlab.eu.example.com` in the example above.
+
+NOTE: **Note:**
+The **primary** and **secondary** Geo deployments must be able to communicate to each other over HTTPS.
+
+## Redis and PostgreSQL High Availability
+
+The **primary** and **secondary** Redis and PostgreSQL should be configured
+for high availability. Because of the additional complexity involved
+in setting up this configuration for PostgreSQL and Redis,
+it is not covered by this Geo HA documentation.
+
+For more information about setting up a highly available PostgreSQL cluster and Redis cluster using the omnibus package see the high availability documentation for
+[PostgreSQL](../../high_availability/database.md) and
+[Redis](../../high_availability/redis.md), respectively.
+
+NOTE: **Note:**
+It is possible to use cloud hosted services for PostgreSQL and Redis, but this is beyond the scope of this document.
+
+## Prerequisites: A working GitLab HA cluster
+
+This cluster will serve as the **primary** node. Use the
+[GitLab HA documentation](../../high_availability/README.md) to set this up.
+
+## Configure the GitLab cluster to be the **primary** node
+
+The following steps enable a GitLab cluster to serve as the **primary** node.
+
+### Step 1: Configure the **primary** frontend servers
+
+1. Edit `/etc/gitlab/gitlab.rb` and add the following:
+
+    ```ruby
+    ##
+    ## Enable the Geo primary role
+    ##
+    roles ['geo_primary_role']
+
+    ##
+    ## Disable automatic migrations
+    ##
+    gitlab_rails['auto_migrate'] = false
+    ```
+
+After making these changes, [reconfigure GitLab][gitlab-reconfigure] so the changes take effect.
+
+NOTE: **Note:** PostgreSQL and Redis should have already been disabled on the
+application servers, and connections from the application servers to those
+services on the backend servers configured, during normal GitLab HA set up. See
+high availability configuration documentation for
+[PostgreSQL](https://docs.gitlab.com/ee/administration/high_availability/database.html#configuring-the-application-nodes)
+and [Redis](../../high_availability/redis.md#example-configuration-for-the-gitlab-application).
+
+The **primary** database will require modification later, as part of
+[step 2](#step-2-configure-the-main-read-only-replica-postgresql-database-on-the-secondary-node).
+
+## Configure a **secondary** node
+
+A **secondary** cluster is similar to any other GitLab HA cluster, with two
+major differences:
+
+* The main PostgreSQL database is a read-only replica of the **primary** node's
+  PostgreSQL database.
+* There is also a single PostgreSQL database for the **secondary** cluster,
+  called the "tracking database", which tracks the synchronization state of
+  various resources.
+
+Therefore, we will set up the HA components one-by-one, and include deviations
+from the normal HA setup.
+
+### Step 1: Configure the Redis and NFS services on the **secondary** node
+
+Configure the following services, again using the non-Geo high availability
+documentation:
+
+* [Configuring Redis for GitLab HA](../../high_availability/redis.md) for high
+  availability.
+* [NFS](../../high_availability/nfs.md) which will store data that is
+  synchronized from the **primary** node.
+
+### Step 2: Configure the main read-only replica PostgreSQL database on the **secondary** node
+
+NOTE: **Note:** The following documentation assumes the database will be run on
+only a single machine, rather than as a PostgreSQL cluster.
+
+Configure the [**secondary** database](database.md) as a read-only replica of
+the **primary** database.
+
+If using an external PostgreSQL instance, refer also to
+[Geo with external PostgreSQL instances](external_database.md).
+
+### Step 3: Configure the tracking database on the **secondary** node
+
+NOTE: **Note:** This documentation assumes the tracking database will be run on
+only a single machine, rather than as a PostgreSQL cluster.
+
+Configure the tracking database.
+
+1. Edit `/etc/gitlab/gitlab.rb` in the tracking database machine, and add the
+    following:
+
+    ```ruby
+    ##
+    ## Enable the Geo secondary tracking database
+    ##
+    geo_postgresql['enable'] = true
+    geo_postgresql['ha'] = true
+    ```
+
+After making these changes [Reconfigure GitLab][gitlab-reconfigure] so the changes take effect.
+
+If using an external PostgreSQL instance, refer also to
+[Geo with external PostgreSQL instances](external_database.md).
+
+### Step 4: Configure the frontend application servers on the **secondary** node
+
+In the architecture overview, there are two machines running the GitLab
+application services. These services are enabled selectively in the
+configuration.
+
+Configure the application servers following
+[Configuring GitLab for HA](../../high_availability/gitlab.md), then make the
+following modifications:
+
+1. Edit `/etc/gitlab/gitlab.rb` on each application server in the **secondary**
+   cluster, and add the following:
+
+    ```ruby
+    ##
+    ## Enable the Geo secondary role
+    ##
+    roles ['geo_secondary_role', 'application_role']
+
+    ##
+    ## Disable automatic migrations
+    ##
+    gitlab_rails['auto_migrate'] = false
+
+    ##
+    ## Configure the connection to the tracking DB. And disable application
+    ## servers from running tracking databases.
+    ##
+    geo_secondary['db_host'] = '<geo_tracking_db_host>'
+    geo_secondary['db_password'] = '<geo_tracking_db_password>'
+    geo_postgresql['enable'] = false
+
+    ##
+    ## Configure connection to the streaming replica database, if you haven't
+    ## already
+    ##
+    gitlab_rails['db_host'] = '<replica_database_host>'
+    gitlab_rails['db_password'] = '<replica_database_password>'
+
+    ##
+    ## Configure connection to Redis, if you haven't already
+    ##
+    gitlab_rails['redis_host'] = '<redis_host>'
+    gitlab_rails['redis_password'] = '<redis_password>'
+
+    ##
+    ## If you are using custom users not managed by Omnibus, you need to specify
+    ## UIDs and GIDs like below, and ensure they match between servers in a
+    ## cluster to avoid permissions issues
+    ##
+    user['uid'] = 9000
+    user['gid'] = 9000
+    web_server['uid'] = 9001
+    web_server['gid'] = 9001
+    registry['uid'] = 9002
+    registry['gid'] = 9002
+    ```
+NOTE: **Note:**
+If you had set up PostgreSQL cluster using the omnibus package and you had set
+up `postgresql['sql_user_password'] = 'md5 digest of secret'` setting, keep in
+mind that `gitlab_rails['db_password']` and `geo_secondary['db_password']`
+mentioned above contains the plaintext passwords. This is used to let the Rails
+servers connect to the databases.
+
+NOTE: **Note:**
+Make sure that current node IP is listed in `postgresql['md5_auth_cidr_addresses']` setting of your remote database.
+
+After making these changes [Reconfigure GitLab][gitlab-reconfigure] so the changes take effect.
+
+On the secondary the following GitLab frontend services will be enabled:
+
+* geo-logcursor
+* gitlab-pages
+* gitlab-workhorse
+* logrotate
+* nginx
+* registry
+* remote-syslog
+* sidekiq
+* unicorn
+
+Verify these services by running `sudo gitlab-ctl status` on the frontend
+application servers.
+
+### Step 5: Set up the LoadBalancer for the **secondary** node
+
+In this topology, a load balancer is required at each geographic location to
+route traffic to the application servers.
+
+See [Load Balancer for GitLab HA](../../high_availability/load_balancer.md) for
+more information.
+
+[diagram-source]: https://docs.google.com/drawings/d/1z0VlizKiLNXVVVaERFwgsIOuEgjcUqDTWPdQYsE7Z4c/edit
+[gitlab-reconfigure]: ../../restart_gitlab.md#omnibus-gitlab-reconfigure
--- a/doc/administration/geo/replication/img/geo_architecture.png
+++ b/doc/administration/geo/replication/img/geo_architecture.png
--- a/doc/administration/geo/replication/img/geo_node_dashboard.png
+++ b/doc/administration/geo/replication/img/geo_node_dashboard.png
--- a/doc/administration/geo/replication/img/geo_node_healthcheck.png
+++ b/doc/administration/geo/replication/img/geo_node_healthcheck.png
--- a/doc/administration/geo/replication/img/geo_overview.png
+++ b/doc/administration/geo/replication/img/geo_overview.png
--- a/doc/administration/geo/replication/index.md
+++ b/doc/administration/geo/replication/index.md
--- a/doc/administration/geo/replication/object_storage.md
+++ b/doc/administration/geo/replication/object_storage.md
+# Geo with Object storage **[PREMIUM ONLY]**
+
+Geo can be used in combination with Object Storage (AWS S3, or
+other compatible object storage).
+
+## Configuration
+
+At this time it is required that if object storage is enabled on the
+**primary** node, it must also be enabled on each **secondary** node.
+
+**Secondary** nodes can use the same storage bucket as the **primary** node, or
+they can use a replicated storage bucket. At this time GitLab does not
+take care of content replication in object storage.
+
+For LFS, follow the documentation to
+[set up LFS object storage](../../../workflow/lfs/lfs_administration.md#storing-lfs-objects-in-remote-object-storage).
+
+For CI job artifacts, there is similar documentation to configure
+[jobs artifact object storage](../../job_artifacts.md#using-object-storage)
+
+For user uploads, there is similar documentation to configure [upload object storage](../../uploads.md#using-object-storage-core-only)
+
+You should enable and configure object storage on both **primary** and **secondary**
+nodes. Migrating existing data to object storage should be performed on the
+**primary** node only. **Secondary** nodes will automatically notice that the migrated
+files are now in object storage.
+
+## Replication
+
+When using Amazon S3, you can use
+[CRR](https://docs.aws.amazon.com/AmazonS3/latest/dev/crr.html) to
+have automatic replication between the bucket used by the **primary** node and
+the bucket used by **secondary** nodes.
+
+If you are using Google Cloud Storage, consider using
+[Multi-Regional Storage](https://cloud.google.com/storage/docs/storage-classes#multi-regional).
+Or you can use the [Storage Transfer Service](https://cloud.google.com/storage/transfer/),
+although this only supports daily synchronization.
+
+For manual synchronization, or scheduled by `cron`, please have a look at:
+
+- [`s3cmd sync`](http://s3tools.org/s3cmd-sync)
+- [`gsutil rsync`](https://cloud.google.com/storage/docs/gsutil/commands/rsync)
--- a/doc/administration/geo/replication/remove_geo_node.md
+++ b/doc/administration/geo/replication/remove_geo_node.md
+# Removing secondary Geo nodes **[PREMIUM ONLY]**
+
+**Secondary** nodes can be removed from the Geo cluster using the Geo admin page of the **primary** node. To remove a **secondary** node:
+
+1. Navigate to **Admin Area > Geo** (`/admin/geo/nodes`).
+1. Click the **Remove** button for the **secondary** node you want to remove.
+1. Confirm by clicking **Remove** when the prompt appears.
+
+Once removed from the Geo admin page, you must stop and uninstall the **secondary** node:
+
+1. On the **secondary** node, stop GitLab:
+
+    ```bash
+    sudo gitlab-ctl stop
+    ```
+1. On the **secondary** node, uninstall GitLab:
+
+    ```bash
+    # Stop gitlab and remove its supervision process
+    sudo gitlab-ctl uninstall
+    
+    # Debian/Ubuntu
+    sudo dpkg --remove gitlab-ee
+    
+    # Redhat/Centos
+    sudo rpm --erase gitlab-ee
+    ```
+
+Once GitLab has been uninstalled from the **secondary** node, the replication slot must be dropped from the **primary** node's database as follows:
+
+1. On the **primary** node, start a PostgreSQL console session:
+
+    ```bash
+    sudo gitlab-psql 
+    ```
+    
+    NOTE: **Note:**
+    Using `gitlab-rails dbconsole` will not work, because managing replication slots requires superuser permissions.
+
+1. Find the name of the relevant replication slot. This is the slot that is specified with `--slot-name` when running the replicate command: `gitlab-ctl replicate-geo-database`.
+
+    ```sql
+    SELECT * FROM pg_replication_slots;
+    ```
+    
+1. Remove the replication slot for the **secondary** node:
+
+    ```sql
+    SELECT pg_drop_replication_slot('<name_of_slot>');
+    ```  
--- a/doc/administration/geo/replication/security_review.md
+++ b/doc/administration/geo/replication/security_review.md
--- a/doc/administration/geo/replication/troubleshooting.md
+++ b/doc/administration/geo/replication/troubleshooting.md
--- a/doc/administration/geo/replication/tuning.md
+++ b/doc/administration/geo/replication/tuning.md
+# Tuning Geo **[PREMIUM ONLY]**
+
+## Changing the sync capacity values
+
+In the Geo admin page (`/admin/geo/nodes`), there are several variables that
+can be tuned to improve performance of Geo:
+
+- Repository sync capacity.
+- File sync capacity.
+
+Increasing these values will increase the number of jobs that are scheduled.
+However, this may not lead to more downloads in parallel unless the number of
+available Sidekiq threads is also increased. For example, if repository sync
+capacity is increased from 25 to 50, you may also want to increase the number
+of Sidekiq threads from 25 to 50. See the
+[Sidekiq concurrency documentation](https://docs.gitlab.com/ee/administration/operations/extra_sidekiq_processes.html#number-of-threads)
+for more details.
--- a/doc/administration/geo/replication/updating_the_geo_nodes.md
+++ b/doc/administration/geo/replication/updating_the_geo_nodes.md
--- a/doc/administration/geo/replication/using_a_geo_server.md
+++ b/doc/administration/geo/replication/using_a_geo_server.md
+[//]: # (Please update EE::GitLab::GeoGitAccess::GEO_SERVER_DOCS_URL if this file is moved)
+
+# Using a Geo Server **[PREMIUM ONLY]**
+
+After you set up the [database replication and configure the Geo nodes][req], use your closest GitLab node as you would a normal standalone GitLab instance.
+
+Pushing directly to a **secondary** node (for both HTTP, SSH including git-lfs) was [introduced](https://about.gitlab.com/2018/09/22/gitlab-11-3-released/) in [GitLab Premium](https://about.gitlab.com/pricing/#self-managed) 11.3.
+
+Example of the output you will see when pushing to a **secondary** node:
+
+```bash
+$ git push
+> GitLab: You're pushing to a Geo secondary.
+> GitLab: We'll help you by proxying this request to the primary: ssh://git@primary.geo/user/repo.git
+Everything up-to-date
+```
+
+[req]: index.md#setup-instructions