Commit ba55cc76 authored by DJ Mountney's avatar DJ Mountney Committed by Achilleas Pipinellis

Move the pg replication/failover doc contents around

To have patroni as the default settings, with the repmgr settings moved
to a seperate section.

Update note on experimental Patroni support
parent 5f985108
---
title: Use Patroni as the default in the replication docs
merge_request: 50101
author:
type: changed
...@@ -46,22 +46,19 @@ Each database node runs three services: ...@@ -46,22 +46,19 @@ Each database node runs three services:
`PostgreSQL` - The database itself. `PostgreSQL` - The database itself.
`repmgrd` - Communicates with other repmgrd services in the cluster and handles `Patroni` - Communicates with other patroni services in the cluster and handles
failover when issues with the master server occurs. The failover procedure failover when issues with the leader server occurs. The failover procedure
consists of: consists of:
- Selecting a new master for the cluster. - Selecting a new leader for the cluster.
- Promoting the new node to master. - Promoting the new node to leader.
- Instructing remaining servers to follow the new master node. - Instructing remaining servers to follow the new leader node.
- The old master node is automatically evicted from the cluster and should be
rejoined manually once recovered.
`Consul` agent - Monitors the status of each node in the database cluster and `Consul` agent - To communicate with Consul cluster which stores the current Patroni state. The agent monitors the status of each node in the database cluster and tracks its health in a service definition on the Consul cluster.
tracks its health in a service definition on the Consul cluster.
### Consul server node ### Consul server node
The Consul server node runs the Consul server service. The Consul server node runs the Consul server service. These nodes must have reached the quorum and elected a leader _before_ Patroni cluster bootstrap otherwise database nodes will wait until such Consul leader is elected.
### PgBouncer node ### PgBouncer node
...@@ -80,7 +77,7 @@ Each service in the package comes with a set of [default ports](https://docs.git ...@@ -80,7 +77,7 @@ Each service in the package comes with a set of [default ports](https://docs.git
- Application servers connect to either PgBouncer directly via its [default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#pgbouncer) or via a configured Internal Load Balancer (TCP) that serves multiple PgBouncers. - Application servers connect to either PgBouncer directly via its [default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#pgbouncer) or via a configured Internal Load Balancer (TCP) that serves multiple PgBouncers.
- PgBouncer connects to the primary database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql) - PgBouncer connects to the primary database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql)
- Repmgr connects to the database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql) - Patroni actively manages the running PostgreSQL processes and configuration.
- PostgreSQL secondaries connect to the primary database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql) - PostgreSQL secondaries connect to the primary database servers [PostgreSQL default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#postgresql)
- Consul servers and agents connect to each others [Consul default ports](https://docs.gitlab.com/omnibus/package-information/defaults.html#consul) - Consul servers and agents connect to each others [Consul default ports](https://docs.gitlab.com/omnibus/package-information/defaults.html#consul)
...@@ -141,7 +138,7 @@ available database connections. ...@@ -141,7 +138,7 @@ available database connections.
In this document we are assuming 3 database nodes, which makes this configuration: In this document we are assuming 3 database nodes, which makes this configuration:
```ruby ```ruby
postgresql['max_wal_senders'] = 4 patroni['postgresql']['max_wal_senders'] = 4
``` ```
As previously mentioned, you'll have to prepare the network subnets that will As previously mentioned, you'll have to prepare the network subnets that will
...@@ -186,18 +183,6 @@ Few notes on the service itself: ...@@ -186,18 +183,6 @@ Few notes on the service itself:
- `/etc/gitlab/gitlab.rb`: hashed, and in plain text - `/etc/gitlab/gitlab.rb`: hashed, and in plain text
- `/var/opt/gitlab/pgbouncer/pg_auth`: hashed - `/var/opt/gitlab/pgbouncer/pg_auth`: hashed
#### Repmgr information
When using default setup, you will only have to prepare the network subnets that will
be allowed to authenticate with the service.
Few notes on the service itself:
- The service runs under the same system account as the database
- In the package, this is by default `gitlab-psql`
- The service will have a superuser database user account generated for it
- This defaults to `gitlab_repmgr`
### Installing Omnibus GitLab ### Installing Omnibus GitLab
First, make sure to [download/install](https://about.gitlab.com/install/) First, make sure to [download/install](https://about.gitlab.com/install/)
...@@ -212,72 +197,80 @@ When installing the GitLab package, do not supply `EXTERNAL_URL` value. ...@@ -212,72 +197,80 @@ When installing the GitLab package, do not supply `EXTERNAL_URL` value.
1. Make sure to [configure the Consul nodes](../consul.md). 1. Make sure to [configure the Consul nodes](../consul.md).
1. Make sure you collect [`CONSUL_SERVER_NODES`](#consul-information), [`PGBOUNCER_PASSWORD_HASH`](#pgbouncer-information), [`POSTGRESQL_PASSWORD_HASH`](#postgresql-information), the [number of db nodes](#postgresql-information), and the [network address](#network-information) before executing the next step. 1. Make sure you collect [`CONSUL_SERVER_NODES`](#consul-information), [`PGBOUNCER_PASSWORD_HASH`](#pgbouncer-information), [`POSTGRESQL_PASSWORD_HASH`](#postgresql-information), the [number of db nodes](#postgresql-information), and the [network address](#network-information) before executing the next step.
1. On the master database node, edit `/etc/gitlab/gitlab.rb` replacing values noted in the `# START user configuration` section: #### Configuring Patroni cluster
```ruby You must enable Patroni explicitly to be able to use it (with `patroni['enable'] = true`). When Patroni is enabled
# Disable all components except PostgreSQL and Repmgr and Consul repmgr will be disabled automatically.
roles ['postgres_role']
# PostgreSQL configuration Any PostgreSQL configuration item that controls replication, for example `wal_level`, `max_wal_senders`, etc, are strictly
postgresql['listen_address'] = '0.0.0.0' controlled by Patroni and will override the original settings that you make with the `postgresql[...]` configuration key.
postgresql['hot_standby'] = 'on' Hence, they are all separated and placed under `patroni['postgresql'][...]`. This behavior is limited to replication.
postgresql['wal_level'] = 'replica' Patroni honours any other PostgreSQL configuration that was made with the `postgresql[...]` configuration key. For example,
postgresql['shared_preload_libraries'] = 'repmgr_funcs' `max_wal_senders` by default is set to `5`. If you wish to change this you must set it with the `patroni['postgresql']['max_wal_senders']`
configuration key.
# Disable automatic database migrations NOTE:
gitlab_rails['auto_migrate'] = false The configuration of a Patroni node is very similar to a repmgr but shorter. When Patroni is enabled, first you can ignore
any replication setting of PostgreSQL (it will be overwritten anyway). Then you can remove any `repmgr[...]` or
repmgr-specific configuration as well. Especially, make sure that you remove `postgresql['shared_preload_libraries'] = 'repmgr_funcs'`.
# Configure the Consul agent Here is an example similar to [the one that was done with repmgr](#configuring-repmgr-nodes):
consul['services'] = %w(postgresql)
# START user configuration ```ruby
# Please set the real values as explained in Required Information section # Disable all components except PostgreSQL, Patroni (or Repmgr), and Consul
# roles['postgres_role']
# Replace PGBOUNCER_PASSWORD_HASH with a generated md5 value
postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
# Replace X with value of number of db nodes + 1
postgresql['max_wal_senders'] = X
postgresql['max_replication_slots'] = X
# Replace XXX.XXX.XXX.XXX/YY with Network Address # Enable Patroni (which automatically disables Repmgr).
postgresql['trust_auth_cidr_addresses'] = %w(XXX.XXX.XXX.XXX/YY) patroni['enable'] = true
repmgr['trust_auth_cidr_addresses'] = %w(127.0.0.1/32 XXX.XXX.XXX.XXX/YY)
# Replace placeholders: # PostgreSQL configuration
# postgresql['listen_address'] = '0.0.0.0'
# Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
# with the addresses gathered for CONSUL_SERVER_NODES
consul['configuration'] = {
retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
}
#
# END user configuration
```
> `postgres_role` was introduced with GitLab 10.3 # Disable automatic database migrations
gitlab_rails['auto_migrate'] = false
1. On secondary nodes, add all the configuration specified above for primary node # Configure the Consul agent
to `/etc/gitlab/gitlab.rb`. In addition, append the following configuration consul['services'] = %w(postgresql)
to inform `gitlab-ctl` that they are standby nodes initially and it need not
attempt to register them as primary node
```ruby # START user configuration
# Specify if a node should attempt to be master on initialization # Please set the real values as explained in Required Information section
repmgr['master_on_initialization'] = false #
``` # Replace PGBOUNCER_PASSWORD_HASH with a generated md5 value
postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect. # Replace X with value of number of db nodes + 1 (OPTIONAL the default value is 5)
1. [Enable Monitoring](#enable-monitoring) patroni['postgresql']['max_wal_senders'] = X
patroni['postgresql']['max_replication_slots'] = X
> Please note: # Replace XXX.XXX.XXX.XXX/YY with Network Address
> postgresql['trust_auth_cidr_addresses'] = %w(XXX.XXX.XXX.XXX/YY)
> - If you want your database to listen on a specific interface, change the configuration:
> `postgresql['listen_address'] = '0.0.0.0'`. # Replace placeholders:
> - If your PgBouncer service runs under a different user account, #
> you also need to specify: `postgresql['pgbouncer_user'] = PGBOUNCER_USERNAME` in # Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
> your configuration. # with the addresses gathered for CONSUL_SERVER_NODES
consul['configuration'] = {
retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
}
#
# END user configuration
```
You do not need an additional or different configuration for replica nodes. As a matter of fact, you don't have to have
a predetermined primary node. Therefore all database nodes use the same configuration.
Once the configuration of a node is done, you must [reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure)
on each node for the changes to take effect.
Generally, when Consul cluster is ready, the first node that [reconfigures](../restart_gitlab.md#omnibus-gitlab-reconfigure)
becomes the leader. You do not need to sequence the nodes reconfiguration. You can run them in parallel or in any order.
If you choose an arbitrary order you do not have any predetermined master.
NOTE:
As opposed to repmgr, once the nodes are reconfigured you do not need any further action or additional command to join
the replicas.
#### Enable Monitoring #### Enable Monitoring
...@@ -298,129 +291,6 @@ If you enable Monitoring, it must be enabled on **all** database servers. ...@@ -298,129 +291,6 @@ If you enable Monitoring, it must be enabled on **all** database servers.
1. Run `sudo gitlab-ctl reconfigure` to compile the configuration. 1. Run `sudo gitlab-ctl reconfigure` to compile the configuration.
#### Database nodes post-configuration
##### Primary node
Select one node as a primary node.
1. Open a database prompt:
```shell
gitlab-psql -d gitlabhq_production
```
1. Enable the `pg_trgm` extension:
```shell
CREATE EXTENSION pg_trgm;
```
1. Enable the `btree_gist` extension:
```shell
CREATE EXTENSION btree_gist;
```
1. Exit the database prompt by typing `\q` and Enter.
1. Verify the cluster is initialized with one node:
```shell
gitlab-ctl repmgr cluster show
```
The output should be similar to the following:
```plaintext
Role | Name | Upstream | Connection String
----------+----------|----------|----------------------------------------
* master | HOSTNAME | | host=HOSTNAME user=gitlab_repmgr dbname=gitlab_repmgr
```
1. Note down the hostname or IP address in the connection string: `host=HOSTNAME`. We will
refer to the hostname in the next section as `MASTER_NODE_NAME`. If the value
is not an IP address, it will need to be a resolvable name (via DNS or
`/etc/hosts`)
##### Secondary nodes
1. Set up the repmgr standby:
```shell
gitlab-ctl repmgr standby setup MASTER_NODE_NAME
```
Do note that this will remove the existing data on the node. The command
has a wait time.
The output should be similar to the following:
```console
# gitlab-ctl repmgr standby setup MASTER_NODE_NAME
Doing this will delete the entire contents of /var/opt/gitlab/postgresql/data
If this is not what you want, hit Ctrl-C now to exit
To skip waiting, rerun with the -w option
Sleeping for 30 seconds
Stopping the database
Removing the data
Cloning the data
Starting the database
Registering the node with the cluster
ok: run: repmgrd: (pid 19068) 0s
```
1. Verify the node now appears in the cluster:
```shell
gitlab-ctl repmgr cluster show
```
The output should be similar to the following:
```plaintext
Role | Name | Upstream | Connection String
----------+---------|-----------|------------------------------------------------
* master | MASTER | | host=MASTER_NODE_NAME user=gitlab_repmgr dbname=gitlab_repmgr
standby | STANDBY | MASTER | host=STANDBY_HOSTNAME user=gitlab_repmgr dbname=gitlab_repmgr
```
Repeat the above steps on all secondary nodes.
#### Database checkpoint
Before moving on, make sure the databases are configured correctly. Run the
following command on the **primary** node to verify that replication is working
properly:
```shell
gitlab-ctl repmgr cluster show
```
The output should be similar to:
```plaintext
Role | Name | Upstream | Connection String
----------+--------------|--------------|--------------------------------------------------------------------
* master | MASTER | | host=MASTER port=5432 user=gitlab_repmgr dbname=gitlab_repmgr
standby | STANDBY | MASTER | host=STANDBY port=5432 user=gitlab_repmgr dbname=gitlab_repmgr
```
If the 'Role' column for any node says "FAILED", check the
[Troubleshooting section](#troubleshooting) before proceeding.
Also, check that the check master command works successfully on each node:
```shell
su - gitlab-consul
gitlab-ctl repmgr-check-master || echo 'This node is a standby repmgr node'
```
This command relies on exit codes to tell Consul whether a particular node is a master
or secondary. The most important thing here is that this command does not produce errors.
If there are errors it's most likely due to incorrect `gitlab-consul` database user permissions.
Check the [Troubleshooting section](#troubleshooting) before proceeding.
### Configuring the PgBouncer node ### Configuring the PgBouncer node
1. Make sure you collect [`CONSUL_SERVER_NODES`](#consul-information), [`CONSUL_PASSWORD_HASH`](#consul-information), and [`PGBOUNCER_PASSWORD_HASH`](#pgbouncer-information) before executing the next step. 1. Make sure you collect [`CONSUL_SERVER_NODES`](#consul-information), [`CONSUL_PASSWORD_HASH`](#consul-information), and [`PGBOUNCER_PASSWORD_HASH`](#pgbouncer-information) before executing the next step.
...@@ -605,9 +475,9 @@ Here is a list and description of each machine and the assigned IP: ...@@ -605,9 +475,9 @@ Here is a list and description of each machine and the assigned IP:
- `10.6.0.21`: PgBouncer 1 - `10.6.0.21`: PgBouncer 1
- `10.6.0.22`: PgBouncer 2 - `10.6.0.22`: PgBouncer 2
- `10.6.0.23`: PgBouncer 3 - `10.6.0.23`: PgBouncer 3
- `10.6.0.31`: PostgreSQL master - `10.6.0.31`: PostgreSQL 1
- `10.6.0.32`: PostgreSQL secondary - `10.6.0.32`: PostgreSQL 2
- `10.6.0.33`: PostgreSQL secondary - `10.6.0.33`: PostgreSQL 3
- `10.6.0.41`: GitLab application - `10.6.0.41`: GitLab application
All passwords are set to `toomanysecrets`, please do not use this password or derived hashes and the `external_url` for GitLab is `http://gitlab.example.com`. All passwords are set to `toomanysecrets`, please do not use this password or derived hashes and the `external_url` for GitLab is `http://gitlab.example.com`.
...@@ -667,29 +537,28 @@ An internal load balancer (TCP) is then required to be setup to serve each PgBou ...@@ -667,29 +537,28 @@ An internal load balancer (TCP) is then required to be setup to serve each PgBou
#### Example recommended setup for PostgreSQL servers #### Example recommended setup for PostgreSQL servers
##### Primary node On database nodes edit `/etc/gitlab/gitlab.rb`:
On primary node edit `/etc/gitlab/gitlab.rb`:
```ruby ```ruby
# Disable all components except PostgreSQL and Repmgr and Consul # Disable all components except PostgreSQL, Patroni (or Repmgr), and Consul
roles ['postgres_role'] roles ['postgres_role']
# PostgreSQL configuration # PostgreSQL configuration
postgresql['listen_address'] = '0.0.0.0' postgresql['listen_address'] = '0.0.0.0'
postgresql['hot_standby'] = 'on' postgresql['hot_standby'] = 'on'
postgresql['wal_level'] = 'replica' postgresql['wal_level'] = 'replica'
postgresql['shared_preload_libraries'] = 'repmgr_funcs'
# Enable Patroni (which automatically disables Repmgr).
patroni['enable'] = true
# Disable automatic database migrations # Disable automatic database migrations
gitlab_rails['auto_migrate'] = false gitlab_rails['auto_migrate'] = false
postgresql['pgbouncer_user_password'] = '771a8625958a529132abe6f1a4acb19c' postgresql['pgbouncer_user_password'] = '771a8625958a529132abe6f1a4acb19c'
postgresql['sql_user_password'] = '450409b85a0223a214b5fb1484f34d0f' postgresql['sql_user_password'] = '450409b85a0223a214b5fb1484f34d0f'
postgresql['max_wal_senders'] = 4 patroni['postgresql']['max_wal_senders'] = 4
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/16) postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/16)
repmgr['trust_auth_cidr_addresses'] = %w(10.6.0.0/16)
# Configure the Consul agent # Configure the Consul agent
consul['services'] = %w(postgresql) consul['services'] = %w(postgresql)
...@@ -702,61 +571,19 @@ consul['monitoring_service_discovery'] = true ...@@ -702,61 +571,19 @@ consul['monitoring_service_discovery'] = true
[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect. [Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
##### Secondary nodes #### Example recommended setup manual steps
On secondary nodes, edit `/etc/gitlab/gitlab.rb` and add all the configuration After deploying the configuration follow these steps:
added to primary node, noted above. In addition, append the following
configuration:
```ruby 1. Find the primary database node:
# Specify if a node should attempt to be master on initialization
repmgr['master_on_initialization'] = false
```
[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect. ```shell
gitlab-ctl get-postgresql-primary
```
###### Example recommended setup for application server 1. On the primary database node:
On the server edit `/etc/gitlab/gitlab.rb`: Enable the `pg_trgm` and `btree_gist` extensions:
```ruby
external_url 'http://gitlab.example.com'
gitlab_rails['db_host'] = '10.6.0.20' # Internal Load Balancer for PgBouncer nodes
gitlab_rails['db_port'] = 6432
gitlab_rails['db_password'] = 'toomanysecrets'
gitlab_rails['auto_migrate'] = false
postgresql['enable'] = false
pgbouncer['enable'] = false
consul['enable'] = true
# Configure Consul agent
consul['watchers'] = %w(postgresql)
pgbouncer['users'] = {
'gitlab-consul': {
password: '5e0e3263571e3704ad655076301d6ebe'
},
'pgbouncer': {
password: '771a8625958a529132abe6f1a4acb19c'
}
}
consul['configuration'] = {
retry_join: %w(10.6.0.11 10.6.0.12 10.6.0.13)
}
```
[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
#### Example recommended setup manual steps
After deploying the configuration follow these steps:
1. On `10.6.0.31`, our primary database:
Enable the `pg_trgm` and `btree_gist` extensions:
```shell ```shell
gitlab-psql -d gitlabhq_production gitlab-psql -d gitlabhq_production
...@@ -767,22 +594,6 @@ After deploying the configuration follow these steps: ...@@ -767,22 +594,6 @@ After deploying the configuration follow these steps:
CREATE EXTENSION btree_gist; CREATE EXTENSION btree_gist;
``` ```
1. On `10.6.0.32`, our first standby database:
Make this node a standby of the primary:
```shell
gitlab-ctl repmgr standby setup 10.6.0.21
```
1. On `10.6.0.33`, our second standby database:
Make this node a standby of the primary:
```shell
gitlab-ctl repmgr standby setup 10.6.0.21
```
1. On `10.6.0.41`, our application server: 1. On `10.6.0.41`, our application server:
Set `gitlab-consul` user's PgBouncer password to `toomanysecrets`: Set `gitlab-consul` user's PgBouncer password to `toomanysecrets`:
...@@ -802,15 +613,15 @@ After deploying the configuration follow these steps: ...@@ -802,15 +613,15 @@ After deploying the configuration follow these steps:
This example uses 3 PostgreSQL servers, and 1 application node (with PgBouncer setup alongside). This example uses 3 PostgreSQL servers, and 1 application node (with PgBouncer setup alongside).
It differs from the [recommended setup](#example-recommended-setup) by moving the Consul servers into the same servers we use for PostgreSQL. It differs from the [recommended setup](#example-recommended-setup) by moving the Consul servers into the same servers we use for PostgreSQL.
The trade-off is between reducing server counts, against the increased operational complexity of needing to deal with PostgreSQL [failover](#failover-procedure) and [restore](#restore-procedure) procedures in addition to [Consul outage recovery](../consul.md#outage-recovery) on the same set of machines. The trade-off is between reducing server counts, against the increased operational complexity of needing to deal with PostgreSQL [failover](#manual-failover-procedure-for-patroni) procedures in addition to [Consul outage recovery](../consul.md#outage-recovery) on the same set of machines.
In this example we start with all servers on the same 10.6.0.0/16 private network range, they can connect to each freely other on those addresses. In this example we start with all servers on the same 10.6.0.0/16 private network range, they can connect to each freely other on those addresses.
Here is a list and description of each machine and the assigned IP: Here is a list and description of each machine and the assigned IP:
- `10.6.0.21`: PostgreSQL master - `10.6.0.21`: PostgreSQL 1
- `10.6.0.22`: PostgreSQL secondary - `10.6.0.22`: PostgreSQL 2
- `10.6.0.23`: PostgreSQL secondary - `10.6.0.23`: PostgreSQL 3
- `10.6.0.31`: GitLab application - `10.6.0.31`: GitLab application
All passwords are set to `toomanysecrets`, please do not use this password or derived hashes. All passwords are set to `toomanysecrets`, please do not use this password or derived hashes.
...@@ -821,9 +632,7 @@ Please note that after the initial configuration, if a failover occurs, the Post ...@@ -821,9 +632,7 @@ Please note that after the initial configuration, if a failover occurs, the Post
#### Example minimal configuration for database servers #### Example minimal configuration for database servers
##### Primary node On database nodes edit `/etc/gitlab/gitlab.rb`:
On primary database node edit `/etc/gitlab/gitlab.rb`:
```ruby ```ruby
# Disable all components except PostgreSQL, Repmgr, and Consul # Disable all components except PostgreSQL, Repmgr, and Consul
...@@ -833,7 +642,9 @@ roles ['postgres_role'] ...@@ -833,7 +642,9 @@ roles ['postgres_role']
postgresql['listen_address'] = '0.0.0.0' postgresql['listen_address'] = '0.0.0.0'
postgresql['hot_standby'] = 'on' postgresql['hot_standby'] = 'on'
postgresql['wal_level'] = 'replica' postgresql['wal_level'] = 'replica'
postgresql['shared_preload_libraries'] = 'repmgr_funcs'
# Enable Patroni (which automatically disables Repmgr).
patroni['enable'] = true
# Disable automatic database migrations # Disable automatic database migrations
gitlab_rails['auto_migrate'] = false gitlab_rails['auto_migrate'] = false
...@@ -843,10 +654,9 @@ consul['services'] = %w(postgresql) ...@@ -843,10 +654,9 @@ consul['services'] = %w(postgresql)
postgresql['pgbouncer_user_password'] = '771a8625958a529132abe6f1a4acb19c' postgresql['pgbouncer_user_password'] = '771a8625958a529132abe6f1a4acb19c'
postgresql['sql_user_password'] = '450409b85a0223a214b5fb1484f34d0f' postgresql['sql_user_password'] = '450409b85a0223a214b5fb1484f34d0f'
postgresql['max_wal_senders'] = 4 patroni['postgresql']['max_wal_senders'] = 4
postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/16) postgresql['trust_auth_cidr_addresses'] = %w(10.6.0.0/16)
repmgr['trust_auth_cidr_addresses'] = %w(10.6.0.0/16)
consul['configuration'] = { consul['configuration'] = {
server: true, server: true,
...@@ -856,16 +666,6 @@ consul['configuration'] = { ...@@ -856,16 +666,6 @@ consul['configuration'] = {
[Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect. [Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
##### Secondary nodes
On secondary nodes, edit `/etc/gitlab/gitlab.rb` and add all the information added
to primary node, noted above. In addition, append the following configuration
```ruby
# Specify if a node should attempt to be master on initialization
repmgr['master_on_initialization'] = false
```
#### Example minimal configuration for application server #### Example minimal configuration for application server
On the server edit `/etc/gitlab/gitlab.rb`: On the server edit `/etc/gitlab/gitlab.rb`:
...@@ -908,555 +708,678 @@ consul['configuration'] = { ...@@ -908,555 +708,678 @@ consul['configuration'] = {
The manual steps for this configuration are the same as for the [example recommended setup](#example-recommended-setup-manual-steps). The manual steps for this configuration are the same as for the [example recommended setup](#example-recommended-setup-manual-steps).
### Failover procedure ### Manual failover procedure for Patroni
By default, if the master database fails, `repmgrd` should promote one of the While Patroni supports automatic failover, you also have the ability to perform
standby nodes to master automatically, and Consul will update PgBouncer with a manual one, where you have two slightly different options:
the new master.
If you need to failover manually, you have two options: - **Failover**: allows you to perform a manual failover when there are no healthy nodes.
You can perform this action in any PostgreSQL node:
**Shutdown the current master database** ```shell
sudo gitlab-ctl patroni failover
```
Run: - **Switchover**: only works when the cluster is healthy and allows you to schedule a switchover (it can happen immediately).
You can perform this action in any PostgreSQL node:
```shell ```shell
gitlab-ctl stop postgresql sudo gitlab-ctl patroni switchover
``` ```
The automated failover process will see this and failover to one of the For further details on this subject, see the
standby nodes. [Patroni documentation](https://patroni.readthedocs.io/en/latest/rest_api.html#switchover-and-failover-endpoints).
**Or perform a manual failover** ## Patroni
1. Ensure the old master node is not still active. NOTE:
1. Login to the server that should become the new master and run: Using Patroni instead of Repmgr is supported for PostgreSQL 11 and required for PostgreSQL 12.
```shell Patroni is an opinionated solution for PostgreSQL high-availability. It takes the control of PostgreSQL, overrides its
gitlab-ctl repmgr standby promote configuration and manages its lifecycle (start, stop, restart). This is a more active approach when compared to repmgr.
``` Both repmgr and Patroni are both supported and available. But Patroni will be the default (and perhaps the only) option
for PostgreSQL 12 clustering and cascading replication for Geo deployments.
1. If there are any other standby servers in the cluster, have them follow The [architecture](#example-recommended-setup-manual-steps) (that was mentioned above) does not change for Patroni.
the new master server: You do not need any special consideration for Patroni while provisioning your database nodes. Patroni heavily relies on
Consul to store the state of the cluster and elect a leader. Any failure in Consul cluster and its leader election will
propagate to Patroni cluster as well.
```shell Similar to repmgr, Patroni monitors the cluster and handles failover. When the primary node fails it works with Consul
gitlab-ctl repmgr standby follow NEW_MASTER to notify PgBouncer. However, as opposed to repmgr, on failure, Patroni handles the transitioning of the old primary to
``` a replica and rejoins it to the cluster automatically. So you do not need any manual operation for recovering the
cluster as you do with repmgr.
#### Geo secondary site considerations With Patroni the connection flow is slightly different. Patroni on each node connects to Consul agent to join the
cluster. Only after this point it decides if the node is the primary or a replica. Based on this decision, it configures
and starts PostgreSQL which it communicates with directly over a Unix socket. This implies that if Consul cluster is not
functional or does not have a leader, Patroni and by extension PostgreSQL will not start. Patroni also exposes a REST
API which can be accessed via its [default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#patroni)
on each node.
When a Geo secondary site is replicating from a primary site that uses `repmgr` and `PgBouncer`, [replicating through PgBouncer is not supported](https://github.com/pgbouncer/pgbouncer/issues/382#issuecomment-517911529) and the secondary must replicate directly from the leader node in the `repmgr` cluster. Therefore, when there is a failover in the `repmgr` cluster, you will need to manually re-point your secondary site to replicate from the new leader with: ### Database authorization for Patroni
```shell Patroni uses Unix socket to manage PostgreSQL instance. Therefore, the connection from the `local` socket must be trusted.
sudo gitlab-ctl replicate-geo-database --host=<new_leader_ip> --replication-slot=<slot_name>
```
Otherwise, the replication will not happen anymore, even if the original node gets re-added as a follower node. This will re-sync your secondary site database and may take a long time depending on the amount of data to sync. Also, replicas use the replication user (`gitlab_replicator` by default) to communicate with the leader. For this user,
you can choose between `trust` and `md5` authentication. If you set `postgresql['sql_replication_password']`,
Patroni will use `md5` authentication, otherwise it falls back to `trust`. You must to specify the cluster CIDR in
`postgresql['md5_auth_cidr_addresses']` or `postgresql['trust_auth_cidr_addresses']` respectively.
### Restore procedure ### Interacting with Patroni cluster
If a node fails, it can be removed from the cluster, or added back as a standby You can use `gitlab-ctl patroni members` to check the status of the cluster members. To check the status of each node
after it has been restored to service. `gitlab-ctl patroni` provides two additional sub-commands, `check-leader` and `check-replica` which indicate if a node
is the primary or a replica.
#### Remove a standby from the cluster When Patroni is enabled, you don't have direct control over `postgresql` service. Patroni will signal PostgreSQL's startup,
shutdown, and restart. For example, for shutting down PostgreSQL on a node, you must shutdown Patroni on the same node
with:
From any other node in the cluster, run: ```shell
sudo gitlab-ctl stop patroni
```
```shell Note that stopping or restarting Patroni service on the leader node will trigger the automatic failover. If you
gitlab-ctl repmgr standby unregister --node=X want to signal Patroni to reload its configuration or restart PostgreSQL process without triggering the failover, you
``` must use the `reload` or `restart` sub-commands of `gitlab-ctl patroni` instead. These two sub-commands are wrappers of
the same `patronictl` commands.
where X is the value of node in `repmgr.conf` on the old server. ### Recovering the Patroni cluster
To find this, you can use: To recover the old primary and rejoin it to the cluster as a replica, you can simply start Patroni with:
```shell ```shell
awk -F = '$1 == "node" { print $2 }' /var/opt/gitlab/postgresql/repmgr.conf sudo gitlab-ctl start patroni
``` ```
It will output something like: No further configuration or intervention is needed.
```plaintext ### Maintenance procedure for Patroni
959789412
```
Then you will use this ID to unregister the node: With Patroni enabled, you can run a planned maintenance. If you want to do some maintenance work on one node and you
don't want Patroni to manage it, you can use put it into maintenance mode:
```shell ```shell
gitlab-ctl repmgr standby unregister --node=959789412 sudo gitlab-ctl patroni pause
``` ```
#### Add a node as a standby server When Patroni runs in a paused mode, it does not change the state of PostgreSQL. Once you are done you can resume Patroni:
From the standby node, run: ```shell
sudo gitlab-ctl patroni resume
```
```shell For further details, see [Patroni documentation on this subject](https://patroni.readthedocs.io/en/latest/pause.html).
gitlab-ctl repmgr standby follow NEW_MASTER
gitlab-ctl restart repmgrd
```
WARNING: ### Switching from repmgr to Patroni
When the server is brought back online, and before
you switch it to a standby node, repmgr will report that there are two masters.
If there are any clients that are still attempting to write to the old master,
this will cause a split, and the old master will need to be resynced from
scratch by performing a `gitlab-ctl repmgr standby setup NEW_MASTER`.
#### Add a failed master back into the cluster as a standby node WARNING:
Although switching from repmgr to Patroni is fairly straightforward the other way around is not. Rolling back from
Patroni to repmgr can be complicated and may involve deletion of data directory. If you need to do that, please contact
GitLab support.
Once `repmgrd` and PostgreSQL are running, the node will need to follow the new You can switch an exiting database cluster to use Patroni instead of repmgr with the following steps:
as a standby node.
```shell 1. Stop repmgr on all replica nodes and lastly with the primary node:
gitlab-ctl repmgr standby follow NEW_MASTER
```
Once the node is following the new master as a standby, the node needs to be ```shell
[unregistered from the cluster on the new master node](#remove-a-standby-from-the-cluster). sudo gitlab-ctl stop repmgrd
```
Once the old master node has been unregistered from the cluster, it will need 1. Stop PostgreSQL on all replica nodes:
to be setup as a new standby:
```shell ```shell
gitlab-ctl repmgr standby setup NEW_MASTER sudo gitlab-ctl stop postgresql
``` ```
Failure to unregister and read the old master node can lead to subsequent failovers NOTE:
not working. Ensure that there is no `walsender` process running on the primary node.
`ps aux | grep walsender` must not show any running process.
### Alternate configurations 1. On the primary node, [configure Patroni](#configuring-patroni-cluster). Remove `repmgr` and any other
repmgr-specific configuration. Also remove any configuration that is related to PostgreSQL replication.
1. [Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) on the primary node. It will become
the leader. You can check this with:
#### Database authorization ```shell
sudo gitlab-ctl tail patroni
```
By default, we give any host on the database network the permission to perform 1. Repeat the last two steps for all replica nodes. `gitlab.rb` should look the same on all nodes.
repmgr operations using PostgreSQL's `trust` method. If you do not want this 1. Optional: You can remove `gitlab_repmgr` database and role on the primary.
level of trust, there are alternatives.
You can trust only the specific nodes that will be database clusters, or you ### Upgrading PostgreSQL major version in a Patroni cluster
can require md5 authentication.
#### Trust specific addresses As of GitLab 13.3, PostgreSQL 11.7 and 12.3 are both shipped with Omnibus GitLab, and as of GitLab 13.7
PostgreSQL 12 is used by default. If you want to upgrade to PostgreSQL 12 in versions prior to GitLab 13.7,
you must ask for it explicitly.
If you know the IP address, or FQDN of all database and PgBouncer nodes in the WARNING:
cluster, you can trust only those nodes. The procedure for upgrading PostgreSQL in a Patroni cluster is different than when upgrading using repmgr.
The following outlines the key differences and important considerations that need to be accounted for when
upgrading PostgreSQL.
In `/etc/gitlab/gitlab.rb` on all of the database nodes, set Here are a few key facts that you must consider before upgrading PostgreSQL:
`repmgr['trust_auth_cidr_addresses']` to an array of strings containing all of
the addresses.
If setting to a node's FQDN, they must have a corresponding PTR record in DNS. - The main point is that you will have to **shut down the Patroni cluster**. This means that your
If setting to a node's IP address, specify it as `XXX.XXX.XXX.XXX/32`. GitLab deployment will be down for the duration of database upgrade or, at least, as long as your leader
node is upgraded. This can be **a significant downtime depending on the size of your database**.
For example: - Upgrading PostgreSQL creates a new data directory with a new control data. From Patroni's perspective
this is a new cluster that needs to be bootstrapped again. Therefore, as part of the upgrade procedure,
the cluster state, which is stored in Consul, will be wiped out. Once the upgrade is completed, Patroni
will be instructed to bootstrap a new cluster. **Note that this will change your _cluster ID_**.
```ruby - The procedures for upgrading leader and replicas are not the same. That is why it is important to use the
repmgr['trust_auth_cidr_addresses'] = %w(192.168.1.44/32 db2.example.com) right procedure on each node.
```
#### MD5 Authentication - Upgrading a replica node **deletes the data directory and resynchronizes it** from the leader using the
configured replication method (currently `pg_basebackup` is the only available option). It might take some
time for replica to catch up with the leader, depending on the size of your database.
If you are running on an untrusted network, repmgr can use md5 authentication - An overview of the upgrade procedure is outlined in [Patoni's documentation](https://patroni.readthedocs.io/en/latest/existing_data.html#major-upgrade-of-postgresql-version).
with a [`.pgpass` file](https://www.postgresql.org/docs/11/libpq-pgpass.html) You can still use `gitlab-ctl pg-upgrade` which implements this procedure with a few adjustments.
to authenticate.
You can specify by IP address, FQDN, or by subnet, using the same format as in Considering these, you should carefully plan your PostgreSQL upgrade:
the previous section:
1. On the current master node, create a password for the `gitlab` and 1. Find out which node is the leader and which node is a replica:
`gitlab_repmgr` user:
```shell ```shell
gitlab-psql -d template1 gitlab-ctl patroni members
template1=# \password gitlab_repmgr
Enter password: ****
Confirm password: ****
template1=# \password gitlab
``` ```
1. On each database node: NOTE:
`gitlab-ctl pg-upgrade` tries to detect the role of the node. If for any reason the auto-detection
does not work or you believe it did not detect the role correctly, you can use the `--leader` or `--replica`
arguments to manually override it.
1. Edit `/etc/gitlab/gitlab.rb`: 1. Stop Patroni **only on replicas**.
1. Ensure `repmgr['trust_auth_cidr_addresses']` is **not** set
1. Set `postgresql['md5_auth_cidr_addresses']` to the desired value
1. Set `postgresql['sql_replication_user'] = 'gitlab_repmgr'`
1. Reconfigure with `gitlab-ctl reconfigure`
1. Restart PostgreSQL with `gitlab-ctl restart postgresql`
1. Create a `.pgpass` file. Enter the `gitlab_repmgr` password twice to ```shell
when asked: sudo gitlab-ctl stop patroni
```
```shell 1. Enable the maintenance mode on the **application node**:
gitlab-ctl write-pgpass --user gitlab_repmgr --hostuser gitlab-psql --database '*'
```
1. On each PgBouncer node, edit `/etc/gitlab/gitlab.rb`: ```shell
1. Ensure `gitlab_rails['db_password']` is set to the plaintext password for sudo gitlab-ctl deploy-page up
the `gitlab` database user ```
1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect
## Troubleshooting 1. Upgrade PostgreSQL on **the leader node** and make sure that the upgrade is completed successfully:
### Consul and PostgreSQL changes not taking effect ```shell
sudo gitlab-ctl pg-upgrade -V 12
```
Due to the potential impacts, `gitlab-ctl reconfigure` only reloads Consul and PostgreSQL, it will not restart the services. However, not all changes can be activated by reloading. 1. Check the status of the leader and cluster. You can only proceed if you have a healthy leader:
To restart either service, run `gitlab-ctl restart SERVICE` ```shell
gitlab-ctl patroni check-leader
For PostgreSQL, it is usually safe to restart the master node by default. Automatic failover defaults to a 1 minute timeout. Provided the database returns before then, nothing else needs to be done. To be safe, you can stop `repmgrd` on the standby nodes first with `gitlab-ctl stop repmgrd`, then start afterwards with `gitlab-ctl start repmgrd`. # OR
On the Consul server nodes, it is important to [restart the Consul service](../consul.md#restart-consul) in a controlled manner. gitlab-ctl patroni members
```
### `gitlab-ctl repmgr-check-master` command produces errors 1. You can now disable the maintenance mode on the **application node**:
If this command displays errors about database permissions it is likely that something failed during ```shell
install, resulting in the `gitlab-consul` database user getting incorrect permissions. Follow these sudo gitlab-ctl deploy-page down
steps to fix the problem: ```
1. On the master database node, connect to the database prompt - `gitlab-psql -d template1` 1. Upgrade PostgreSQL **on replicas** (you can do this in parallel on all of them):
1. Delete the `gitlab-consul` user - `DROP USER "gitlab-consul";`
1. Exit the database prompt - `\q`
1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) and the user will be re-added with the proper permissions.
1. Change to the `gitlab-consul` user - `su - gitlab-consul`
1. Try the check command again - `gitlab-ctl repmgr-check-master`.
Now there should not be errors. If errors still occur then there is another problem. ```shell
sudo gitlab-ctl pg-upgrade -V 12
```
### PgBouncer error `ERROR: pgbouncer cannot connect to server` NOTE:
Reverting PostgreSQL upgrade with `gitlab-ctl revert-pg-upgrade` has the same considerations as
`gitlab-ctl pg-upgrade`. You should follow the same procedure by first stopping the replicas,
then reverting the leader, and finally reverting the replicas.
You may get this error when running `gitlab-rake gitlab:db:configure` or you ## Repmgr
may see the error in the PgBouncer log file.
```plaintext NOTE:
PG::ConnectionBad: ERROR: pgbouncer cannot connect to server Using Patroni instead of Repmgr is supported for PostgreSQL 11 and required for PostgreSQL 12.
```
The problem may be that your PgBouncer node's IP address is not included in the ### Configuring Repmgr Nodes
`trust_auth_cidr_addresses` setting in `/etc/gitlab/gitlab.rb` on the database nodes.
You can confirm that this is the issue by checking the PostgreSQL log on the master 1. On the master database node, edit `/etc/gitlab/gitlab.rb` replacing values noted in the `# START user configuration` section:
database node. If you see the following error then `trust_auth_cidr_addresses`
is the problem.
```plaintext ```ruby
2018-03-29_13:59:12.11776 FATAL: no pg_hba.conf entry for host "123.123.123.123", user "pgbouncer", database "gitlabhq_production", SSL off # Disable all components except PostgreSQL and Repmgr and Consul
``` roles ['postgres_role']
To fix the problem, add the IP address to `/etc/gitlab/gitlab.rb`. # PostgreSQL configuration
postgresql['listen_address'] = '0.0.0.0'
postgresql['hot_standby'] = 'on'
postgresql['wal_level'] = 'replica'
postgresql['shared_preload_libraries'] = 'repmgr_funcs'
```ruby # Disable automatic database migrations
postgresql['trust_auth_cidr_addresses'] = %w(123.123.123.123/32 <other_cidrs>) gitlab_rails['auto_migrate'] = false
```
[Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect. # Configure the Consul agent
consul['services'] = %w(postgresql)
### Issues with other components # START user configuration
# Please set the real values as explained in Required Information section
#
# Replace PGBOUNCER_PASSWORD_HASH with a generated md5 value
postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
# Replace X with value of number of db nodes + 1
postgresql['max_wal_senders'] = X
postgresql['max_replication_slots'] = X
If you're running into an issue with a component not outlined here, be sure to check the troubleshooting section of their specific documentation page: # Replace XXX.XXX.XXX.XXX/YY with Network Address
postgresql['trust_auth_cidr_addresses'] = %w(XXX.XXX.XXX.XXX/YY)
repmgr['trust_auth_cidr_addresses'] = %w(127.0.0.1/32 XXX.XXX.XXX.XXX/YY)
- [Consul](../consul.md#troubleshooting-consul) # Replace placeholders:
- [PostgreSQL](https://docs.gitlab.com/omnibus/settings/database.html#troubleshooting) #
# Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z
# with the addresses gathered for CONSUL_SERVER_NODES
consul['configuration'] = {
retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z)
}
#
# END user configuration
```
## Patroni > `postgres_role` was introduced with GitLab 10.3
NOTE: 1. On secondary nodes, add all the configuration specified above for primary node
Starting from GitLab 13.1, Patroni is available for **experimental** use to replace repmgr. Due to its to `/etc/gitlab/gitlab.rb`. In addition, append the following configuration
experimental nature, Patroni support is **subject to change without notice.** to inform `gitlab-ctl` that they are standby nodes initially and it need not
attempt to register them as primary node
Patroni is an opinionated solution for PostgreSQL high-availability. It takes the control of PostgreSQL, overrides its ```ruby
configuration and manages its lifecycle (start, stop, restart). This is a more active approach when compared to repmgr. # Specify if a node should attempt to be master on initialization
Both repmgr and Patroni are both supported and available. But Patroni will be the default (and perhaps the only) option repmgr['master_on_initialization'] = false
for PostgreSQL 12 clustering and cascading replication for Geo deployments. ```
The [architecture](#example-recommended-setup-manual-steps) (that was mentioned above) does not change for Patroni. 1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
You do not need any special consideration for Patroni while provisioning your database nodes. Patroni heavily relies on 1. [Enable Monitoring](#enable-monitoring)
Consul to store the state of the cluster and elect a leader. Any failure in Consul cluster and its leader election will
propagate to Patroni cluster as well.
Similar to repmgr, Patroni monitors the cluster and handles failover. When the primary node fails it works with Consul > Please note:
to notify PgBouncer. However, as opposed to repmgr, on failure, Patroni handles the transitioning of the old primary to >
a replica and rejoins it to the cluster automatically. So you do not need any manual operation for recovering the > - If you want your database to listen on a specific interface, change the configuration:
cluster as you do with repmgr. > `postgresql['listen_address'] = '0.0.0.0'`.
> - If your PgBouncer service runs under a different user account,
> you also need to specify: `postgresql['pgbouncer_user'] = PGBOUNCER_USERNAME` in
> your configuration.
With Patroni the connection flow is slightly different. Patroni on each node connects to Consul agent to join the #### Database nodes post-configuration
cluster. Only after this point it decides if the node is the primary or a replica. Based on this decision, it configures
and starts PostgreSQL which it communicates with directly over a Unix socket. This implies that if Consul cluster is not
functional or does not have a leader, Patroni and by extension PostgreSQL will not start. Patroni also exposes a REST
API which can be accessed via its [default port](https://docs.gitlab.com/omnibus/package-information/defaults.html#patroni)
on each node.
### Configuring Patroni cluster ##### Primary node
You must enable Patroni explicitly to be able to use it (with `patroni['enable'] = true`). When Patroni is enabled Select one node as a primary node.
repmgr will be disabled automatically.
Any PostgreSQL configuration item that controls replication, for example `wal_level`, `max_wal_senders`, etc, are strictly 1. Open a database prompt:
controlled by Patroni and will override the original settings that you make with the `postgresql[...]` configuration key.
Hence, they are all separated and placed under `patroni['postgresql'][...]`. This behavior is limited to replication.
Patroni honours any other PostgreSQL configuration that was made with the `postgresql[...]` configuration key. For example,
`max_wal_senders` by default is set to `5`. If you wish to change this you must set it with the `patroni['postgresql']['max_wal_senders']`
configuration key.
The configuration of Patroni node is very similar to a repmgr but shorter. When Patroni is enabled, first you can ignore ```shell
any replication setting of PostgreSQL (it will be overwritten anyway). Then you can remove any `repmgr[...]` or gitlab-psql -d gitlabhq_production
repmgr-specific configuration as well. Especially, make sure that you remove `postgresql['shared_preload_libraries'] = 'repmgr_funcs'`. ```
Here is an example similar to [the one that was done with repmgr](#configuring-the-database-nodes): 1. Enable the `pg_trgm` extension:
```ruby ```shell
# Disable all components except PostgreSQL and Repmgr and Consul CREATE EXTENSION pg_trgm;
roles['postgres_role'] ```
# Enable Patroni 1. Enable the `btree_gist` extension:
patroni['enable'] = true
# PostgreSQL configuration ```shell
postgresql['listen_address'] = '0.0.0.0' CREATE EXTENSION btree_gist;
```
# Disable automatic database migrations 1. Exit the database prompt by typing `\q` and Enter.
gitlab_rails['auto_migrate'] = false
# Configure the Consul agent 1. Verify the cluster is initialized with one node:
consul['services'] = %w(postgresql)
# START user configuration ```shell
# Please set the real values as explained in Required Information section gitlab-ctl repmgr cluster show
# ```
# Replace PGBOUNCER_PASSWORD_HASH with a generated md5 value
postgresql['pgbouncer_user_password'] = 'PGBOUNCER_PASSWORD_HASH'
# Replace POSTGRESQL_PASSWORD_HASH with a generated md5 value
postgresql['sql_user_password'] = 'POSTGRESQL_PASSWORD_HASH'
# Replace X with value of number of db nodes + 1 (OPTIONAL the default value is 5) The output should be similar to the following:
patroni['postgresql']['max_wal_senders'] = X
patroni['postgresql']['max_replication_slots'] = X
# Replace XXX.XXX.XXX.XXX/YY with Network Address ```plaintext
postgresql['trust_auth_cidr_addresses'] = %w(XXX.XXX.XXX.XXX/YY) Role | Name | Upstream | Connection String
----------+----------|----------|----------------------------------------
* master | HOSTNAME | | host=HOSTNAME user=gitlab_repmgr dbname=gitlab_repmgr
```
# Replace placeholders: 1. Note down the hostname or IP address in the connection string: `host=HOSTNAME`. We will
# refer to the hostname in the next section as `MASTER_NODE_NAME`. If the value
# Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z is not an IP address, it will need to be a resolvable name (via DNS or
# with the addresses gathered for CONSUL_SERVER_NODES `/etc/hosts`)
consul['configuration'] = {
retry_join: %w(Y.Y.Y.Y consul1.gitlab.example.com Z.Z.Z.Z) ##### Secondary nodes
}
# 1. Set up the repmgr standby:
# END user configuration
```shell
gitlab-ctl repmgr standby setup MASTER_NODE_NAME
```
Do note that this will remove the existing data on the node. The command
has a wait time.
The output should be similar to the following:
```console
# gitlab-ctl repmgr standby setup MASTER_NODE_NAME
Doing this will delete the entire contents of /var/opt/gitlab/postgresql/data
If this is not what you want, hit Ctrl-C now to exit
To skip waiting, rerun with the -w option
Sleeping for 30 seconds
Stopping the database
Removing the data
Cloning the data
Starting the database
Registering the node with the cluster
ok: run: repmgrd: (pid 19068) 0s
```
1. Verify the node now appears in the cluster:
```shell
gitlab-ctl repmgr cluster show
```
The output should be similar to the following:
```plaintext
Role | Name | Upstream | Connection String
----------+---------|-----------|------------------------------------------------
* master | MASTER | | host=MASTER_NODE_NAME user=gitlab_repmgr dbname=gitlab_repmgr
standby | STANDBY | MASTER | host=STANDBY_HOSTNAME user=gitlab_repmgr dbname=gitlab_repmgr
```
Repeat the above steps on all secondary nodes.
#### Database checkpoint
Before moving on, make sure the databases are configured correctly. Run the
following command on the **primary** node to verify that replication is working
properly:
```shell
gitlab-ctl repmgr cluster show
``` ```
You do not need an additional or different configuration for replica nodes. As a matter of fact, you don't have to have The output should be similar to:
a predetermined primary node. Therefore all database nodes use the same configuration.
Once the configuration of a node is done, you must [reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) ```plaintext
on each node for the changes to take effect. Role | Name | Upstream | Connection String
----------+--------------|--------------|--------------------------------------------------------------------
* master | MASTER | | host=MASTER port=5432 user=gitlab_repmgr dbname=gitlab_repmgr
standby | STANDBY | MASTER | host=STANDBY port=5432 user=gitlab_repmgr dbname=gitlab_repmgr
```
Generally, when Consul cluster is ready, the first node that [reconfigures](../restart_gitlab.md#omnibus-gitlab-reconfigure) If the 'Role' column for any node says "FAILED", check the
becomes the leader. You do not need to sequence the nodes reconfiguration. You can run them in parallel or in any order. [Troubleshooting section](#troubleshooting) before proceeding.
If you choose an arbitrary order you do not have any predetermined master.
As opposed to repmgr, once the nodes are reconfigured you do not need any further action or additional command to join Also, check that the check master command works successfully on each node:
the replicas.
#### Database authorization for Patroni ```shell
su - gitlab-consul
gitlab-ctl repmgr-check-master || echo 'This node is a standby repmgr node'
```
Patroni uses Unix socket to manage PostgreSQL instance. Therefore, the connection from the `local` socket must be trusted. This command relies on exit codes to tell Consul whether a particular node is a master
or secondary. The most important thing here is that this command does not produce errors.
If there are errors it's most likely due to incorrect `gitlab-consul` database user permissions.
Check the [Troubleshooting section](#troubleshooting) before proceeding.
Also, replicas use the replication user (`gitlab_replicator` by default) to communicate with the leader. For this user, ### Repmgr failover procedure
you can choose between `trust` and `md5` authentication. If you set `postgresql['sql_replication_password']`,
Patroni will use `md5` authentication, otherwise it falls back to `trust`. You must to specify the cluster CIDR in
`postgresql['md5_auth_cidr_addresses']` or `postgresql['trust_auth_cidr_addresses']` respectively.
### Interacting with Patroni cluster By default, if the master database fails, `repmgrd` should promote one of the
standby nodes to master automatically, and Consul will update PgBouncer with
the new master.
If you need to failover manually, you have two options:
**Shutdown the current master database**
Run:
```shell
gitlab-ctl stop postgresql
```
The automated failover process will see this and failover to one of the
standby nodes.
**Or perform a manual failover**
1. Ensure the old master node is not still active.
1. Login to the server that should become the new master and run:
```shell
gitlab-ctl repmgr standby promote
```
1. If there are any other standby servers in the cluster, have them follow
the new master server:
```shell
gitlab-ctl repmgr standby follow NEW_MASTER
```
#### Geo secondary site considerations
When a Geo secondary site is replicating from a primary site that uses `repmgr` and `PgBouncer`, [replicating through PgBouncer is not supported](https://github.com/pgbouncer/pgbouncer/issues/382#issuecomment-517911529) and the secondary must replicate directly from the leader node in the `repmgr` cluster. Therefore, when there is a failover in the `repmgr` cluster, you will need to manually re-point your secondary site to replicate from the new leader with:
```shell
sudo gitlab-ctl replicate-geo-database --host=<new_leader_ip> --replication-slot=<slot_name>
```
Otherwise, the replication will not happen anymore, even if the original node gets re-added as a follower node. This will re-sync your secondary site database and may take a long time depending on the amount of data to sync.
### Repmgr Restore procedure
If a node fails, it can be removed from the cluster, or added back as a standby
after it has been restored to service.
#### Remove a standby from the cluster
From any other node in the cluster, run:
You can use `gitlab-ctl patroni members` to check the status of the cluster members. To check the status of each node ```shell
`gitlab-ctl patroni` provides two additional sub-commands, `check-leader` and `check-replica` which indicate if a node gitlab-ctl repmgr standby unregister --node=X
is the primary or a replica. ```
When Patroni is enabled, you don't have direct control over `postgresql` service. Patroni will signal PostgreSQL's startup, where X is the value of node in `repmgr.conf` on the old server.
shutdown, and restart. For example, for shutting down PostgreSQL on a node, you must shutdown Patroni on the same node
with:
```shell To find this, you can use:
sudo gitlab-ctl stop patroni
```
Note that stopping or restarting Patroni service on the leader node will trigger the automatic failover. If you ```shell
want to signal Patroni to reload its configuration or restart PostgreSQL process without triggering the failover, you awk -F = '$1 == "node" { print $2 }' /var/opt/gitlab/postgresql/repmgr.conf
must use the `reload` or `restart` sub-commands of `gitlab-ctl patroni` instead. These two sub-commands are wrappers of ```
the same `patronictl` commands.
### Manual failover procedure for Patroni It will output something like:
While Patroni supports automatic failover, you also have the ability to perform ```plaintext
a manual one, where you have two slightly different options: 959789412
```
- **Failover**: allows you to perform a manual failover when there are no healthy nodes. Then you will use this ID to unregister the node:
You can perform this action in any PostgreSQL node:
```shell ```shell
sudo gitlab-ctl patroni failover gitlab-ctl repmgr standby unregister --node=959789412
``` ```
- **Switchover**: only works when the cluster is healthy and allows you to schedule a switchover (it can happen immediately). #### Add a node as a standby server
You can perform this action in any PostgreSQL node:
From the standby node, run:
```shell ```shell
sudo gitlab-ctl patroni switchover gitlab-ctl repmgr standby follow NEW_MASTER
gitlab-ctl restart repmgrd
``` ```
For further details on this subject, see the WARNING:
[Patroni documentation](https://patroni.readthedocs.io/en/latest/rest_api.html#switchover-and-failover-endpoints). When the server is brought back online, and before
you switch it to a standby node, repmgr will report that there are two masters.
If there are any clients that are still attempting to write to the old master,
this will cause a split, and the old master will need to be resynced from
scratch by performing a `gitlab-ctl repmgr standby setup NEW_MASTER`.
### Recovering the Patroni cluster #### Add a failed master back into the cluster as a standby node
To recover the old primary and rejoin it to the cluster as a replica, you can simply start Patroni with: Once `repmgrd` and PostgreSQL are running, the node will need to follow the new
as a standby node.
```shell ```shell
sudo gitlab-ctl start patroni gitlab-ctl repmgr standby follow NEW_MASTER
``` ```
No further configuration or intervention is needed. Once the node is following the new master as a standby, the node needs to be
[unregistered from the cluster on the new master node](#remove-a-standby-from-the-cluster).
### Maintenance procedure for Patroni Once the old master node has been unregistered from the cluster, it will need
to be setup as a new standby:
With Patroni enabled, you can run a planned maintenance. If you want to do some maintenance work on one node and you ```shell
don't want Patroni to manage it, you can use put it into maintenance mode: gitlab-ctl repmgr standby setup NEW_MASTER
```
```shell Failure to unregister and read the old master node can lead to subsequent failovers
sudo gitlab-ctl patroni pause not working.
```
When Patroni runs in a paused mode, it does not change the state of PostgreSQL. Once you are done you can resume Patroni: ### Alternate configurations
```shell #### Database authorization
sudo gitlab-ctl patroni resume
```
For further details, see [Patroni documentation on this subject](https://patroni.readthedocs.io/en/latest/pause.html). By default, we give any host on the database network the permission to perform
repmgr operations using PostgreSQL's `trust` method. If you do not want this
level of trust, there are alternatives.
### Switching from repmgr to Patroni You can trust only the specific nodes that will be database clusters, or you
can require md5 authentication.
WARNING: #### Trust specific addresses
Although switching from repmgr to Patroni is fairly straightforward the other way around is not. Rolling back from
Patroni to repmgr can be complicated and may involve deletion of data directory. If you need to do that, please contact
GitLab support.
You can switch an exiting database cluster to use Patroni instead of repmgr with the following steps: If you know the IP address, or FQDN of all database and PgBouncer nodes in the
cluster, you can trust only those nodes.
1. Stop repmgr on all replica nodes and lastly with the primary node: In `/etc/gitlab/gitlab.rb` on all of the database nodes, set
`repmgr['trust_auth_cidr_addresses']` to an array of strings containing all of
the addresses.
```shell If setting to a node's FQDN, they must have a corresponding PTR record in DNS.
sudo gitlab-ctl stop repmgrd If setting to a node's IP address, specify it as `XXX.XXX.XXX.XXX/32`.
```
1. Stop PostgreSQL on all replica nodes: For example:
```shell ```ruby
sudo gitlab-ctl stop postgresql repmgr['trust_auth_cidr_addresses'] = %w(192.168.1.44/32 db2.example.com)
``` ```
NOTE: #### MD5 Authentication
Ensure that there is no `walsender` process running on the primary node.
`ps aux | grep walsender` must not show any running process.
1. On the primary node, [configure Patroni](#configuring-patroni-cluster). Remove `repmgr` and any other If you are running on an untrusted network, repmgr can use md5 authentication
repmgr-specific configuration. Also remove any configuration that is related to PostgreSQL replication. with a [`.pgpass` file](https://www.postgresql.org/docs/11/libpq-pgpass.html)
1. [Reconfigure Omnibus GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) on the primary node. It will become to authenticate.
the leader. You can check this with:
```shell You can specify by IP address, FQDN, or by subnet, using the same format as in
sudo gitlab-ctl tail patroni the previous section:
```
1. Repeat the last two steps for all replica nodes. `gitlab.rb` should look the same on all nodes. 1. On the current master node, create a password for the `gitlab` and
1. Optional: You can remove `gitlab_repmgr` database and role on the primary. `gitlab_repmgr` user:
### Upgrading PostgreSQL major version in a Patroni cluster ```shell
gitlab-psql -d template1
template1=# \password gitlab_repmgr
Enter password: ****
Confirm password: ****
template1=# \password gitlab
```
As of GitLab 13.3, PostgreSQL 11.7 and 12.3 are both shipped with Omnibus GitLab. GitLab still 1. On each database node:
uses PostgreSQL 11 by default. Therefore `gitlab-ctl pg-upgrade` does not automatically upgrade
to PostgreSQL 12. If you want to upgrade to PostgreSQL 12, you must ask for it explicitly.
WARNING: 1. Edit `/etc/gitlab/gitlab.rb`:
The procedure for upgrading PostgreSQL in a Patroni cluster is different than when upgrading using repmgr. 1. Ensure `repmgr['trust_auth_cidr_addresses']` is **not** set
The following outlines the key differences and important considerations that need to be accounted for when 1. Set `postgresql['md5_auth_cidr_addresses']` to the desired value
upgrading PostgreSQL. 1. Set `postgresql['sql_replication_user'] = 'gitlab_repmgr'`
1. Reconfigure with `gitlab-ctl reconfigure`
1. Restart PostgreSQL with `gitlab-ctl restart postgresql`
Here are a few key facts that you must consider before upgrading PostgreSQL: 1. Create a `.pgpass` file. Enter the `gitlab_repmgr` password twice to
when asked:
- The main point is that you will have to **shut down the Patroni cluster**. This means that your ```shell
GitLab deployment will be down for the duration of database upgrade or, at least, as long as your leader gitlab-ctl write-pgpass --user gitlab_repmgr --hostuser gitlab-psql --database '*'
node is upgraded. This can be **a significant downtime depending on the size of your database**. ```
- Upgrading PostgreSQL creates a new data directory with a new control data. From Patroni's perspective 1. On each PgBouncer node, edit `/etc/gitlab/gitlab.rb`:
this is a new cluster that needs to be bootstrapped again. Therefore, as part of the upgrade procedure, 1. Ensure `gitlab_rails['db_password']` is set to the plaintext password for
the cluster state, which is stored in Consul, will be wiped out. Once the upgrade is completed, Patroni the `gitlab` database user
will be instructed to bootstrap a new cluster. **Note that this will change your _cluster ID_**. 1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect
- The procedures for upgrading leader and replicas are not the same. That is why it is important to use the ## Troubleshooting
right procedure on each node.
- Upgrading a replica node **deletes the data directory and resynchronizes it** from the leader using the ### Consul and PostgreSQL changes not taking effect
configured replication method (currently `pg_basebackup` is the only available option). It might take some
time for replica to catch up with the leader, depending on the size of your database.
- An overview of the upgrade procedure is outlined in [Patoni's documentation](https://patroni.readthedocs.io/en/latest/existing_data.html#major-upgrade-of-postgresql-version). Due to the potential impacts, `gitlab-ctl reconfigure` only reloads Consul and PostgreSQL, it will not restart the services. However, not all changes can be activated by reloading.
You can still use `gitlab-ctl pg-upgrade` which implements this procedure with a few adjustments.
Considering these, you should carefully plan your PostgreSQL upgrade: To restart either service, run `gitlab-ctl restart SERVICE`
1. Find out which node is the leader and which node is a replica: For PostgreSQL, it is usually safe to restart the master node by default. Automatic failover defaults to a 1 minute timeout. Provided the database returns before then, nothing else needs to be done. To be safe, you can stop `repmgrd` on the standby nodes first with `gitlab-ctl stop repmgrd`, then start afterwards with `gitlab-ctl start repmgrd`.
```shell On the Consul server nodes, it is important to [restart the Consul service](../consul.md#restart-consul) in a controlled manner.
gitlab-ctl patroni members
```
NOTE: ### `gitlab-ctl repmgr-check-master` command produces errors
`gitlab-ctl pg-upgrade` tries to detect the role of the node. If for any reason the auto-detection
does not work or you believe it did not detect the role correctly, you can use the `--leader` or `--replica`
arguments to manually override it.
1. Stop Patroni **only on replicas**. If this command displays errors about database permissions it is likely that something failed during
install, resulting in the `gitlab-consul` database user getting incorrect permissions. Follow these
steps to fix the problem:
```shell 1. On the master database node, connect to the database prompt - `gitlab-psql -d template1`
sudo gitlab-ctl stop patroni 1. Delete the `gitlab-consul` user - `DROP USER "gitlab-consul";`
``` 1. Exit the database prompt - `\q`
1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) and the user will be re-added with the proper permissions.
1. Change to the `gitlab-consul` user - `su - gitlab-consul`
1. Try the check command again - `gitlab-ctl repmgr-check-master`.
1. Enable the maintenance mode on the **application node**: Now there should not be errors. If errors still occur then there is another problem.
```shell ### PgBouncer error `ERROR: pgbouncer cannot connect to server`
sudo gitlab-ctl deploy-page up
```
1. Upgrade PostgreSQL on **the leader node** and make sure that the upgrade is completed successfully: You may get this error when running `gitlab-rake gitlab:db:configure` or you
may see the error in the PgBouncer log file.
```shell ```plaintext
sudo gitlab-ctl pg-upgrade -V 12 PG::ConnectionBad: ERROR: pgbouncer cannot connect to server
``` ```
1. Check the status of the leader and cluster. You can only proceed if you have a healthy leader: The problem may be that your PgBouncer node's IP address is not included in the
`trust_auth_cidr_addresses` setting in `/etc/gitlab/gitlab.rb` on the database nodes.
```shell You can confirm that this is the issue by checking the PostgreSQL log on the master
gitlab-ctl patroni check-leader database node. If you see the following error then `trust_auth_cidr_addresses`
is the problem.
# OR ```plaintext
2018-03-29_13:59:12.11776 FATAL: no pg_hba.conf entry for host "123.123.123.123", user "pgbouncer", database "gitlabhq_production", SSL off
```
gitlab-ctl patroni members To fix the problem, add the IP address to `/etc/gitlab/gitlab.rb`.
```
1. You can now disable the maintenance mode on the **application node**: ```ruby
postgresql['trust_auth_cidr_addresses'] = %w(123.123.123.123/32 <other_cidrs>)
```
```shell [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes to take effect.
sudo gitlab-ctl deploy-page down
```
1. Upgrade PostgreSQL **on replicas** (you can do this in parallel on all of them): ### Issues with other components
```shell If you're running into an issue with a component not outlined here, be sure to check the troubleshooting section of their specific documentation page:
sudo gitlab-ctl pg-upgrade -V 12
```
NOTE: - [Consul](../consul.md#troubleshooting-consul)
Reverting PostgreSQL upgrade with `gitlab-ctl revert-pg-upgrade` has the same considerations as - [PostgreSQL](https://docs.gitlab.com/omnibus/settings/database.html#troubleshooting)
`gitlab-ctl pg-upgrade`. You should follow the same procedure by first stopping the replicas,
then reverting the leader, and finally reverting the replicas.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment