Commit 0d452261 authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Merge branch 'gitaly-docs-improvements' into 'master'

Simplify failover documentation

Closes gitaly#2584

See merge request gitlab-org/gitlab!28405
parents 643567db c5925bca
...@@ -256,9 +256,9 @@ application server, or a Gitaly node. ...@@ -256,9 +256,9 @@ application server, or a Gitaly node.
```ruby ```ruby
# Name of storage hash must match storage name in git_data_dirs on GitLab # Name of storage hash must match storage name in git_data_dirs on GitLab
# server ('praefect') and in git_data_dirs on Gitaly nodes ('gitaly-1') # server ('storage_1') and in git_data_dirs on Gitaly nodes ('gitaly-1')
praefect['virtual_storages'] = { praefect['virtual_storages'] = {
'praefect' => { 'storage_1' => {
'gitaly-1' => { 'gitaly-1' => {
'address' => 'tcp://GITALY_HOST:8075', 'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN', 'token' => 'PRAEFECT_INTERNAL_TOKEN',
...@@ -430,7 +430,7 @@ documentation](index.md#3-gitaly-server-configuration). ...@@ -430,7 +430,7 @@ documentation](index.md#3-gitaly-server-configuration).
gitlab-ctl restart gitaly gitlab-ctl restart gitaly
``` ```
**Complete these steps for each Gitaly node!** **The steps above must be completed for each Gitaly node!**
After all Gitaly nodes are configured, you can run the Praefect connection After all Gitaly nodes are configured, you can run the Praefect connection
checker to verify Praefect can connect to all Gitaly servers in the Praefect checker to verify Praefect can connect to all Gitaly servers in the Praefect
...@@ -442,6 +442,34 @@ config. ...@@ -442,6 +442,34 @@ config.
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dial-nodes sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dial-nodes
``` ```
1. Enable automatic failover by editing `/etc/gitlab/gitlab.rb`:
```ruby
praefect['failover_enabled'] = true
```
When automatic failover is enabled, Praefect checks the health of internal
Gitaly nodes. If the primary has a certain amount of health checks fail, it
will promote one of the secondaries to be primary, and demote the primary to
be a secondary.
Manual failover is possible by updating `praefect['virtual_storages']` and
nominating a new primary node.
NOTE: **Note:**: Automatic failover is not yet supported for setups with
multiple Praefect nodes. There is currently no coordination between Praefect
nodes, which could result in two Praefect instances thinking two different
Gitaly nodes are the primary. Follow issue
[#2547](https://gitlab.com/gitlab-org/gitaly/-/issues/2547) for
updates.
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure):
```shell
gitlab-ctl reconfigure
```
### GitLab ### GitLab
To complete this section you will need: To complete this section you will need:
...@@ -560,133 +588,80 @@ Particular attention should be shown to: ...@@ -560,133 +588,80 @@ Particular attention should be shown to:
- Deselect the **default** storage location - Deselect the **default** storage location
- Select the **praefect** storage location - Select the **praefect** storage location
![Update repository storage](img/praefect_storage_v12_10.png)
1. Verify everything is still working by creating a new project. Check the 1. Verify everything is still working by creating a new project. Check the
"Initialize repository with a README" box so that there is content in the "Initialize repository with a README" box so that there is content in the
repository that viewed. If the project is created, and you can see the repository that viewed. If the project is created, and you can see the
README file, it works! README file, it works!
Congratulations! You have configured a highly available Praefect cluster. ### Grafana
### Failover
There are two ways to do a failover from one internal Gitaly node to another as the primary. Manually, or automatically. Grafana is included with GitLab, and can be used to monitor your Praefect
cluster. See [Grafana Dashboard
As an example, in this `config.toml` we have one virtual storage named "default" with two internal Gitaly nodes behind it. Service](https://docs.gitlab.com/omnibus/settings/grafana.html)
One is deemed the "primary". This means that read and write traffic will go to `internal_storage_0`, and writes for detailed documentation.
will get replicated to `internal_storage_1`:
```toml
socket_path = "/path/to/Praefect.socket"
# failover_enabled will enable automatic failover
failover_enabled = false
[logging]
format = "json"
level = "info"
[[virtual_storage]]
name = "default"
[[virtual_storage.node]]
name = "internal_storage_0"
address = "tcp://localhost:9999"
primary = true
token = "supersecret"
[[virtual_storage.node]] To get started quickly:
name = "internal_storage_1"
address = "tcp://localhost:9998"
token = "supersecret"
```
#### Manual failover 1. SSH into the **GitLab** node and login as root:
In order to failover from using one internal Gitaly node to using another, a manual failover step can be used. Unless `failover_enabled` is set to `true` ```shell
in the `config.toml`, the only way to fail over from one primary to using another node as the primary is to do a manual failover. sudo -i
```
1. Move `primary = true` from the current `[[virtual_storage.node]]` to another node in `/etc/gitlab/gitlab.rb`: 1. Enable the Grafana login form by editing `/etc/gitlab/gitlab.rb`.
```ruby ```ruby
praefect['virtual_storages'] = { grafana['disable_login_form'] = false
'praefect' => {
'gitaly-1' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
# no longer the primary
},
'gitaly-2' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
# this is the new primary
'primary' => true
},
'gitaly-3' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
}
}
}
``` ```
1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure). 1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure):
On a restart, Praefect will send write traffic to `internal_storage_1`. `internal_storage_0` is the new secondary now, ```shell
and replication jobs will be created to replicate repository data to `internal_storage_0` **from** `internal_storage_1` gitlab-ctl reconfigure
```
#### Automatic failover 1. Set the Grafana admin password. This command will prompt you to enter a new
password:
When automatic failover is enabled, Praefect will do automatic detection of the health of internal Gitaly nodes. If the ```shell
primary has a certain amount of health checks fail, it will decide to promote one of the secondaries to be primary, and gitlab-ctl set-grafana-password
demote the primary to be a secondary. ```
1. To enable automatic failover, edit `/etc/gitlab/gitlab.rb`: 1. In your web browser, open `/-/grafana` (e.g.
`https://gitlab.example.com/-/grafana`) on your GitLab server.
```ruby Login using the password you set, and the username `admin`.
# failover_enabled turns on automatic failover
praefect['failover_enabled'] = true
praefect['virtual_storages'] = {
'praefect' => {
'gitaly-1' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN',
'primary' => true
},
'gitaly-2' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN'
},
'gitaly-3' => {
'address' => 'tcp://GITALY_HOST:8075',
'token' => 'PRAEFECT_INTERNAL_TOKEN'
}
}
}
```
1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure). 1. Go to **Explore** and query `gitlab_build_info` to verify that you are
getting metrics from all your machines.
Below is the picture when Praefect starts up with the config.toml above: Congratulations! You've configured an observable highly available Praefect
cluster.
```mermaid ## Automatic failover and leader election
graph TD
A[Praefect] -->|Mutator RPC| B(internal_storage_0)
B --> |Replication|C[internal_storage_1]
```
Let's say suddenly `internal_storage_0` goes down. Praefect will detect this and Praefect regularly checks the health of each backend Gitaly node. This
automatically switch over to `internal_storage_1`, and `internal_storage_0` will serve as a secondary: information can be used to automatically failover to a new primary node if the
current primary node is found to be unhealthy.
```mermaid - **Manual:** Automatic failover is disabled. The primary node can be
graph TD reconfigured in `/etc/gitlab/gitlab.rb` on the Praefect node. Modify the
A[Praefect] -->|Mutator RPC| B(internal_storage_1) `praefect['virtual_storages']` field by moving the `primary = true` to promote
B --> |Replication|C[internal_storage_0] a different Gitaly node to primary. In the steps above, `gitaly-1` was set to
``` the primary.
- **Memory:** Enabled by setting `praefect['failover_enabled'] = true` in
`/etc/gitlab/gitlab.rb` on the Praefect node. If a sufficient number of health
checks fail for the current primary backend Gitaly node, and new primary will
be elected. **Do not use with multiple Praefect nodes!** Using with multiple
Praefect nodes is likely to result in a split brain.
- **PostgreSQL:** Coming soon. See isse
[#2547](https://gitlab.com/gitlab-org/gitaly/-/issues/2547) for updates.
NOTE: **Note:**: Currently this feature is supported for setups that only have 1 Praefect instance. Praefect instances running, It is likely that we will implement support for Consul, and a cloud native
for example behind a load balancer, `failover_enabled` should be disabled. The reason is The reason is because there strategy in the future.
is no coordination that currently happens across different Praefect instances, so there could be a situation where
two Praefect instances think two different Gitaly nodes are the primary.
## Backend Node Recovery ## Backend Node Recovery
...@@ -711,49 +686,6 @@ The command will return a list of repositories that were found to be ...@@ -711,49 +686,6 @@ The command will return a list of repositories that were found to be
inconsistent against the current primary. Each of these inconsistencies will inconsistent against the current primary. Each of these inconsistencies will
also be logged with an accompanying replication job ID. also be logged with an accompanying replication job ID.
## Grafana
Grafana is included with GitLab, and can be used to monitor your Praefect
cluster. See [Grafana Dashboard
Service](https://docs.gitlab.com/omnibus/settings/grafana.html)
for detailed documentation.
To get started quickly:
1. SSH into the **GitLab** node and login as root:
```shell
sudo -i
```
1. Enable the Grafana login form by editing `/etc/gitlab/gitlab.rb`.
```ruby
grafana['disable_login_form'] = false
```
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure):
```shell
gitlab-ctl reconfigure
```
1. Set the Grafana admin password. This command will prompt you to enter a new
password:
```shell
gitlab-ctl set-grafana-password
```
1. In your web browser, open `/-/grafana` (e.g.
`https://gitlab.example.com/-/grafana`) on your GitLab server.
Login using the password you set, and the username `admin`.
1. Go to **Explore** and query `gitlab_build_info` to verify that you are
getting metrics from all your machines.
## Migrating existing repositories to Praefect ## Migrating existing repositories to Praefect
If your GitLab instance already has repositories, these won't be migrated If your GitLab instance already has repositories, these won't be migrated
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment