praefect.md 63.5 KB
Newer Older
1
---
2 3
stage: Create
group: Gitaly
4
info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
5 6
type: reference
---
7

8
# Configure Gitaly Cluster **(FREE SELF)**
9

Evan Read's avatar
Evan Read committed
10 11
Configure Gitaly Cluster using either:

12 13 14 15 16 17 18 19
- Gitaly Cluster configuration instructions available as part of
  [reference architectures](../reference_architectures/index.md) for installations of up to:
  - [3000 users](../reference_architectures/3k_users.md#configure-gitaly-cluster).
  - [5000 users](../reference_architectures/5k_users.md#configure-gitaly-cluster).
  - [10,000 users](../reference_architectures/10k_users.md#configure-gitaly-cluster).
  - [25,000 users](../reference_architectures/25k_users.md#configure-gitaly-cluster).
  - [50,000 users](../reference_architectures/50k_users.md#configure-gitaly-cluster).
- The custom configuration instructions that follow on this page.
Evan Read's avatar
Evan Read committed
20 21

Smaller GitLab installations may need only [Gitaly itself](index.md).
22

23 24 25 26
NOTE:
Upgrade instructions for Omnibus GitLab installations
[are available](https://docs.gitlab.com/omnibus/update/#gitaly-cluster).

27
## Requirements for configuring a Gitaly Cluster
28

29
The minimum recommended configuration for a Gitaly Cluster requires:
30

Russell Dickenson's avatar
Russell Dickenson committed
31 32
- 1 load balancer
- 1 PostgreSQL server (PostgreSQL 11 or newer)
33
- 3 Praefect nodes
34 35 36 37 38 39
- 3 Gitaly nodes (1 primary, 2 secondary)

See the [design
document](https://gitlab.com/gitlab-org/gitaly/-/blob/master/doc/design_ha.md)
for implementation details.

Evan Read's avatar
Evan Read committed
40 41 42 43
NOTE:
If not set in GitLab, feature flags are read as false from the console and Praefect uses their
default value. The default value depends on the GitLab version.

44
## Setup Instructions
45

46 47
If you [installed](https://about.gitlab.com/install/) GitLab using the Omnibus
package (highly recommended), follow the steps below:
48

49 50 51 52
1. [Preparation](#preparation)
1. [Configuring the Praefect database](#postgresql)
1. [Configuring the Praefect proxy/router](#praefect)
1. [Configuring each Gitaly node](#gitaly) (once for each Gitaly node)
53
1. [Configure the load balancer](#load-balancer)
54
1. [Updating the GitLab server configuration](#gitlab)
55
1. [Configure Grafana](#grafana)
56

57
### Preparation
58

59 60
Before beginning, you should already have a working GitLab instance. [Learn how
to install GitLab](https://about.gitlab.com/install/).
61

62
Provision a PostgreSQL server (PostgreSQL 11 or newer).
63

64 65 66
Prepare all your new nodes by [installing
GitLab](https://about.gitlab.com/install/).

67
- At least 1 Praefect node (minimal storage required)
68
- 3 Gitaly nodes (high CPU, high memory, fast storage)
69
- 1 GitLab server
70

71
You need the IP/host address for each node.
72

73
1. `LOAD_BALANCER_SERVER_ADDRESS`: the IP/host address of the load balancer
74
1. `POSTGRESQL_SERVER_ADDRESS`: the IP/host address of the PostgreSQL server
75
1. `PRAEFECT_HOST`: the IP/host address of the Praefect server
76
1. `GITALY_HOST_*`: the IP or host address of each Gitaly server
77 78 79 80
1. `GITLAB_HOST`: the IP/host address of the GitLab server

If you are using a cloud provider, you can look up the addresses for each server through your cloud provider's management console.

81
If you are using Google Cloud Platform, SoftLayer, or any other vendor that provides a virtual private cloud (VPC) you can use the private addresses for each cloud instance (corresponds to "internal address" for Google Cloud Platform) for `PRAEFECT_HOST`, `GITALY_HOST_*`, and `GITLAB_HOST`.
82

83 84
#### Secrets

85 86
The communication between components is secured with different secrets, which
are described below. Before you begin, generate a unique secret for each, and
87
make note of it. This enables you to replace these placeholder tokens
88 89 90 91 92 93 94 95 96 97 98
with secure tokens as you complete the setup process.

1. `GITLAB_SHELL_SECRET_TOKEN`: this is used by Git hooks to make callback HTTP
   API requests to GitLab when accepting a Git push. This secret is shared with
   GitLab Shell for legacy reasons.
1. `PRAEFECT_EXTERNAL_TOKEN`: repositories hosted on your Praefect cluster can
   only be accessed by Gitaly clients that carry this token.
1. `PRAEFECT_INTERNAL_TOKEN`: this token is used for replication traffic inside
   your Praefect cluster. This is distinct from `PRAEFECT_EXTERNAL_TOKEN`
   because Gitaly clients must not be able to access internal nodes of the
   Praefect cluster directly; that could lead to data loss.
99
1. `PRAEFECT_SQL_PASSWORD`: this password is used by Praefect to connect to
100
   PostgreSQL.
101

102
We note in the instructions below where these secrets are required.
103

104
NOTE:
105
Omnibus GitLab installations can use `gitlab-secrets.json` for `GITLAB_SHELL_SECRET_TOKEN`.
106

107
### PostgreSQL
108

109
NOTE:
110
Do not store the GitLab application database and the Praefect
111
database on the same PostgreSQL server if using
112
[Geo](../geo/index.md). The replication state is internal to each instance
113
of GitLab and should not be replicated.
114

115
These instructions help set up a single PostgreSQL database, which creates a single point of
116
failure. The following options are available:
117

118 119 120
- For non-Geo installations, either:
  - Use one of the documented [PostgreSQL setups](../postgresql/index.md).
  - Use your own third-party database setup, if fault tolerance is required.
121 122 123
- For Geo instances, either:
  - Set up a separate [PostgreSQL instance](https://www.postgresql.org/docs/11/high-availability.html).
  - Use a cloud-managed PostgreSQL service. AWS
124
     [Relational Database Service](https://aws.amazon.com/rds/) is recommended.
125

126
To complete this section you need:
127

128
- 1 Praefect node
129
- 1 PostgreSQL server (PostgreSQL 11 or newer)
130
  - An SQL user with permissions to create databases
131

132
During this section, we configure the PostgreSQL server, from the Praefect
133
node, using `psql` which is installed by Omnibus GitLab.
134

135
1. SSH into the **Praefect** node and login as root:
136

137 138 139
   ```shell
   sudo -i
   ```
140

141 142 143
1. Connect to the PostgreSQL server with administrative access. This is likely
   the `postgres` user. The database `template1` is used because it is created
   by default on all PostgreSQL servers.
144

145 146 147
   ```shell
   /opt/gitlab/embedded/bin/psql -U postgres -d template1 -h POSTGRESQL_SERVER_ADDRESS
   ```
148

149
   Create a new user `praefect` to be used by Praefect. Replace
150 151
   `PRAEFECT_SQL_PASSWORD` with the strong password you generated in the
   preparation step.
152

153 154
   ```sql
   CREATE ROLE praefect WITH LOGIN CREATEDB PASSWORD 'PRAEFECT_SQL_PASSWORD';
155
   ```
156

157
1. Reconnect to the PostgreSQL server, this time as the `praefect` user:
158

159 160 161
   ```shell
   /opt/gitlab/embedded/bin/psql -U praefect -d template1 -h POSTGRESQL_SERVER_ADDRESS
   ```
162

163 164
   Create a new database `praefect_production`. By creating the database while
   connected as the `praefect` user, we are confident they have access.
165

166 167 168
   ```sql
   CREATE DATABASE praefect_production WITH ENCODING=UTF8;
   ```
169

170
The database used by Praefect is now configured.
171

172
If you see Praefect database errors after configuring PostgreSQL, see
173
[troubleshooting steps](index.md#relation-does-not-exist-errors).
174

175 176
#### PgBouncer

177
To reduce PostgreSQL resource consumption, we recommend setting up and configuring
178
[PgBouncer](https://www.pgbouncer.org/) in front of the PostgreSQL instance. To do
179 180 181 182 183 184 185
this, set the corresponding IP or host address of the PgBouncer instance in
`/etc/gitlab/gitlab.rb` by changing the following settings:

- `praefect['database_host']`, for the address.
- `praefect['database_port']`, for the port.

Because PgBouncer manages resources more efficiently, Praefect still requires a
186
direct connection to the PostgreSQL database. It uses the
187
[LISTEN](https://www.postgresql.org/docs/11/sql-listen.html)
188
feature that is [not supported](https://www.pgbouncer.org/features.html) by
189
PgBouncer with `pool_mode = transaction`.
190 191
Set `praefect['database_host_no_proxy']` and `praefect['database_port_no_proxy']`
to a direct connection, and not a PgBouncer connection.
192 193 194

Save the changes to `/etc/gitlab/gitlab.rb` and
[reconfigure Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure).
195 196

This documentation doesn't provide PgBouncer installation instructions,
197
but you can:
198 199 200 201

- Find instructions on the [official website](https://www.pgbouncer.org/install.html).
- Use a [Docker image](https://hub.docker.com/r/edoburu/pgbouncer/).

202 203
In addition to the base PgBouncer configuration options, set the following values in
your `pgbouncer.ini` file:
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225

- The [Praefect PostgreSQL database](#postgresql) in the `[databases]` section:

   ```ini
   [databases]
   * = host=POSTGRESQL_SERVER_ADDRESS port=5432 auth_user=praefect
   ```

- [`pool_mode`](https://www.pgbouncer.org/config.html#pool_mode)
  and [`ignore_startup_parameters`](https://www.pgbouncer.org/config.html#ignore_startup_parameters)
  in the `[pgbouncer]` section:

   ```ini
   [pgbouncer]
   pool_mode = transaction
   ignore_startup_parameters = extra_float_digits
   ```

The `praefect` user and its password should be included in the file (default is
`userlist.txt`) used by PgBouncer if the [`auth_file`](https://www.pgbouncer.org/config.html#auth_file)
configuration option is set.

226
NOTE:
227 228 229 230 231 232
By default PgBouncer uses port `6432` to accept incoming
connections. You can change it by setting the [`listen_port`](https://www.pgbouncer.org/config.html#listen_port)
configuration option. We recommend setting it to the default port value (`5432`) used by
PostgreSQL instances. Otherwise you should change the configuration parameter
`praefect['database_port']` for each Praefect instance to the correct value.

233
### Praefect
234

235 236
> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/2634) in GitLab 13.4, Praefect nodes can no longer be designated as `primary`.

237 238 239 240
If there are multiple Praefect nodes:

- Complete the following steps for **each** node.
- Designate one node as the "deploy node", and configure it first.
241

242
To complete this section you need a [configured PostgreSQL server](#postgresql), including:
243

244 245
- IP/host address (`POSTGRESQL_SERVER_ADDRESS`)
- Password (`PRAEFECT_SQL_PASSWORD`)
246

247 248
Praefect should be run on a dedicated node. Do not run Praefect on the
application server, or a Gitaly node.
249

250
1. SSH into the **Praefect** node and login as root:
251

252 253 254
   ```shell
   sudo -i
   ```
255

256
1. Disable all other services by editing `/etc/gitlab/gitlab.rb`:
257

258 259 260 261 262
   ```ruby
   # Disable all other services on the Praefect node
   postgresql['enable'] = false
   redis['enable'] = false
   nginx['enable'] = false
263
   alertmanager['enable'] = false
264 265
   prometheus['enable'] = false
   grafana['enable'] = false
266
   puma['enable'] = false
267 268 269 270 271 272 273 274 275
   sidekiq['enable'] = false
   gitlab_workhorse['enable'] = false
   gitaly['enable'] = false

   # Enable only the Praefect service
   praefect['enable'] = true

   # Prevent database connections during 'gitlab-ctl reconfigure'
   gitlab_rails['auto_migrate'] = false
276
   praefect['auto_migrate'] = false
277 278 279 280 281 282
   ```

1. Configure **Praefect** to listen on network interfaces by editing
   `/etc/gitlab/gitlab.rb`:

   ```ruby
283
   praefect['listen_addr'] = '0.0.0.0:2305'
284 285 286

   # Enable Prometheus metrics access to Praefect. You must use firewalls
   # to restrict access to this address/port.
287
   praefect['prometheus_listen_addr'] = '0.0.0.0:9652'
288 289 290
   ```

1. Configure a strong `auth_token` for **Praefect** by editing
291 292
   `/etc/gitlab/gitlab.rb`. This is needed by clients outside the cluster
   (like GitLab Shell) to communicate with the Praefect cluster:
293 294 295 296 297 298 299 300

   ```ruby
   praefect['auth_token'] = 'PRAEFECT_EXTERNAL_TOKEN'
   ```

1. Configure **Praefect** to connect to the PostgreSQL database by editing
   `/etc/gitlab/gitlab.rb`.

301
   You need to replace `POSTGRESQL_SERVER_ADDRESS` with the IP/host address
302 303 304 305 306 307 308 309 310
   of the database, and `PRAEFECT_SQL_PASSWORD` with the strong password set
   above.

   ```ruby
   praefect['database_host'] = 'POSTGRESQL_SERVER_ADDRESS'
   praefect['database_port'] = 5432
   praefect['database_user'] = 'praefect'
   praefect['database_password'] = 'PRAEFECT_SQL_PASSWORD'
   praefect['database_dbname'] = 'praefect_production'
311 312
   praefect['database_host_no_proxy'] = 'POSTGRESQL_SERVER_ADDRESS'
   praefect['database_port_no_proxy'] = 5432
313 314 315 316 317
   ```

   If you want to use a TLS client certificate, the options below can be used:

   ```ruby
318
   # Connect to PostgreSQL using a TLS client certificate
319 320
   # praefect['database_sslcert'] = '/path/to/client-cert'
   # praefect['database_sslkey'] = '/path/to/client-key'
321

322 323 324 325
   # Trust a custom certificate authority
   # praefect['database_sslrootcert'] = '/path/to/rootcert'
   ```

326
   By default, Praefect refuses to make an unencrypted connection to
327 328 329 330 331 332 333 334 335
   PostgreSQL. You can override this by uncommenting the following line:

   ```ruby
   # praefect['database_sslmode'] = 'disable'
   ```

1. Configure the **Praefect** cluster to connect to each Gitaly node in the
   cluster by editing `/etc/gitlab/gitlab.rb`.

336 337 338
   The virtual storage's name must match the configured storage name in GitLab
   configuration. In a later step, we configure the storage name as `default`
   so we use `default` here as well. This cluster has three Gitaly nodes `gitaly-1`,
339
   `gitaly-2`, and `gitaly-3`, which are intended to be replicas of each other.
340

341
   WARNING:
342
   If you have data on an already existing storage called
343
   `default`, you should configure the virtual storage with another name and
344
   [migrate the data to the Gitaly Cluster storage](#migrate-to-gitaly-cluster)
345
   afterwards.
346

347
   Replace `PRAEFECT_INTERNAL_TOKEN` with a strong secret, which is used by
348 349 350
   Praefect when communicating with Gitaly nodes in the cluster. This token is
   distinct from the `PRAEFECT_EXTERNAL_TOKEN`.

351
   Replace `GITALY_HOST_*` with the IP or host address of the each Gitaly node.
352 353 354 355

   More Gitaly nodes can be added to the cluster to increase the number of
   replicas. More clusters can also be added for very large GitLab instances.

356 357 358 359 360
   NOTE:
   When adding additional Gitaly nodes to a virtual storage, all storage names
   within that virtual storage must be unique. Additionally, all Gitaly node
   addresses referenced in the Praefect configuration must be unique.

361 362
   ```ruby
   # Name of storage hash must match storage name in git_data_dirs on GitLab
363
   # server ('default') and in git_data_dirs on Gitaly nodes ('gitaly-1')
364
   praefect['virtual_storages'] = {
365
     'default' => {
366 367 368 369 370 371 372 373 374 375 376 377 378
       'nodes' => {
         'gitaly-1' => {
           'address' => 'tcp://GITALY_HOST_1:8075',
           'token'   => 'PRAEFECT_INTERNAL_TOKEN',
         },
         'gitaly-2' => {
           'address' => 'tcp://GITALY_HOST_2:8075',
           'token'   => 'PRAEFECT_INTERNAL_TOKEN'
         },
         'gitaly-3' => {
           'address' => 'tcp://GITALY_HOST_3:8075',
           'token'   => 'PRAEFECT_INTERNAL_TOKEN'
         }
379 380 381 382 383
       }
     }
   }
   ```

384 385 386 387
   NOTE:
   In [GitLab 13.8 and earlier](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/4988),
   Gitaly nodes were configured directly under the virtual storage, and not under the `nodes` key.

388 389
1. [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/2013) in GitLab 13.1 and later, enable [distribution of reads](#distributed-reads).

390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
   Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure):

   ```shell
   gitlab-ctl reconfigure
   ```

1. For:

   - The "deploy node":
     1. Enable Praefect auto-migration again by setting `praefect['auto_migrate'] = true` in
        `/etc/gitlab/gitlab.rb`.
     1. To ensure database migrations are only run during reconfigure and not automatically on
        upgrade, run:

        ```shell
        sudo touch /etc/gitlab/skip-auto-reconfigure
        ```

   - The other nodes, you can leave the settings as they are. Though
     `/etc/gitlab/skip-auto-reconfigure` isn't required, you may want to set it to prevent GitLab
     running reconfigure automatically when running commands such as `apt-get update`. This way any
     additional configuration changes can be done and then reconfigure can be run manually.

414 415
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
   Praefect](../restart_gitlab.md#omnibus-gitlab-reconfigure):
416 417

   ```shell
418
   gitlab-ctl reconfigure
419 420
   ```

421 422
1. To ensure that Praefect [has updated its Prometheus listen
   address](https://gitlab.com/gitlab-org/gitaly/-/issues/2734), [restart
423
   Praefect](../restart_gitlab.md#omnibus-gitlab-restart):
424 425 426 427 428

   ```shell
   gitlab-ctl restart praefect
   ```

429 430 431 432 433 434 435 436 437 438
1. Verify that Praefect can reach PostgreSQL:

   ```shell
   sudo -u git /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml sql-ping
   ```

   If the check fails, make sure you have followed the steps correctly. If you
   edit `/etc/gitlab/gitlab.rb`, remember to run `sudo gitlab-ctl reconfigure`
   again before trying the `sql-ping` command.

439 440
**The steps above must be completed for each Praefect node!**

441
#### Enabling TLS support
442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475

> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/1698) in GitLab 13.2.

Praefect supports TLS encryption. To communicate with a Praefect instance that listens
for secure connections, you must:

- Use a `tls://` URL scheme in the `gitaly_address` of the corresponding storage entry
  in the GitLab configuration.
- Bring your own certificates because this isn't provided automatically. The certificate
  corresponding to each Praefect server must be installed on that Praefect server.

Additionally the certificate, or its certificate authority, must be installed on all Gitaly servers
and on all Praefect clients that communicate with it following the procedure described in
[GitLab custom certificate configuration](https://docs.gitlab.com/omnibus/settings/ssl.html#install-custom-public-certificates) (and repeated below).

Note the following:

- The certificate must specify the address you use to access the Praefect server. If
  addressing the Praefect server by:

  - Hostname, you can either use the Common Name field for this, or add it as a Subject
    Alternative Name.
  - IP address, you must add it as a Subject Alternative Name to the certificate.

- You can configure Praefect servers with both an unencrypted listening address
  `listen_addr` and an encrypted listening address `tls_listen_addr` at the same time.
  This allows you to do a gradual transition from unencrypted to encrypted traffic, if
  necessary.

To configure Praefect with TLS:

**For Omnibus GitLab**

1. Create certificates for Praefect servers.
476

477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494
1. On the Praefect servers, create the `/etc/gitlab/ssl` directory and copy your key
   and certificate there:

   ```shell
   sudo mkdir -p /etc/gitlab/ssl
   sudo chmod 755 /etc/gitlab/ssl
   sudo cp key.pem cert.pem /etc/gitlab/ssl/
   sudo chmod 644 key.pem cert.pem
   ```

1. Edit `/etc/gitlab/gitlab.rb` and add:

   ```ruby
   praefect['tls_listen_addr'] = "0.0.0.0:3305"
   praefect['certificate_path'] = "/etc/gitlab/ssl/cert.pem"
   praefect['key_path'] = "/etc/gitlab/ssl/key.pem"
   ```

495 496
1. Save the file and [reconfigure](../restart_gitlab.md#omnibus-gitlab-reconfigure).

497 498 499 500 501 502 503 504 505 506 507 508
1. On the Praefect clients (including each Gitaly server), copy the certificates,
   or their certificate authority, into `/etc/gitlab/trusted-certs`:

   ```shell
   sudo cp cert.pem /etc/gitlab/trusted-certs/
   ```

1. On the Praefect clients (except Gitaly servers), edit `git_data_dirs` in
   `/etc/gitlab/gitlab.rb` as follows:

   ```ruby
   git_data_dirs({
509 510 511 512
     "default" => {
       "gitaly_address" => 'tls://LOAD_BALANCER_SERVER_ADDRESS:2305',
       "gitaly_token" => 'PRAEFECT_EXTERNAL_TOKEN'
     }
513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546
   })
   ```

1. Save the file and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure).

**For installations from source**

1. Create certificates for Praefect servers.
1. On the Praefect servers, create the `/etc/gitlab/ssl` directory and copy your key and certificate
   there:

   ```shell
   sudo mkdir -p /etc/gitlab/ssl
   sudo chmod 755 /etc/gitlab/ssl
   sudo cp key.pem cert.pem /etc/gitlab/ssl/
   sudo chmod 644 key.pem cert.pem
   ```

1. On the Praefect clients (including each Gitaly server), copy the certificates,
   or their certificate authority, into the system trusted certificates:

   ```shell
   sudo cp cert.pem /usr/local/share/ca-certificates/praefect.crt
   sudo update-ca-certificates
   ```

1. On the Praefect clients (except Gitaly servers), edit `storages` in
   `/home/git/gitlab/config/gitlab.yml` as follows:

   ```yaml
   gitlab:
     repositories:
       storages:
         default:
547
           gitaly_address: tls://LOAD_BALANCER_SERVER_ADDRESS:3305
548
           path: /some/local/path
549 550
   ```

551
   NOTE:
552
   `/some/local/path` should be set to a local folder that exists, however no
553
   data is stored in this folder. This requirement is scheduled to be removed when
554 555 556 557
   [this issue](https://gitlab.com/gitlab-org/gitaly/-/issues/1282) is resolved.

1. Save the file and [restart GitLab](../restart_gitlab.md#installations-from-source).
1. Copy all Praefect server certificates, or their certificate authority, to the system
558
   trusted certificates on each Gitaly server so the Praefect server trusts the
559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577
   certificate when called by Gitaly servers:

   ```shell
   sudo cp cert.pem /usr/local/share/ca-certificates/praefect.crt
   sudo update-ca-certificates
   ```

1. Edit `/home/git/praefect/config.toml` and add:

   ```toml
   tls_listen_addr = '0.0.0.0:3305'

   [tls]
   certificate_path = '/etc/gitlab/ssl/cert.pem'
   key_path = '/etc/gitlab/ssl/key.pem'
   ```

1. Save the file and [restart GitLab](../restart_gitlab.md#installations-from-source).

578 579
### Gitaly

580
NOTE:
581
Complete these steps for **each** Gitaly node.
582

583
To complete this section you need:
584 585 586 587 588 589

- [Configured Praefect node](#praefect)
- 3 (or more) servers, with GitLab installed, to be configured as Gitaly nodes.
  These should be dedicated nodes, do not run other services on these nodes.

Every Gitaly server assigned to the Praefect cluster needs to be configured. The
590
configuration is the same as a normal [standalone Gitaly server](index.md),
591 592
except:

593 594
- The storage names are exposed to Praefect, not GitLab
- The secret token is shared with Praefect, not GitLab
595 596 597 598 599 600

The configuration of all Gitaly nodes in the Praefect cluster can be identical,
because we rely on Praefect to route operations correctly.

Particular attention should be shown to:

601
- The `gitaly['auth_token']` configured in this section must match the `token`
602
  value under `praefect['virtual_storages']['nodes']` on the Praefect node. This was set
603 604
  in the [previous section](#praefect). This document uses the placeholder
  `PRAEFECT_INTERNAL_TOKEN` throughout.
605
- The storage names in `git_data_dirs` configured in this section must match the
606 607 608 609 610
  storage names under `praefect['virtual_storages']` on the Praefect node. This
  was set in the [previous section](#praefect). This document uses `gitaly-1`,
  `gitaly-2`, and `gitaly-3` as Gitaly storage names.

For more information on Gitaly server configuration, see our [Gitaly
611
documentation](configure_gitaly.md#configure-gitaly-servers).
612 613 614 615 616 617 618 619 620 621 622 623 624 625 626

1. SSH into the **Gitaly** node and login as root:

   ```shell
   sudo -i
   ```

1. Disable all other services by editing `/etc/gitlab/gitlab.rb`:

   ```ruby
   # Disable all other services on the Praefect node
   postgresql['enable'] = false
   redis['enable'] = false
   nginx['enable'] = false
   grafana['enable'] = false
627
   puma['enable'] = false
628 629 630 631
   sidekiq['enable'] = false
   gitlab_workhorse['enable'] = false
   prometheus_monitoring['enable'] = false

632
   # Enable only the Gitaly service
633 634
   gitaly['enable'] = true

635 636 637
   # Enable Prometheus if needed
   prometheus['enable'] = true

638 639 640 641 642 643 644 645 646 647
   # Prevent database connections during 'gitlab-ctl reconfigure'
   gitlab_rails['auto_migrate'] = false
   ```

1. Configure **Gitaly** to listen on network interfaces by editing
   `/etc/gitlab/gitlab.rb`:

   ```ruby
   # Make Gitaly accept connections on all network interfaces.
   # Use firewalls to restrict access to this address/port.
648
   gitaly['listen_addr'] = '0.0.0.0:8075'
649 650 651

   # Enable Prometheus metrics access to Gitaly. You must use firewalls
   # to restrict access to this address/port.
652
   gitaly['prometheus_listen_addr'] = '0.0.0.0:9236'
653 654 655
   ```

1. Configure a strong `auth_token` for **Gitaly** by editing
656 657
   `/etc/gitlab/gitlab.rb`. This is needed by clients to communicate with
   this Gitaly nodes. Typically, this token is the same for all Gitaly
658 659 660 661 662 663
   nodes.

   ```ruby
   gitaly['auth_token'] = 'PRAEFECT_INTERNAL_TOKEN'
   ```

664
1. Configure the GitLab Shell secret token, which is needed for `git push` operations. Either:
665

666
   - Method 1:
667

668 669 670 671 672 673 674 675 676 677 678 679 680 681
     1. Copy `/etc/gitlab/gitlab-secrets.json` from the Gitaly client to same path on the Gitaly
        servers and any other Gitaly clients.
     1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) on Gitaly servers.

   - Method 2:

     1. Edit `/etc/gitlab/gitlab.rb`.
     1. Replace `GITLAB_SHELL_SECRET_TOKEN` with the real secret.

        ```ruby
        gitlab_shell['secret_token'] = 'GITLAB_SHELL_SECRET_TOKEN'
        ```

1. Configure and `internal_api_url`, which is also needed for `git push` operations:
682

683
   ```ruby
684 685
   # Configure the gitlab-shell API callback URL. Without this, `git push` will
   # fail. This can be your front door GitLab URL or an internal load balancer.
686
   # Examples: 'https://gitlab.example.com', 'http://1.2.3.4'
687
   gitlab_rails['internal_api_url'] = 'http://GITLAB_HOST'
688 689 690 691
   ```

1. Configure the storage location for Git data by setting `git_data_dirs` in
   `/etc/gitlab/gitlab.rb`. Each Gitaly node should have a unique storage name
692
   (such as `gitaly-1`).
693 694 695 696

   Instead of configuring `git_data_dirs` uniquely for each Gitaly node, it is
   often easier to have include the configuration for all Gitaly nodes on every
   Gitaly node. This is supported because the Praefect `virtual_storages`
697
   configuration maps each storage name (such as `gitaly-1`) to a specific node, and
698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717
   requests are routed accordingly. This means every Gitaly node in your fleet
   can share the same configuration.

   ```ruby
   # You can include the data dirs for all nodes in the same config, because
   # Praefect will only route requests according to the addresses provided in the
   # prior step.
   git_data_dirs({
     "gitaly-1" => {
       "path" => "/var/opt/gitlab/git-data"
     },
     "gitaly-2" => {
       "path" => "/var/opt/gitlab/git-data"
     },
     "gitaly-3" => {
       "path" => "/var/opt/gitlab/git-data"
     }
   })
   ```

718 719
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
   Gitaly](../restart_gitlab.md#omnibus-gitlab-reconfigure):
720 721

   ```shell
722
   gitlab-ctl reconfigure
723 724
   ```

725 726 727
1. To ensure that Gitaly [has updated its Prometheus listen
   address](https://gitlab.com/gitlab-org/gitaly/-/issues/2734), [restart
   Gitaly](../restart_gitlab.md#omnibus-gitlab-restart):
728 729 730 731 732

   ```shell
   gitlab-ctl restart gitaly
   ```

733
**The steps above must be completed for each Gitaly node!**
734

735
After all Gitaly nodes are configured, run the Praefect connection
Paul Okstad's avatar
Paul Okstad committed
736
checker to verify Praefect can connect to all Gitaly servers in the Praefect
737
configuration.
Paul Okstad's avatar
Paul Okstad committed
738

739
1. SSH into each **Praefect** node and run the Praefect connection checker:
Paul Okstad's avatar
Paul Okstad committed
740

741 742 743
   ```shell
   sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dial-nodes
   ```
744

745 746
### Load Balancer

747
In a fault-tolerant Gitaly configuration, a load balancer is needed to route
748 749 750 751
internal traffic from the GitLab application to the Praefect nodes. The
specifics on which load balancer to use or the exact configuration is beyond the
scope of the GitLab documentation.

752
NOTE:
Will Chandler's avatar
Will Chandler committed
753 754
The load balancer must be configured to accept traffic from the Gitaly nodes in
addition to the GitLab nodes. Some requests handled by
755
[`gitaly-ruby`](configure_gitaly.md#gitaly-ruby) sidecar processes call into the main Gitaly
Will Chandler's avatar
Will Chandler committed
756 757 758
process. `gitaly-ruby` uses the Gitaly address set in the GitLab server's
`git_data_dirs` setting to make this connection.

759
We hope that if you're managing fault-tolerant systems like GitLab, you have a load balancer
760 761 762
of choice already. Some examples include [HAProxy](https://www.haproxy.org/)
(open-source), [Google Internal Load Balancer](https://cloud.google.com/load-balancing/docs/internal/),
[AWS Elastic Load Balancer](https://aws.amazon.com/elasticloadbalancing/), F5
763
Big-IP LTM, and Citrix Net Scaler. This documentation outlines what ports
764 765 766
and protocols you need configure.

| LB Port | Backend Port | Protocol |
767
|:--------|:-------------|:---------|
768 769
| 2305    | 2305         | TCP      |

770
### GitLab
771

772
To complete this section you need:
773

774 775
- [Configured Praefect node](#praefect)
- [Configured Gitaly nodes](#gitaly)
776

777 778 779 780 781 782
The Praefect cluster needs to be exposed as a storage location to the GitLab
application. This is done by updating the `git_data_dirs`.

Particular attention should be shown to:

- the storage name added to `git_data_dirs` in this section must match the
783
  storage name under `praefect['virtual_storages']` on the Praefect node(s). This
784
  was set in the [Praefect](#praefect) section of this guide. This document uses
785
  `default` as the Praefect storage name.
786 787 788 789 790 791 792

1. SSH into the **GitLab** node and login as root:

   ```shell
   sudo -i
   ```

793 794 795
1. Configure the `external_url` so that files could be served by GitLab
   by proper endpoint access by editing `/etc/gitlab/gitlab.rb`:

796
   You need to replace `GITLAB_SERVER_URL` with the real external facing
797 798 799 800 801 802
   URL on which current GitLab instance is serving:

   ```ruby
   external_url 'GITLAB_SERVER_URL'
   ```

803 804
1. Disable the default Gitaly service running on the GitLab host. It isn't needed
   because GitLab connects to the configured cluster.
805

806
   WARNING:
807
   If you have existing data stored on the default Gitaly storage,
808
   you should [migrate the data your Gitaly Cluster storage](#migrate-to-gitaly-cluster)
Evan Read's avatar
Evan Read committed
809
   first.
810 811 812 813 814

   ```ruby
   gitaly['enable'] = false
   ```

815 816 817
1. Add the Praefect cluster as a storage location by editing
   `/etc/gitlab/gitlab.rb`.

818
   You need to replace:
819

820 821
   - `LOAD_BALANCER_SERVER_ADDRESS` with the IP address or hostname of the load
     balancer.
822
   - `PRAEFECT_EXTERNAL_TOKEN` with the real secret
823

824 825
   If you are using TLS, the `gitaly_address` should begin with `tls://`.

826 827 828
   ```ruby
   git_data_dirs({
     "default" => {
829
       "gitaly_address" => "tcp://LOAD_BALANCER_SERVER_ADDRESS:2305",
830 831 832 833
       "gitaly_token" => 'PRAEFECT_EXTERNAL_TOKEN'
     }
   })
   ```
834

835 836
1. Configure the GitLab Shell secret token so that callbacks from Gitaly nodes during a `git push`
   are properly authenticated. Either:
837

838
   - Method 1:
839

840 841 842 843 844 845 846 847 848 849 850 851
     1. Copy `/etc/gitlab/gitlab-secrets.json` from the Gitaly client to same path on the Gitaly
        servers and any other Gitaly clients.
     1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) on Gitaly servers.

   - Method 2:

     1. Edit `/etc/gitlab/gitlab.rb`.
     1. Replace `GITLAB_SHELL_SECRET_TOKEN` with the real secret.

        ```ruby
        gitlab_shell['secret_token'] = 'GITLAB_SHELL_SECRET_TOKEN'
        ```
852

853 854
1. Add Prometheus monitoring settings by editing `/etc/gitlab/gitlab.rb`. If Prometheus
   is enabled on a different node, make edits on that node instead.
855

856
   You need to replace:
857 858

   - `PRAEFECT_HOST` with the IP address or hostname of the Praefect node
859
   - `GITALY_HOST_*` with the IP address or hostname of each Gitaly node
860 861 862 863 864 865 866

   ```ruby
   prometheus['scrape_configs'] = [
     {
       'job_name' => 'praefect',
       'static_configs' => [
         'targets' => [
867 868 869
           'PRAEFECT_HOST:9652', # praefect-1
           'PRAEFECT_HOST:9652', # praefect-2
           'PRAEFECT_HOST:9652', # praefect-3
870 871 872 873 874 875 876
         ]
       ]
     },
     {
       'job_name' => 'praefect-gitaly',
       'static_configs' => [
         'targets' => [
877 878 879
           'GITALY_HOST_1:9236', # gitaly-1
           'GITALY_HOST_2:9236', # gitaly-2
           'GITALY_HOST_3:9236', # gitaly-3
880 881 882 883 884 885
         ]
       ]
     }
   ]
   ```

886 887 888
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure):

   ```shell
889
   gitlab-ctl reconfigure
890 891
   ```

892
1. Verify on each Gitaly node the Git Hooks can reach GitLab. On each Gitaly node run:
893 894

   ```shell
895
   /opt/gitlab/embedded/bin/gitaly-hooks check /var/opt/gitlab/gitaly/config.toml
896 897
   ```

898 899 900
1. Verify that GitLab can reach Praefect:

   ```shell
901
   gitlab-rake gitlab:gitaly:check
902 903
   ```

904 905 906
1. Check in **Admin Area > Settings > Repository > Repository storage** that the Praefect storage
   is configured to store new repositories. Following this guide, the `default` storage should have
   weight 100 to store all new repositories.
907

908
1. Verify everything is working by creating a new project. Check the
909 910 911 912
   "Initialize repository with a README" box so that there is content in the
   repository that viewed. If the project is created, and you can see the
   README file, it works!

913 914 915 916
#### Use TCP for existing GitLab instances

When adding Gitaly Cluster to an existing Gitaly instance, the existing Gitaly storage
must use a TCP address. If `gitaly_address` is not specified, then a Unix socket is used,
917
which prevents the communication with the cluster.
918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933

For example:

```ruby
git_data_dirs({
  'default' => { 'gitaly_address' => 'tcp://old-gitaly.internal:8075' },
  'cluster' => {
    'gitaly_address' => 'tcp://<load_balancer_server_address>:2305',
    'gitaly_token' => '<praefect_external_token>'
  }
})
```

See [Mixed Configuration](configure_gitaly.md#mixed-configuration) for further information on
running multiple Gitaly storages.

934
### Grafana
935

936 937 938 939
Grafana is included with GitLab, and can be used to monitor your Praefect
cluster. See [Grafana Dashboard
Service](https://docs.gitlab.com/omnibus/settings/grafana.html)
for detailed documentation.
940

941
To get started quickly:
942

943
1. SSH into the **GitLab** node (or whichever node has Grafana enabled) and login as root:
944

945 946 947
   ```shell
   sudo -i
   ```
948

949
1. Enable the Grafana login form by editing `/etc/gitlab/gitlab.rb`.
950 951

   ```ruby
952
   grafana['disable_login_form'] = false
953 954
   ```

955 956
1. Save the changes to `/etc/gitlab/gitlab.rb` and [reconfigure
   GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure):
957

958 959 960
   ```shell
   gitlab-ctl reconfigure
   ```
961

962
1. Set the Grafana administrator password. This command prompts you to enter a new
963
   password:
964

965 966 967
   ```shell
   gitlab-ctl set-grafana-password
   ```
968

969
1. In your web browser, open `/-/grafana` (such as
970
   `https://gitlab.example.com/-/grafana`) on your GitLab server.
971

972
   Login using the password you set, and the username `admin`.
973

974 975
1. Go to **Explore** and query `gitlab_build_info` to verify that you are
   getting metrics from all your machines.
976

977
Congratulations! You've configured an observable fault-tolerant Praefect
978
cluster.
979

980 981
## Distributed reads

982
> - Introduced in GitLab 13.1 in [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) with feature flag `gitaly_distributed_reads` set to disabled.
983 984
> - [Made generally available and enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/2951) in GitLab 13.3.
> - [Disabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3178) in GitLab 13.5.
985
> - [Enabled by default](https://gitlab.com/gitlab-org/gitaly/-/issues/3334) in GitLab 13.8.
986
> - [Feature flag removed](https://gitlab.com/gitlab-org/gitaly/-/issues/3383) in GitLab 13.11.
987

988 989 990
Praefect supports distribution of read operations across Gitaly nodes that are
configured for the virtual node.

991
All RPCs marked with `ACCESSOR` option like
992 993 994 995 996 997 998 999 1000
[GetBlob](https://gitlab.com/gitlab-org/gitaly/-/blob/v12.10.6/proto/blob.proto#L16)
are redirected to an up to date and healthy Gitaly node.

_Up to date_ in this context means that:

- There is no replication operations scheduled for this node.
- The last replication operation is in _completed_ state.

If there is no such nodes, or any other error occurs during node selection, the primary
1001
node is chosen to serve the request.
1002 1003 1004 1005 1006 1007 1008 1009 1010

To track distribution of read operations, you can use the `gitaly_praefect_read_distribution`
Prometheus counter metric. It has two labels:

- `virtual_storage`.
- `storage`.

They reflect configuration defined for this instance of Praefect.

1011 1012
## Strong consistency

1013 1014
> - Introduced in GitLab 13.1 in [alpha](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga), disabled by default.
> - Entered [beta](https://about.gitlab.com/handbook/product/gitlab-the-product/#alpha-beta-ga) in GitLab 13.2, disabled by default.
1015
> - In GitLab 13.3, disabled unless primary-wins voting strategy is disabled.
1016
> - From GitLab 13.4, enabled by default.
1017 1018
> - From GitLab 13.5, you must use Git v2.28.0 or higher on Gitaly nodes to enable strong consistency.
> - From GitLab 13.6, primary-wins voting strategy and `gitaly_reference_transactions_primary_wins` feature flag were removed from the source code.
1019 1020 1021 1022 1023

Praefect guarantees eventual consistency by replicating all writes to secondary nodes
after the write to the primary Gitaly node has happened.

Praefect can instead provide strong consistency by creating a transaction and writing
1024 1025
changes to all Gitaly nodes at once.
If enabled, transactions are only available for a subset of RPCs. For more
1026
information, see the [strong consistency epic](https://gitlab.com/groups/gitlab-org/-/epics/1189).
1027 1028 1029

To enable strong consistency:

1030 1031 1032 1033 1034 1035
- In GitLab 13.5, you must use Git v2.28.0 or higher on Gitaly nodes to enable strong consistency.
- In GitLab 13.4 and later, the strong consistency voting strategy has been improved and enabled by default.
  Instead of requiring all nodes to agree, only the primary and half of the secondaries need to agree.
- In GitLab 13.3, reference transactions are enabled by default with a primary-wins strategy.
  This strategy causes all transactions to succeed for the primary and thus does not ensure strong consistency.
  To enable strong consistency, disable the `:gitaly_reference_transactions_primary_wins` feature flag.
1036
- In GitLab 13.2, enable the `:gitaly_reference_transactions` feature flag.
1037 1038 1039
- In GitLab 13.1, enable the `:gitaly_reference_transactions` and `:gitaly_hooks_rpc`
  feature flags.

1040
Changing feature flags requires [access to the Rails console](../feature_flags.md#start-the-gitlab-rails-console).
1041 1042 1043 1044
In the Rails console, enable or disable the flags as required. For example:

```ruby
Feature.enable(:gitaly_reference_transactions)
1045
Feature.disable(:gitaly_reference_transactions_primary_wins)
1046 1047
```

1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060
To monitor strong consistency, you can use the following Prometheus metrics:

- `gitaly_praefect_transactions_total`: Number of transactions created and
  voted on.
- `gitaly_praefect_subtransactions_per_transaction_total`: Number of times
  nodes cast a vote for a single transaction. This can happen multiple times if
  multiple references are getting updated in a single transaction.
- `gitaly_praefect_voters_per_transaction_total`: Number of Gitaly nodes taking
  part in a transaction.
- `gitaly_praefect_transactions_delay_seconds`: Server-side delay introduced by
  waiting for the transaction to be committed.
- `gitaly_hook_transaction_voting_delay_seconds`: Client-side delay introduced
  by waiting for the transaction to be committed.
1061

1062 1063 1064 1065 1066 1067 1068
## Replication factor

Replication factor is the number of copies Praefect maintains of a given repository. A higher
replication factor offers better redundancy and distribution of read workload, but also results
in a higher storage cost. By default, Praefect replicates repositories to every storage in a
virtual storage.

Evan Read's avatar
Evan Read committed
1069
### Configure replication factor
1070 1071

WARNING:
1072
Configurable replication factors require [repository-specific primary nodes](#repository-specific-primary-nodes) to be used.
1073 1074 1075 1076

Praefect supports configuring a replication factor on a per-repository basis, by assigning
specific storage nodes to host a repository.

1077
Praefect does not store the actual replication factor, but assigns enough storages to host the repository
1078 1079 1080
so the desired replication factor is met. If a storage node is later removed from the virtual storage,
the replication factor of repositories assigned to the storage is decreased accordingly.

1081 1082 1083 1084 1085 1086 1087 1088 1089
You can configure:

- A default replication factor for each virtual storage that is applied to newly-created repositories.
  The configuration is added to the `/etc/gitlab/gitlab.rb` file:

  ```ruby
  praefect['virtual_storages'] = {
    'default' => {
      'default_replication_factor' => 1,
1090
      # ...
1091 1092 1093 1094 1095 1096 1097 1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116
    }
  }
  ```

- A replication factor for an existing repository using the `set-replication-factor` sub-command.
  `set-replication-factor` automatically assigns or unassigns random storage nodes as
  necessary to reach the desired replication factor. The repository's primary node is
  always assigned first and is never unassigned.

  ```shell
  sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml set-replication-factor -virtual-storage <virtual-storage> -repository <relative-path> -replication-factor <replication-factor>
  ```

  - `-virtual-storage` is the virtual storage the repository is located in.
  - `-repository` is the repository's relative path in the storage.
  - `-replication-factor` is the desired replication factor of the repository. The minimum value is
    `1`, as the primary needs a copy of the repository. The maximum replication factor is the number of
    storages in the virtual storage.

  On success, the assigned host storages are printed. For example:

  ```shell
  $ sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml set-replication-factor -virtual-storage default -repository @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git -replication-factor 2

  current assignments: gitaly-1, gitaly-2
  ```
1117

1118
## Automatic failover and primary election strategies
1119

1120 1121
Praefect regularly checks the health of each Gitaly node. This is used to automatically fail over
to a newly-elected primary Gitaly node if the current primary node is found to be unhealthy.
1122

1123 1124
We recommend using [repository-specific primary nodes](#repository-specific-primary-nodes). This is
[planned to be the only available election strategy](https://gitlab.com/gitlab-org/gitaly/-/issues/3574)
1125 1126
from GitLab 14.0.

1127 1128 1129
### Repository-specific primary nodes

> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/3492) in GitLab 13.12.
1130

1131 1132 1133
Gitaly Cluster supports electing repository-specific primary Gitaly nodes. Repository-specific
Gitaly primary nodes are enabled in `/etc/gitlab/gitlab.rb` by setting
`praefect['failover_election_strategy'] = 'per_repository'`.
1134

1135
Praefect's [deprecated election strategies](#deprecated-election-strategies):
1136

1137 1138 1139 1140
- Elected a primary Gitaly node for each virtual storage, which was used as the primary node for
  each repository in the virtual storage.
- Prevented horizontal scaling of a virtual storage. The primary Gitaly node needed a replica of
  each repository and thus became the bottleneck.
1141

1142 1143 1144
The `per_repository` election strategy solves this problem by electing a primary Gitaly node separately for each
repository. Combined with [configurable replication factors](#configure-replication-factor), you can
horizontally scale storage capacity and distribute write load across Gitaly nodes.
1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167

Primary elections are run when:

- Praefect starts up.
- The cluster's consensus of a Gitaly node's health changes.

A Gitaly node is considered:

- Healthy if `>=50%` Praefect nodes have successfully health checked the Gitaly node in the
  previous ten seconds.
- Unhealthy otherwise.

During an election run, Praefect elects a new primary Gitaly node for each repository that has
an unhealthy primary Gitaly node. The election is made:

- Randomly from healthy secondary Gitaly nodes that are the most up to date.
- Only from Gitaly nodes assigned to the host repository.

If there are no healthy secondary nodes for a repository:

- The unhealthy primary node is demoted and the repository is left without a primary node.
- Operations that require a primary node fail until a primary is successfully elected.

1168
#### Migrate to repository-specific primary Gitaly nodes
1169

1170
New Gitaly Clusters can start using the `per_repository` election strategy immediately.
1171 1172 1173

To migrate existing clusters:

1174
1. Praefect nodes didn't historically keep database records of every repository stored on the cluster. When
1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197
   the `per_repository` election strategy is configured, Praefect expects to have database records of
   each repository. A [background migration](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2749) is
   included in GitLab 13.6 and later to create any missing database records for repositories. Before migrating
   you should verify the migration has run by checking Praefect's logs:

   Check Praefect's logs for `repository importer finished` message. The `virtual_storages` field contains
   the names of virtual storages and whether they've had any missing database records created.

   For example, the `default` virtual storage has been successfully migrated:

   ```json
   {"level":"info","msg":"repository importer finished","pid":19752,"time":"2021-04-28T11:41:36.743Z","virtual_storages":{"default":true}}
   ```

   If a virtual storage has not been successfully migrated, it would have `false` next to it:

   ```json
   {"level":"info","msg":"repository importer finished","pid":19752,"time":"2021-04-28T11:41:36.743Z","virtual_storages":{"default":false}}
   ```

   The migration is ran when Praefect starts up. If the migration is unsuccessful, you can restart
   a Praefect node to reattempt it. The migration only runs with `sql` election strategy configured.

1198
1. Running two different election strategies side by side can cause a split brain, where different
1199
   Praefect nodes consider repositories to have different primaries. This can be avoided either:
1200

1201
   - If a short downtime is acceptable:
1202

1203
      1. Shut down all Praefect nodes before changing the election strategy. Do this by running `gitlab-ctl stop praefect` on the Praefect nodes.
1204

1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228
      1. On the Praefect nodes, configure the election strategy in `/etc/gitlab/gitlab.rb` with `praefect['failover_election_strategy'] = 'per_repository'`.

      1. Run `gitlab-ctl reconfigure && gitlab-ctl start` to reconfigure and start the Praefects.

   - If downtime is unacceptable:

      1. Determine which Gitaly node is [the current primary](index.md#determine-primary-gitaly-node).

      1. Comment out the secondary Gitaly nodes from the virtual storage's configuration in `/etc/gitlab/gitlab.rb`
      on all Praefect nodes. This ensures there's only one Gitaly node configured, causing both of the election
      strategies to elect the same Gitaly node as the primary.

      1. Run `gitlab-ctl reconfigure` on all Praefect nodes. Wait until all Praefect processes have restarted and
      the old processes have exited. This can take up to one minute.

      1. On all Praefect nodes, configure the election strategy in `/etc/gitlab/gitlab.rb` with
      `praefect['failover_election_strategy'] = 'per_repository'`.

      1. Run `gitlab-ctl reconfigure` on all Praefect nodes. Wait until all of the Praefect processes have restarted and
      the old processes have exited. This can take up to one minute.

      1. Uncomment the secondary Gitaly node configuration commented out in the earlier step on all Praefect nodes.

      1. Run `gitlab-ctl reconfigure` on all Praefect nodes to reconfigure and restart the Praefect processes.
1229

1230
### Deprecated election strategies
1231 1232 1233 1234 1235 1236

WARNING:
The below election strategies are deprecated and are scheduled for removal in GitLab 14.0.
Migrate to [repository-specific primary nodes](#repository-specific-primary-nodes).

- **PostgreSQL:** Enabled by default until GitLab 14.0, and equivalent to:
1237 1238 1239 1240 1241 1242 1243 1244 1245
  `praefect['failover_election_strategy'] = 'sql'`.

  This configuration option:

  - Allows multiple Praefect nodes to coordinate via the PostgreSQL database to elect a primary
    Gitaly node.
  - Causes Praefect nodes to elect a new primary Gitaly node, monitor its health, and elect a new primary
    Gitaly node if the current one is not reached within 10 seconds by a majority of the Praefect
    nodes.
1246
- **Memory:** Enabled by setting `praefect['failover_election_strategy'] = 'local'`
1247 1248 1249 1250 1251
  in `/etc/gitlab/gitlab.rb` on the Praefect node.

  If a sufficient number of health checks fail for the current primary Gitaly node, a new primary is
  elected. **Do not use with multiple Praefect nodes!** Using with multiple Praefect nodes is
  likely to result in a split brain.
1252

1253
## Primary Node Failure
1254

1255 1256 1257 1258 1259 1260 1261 1262
Gitaly Cluster recovers from a failing primary Gitaly node by promoting a healthy secondary as the
new primary.

To minimize data loss, Gitaly Cluster:

- Switches repositories that are outdated on the new primary to [read-only mode](#read-only-mode).
- Elects the secondary with the least unreplicated writes from the primary to be the new primary.
  Because there can still be some unreplicated writes, [data loss can occur](#check-for-data-loss).
1263

1264
### Read-only mode
1265

1266 1267 1268 1269 1270 1271 1272 1273 1274
> - Introduced in GitLab 13.0 as [generally available](https://about.gitlab.com/handbook/product/gitlab-the-product/#generally-available-ga).
> - Between GitLab 13.0 and GitLab 13.2, read-only mode applied to the whole virtual storage and occurred whenever failover occurred.
> - [In GitLab 13.3 and later](https://gitlab.com/gitlab-org/gitaly/-/issues/2862), read-only mode applies on a per-repository basis and only occurs if a new primary is out of date.

When Gitaly Cluster switches to a new primary, repositories enter read-only mode if they are out of
date. This can happen after failing over to an outdated secondary. Read-only mode eases data
recovery efforts by preventing writes that may conflict with the unreplicated writes on other nodes.

To enable writes again, an administrator can:
1275

1276
1. [Check](#check-for-data-loss) for data loss.
1277
1. Attempt to [recover](#data-recovery) missing data.
1278 1279 1280
1. Either [enable writes](#enable-writes-or-accept-data-loss) in the virtual storage or
   [accept data loss](#enable-writes-or-accept-data-loss) if necessary, depending on the version of
   GitLab.
1281

1282
### Check for data loss
1283

1284 1285
The Praefect `dataloss` sub-command identifies replicas that are likely to be outdated. This can help
identify potential data loss after a failover. The following parameters are
1286
available:
1287

1288
- `-virtual-storage` that specifies which virtual storage to check. The default behavior is to
1289
  display outdated replicas of read-only repositories as they might require administrator action.
1290 1291
- In GitLab 13.3 and later, `-partially-replicated` that specifies whether to display a list of
  [outdated replicas of writable repositories](#outdated-replicas-of-writable-repositories).
1292

1293
NOTE:
1294 1295
`dataloss` is still in beta and the output format is subject to change.

1296
To check for repositories with outdated primaries, run:
1297 1298

```shell
1299
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>]
1300
```
1301

1302
Every configured virtual storage is checked if none is specified:
1303 1304 1305

```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss
1306
```
1307

1308
Repositories which have assigned storage nodes that contain an outdated copy of the repository are listed
1309
in the output. This information is printed for each repository:
1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325

- A repository's relative path to the storage directory identifies each repository and groups the related
  information.
- The repository's current status is printed in parentheses next to the disk path. If the repository's primary
  is outdated, the repository is in `read-only` mode and can't accept writes. Otherwise, the mode is `writable`.
- The primary field lists the repository's current primary. If the repository has no primary, the field shows
  `No Primary`.
- The In-Sync Storages lists replicas which have replicated the latest successful write and all writes
  preceding it.
- The Outdated Storages lists replicas which contain an outdated copy of the repository. Replicas which have no copy
  of the repository but should contain it are also listed here. The maximum number of changes the replica is missing
  is listed next to replica. It's important to notice that the outdated replicas may be fully up to date or contain
  later changes but Praefect can't guarantee it.

Whether a replica is assigned to host the repository is listed with each replica's status. `assigned host` is printed
next to replicas which are assigned to store the repository. The text is omitted if the replica contains a copy of
1326
the repository but is not assigned to store the repository. Such replicas aren't kept in-sync by Praefect, but may
1327 1328 1329
act as replication sources to bring assigned replicas up to date.

Example output:
1330

1331
```shell
1332
Virtual storage: default
1333
  Outdated repositories:
1334 1335 1336 1337 1338 1339 1340
    @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git (read-only):
      Primary: gitaly-1
      In-Sync Storages:
        gitaly-2, assigned host
      Outdated Storages:
        gitaly-1 is behind by 3 changes or less, assigned host
        gitaly-3 is behind by 3 changes or less
1341 1342
```

1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353
A confirmation is printed out when every repository is writable. For example:

```shell
Virtual storage: default
  All repositories are writable!
```

#### Outdated replicas of writable repositories

> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/3019) in GitLab 13.3.

1354 1355
To also list information of repositories whose primary is up to date but one or more assigned
replicas are outdated, use the `-partially-replicated` flag.
1356 1357 1358 1359 1360 1361 1362 1363 1364

A repository is writable if the primary has the latest changes. Secondaries might be temporarily
outdated while they are waiting to replicate the latest changes.

```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss [-virtual-storage <virtual-storage>] [-partially-replicated]
```

Example output:
1365 1366 1367 1368

```shell
Virtual storage: default
  Outdated repositories:
1369 1370 1371 1372 1373 1374 1375
    @hashed/3f/db/3fdba35f04dc8c462986c992bcf875546257113072a909c162f7e470e581e278.git (writable):
      Primary: gitaly-1
      In-Sync Storages:
        gitaly-1, assigned host
      Outdated Storages:
        gitaly-2 is behind by 3 changes or less, assigned host
        gitaly-3 is behind by 3 changes or less
1376 1377
```

1378 1379 1380
With the `-partially-replicated` flag set, a confirmation is printed out if every assigned replica is fully up to
date.

1381
For example:
1382

1383 1384 1385 1386 1387
```shell
Virtual storage: default
  All repositories are up to date!
```

1388
### Check repository checksums
1389

1390 1391
To check a project's repository checksums across on all Gitaly nodes, run the
[replicas Rake task](../raketasks/praefect.md#replica-checksums) on the main GitLab node.
1392

1393
### Enable writes or accept data loss
1394

1395
Praefect provides the following sub-commands to re-enable writes:
1396

1397 1398
- In GitLab 13.2 and earlier, `enable-writes` to re-enable virtual storage for writes after data
  recovery attempts.
1399 1400 1401 1402 1403

   ```shell
   sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml enable-writes -virtual-storage <virtual-storage>
   ```

1404 1405 1406 1407 1408
- [In GitLab 13.3](https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2415) and later,
  `accept-dataloss` to accept data loss and re-enable writes for repositories after data recovery
  attempts have failed. Accepting data loss causes current version of the repository on the
  authoritative storage to be considered latest. Other storages are brought up to date with the
  authoritative storage by scheduling replication jobs.
1409 1410 1411 1412 1413

  ```shell
  sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml accept-dataloss -virtual-storage <virtual-storage> -repository <relative-path> -authoritative-storage <storage-name>
  ```

1414
WARNING:
1415
`accept-dataloss` causes permanent data loss by overwriting other versions of the repository. Data
1416 1417 1418 1419
[recovery efforts](#data-recovery) must be performed before using it.

## Data recovery

1420 1421 1422 1423 1424 1425
If a Gitaly node fails replication jobs for any reason, it ends up hosting outdated versions of the
affected repositories. Praefect provides tools for:

- [Automatic](#automatic-reconciliation) reconciliation, for GitLab 13.4 and later.
- [Manual](#manual-reconciliation) reconciliation, for:
  - GitLab 13.3 and earlier.
Evan Read's avatar
Evan Read committed
1426 1427 1428
  - Repositories upgraded to GitLab 13.4 and later without entries in the `repositories` table. In
    GitLab 13.6 and later, [a migration is run](https://gitlab.com/gitlab-org/gitaly/-/issues/3033)
    when Praefect starts for these repositories.
1429 1430

These tools reconcile the outdated repositories to bring them fully up to date again.
1431

1432
### Automatic reconciliation
Paul Okstad's avatar
Paul Okstad committed
1433

1434
> [Introduced](https://gitlab.com/gitlab-org/gitaly/-/issues/2717) in GitLab 13.4.
1435

1436 1437
Praefect automatically reconciles repositories that are not up to date. By default, this is done every
five minutes. For each outdated repository on a healthy Gitaly node, the Praefect picks a
1438
random, fully up-to-date replica of the repository on another healthy Gitaly node to replicate from. A
1439 1440
replication job is scheduled only if there are no other replication jobs pending for the target
repository.
Paul Okstad's avatar
Paul Okstad committed
1441

1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460
The reconciliation frequency can be changed via the configuration. The value can be any valid
[Go duration value](https://golang.org/pkg/time/#ParseDuration). Values below 0 disable the feature.

Examples:

```ruby
praefect['reconciliation_scheduling_interval'] = '5m' # the default value
```

```ruby
praefect['reconciliation_scheduling_interval'] = '30s' # reconcile every 30 seconds
```

```ruby
praefect['reconciliation_scheduling_interval'] = '0' # disable the feature
```

### Manual reconciliation

1461 1462 1463 1464 1465 1466 1467
WARNING:
The `reconcile` sub-command is deprecated and scheduled for removal in GitLab 14.0. Use
[automatic reconciliation](#automatic-reconciliation) instead. Manual reconciliation may
produce excess replication jobs and is limited in functionality. Manual reconciliation does
not work when [repository-specific primary nodes](#repository-specific-primary-nodes) are
enabled.

1468 1469
The Praefect `reconcile` sub-command allows for the manual reconciliation between two Gitaly nodes. The
command replicates every repository on a later version on the reference storage to the target storage.
Paul Okstad's avatar
Paul Okstad committed
1470 1471

```shell
1472
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml reconcile -virtual <virtual-storage> -reference <up-to-date-storage> -target <outdated-storage> -f
Paul Okstad's avatar
Paul Okstad committed
1473 1474
```

1475
- Replace the placeholder `<virtual-storage>` with the virtual storage containing the Gitaly node storage to be checked.
1476 1477
- Replace the placeholder `<up-to-date-storage>` with the Gitaly storage name containing up to date repositories.
- Replace the placeholder `<outdated-storage>` with the Gitaly storage name containing outdated repositories.
Paul Okstad's avatar
Paul Okstad committed
1478

1479
## Migrate to Gitaly Cluster
1480

Evan Read's avatar
Evan Read committed
1481 1482 1483 1484 1485 1486 1487
Whether migrating to Gitaly Cluster because of [NFS support deprecation](index.md#nfs-deprecation-notice)
or to move from single Gitaly nodes, the basic process involves:

1. Create the required storage.
1. Create and configure Gitaly Cluster.
1. [Move the repositories](#move-repositories).

1488 1489
When creating the storage, see some
[repository storage recommendations](faq.md#what-are-some-repository-storage-recommendations).
Evan Read's avatar
Evan Read committed
1490 1491 1492

### Move Repositories

1493 1494
To migrate to Gitaly Cluster, existing repositories stored outside Gitaly Cluster must be
moved. There is no automatic migration but the moves can be scheduled with the GitLab API.
1495

1496 1497 1498
GitLab repositories can be associated with projects, groups, and snippets. Each of these types
have a separate API to schedule the respective repositories to move. To move all repositories
on a GitLab instance, each of these types must be scheduled to move for each storage.
1499

1500
Each repository is made read-only for the duration of the move. The repository is not writable
1501
until the move has completed.
1502

1503 1504
After creating and configuring Gitaly Cluster:

1505 1506
1. Ensure all storages are accessible to the GitLab instance. In this example, these are
   `<original_storage_name>` and `<cluster_storage_name>`.
1507 1508 1509 1510
1. [Configure repository storage weights](../repository_storage_paths.md#configure-where-new-repositories-are-stored)
   so that the Gitaly Cluster receives all new projects. This stops new projects being created
   on existing Gitaly nodes while the migration is in progress.
1. Schedule repository moves for:
Evan Read's avatar
Evan Read committed
1511 1512 1513
   - [Projects](#bulk-schedule-project-moves).
   - [Snippets](#bulk-schedule-snippet-moves).
   - [Groups](#bulk-schedule-group-moves). **(PREMIUM SELF)**
1514

Evan Read's avatar
Evan Read committed
1515
#### Bulk schedule project moves
Evan Read's avatar
Evan Read committed
1516

1517
1. [Schedule repository storage moves for all projects on a storage shard](../../api/project_repository_storage_moves.md#schedule-repository-storage-moves-for-all-projects-on-a-storage-shard) using the API. For example:
Evan Read's avatar
Evan Read committed
1518 1519

   ```shell
1520 1521 1522 1523
   curl --request POST --header "Private-Token: <your_access_token>" \
        --header "Content-Type: application/json" \
        --data '{"source_storage_name":"<original_storage_name>","destination_storage_name":"<cluster_storage_name>"}' \
        "https://gitlab.example.com/api/v4/project_repository_storage_moves"
Evan Read's avatar
Evan Read committed
1524 1525
   ```

1526
1. [Query the most recent repository moves](../../api/project_repository_storage_moves.md#retrieve-all-project-repository-storage-moves)
Evan Read's avatar
Evan Read committed
1527
   using the API. The query indicates either:
1528 1529 1530
   - The moves have completed successfully. The `state` field is `finished`.
   - The moves are in progress. Re-query the repository move until it completes successfully.
   - The moves have failed. Most failures are temporary and are solved by rescheduling the move.
Evan Read's avatar
Evan Read committed
1531

1532
1. After the moves are complete, [query projects](../../api/projects.md#list-all-projects)
1533 1534
   using the API to confirm that all projects have moved. No projects should be returned
   with `repository_storage` field set to the old storage.
1535

1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549
   ```shell
   curl --header "Private-Token: <your_access_token>" --header "Content-Type: application/json" \
   "https://gitlab.example.com/api/v4/projects?repository_storage=<original_storage_name>"
   ```

   Alternatively use [the rails console](../operations/rails_console.md) to
   confirm that all projects have moved. Run the following in the rails console:

   ```ruby
   ProjectRepository.for_repository_storage('<original_storage_name>')
   ```

1. Repeat for each storage as required.

Evan Read's avatar
Evan Read committed
1550
#### Bulk schedule snippet moves
1551 1552 1553 1554

1. [Schedule repository storage moves for all snippets on a storage shard](../../api/snippet_repository_storage_moves.md#schedule-repository-storage-moves-for-all-snippets-on-a-storage-shard) using the API. For example:

   ```shell
1555 1556 1557 1558
   curl --request POST --header "PRIVATE-TOKEN: <your_access_token>" \
        --header "Content-Type: application/json" \
        --data '{"source_storage_name":"<original_storage_name>","destination_storage_name":"<cluster_storage_name>"}' \
        "https://gitlab.example.com/api/v4/snippet_repository_storage_moves"
1559 1560 1561 1562 1563 1564 1565 1566 1567 1568 1569 1570 1571 1572 1573 1574 1575 1576
   ```

1. [Query the most recent repository moves](../../api/snippet_repository_storage_moves.md#retrieve-all-snippet-repository-storage-moves)
   using the API. The query indicates either:
   - The moves have completed successfully. The `state` field is `finished`.
   - The moves are in progress. Re-query the repository move until it completes successfully.
   - The moves have failed. Most failures are temporary and are solved by rescheduling the move.

1. After the moves are complete, use [the rails console](../operations/rails_console.md) to
   confirm that all snippets have moved. No snippets should be returned for the original
   storage. Run the following in the rails console:

   ```ruby
   SnippetRepository.for_repository_storage('<original_storage_name>')
   ```

1. Repeat for each storage as required.

Evan Read's avatar
Evan Read committed
1577
#### Bulk schedule group moves **(PREMIUM SELF)**
1578 1579 1580 1581

1. [Schedule repository storage moves for all groups on a storage shard](../../api/group_repository_storage_moves.md#schedule-repository-storage-moves-for-all-groups-on-a-storage-shard) using the API.

    ```shell
1582 1583 1584 1585
    curl --request POST --header "PRIVATE-TOKEN: <your_access_token>" \
         --header "Content-Type: application/json" \
         --data '{"source_storage_name":"<original_storage_name>","destination_storage_name":"<cluster_storage_name>"}' \
         "https://gitlab.example.com/api/v4/group_repository_storage_moves"
1586 1587 1588 1589 1590 1591 1592 1593 1594 1595 1596 1597 1598 1599 1600 1601 1602
    ```

1. [Query the most recent repository moves](../../api/group_repository_storage_moves.md#retrieve-all-group-repository-storage-moves)
   using the API. The query indicates either:
   - The moves have completed successfully. The `state` field is `finished`.
   - The moves are in progress. Re-query the repository move until it completes successfully.
   - The moves have failed. Most failures are temporary and are solved by rescheduling the move.

1. After the moves are complete, use [the rails console](../operations/rails_console.md) to
   confirm that all groups have moved. No groups should be returned for the original
   storage. Run the following in the rails console:

   ```ruby
   GroupWikiRepository.for_repository_storage('<original_storage_name>')
   ```

1. Repeat for each storage as required.