Commit b50d611a authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Move Consul docs to a new location

Move Consul docs to a new location outside of the high_availability
dir which is being deprecated.
parent a8292c2c
---
type: reference
---
# How to set up Consul **(PREMIUM ONLY)**
A Consul cluster consists of both
[server and client agents](https://www.consul.io/docs/agent).
The servers run on their own nodes and the clients run on other nodes that in
turn communicate with the servers.
GitLab Premium includes a bundled version of [Consul](https://www.consul.io/)
a service networking solution that you can manage by using `/etc/gitlab/gitlab.rb`.
## Configure the Consul nodes
NOTE: **Important:**
Before proceeding, refer to the
[available reference architectures](reference_architectures/index.md#available-reference-architectures)
to find out how many Consul server nodes you should have.
On **each** Consul server node perform the following:
1. Follow the instructions to [install](https://about.gitlab.com/install/)
GitLab by choosing your preferred platform, but do not supply the
`EXTERNAL_URL` value when asked.
1. Edit `/etc/gitlab/gitlab.rb`, and add the following by replacing the values
noted in the `retry_join` section. In the example below, there are three
nodes, two denoted with their IP, and one with its FQDN, you can use either
notation:
```ruby
# Disable all components except Consul
roles ['consul_role']
# Consul nodes: can be FQDN or IP, separated by a whitespace
consul['configuration'] = {
server: true,
retry_join: %w(10.10.10.1 consul1.gitlab.example.com 10.10.10.2)
}
# Disable auto migrations
gitlab_rails['auto_migrate'] = false
```
1. [Reconfigure GitLab](restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes
to take effect.
1. Run the following command to ensure Consul is both configured correctly and
to verify that all server nodes are communicating:
```shell
sudo /opt/gitlab/embedded/bin/consul members
```
The output should be similar to:
```plaintext
Node Address Status Type Build Protocol DC
CONSUL_NODE_ONE XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul
CONSUL_NODE_TWO XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul
CONSUL_NODE_THREE XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul
```
If the results display any nodes with a status that isn't `alive`, or if any
of the three nodes are missing, see the [Troubleshooting section](#troubleshooting-consul).
## Upgrade the Consul nodes
To upgrade your Consul nodes, upgrade the GitLab package.
Nodes should be:
- Members of a healthy cluster prior to upgrading the Omnibus GitLab package.
- Upgraded one node at a time.
Identify any existing health issues in the cluster by running the following command
within each node. The command will return an empty array if the cluster is healthy:
```shell
curl http://127.0.0.1:8500/v1/health/state/critical
```
Consul nodes communicate using the raft protocol. If the current leader goes
offline, there needs to be a leader election. A leader node must exist to facilitate
synchronization across the cluster. If too many nodes go offline at the same time,
the cluster will lose quorum and not elect a leader due to
[broken consensus](https://www.consul.io/docs/internals/consensus.html).
Consult the [troubleshooting section](#troubleshooting-consul) if the cluster is not
able to recover after the upgrade. The [outage recovery](#outage-recovery) may
be of particular interest.
NOTE: **Note:**
GitLab uses Consul to store only transient data that is easily regenerated. If
the bundled Consul was not used by any process other than GitLab itself, then
[rebuilding the cluster from scratch](#recreate-from-scratch) is fine.
## Troubleshooting Consul
Below are some useful operations should you need to debug any issues.
You can see any error logs by running:
```shell
sudo gitlab-ctl tail consul
```
### Check the cluster membership
To determine which nodes are part of the cluster, run the following on any member in the cluster:
```shell
sudo /opt/gitlab/embedded/bin/consul members
```
The output should be similar to:
```plaintext
Node Address Status Type Build Protocol DC
consul-b XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul
consul-c XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul
consul-c XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul
db-a XX.XX.X.Y:8301 alive client 0.9.0 2 gitlab_consul
db-b XX.XX.X.Y:8301 alive client 0.9.0 2 gitlab_consul
```
Ideally all nodes will have a `Status` of `alive`.
### Restart Consul
If it is necessary to restart Consul, it is important to do this in
a controlled manner to maintain quorum. If quorum is lost, to recover the cluster,
you will need to follow the Consul [outage recovery](#outage-recovery) process.
To be safe, it's recommended that you only restart Consul in one node at a time to
ensure the cluster remains intact. For larger clusters, it is possible to restart
multiple nodes at a time. See the
[Consul consensus document](https://www.consul.io/docs/internals/consensus.html#deployment-table)
for how many failures it can tolerate. This will be the number of simultaneous
restarts it can sustain.
To restart Consul:
```shell
sudo gitlab-ctl restart consul
```
### Consul nodes unable to communicate
By default, Consul will attempt to
[bind](https://www.consul.io/docs/agent/options.html#_bind) to `0.0.0.0`, but
it will advertise the first private IP address on the node for other Consul nodes
to communicate with it. If the other nodes cannot communicate with a node on
this address, then the cluster will have a failed status.
If you are running into this issue, you will see messages like the following in `gitlab-ctl tail consul` output:
```plaintext
2017-09-25_19:53:39.90821 2017/09/25 19:53:39 [WARN] raft: no known peers, aborting election
2017-09-25_19:53:41.74356 2017/09/25 19:53:41 [ERR] agent: failed to sync remote state: No cluster leader
```
To fix this:
1. Pick an address on each node that all of the other nodes can reach this node through.
1. Update your `/etc/gitlab/gitlab.rb`
```ruby
consul['configuration'] = {
...
bind_addr: 'IP ADDRESS'
}
```
1. Reconfigure GitLab;
```shell
gitlab-ctl reconfigure
```
If you still see the errors, you may have to
[erase the Consul database and reinitialize](#recreate-from-scratch) on the affected node.
### Consul does not start - multiple private IPs
In case that a node has multiple private IPs, Consul will be confused as to
which of the private addresses to advertise, and then immediately exit on start.
You will see messages like the following in `gitlab-ctl tail consul` output:
```plaintext
2017-11-09_17:41:45.52876 ==> Starting Consul agent...
2017-11-09_17:41:45.53057 ==> Error creating agent: Failed to get advertise address: Multiple private IPs found. Please configure one.
```
To fix this:
1. Pick an address on the node that all of the other nodes can reach this node through.
1. Update your `/etc/gitlab/gitlab.rb`
```ruby
consul['configuration'] = {
...
bind_addr: 'IP ADDRESS'
}
```
1. Reconfigure GitLab;
```shell
gitlab-ctl reconfigure
```
### Outage recovery
If you lost enough Consul nodes in the cluster to break quorum, then the cluster
is considered failed, and it will not function without manual intervention.
In that case, you can either recreate the nodes from scratch or attempt a
recover.
#### Recreate from scratch
By default, GitLab does not store anything in the Consul node that cannot be
recreated. To erase the Consul database and reinitialize:
```shell
sudo gitlab-ctl stop consul
sudo rm -rf /var/opt/gitlab/consul/data
sudo gitlab-ctl start consul
```
After this, the node should start back up, and the rest of the server agents rejoin.
Shortly after that, the client agents should rejoin as well.
#### Recover a failed node
If you have taken advantage of Consul to store other data and want to restore
the failed node, follow the
[Consul guide](https://learn.hashicorp.com/consul/day-2-operations/outage)
to recover a failed cluster.
--- ---
type: reference redirect_to: ../consul.md
--- ---
# How to set up Consul **(PREMIUM ONLY)** This document was moved to [another location](../consul.md).
GitLab Premium includes a bundled version of [Consul](https://www.consul.io/)
a service networking solution that you can manage by using `/etc/gitlab/gitlab.rb`.
A Consul cluster consists of both
[server and client agents](https://www.consul.io/docs/agent).
The servers run on their own nodes and the clients run on other nodes that, in turn, communicate with
the servers.
## Configure the Consul nodes
> - `consul_role` was introduced in GitLab 10.3.
NOTE: **Important:**
Before proceeding, refer to the
[available reference architectures](../reference_architectures/index.md#available-reference-architectures)
to find out how many Consul server nodes you should have.
On **each** Consul server node perform the following:
1. Follow the instructions to [install](https://about.gitlab.com/install/)
GitLab by choosing your preferred platform, but do not supply the
`EXTERNAL_URL` value when asked.
1. Edit `/etc/gitlab/gitlab.rb`, and add the following by replacing the values
noted in the `retry_join` section. In the example below, there are three
nodes, two denoted with their IP, and one with its FQDN, you can use either
notation:
```ruby
# Disable all components except Consul
roles ['consul_role']
# Consul nodes: can be FQDN or IP, separated by a whitespace
consul['configuration'] = {
server: true,
retry_join: %w(10.10.10.1 consul1.gitlab.example.com 10.10.10.2)
}
# Disable auto migrations
gitlab_rails['auto_migrate'] = false
```
1. [Reconfigure GitLab](../restart_gitlab.md#omnibus-gitlab-reconfigure) for the changes
to take effect.
1. Run the following command to ensure Consul is both configured correctly and
to verify that all server nodes are communicating:
```shell
sudo /opt/gitlab/embedded/bin/consul members
```
The output should be similar to:
```plaintext
Node Address Status Type Build Protocol DC
CONSUL_NODE_ONE XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul
CONSUL_NODE_TWO XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul
CONSUL_NODE_THREE XXX.XXX.XXX.YYY:8301 alive server 0.9.2 2 gitlab_consul
```
If the results display any nodes with a status that isn't `alive`, or if any
of the three nodes are missing, see the [Troubleshooting section](#troubleshooting-consul).
## Upgrade the Consul nodes
To upgrade your Consul nodes, upgrade the GitLab package.
Nodes should be:
- Members of a healthy cluster prior to upgrading the Omnibus GitLab package.
- Upgraded one node at a time.
Identify any existing health issues in the cluster by running the following command
within each node. The command will return an empty array if the cluster is healthy:
```shell
curl http://127.0.0.1:8500/v1/health/state/critical
```
Consul nodes communicate using the raft protocol. If the current leader goes
offline, there needs to be a leader election. A leader node must exist to facilitate
synchronization across the cluster. If too many nodes go offline at the same time,
the cluster will lose quorum and not elect a leader due to
[broken consensus](https://www.consul.io/docs/internals/consensus.html).
Consult the [troubleshooting section](#troubleshooting-consul) if the cluster is not
able to recover after the upgrade. The [outage recovery](#outage-recovery) may
be of particular interest.
NOTE: **Note:**
GitLab uses Consul to store only transient data that is easily regenerated. If
the bundled Consul was not used by any process other than GitLab itself, then
[rebuilding the cluster from scratch](#recreate-from-scratch) is fine.
## Troubleshooting Consul
Below are some useful operations should you need to debug any issues.
You can see any error logs by running:
```shell
sudo gitlab-ctl tail consul
```
### Check the cluster membership
To determine which nodes are part of the cluster, run the following on any member in the cluster:
```shell
sudo /opt/gitlab/embedded/bin/consul members
```
The output should be similar to:
```plaintext
Node Address Status Type Build Protocol DC
consul-b XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul
consul-c XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul
consul-c XX.XX.X.Y:8301 alive server 0.9.0 2 gitlab_consul
db-a XX.XX.X.Y:8301 alive client 0.9.0 2 gitlab_consul
db-b XX.XX.X.Y:8301 alive client 0.9.0 2 gitlab_consul
```
Ideally all nodes will have a `Status` of `alive`.
### Restart Consul
If it is necessary to restart Consul, it is important to do this in
a controlled manner to maintain quorum. If quorum is lost, to recover the cluster,
you will need to follow the Consul [outage recovery](#outage-recovery) process.
To be safe, it's recommended that you only restart Consul in one node at a time to
ensure the cluster remains intact. For larger clusters, it is possible to restart
multiple nodes at a time. See the
[Consul consensus document](https://www.consul.io/docs/internals/consensus.html#deployment-table)
for how many failures it can tolerate. This will be the number of simultaneous
restarts it can sustain.
To restart Consul:
```shell
sudo gitlab-ctl restart consul
```
### Consul nodes unable to communicate
By default, Consul will attempt to
[bind](https://www.consul.io/docs/agent/options.html#_bind) to `0.0.0.0`, but
it will advertise the first private IP address on the node for other Consul nodes
to communicate with it. If the other nodes cannot communicate with a node on
this address, then the cluster will have a failed status.
If you are running into this issue, you will see messages like the following in `gitlab-ctl tail consul` output:
```plaintext
2017-09-25_19:53:39.90821 2017/09/25 19:53:39 [WARN] raft: no known peers, aborting election
2017-09-25_19:53:41.74356 2017/09/25 19:53:41 [ERR] agent: failed to sync remote state: No cluster leader
```
To fix this:
1. Pick an address on each node that all of the other nodes can reach this node through.
1. Update your `/etc/gitlab/gitlab.rb`
```ruby
consul['configuration'] = {
...
bind_addr: 'IP ADDRESS'
}
```
1. Reconfigure GitLab;
```shell
gitlab-ctl reconfigure
```
If you still see the errors, you may have to
[erase the Consul database and reinitialize](#recreate-from-scratch) on the affected node.
### Consul does not start - multiple private IPs
In case that a node has multiple private IPs, Consul will be confused as to
which of the private addresses to advertise, and then immediately exit on start.
You will see messages like the following in `gitlab-ctl tail consul` output:
```plaintext
2017-11-09_17:41:45.52876 ==> Starting Consul agent...
2017-11-09_17:41:45.53057 ==> Error creating agent: Failed to get advertise address: Multiple private IPs found. Please configure one.
```
To fix this:
1. Pick an address on the node that all of the other nodes can reach this node through.
1. Update your `/etc/gitlab/gitlab.rb`
```ruby
consul['configuration'] = {
...
bind_addr: 'IP ADDRESS'
}
```
1. Reconfigure GitLab;
```shell
gitlab-ctl reconfigure
```
### Outage recovery
If you lost enough Consul nodes in the cluster to break quorum, then the cluster
is considered failed, and it will not function without manual intervention.
In that case, you can either recreate the nodes from scratch or attempt a
recover.
#### Recreate from scratch
By default, GitLab does not store anything in the Consul node that cannot be
recreated. To erase the Consul database and reinitialize:
```shell
sudo gitlab-ctl stop consul
sudo rm -rf /var/opt/gitlab/consul/data
sudo gitlab-ctl start consul
```
After this, the node should start back up, and the rest of the server agents rejoin.
Shortly after that, the client agents should rejoin as well.
#### Recover a failed node
If you have taken advantage of Consul to store other data and want to restore
the failed node, follow the
[Consul guide](https://learn.hashicorp.com/consul/day-2-operations/outage)
to recover a failed cluster.
...@@ -203,7 +203,7 @@ When installing the GitLab package, do not supply `EXTERNAL_URL` value. ...@@ -203,7 +203,7 @@ When installing the GitLab package, do not supply `EXTERNAL_URL` value.
### Configuring the Database nodes ### Configuring the Database nodes
1. Make sure to [configure the Consul nodes](../high_availability/consul.md). 1. Make sure to [configure the Consul nodes](../consul.md).
1. Make sure you collect [`CONSUL_SERVER_NODES`](#consul-information), [`PGBOUNCER_PASSWORD_HASH`](#pgbouncer-information), [`POSTGRESQL_PASSWORD_HASH`](#postgresql-information), the [number of db nodes](#postgresql-information), and the [network address](#network-information) before executing the next step. 1. Make sure you collect [`CONSUL_SERVER_NODES`](#consul-information), [`PGBOUNCER_PASSWORD_HASH`](#pgbouncer-information), [`POSTGRESQL_PASSWORD_HASH`](#postgresql-information), the [number of db nodes](#postgresql-information), and the [network address](#network-information) before executing the next step.
1. On the master database node, edit `/etc/gitlab/gitlab.rb` replacing values noted in the `# START user configuration` section: 1. On the master database node, edit `/etc/gitlab/gitlab.rb` replacing values noted in the `# START user configuration` section:
...@@ -795,7 +795,7 @@ After deploying the configuration follow these steps: ...@@ -795,7 +795,7 @@ After deploying the configuration follow these steps:
This example uses 3 PostgreSQL servers, and 1 application node (with PgBouncer setup alongside). This example uses 3 PostgreSQL servers, and 1 application node (with PgBouncer setup alongside).
It differs from the [recommended setup](#example-recommended-setup) by moving the Consul servers into the same servers we use for PostgreSQL. It differs from the [recommended setup](#example-recommended-setup) by moving the Consul servers into the same servers we use for PostgreSQL.
The trade-off is between reducing server counts, against the increased operational complexity of needing to deal with PostgreSQL [failover](#failover-procedure) and [restore](#restore-procedure) procedures in addition to [Consul outage recovery](../high_availability/consul.md#outage-recovery) on the same set of machines. The trade-off is between reducing server counts, against the increased operational complexity of needing to deal with PostgreSQL [failover](#failover-procedure) and [restore](#restore-procedure) procedures in addition to [Consul outage recovery](../consul.md#outage-recovery) on the same set of machines.
In this example we start with all servers on the same 10.6.0.0/16 private network range, they can connect to each freely other on those addresses. In this example we start with all servers on the same 10.6.0.0/16 private network range, they can connect to each freely other on those addresses.
...@@ -1087,7 +1087,7 @@ To restart either service, run `gitlab-ctl restart SERVICE` ...@@ -1087,7 +1087,7 @@ To restart either service, run `gitlab-ctl restart SERVICE`
For PostgreSQL, it is usually safe to restart the master node by default. Automatic failover defaults to a 1 minute timeout. Provided the database returns before then, nothing else needs to be done. To be safe, you can stop `repmgrd` on the standby nodes first with `gitlab-ctl stop repmgrd`, then start afterwards with `gitlab-ctl start repmgrd`. For PostgreSQL, it is usually safe to restart the master node by default. Automatic failover defaults to a 1 minute timeout. Provided the database returns before then, nothing else needs to be done. To be safe, you can stop `repmgrd` on the standby nodes first with `gitlab-ctl stop repmgrd`, then start afterwards with `gitlab-ctl start repmgrd`.
On the Consul server nodes, it is important to [restart the Consul service](../high_availability/consul.md#restart-consul) in a controlled manner. On the Consul server nodes, it is important to [restart the Consul service](../consul.md#restart-consul) in a controlled manner.
### `gitlab-ctl repmgr-check-master` command produces errors ### `gitlab-ctl repmgr-check-master` command produces errors
...@@ -1136,7 +1136,7 @@ postgresql['trust_auth_cidr_addresses'] = %w(123.123.123.123/32 <other_cidrs>) ...@@ -1136,7 +1136,7 @@ postgresql['trust_auth_cidr_addresses'] = %w(123.123.123.123/32 <other_cidrs>)
If you're running into an issue with a component not outlined here, be sure to check the troubleshooting section of their specific documentation page. If you're running into an issue with a component not outlined here, be sure to check the troubleshooting section of their specific documentation page.
- [Consul](../high_availability/consul.md#troubleshooting-consul) - [Consul](../consul.md#troubleshooting-consul)
- [PostgreSQL](https://docs.gitlab.com/omnibus/settings/database.html#troubleshooting) - [PostgreSQL](https://docs.gitlab.com/omnibus/settings/database.html#troubleshooting)
- [GitLab application](../high_availability/gitlab.md#troubleshooting) - [GitLab application](../high_availability/gitlab.md#troubleshooting)
......
...@@ -524,7 +524,7 @@ To restart either service, run `gitlab-ctl restart SERVICE` ...@@ -524,7 +524,7 @@ To restart either service, run `gitlab-ctl restart SERVICE`
For PostgreSQL, it is usually safe to restart the master node by default. Automatic failover defaults to a 1 minute timeout. Provided the database returns before then, nothing else needs to be done. To be safe, you can stop `repmgrd` on the standby nodes first with `gitlab-ctl stop repmgrd`, then start afterwards with `gitlab-ctl start repmgrd`. For PostgreSQL, it is usually safe to restart the master node by default. Automatic failover defaults to a 1 minute timeout. Provided the database returns before then, nothing else needs to be done. To be safe, you can stop `repmgrd` on the standby nodes first with `gitlab-ctl stop repmgrd`, then start afterwards with `gitlab-ctl start repmgrd`.
On the Consul server nodes, it is important to restart the Consul service in a controlled fashion. Read our [Consul documentation](../high_availability/consul.md#restarting-the-server-cluster) for instructions on how to restart the service. On the Consul server nodes, it is important to restart the Consul service in a controlled fashion. Read our [Consul documentation](../consul.md#restart-consul) for instructions on how to restart the service.
### `gitlab-ctl repmgr-check-master` command produces errors ### `gitlab-ctl repmgr-check-master` command produces errors
......
...@@ -247,7 +247,7 @@ GitLab can be considered to have two layers from a process perspective: ...@@ -247,7 +247,7 @@ GitLab can be considered to have two layers from a process perspective:
- [Project page](https://github.com/hashicorp/consul/blob/master/README.md) - [Project page](https://github.com/hashicorp/consul/blob/master/README.md)
- Configuration: - Configuration:
- [Omnibus](../administration/high_availability/consul.md) - [Omnibus](../administration/consul.md)
- [Charts](https://docs.gitlab.com/charts/installation/deployment.html#postgresql) - [Charts](https://docs.gitlab.com/charts/installation/deployment.html#postgresql)
- Layer: Core Service (Data) - Layer: Core Service (Data)
- GitLab.com: [Consul](../user/gitlab_com/index.md#consul) - GitLab.com: [Consul](../user/gitlab_com/index.md#consul)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment