Commit 439b0fa9 authored by Will Chandler's avatar Will Chandler

Enable prometheus, update dataloss

Update listed gitaly config to enable prometheus. We are encouraging
use of grafana, so this
will be a useful default.

Also add a note that explicitly states that `dataloss` must be run on
a Praefect node.
parent 59e5a8ef
...@@ -396,7 +396,6 @@ documentation](index.md#configure-gitaly-servers). ...@@ -396,7 +396,6 @@ documentation](index.md#configure-gitaly-servers).
postgresql['enable'] = false postgresql['enable'] = false
redis['enable'] = false redis['enable'] = false
nginx['enable'] = false nginx['enable'] = false
prometheus['enable'] = false
grafana['enable'] = false grafana['enable'] = false
puma['enable'] = false puma['enable'] = false
sidekiq['enable'] = false sidekiq['enable'] = false
...@@ -406,6 +405,9 @@ documentation](index.md#configure-gitaly-servers). ...@@ -406,6 +405,9 @@ documentation](index.md#configure-gitaly-servers).
# Enable only the Gitaly service # Enable only the Gitaly service
gitaly['enable'] = true gitaly['enable'] = true
# Enable Prometheus if needed
prometheus['enable'] = true
# Prevent database connections during 'gitlab-ctl reconfigure' # Prevent database connections during 'gitlab-ctl reconfigure'
gitlab_rails['rake_cache_clear'] = false gitlab_rails['rake_cache_clear'] = false
gitlab_rails['auto_migrate'] = false gitlab_rails['auto_migrate'] = false
...@@ -739,7 +741,9 @@ strategy in the future. ...@@ -739,7 +741,9 @@ strategy in the future.
## Identifying Impact of a Primary Node Failure ## Identifying Impact of a Primary Node Failure
When a primary Gitaly node fails, there is a chance of data loss. Data loss can occur if there were outstanding replication jobs the secondaries did not manage to process before the failure. The Praefect `dataloss` sub-command helps identify these cases by counting the number of dead replication jobs for each repository within a given time frame. When a primary Gitaly node fails, there is a chance of data loss. Data loss can occur if there were outstanding replication jobs the secondaries did not manage to process before the failure. The `dataloss` Praefect sub-command helps identify these cases by counting the number of dead replication jobs for each repository. This command must be executed on a Praefect node.
A time frame to search can be specified with `-from` and `-to`:
```shell ```shell
sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss -from <rfc3339-time> -to <rfc3339-time> sudo /opt/gitlab/embedded/bin/praefect -config /var/opt/gitlab/praefect/config.toml dataloss -from <rfc3339-time> -to <rfc3339-time>
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment