Commit c2aed6e4 authored by Suzanne Selhorn's avatar Suzanne Selhorn

Merge branch...

Merge branch 'Fix-Vale-issues-for-(/administration/geo/disaster_recovery/planned_failover.md)-#332111' into 'master'

Fix Vale issues for planned_failover.md

See merge request gitlab-org/gitlab!63024
parents 4154ab19 be1049c3
...@@ -35,7 +35,7 @@ required scheduled maintenance period significantly. ...@@ -35,7 +35,7 @@ required scheduled maintenance period significantly.
A common strategy for keeping this period as short as possible for data stored A common strategy for keeping this period as short as possible for data stored
in files is to use `rsync` to transfer the data. An initial `rsync` can be in files is to use `rsync` to transfer the data. An initial `rsync` can be
performed ahead of the maintenance window; subsequent `rsync`s (including a performed ahead of the maintenance window; subsequent `rsync`s (including a
final transfer inside the maintenance window) will then transfer only the final transfer inside the maintenance window) then transfers only the
*changes* between the **primary** node and the **secondary** nodes. *changes* between the **primary** node and the **secondary** nodes.
Repository-centric strategies for using `rsync` effectively can be found in the Repository-centric strategies for using `rsync` effectively can be found in the
...@@ -50,7 +50,7 @@ this command reports `ERROR - Replication is not up-to-date` even if ...@@ -50,7 +50,7 @@ this command reports `ERROR - Replication is not up-to-date` even if
replication is actually up-to-date. This bug was fixed in GitLab 13.8 and replication is actually up-to-date. This bug was fixed in GitLab 13.8 and
later. later.
Run this command to list out all preflight checks and automatically check if replication and verification are complete before scheduling a planned failover to ensure the process will go smoothly: Run this command to list out all preflight checks and automatically check if replication and verification are complete before scheduling a planned failover to ensure the process goes smoothly:
```shell ```shell
gitlab-ctl promotion-preflight-checks gitlab-ctl promotion-preflight-checks
...@@ -73,7 +73,7 @@ In GitLab 12.4, you can optionally allow GitLab to manage replication of Object ...@@ -73,7 +73,7 @@ In GitLab 12.4, you can optionally allow GitLab to manage replication of Object
Database settings are automatically replicated to the **secondary** node, but the Database settings are automatically replicated to the **secondary** node, but the
`/etc/gitlab/gitlab.rb` file must be set up manually, and differs between `/etc/gitlab/gitlab.rb` file must be set up manually, and differs between
nodes. If features such as Mattermost, OAuth or LDAP integration are enabled nodes. If features such as Mattermost, OAuth or LDAP integration are enabled
on the **primary** node but not the **secondary** node, they will be lost during failover. on the **primary** node but not the **secondary** node, they are lost during failover.
Review the `/etc/gitlab/gitlab.rb` file for both nodes and ensure the **secondary** node Review the `/etc/gitlab/gitlab.rb` file for both nodes and ensure the **secondary** node
supports everything the **primary** node does **before** scheduling a planned failover. supports everything the **primary** node does **before** scheduling a planned failover.
...@@ -119,7 +119,7 @@ time to complete ...@@ -119,7 +119,7 @@ time to complete
If any objects are failing to replicate, this should be investigated before If any objects are failing to replicate, this should be investigated before
scheduling the maintenance window. Following a planned failover, anything that scheduling the maintenance window. Following a planned failover, anything that
failed to replicate will be **lost**. failed to replicate is **lost**.
You can use the [Geo status API](../../../api/geo_nodes.md#retrieve-project-sync-or-verification-failures-that-occurred-on-the-current-node) to review failed objects and You can use the [Geo status API](../../../api/geo_nodes.md#retrieve-project-sync-or-verification-failures-that-occurred-on-the-current-node) to review failed objects and
the reasons for failure. the reasons for failure.
...@@ -136,9 +136,9 @@ This [content was moved to another location](background_verification.md). ...@@ -136,9 +136,9 @@ This [content was moved to another location](background_verification.md).
On the **primary** node, navigate to **Admin Area > Messages**, add a broadcast On the **primary** node, navigate to **Admin Area > Messages**, add a broadcast
message. You can check under **Admin Area > Geo** to estimate how long it message. You can check under **Admin Area > Geo** to estimate how long it
will take to finish syncing. An example message would be: takes to finish syncing. An example message would be:
> A scheduled maintenance will take place at XX:XX UTC. We expect it to take > A scheduled maintenance takes place at XX:XX UTC. We expect it to take
> less than 1 hour. > less than 1 hour.
## Prevent updates to the **primary** node ## Prevent updates to the **primary** node
...@@ -151,7 +151,7 @@ be disabled on the primary site: ...@@ -151,7 +151,7 @@ be disabled on the primary site:
1. Disable non-Geo periodic background jobs on the **primary** node by navigating 1. Disable non-Geo periodic background jobs on the **primary** node by navigating
to **Admin Area > Monitoring > Background Jobs > Cron**, pressing `Disable All`, to **Admin Area > Monitoring > Background Jobs > Cron**, pressing `Disable All`,
and then pressing `Enable` for the `geo_sidekiq_cron_config_worker` cron job. and then pressing `Enable` for the `geo_sidekiq_cron_config_worker` cron job.
This job will re-enable several other cron jobs that are essential for planned This job re-enables several other cron jobs that are essential for planned
failover to complete successfully. failover to complete successfully.
## Finish replicating and verifying all data ## Finish replicating and verifying all data
...@@ -161,7 +161,7 @@ be disabled on the primary site: ...@@ -161,7 +161,7 @@ be disabled on the primary site:
1. On the **primary** node, navigate to **Admin Area > Monitoring > Background Jobs > Queues** 1. On the **primary** node, navigate to **Admin Area > Monitoring > Background Jobs > Queues**
and wait for all queues except those with `geo` in the name to drop to 0. and wait for all queues except those with `geo` in the name to drop to 0.
These queues contain work that has been submitted by your users; failing over These queues contain work that has been submitted by your users; failing over
before it is completed will cause the work to be lost. before it is completed, causes the work to be lost.
1. On the **primary** node, navigate to **Admin Area > Geo** and wait for the 1. On the **primary** node, navigate to **Admin Area > Geo** and wait for the
following conditions to be true of the **secondary** node you are failing over to: following conditions to be true of the **secondary** node you are failing over to:
...@@ -176,15 +176,15 @@ be disabled on the primary site: ...@@ -176,15 +176,15 @@ be disabled on the primary site:
to verify the integrity of CI artifacts, LFS objects, and uploads in file to verify the integrity of CI artifacts, LFS objects, and uploads in file
storage. storage.
At this point, your **secondary** node will contain an up-to-date copy of everything the At this point, your **secondary** node contains an up-to-date copy of everything the
**primary** node has, meaning nothing will be lost when you fail over. **primary** node has, meaning nothing was lost when you fail over.
## Promote the **secondary** node ## Promote the **secondary** node
Finally, follow the [Disaster Recovery docs](index.md) to promote the Finally, follow the [Disaster Recovery docs](index.md) to promote the
**secondary** node to a **primary** node. This process will cause a brief outage on the **secondary** node, and users may need to log in again. **secondary** node to a **primary** node. This process causes a brief outage on the **secondary** node, and users may need to log in again.
Once it is completed, the maintenance window is over! Your new **primary** node will now Once it is completed, the maintenance window is over! Your new **primary** node, now
begin to diverge from the old one. If problems do arise at this point, failing begin to diverge from the old one. If problems do arise at this point, failing
back to the old **primary** node [is possible](bring_primary_back.md), but likely to result back to the old **primary** node [is possible](bring_primary_back.md), but likely to result
in the loss of any data uploaded to the new **primary** in the meantime. in the loss of any data uploaded to the new **primary** in the meantime.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment