Commit ceaefe24 authored by Ash McKenzie's avatar Ash McKenzie

Merge branch 'elasticsearch-disaster-recovery-logging' into 'master'

Improve Gitlab::Elastic::Indexer update logging  and document possible use of logs

See merge request gitlab-org/gitlab!50482
parents e8af934c 72f17829
......@@ -337,3 +337,48 @@ cluster.routing.allocation.disk.watermark.high: 10gb
Restart Elasticsearch, and the `read_only_allow_delete` will clear on it's own.
_from "Disk-based Shard Allocation | Elasticsearch Reference" [5.6](https://www.elastic.co/guide/en/elasticsearch/reference/5.6/disk-allocator.html#disk-allocator) and [6.x](https://www.elastic.co/guide/en/elasticsearch/reference/6.7/disk-allocator.html)_
### Disaster recovery/data loss/backups
The use of Elasticsearch in GitLab is only ever as a secondary data store.
This means that all of the data stored in Elasticsearch can always be derived
again from other data sources, specifically PostgreSQL and Gitaly. Therefore if
the Elasticsearch data store is ever corrupted for whatever reason you can
simply reindex everything from scratch.
If your Elasticsearch index is incredibly large it may be too time consuming or
cause too much downtime to reindex from scratch. There aren't any built in
mechanisms for automatically finding discrepencies and resyncing an
Elasticsearch index if it gets out of sync but one tool that may be useful is
looking at the logs for all the updates that occurred in a time range you
believe may have been missed. This information is very low level and only
useful for operators that are familiar with the GitLab codebase. It is
documented here in case it is useful for others. The relevant logs that could
theoretically be used to figure out what needs to be replayed are:
1. All non-repository updates that were synced can be found in
[`elasticsearch.log`](../administration/logs.md#elasticsearchlog) by
searching for
[`track_items`](https://gitlab.com/gitlab-org/gitlab/-/blob/1e60ea99bd8110a97d8fc481e2f41cab14e63d31/ee/app/services/elastic/process_bookkeeping_service.rb#L25)
and these can be replayed by sending these items again through
`::Elastic::ProcessBookkeepingService.track!`
1. All repository updates that occurred can be found in
[`elasticsearch.log`](../administration/logs.md#elasticsearchlog) by
searching for
[`indexing_commit_range`](https://gitlab.com/gitlab-org/gitlab/-/blob/6f9d75dd3898536b9ec2fb206e0bd677ab59bd6d/ee/lib/gitlab/elastic/indexer.rb#L41).
Replaying these requires resetting the
[`IndexStatus#last_commit/last_wiki_commit`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/models/index_status.rb)
to the oldest `from_sha` in the logs and then triggering another index of
the project using
[`ElasticCommitIndexerWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic_commit_indexer_worker.rb)
1. All project deletes that occurred can be found in
[`sidekiq.log`](../administration/logs.md#sidekiqlog) by searching for
[`ElasticDeleteProjectWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/app/workers/elastic_delete_project_worker.rb).
These updates can be replayed by triggering another
`ElasticDeleteProjectWorker`.
With the above methods and taking regular [Elasticsearch
snapshots](https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html)
we should be able to recover from different kinds of data loss issues in a
relatively short period of time compared to indexing everything from
scratch.
......@@ -980,3 +980,11 @@ results and assuming that basic search is supported in that scope. This "basic
search" will behave as though you don't have Advanced Search enabled at all for
your instance and search using other data sources (ie. PostgreSQL data and Git
data).
### Data recovery: Elasticsearch is a secondary data store only
The use of Elasticsearch in GitLab is only ever as a secondary data store.
This means that all of the data stored in Elasticsearch can always be derived
again from other data sources, specifically PostgreSQL and Gitaly. Therefore, if
the Elasticsearch data store is ever corrupted for whatever reason, you can
simply reindex everything from scratch.
......@@ -38,9 +38,11 @@ module Gitlab
return update_index_status(Gitlab::Git::BLANK_SHA) unless commit
repository.__elasticsearch__.elastic_writing_targets.each do |target|
Sidekiq.logger.debug(message: "Indexation running for #{project.id} #{from_sha}..#{commit.sha}",
logger.debug(message: "indexing_commit_range",
project_id: project.id,
wiki: index_wiki?)
from_sha: from_sha,
to_sha: commit.sha,
index_wiki: index_wiki?)
run_indexer!(commit.sha, target)
end
......@@ -178,10 +180,10 @@ module Gitlab
# rubocop: disable CodeReuse/ActiveRecord
def update_index_status(to_sha)
unless Project.exists?(id: project.id)
Gitlab::Elasticsearch::Logger.build.debug(
logger.debug(
message: 'Index status could not be updated as the project does not exist',
project_id: project.id,
wiki: index_wiki?
index_wiki: index_wiki?
)
return false
end
......@@ -204,6 +206,10 @@ module Gitlab
project.reload_index_status
end
# rubocop: enable CodeReuse/ActiveRecord
def logger
@logger ||= ::Gitlab::Elasticsearch::Logger.build
end
end
end
end
......@@ -396,13 +396,14 @@ RSpec.describe Gitlab::Elastic::Indexer do
before do
allow(Gitlab::Elasticsearch::Logger).to receive(:build).and_return(logger_double)
allow(indexer).to receive(:run_indexer!) { Project.where(id: project.id).delete_all }
allow(logger_double).to receive(:debug)
end
it 'does not raises an exception and prints log message' do
expect(logger_double).to receive(:debug).with(
message: 'Index status could not be updated as the project does not exist',
project_id: project.id,
wiki: false
index_wiki: false
)
expect(IndexStatus).not_to receive(:safe_find_or_create_by!).with(project_id: project.id)
expect { indexer.run }.not_to raise_error
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment