Commit bbde8d09 authored by Dmitry Gruzd's avatar Dmitry Gruzd

Advanced Search: Estimate ES cluster size

This change introduces a new rake task estimate_cluster_size,
which could be useful for estimating ES cluster size
parent 2d12dc1c
...@@ -56,6 +56,12 @@ A few notes on CPU and storage: ...@@ -56,6 +56,12 @@ A few notes on CPU and storage:
to any spinning media for Elasticsearch. In testing, nodes that use SSD storage to any spinning media for Elasticsearch. In testing, nodes that use SSD storage
see boosts in both query and indexing performance. see boosts in both query and indexing performance.
- We've introduced the [`estimate_cluster_size`](#gitlab-advanced-search-rake-tasks)
Rake task to estimate the Advanced Search storage requirements in advance, which
- The [`estimate_cluster_size`](#gitlab-advanced-search-rake-tasks) Rake task estimates the
Advanced Search storage requirements in advance. The Rake task uses total repository size
for the calculation. [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/221177) in GitLab 13.10.
Keep in mind, these are **minimum requirements** for Elasticsearch. Keep in mind, these are **minimum requirements** for Elasticsearch.
Heavily-used Elasticsearch clusters will likely require considerably more Heavily-used Elasticsearch clusters will likely require considerably more
resources. resources.
...@@ -421,8 +427,9 @@ The following are some available Rake tasks: ...@@ -421,8 +427,9 @@ The following are some available Rake tasks:
| [`sudo gitlab-rake gitlab:elastic:index_snippets`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Performs an Elasticsearch import that indexes the snippets data. | | [`sudo gitlab-rake gitlab:elastic:index_snippets`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Performs an Elasticsearch import that indexes the snippets data. |
| [`sudo gitlab-rake gitlab:elastic:projects_not_indexed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Displays which projects are not indexed. | | [`sudo gitlab-rake gitlab:elastic:projects_not_indexed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Displays which projects are not indexed. |
| [`sudo gitlab-rake gitlab:elastic:reindex_cluster`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Schedules a zero-downtime cluster reindexing task. This feature should be used with an index that was created after GitLab 13.0. | | [`sudo gitlab-rake gitlab:elastic:reindex_cluster`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Schedules a zero-downtime cluster reindexing task. This feature should be used with an index that was created after GitLab 13.0. |
| [`sudo gitlab-rake gitlab:elastic:mark_reindex_failed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake)`] | Mark the most recent re-index job as failed. | | [`sudo gitlab-rake gitlab:elastic:mark_reindex_failed`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Mark the most recent re-index job as failed. |
| [`sudo gitlab-rake gitlab:elastic:list_pending_migrations`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake)`] | List pending migrations. Pending migrations include those that have not yet started, have started but not finished, and those that are halted. | | [`sudo gitlab-rake gitlab:elastic:list_pending_migrations`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | List pending migrations. Pending migrations include those that have not yet started, have started but not finished, and those that are halted. |
| [`sudo gitlab-rake gitlab:elastic:estimate_cluster_size`](https://gitlab.com/gitlab-org/gitlab/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Get an estimate of cluster size based on the total repository size. |
### Environment variables ### Environment variables
......
---
title: 'Advanced Search: Estimate Elasticsearch cluster size'
merge_request: 54430
author:
type: added
...@@ -163,6 +163,21 @@ namespace :gitlab do ...@@ -163,6 +163,21 @@ namespace :gitlab do
end end
end end
desc "GitLab | Elasticsearch | Estimate Cluster size"
task estimate_cluster_size: :environment do
include ActionView::Helpers::NumberHelper
total_size = Namespace::RootStorageStatistics.sum(:repository_size).to_i
total_size_human = number_to_human_size(total_size, delimiter: ',', precision: 1, significant: false)
estimated_cluster_size = total_size * 0.5
estimated_cluster_size_human = number_to_human_size(estimated_cluster_size, delimiter: ',', precision: 1, significant: false)
puts "This GitLab instance repository size is #{total_size_human}."
puts "By our estimates for such repository size, your cluster size should be at least #{estimated_cluster_size_human}.".color(:green)
puts 'Please note that it is possible to index only selected namespaces/projects by using Elasticsearch indexing restrictions.'
end
def project_id_batches(&blk) def project_id_batches(&blk)
relation = Project.all relation = Project.all
......
...@@ -231,4 +231,18 @@ RSpec.describe 'gitlab:elastic namespace rake tasks', :elastic do ...@@ -231,4 +231,18 @@ RSpec.describe 'gitlab:elastic namespace rake tasks', :elastic do
end end
end end
end end
describe 'estimate_cluster_size' do
subject { run_rake_task('gitlab:elastic:estimate_cluster_size') }
before do
create(:namespace_root_storage_statistics, repository_size: 1.megabyte)
create(:namespace_root_storage_statistics, repository_size: 10.megabyte)
create(:namespace_root_storage_statistics, repository_size: 30.megabyte)
end
it 'outputs estimates' do
expect { subject }.to output(/your cluster size should be at least 20.5 MB/).to_stdout
end
end
end end
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment