Commit 3aa12e23 authored by Dmitrijs Finaskins's avatar Dmitrijs Finaskins Committed by Kati Paizee

Update elasticsearch.md to replace future tense with present tense

parent 7f502df4
......@@ -207,7 +207,7 @@ The following Elasticsearch settings are available:
| `Password` | The password of your Elasticsearch instance. |
| `Number of Elasticsearch shards` | Elasticsearch indexes are split into multiple shards for performance reasons. In general, you should use at least 5 shards, and indexes with tens of millions of documents need to have more shards ([see below](#guidance-on-choosing-optimal-cluster-configuration)). Changes to this value do not take effect until the index is recreated. You can read more about tradeoffs in the [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html). |
| `Number of Elasticsearch replicas` | Each Elasticsearch shard can have a number of replicas. These are a complete copy of the shard, and can provide increased query performance or resilience against hardware failure. Increasing this value increases total disk space required by the index. |
| `Limit namespaces and projects that can be indexed` | Enabling this allows you to select namespaces and projects to index. All other namespaces and projects use database search instead. If you enable this option but do not select any namespaces or projects, none will be indexed. [Read more below](#limit-namespaces-and-projects).
| `Limit namespaces and projects that can be indexed` | Enabling this allows you to select namespaces and projects to index. All other namespaces and projects use database search instead. If you enable this option but do not select any namespaces or projects, none are indexed. [Read more below](#limit-namespaces-and-projects).
| `Using AWS hosted Elasticsearch with IAM credentials` | Sign your Elasticsearch requests using [AWS IAM authorization](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html), [AWS EC2 Instance Profile Credentials](https://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-iam-instance-profile.html#getting-started-create-iam-instance-profile-cli), or [AWS ECS Tasks Credentials](https://docs.aws.amazon.com/AmazonECS/latest/userguide/task-iam-roles.html). Please refer to [Identity and Access Management in Amazon Elasticsearch Service](https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-ac.html) for details of AWS hosted Elasticsearch domain access policy configuration. |
| `AWS Region` | The AWS region in which your Elasticsearch service is located. |
| `AWS Access Key` | The AWS access key. |
......@@ -233,7 +233,7 @@ If you select `Limit namespaces and projects that can be indexed`, more options
You can select namespaces and projects to index exclusively. Note that if the namespace is a group, it includes
any subgroups and projects belonging to those subgroups to be indexed as well.
Advanced Search only provides cross-group code/commit search (global) if all name-spaces are indexed. In this particular scenario where only a subset of namespaces are indexed, a global search will not provide a code or commit scope. This is possible only in the scope of an indexed namespace. There is no way to code/commit search in multiple indexed namespaces (when only a subset of namespaces has been indexed). For example if two groups are indexed, there is no way to run a single code search on both. You can only run a code search on the first group and then on the second.
Advanced Search only provides cross-group code/commit search (global) if all name-spaces are indexed. In this particular scenario where only a subset of namespaces are indexed, a global search does not provide a code or commit scope. This is possible only in the scope of an indexed namespace. There is no way to code/commit search in multiple indexed namespaces (when only a subset of namespaces has been indexed). For example if two groups are indexed, there is no way to run a single code search on both. You can only run a code search on the first group and then on the second.
You can filter the selection dropdown by writing part of the namespace or project name you're interested in.
......@@ -355,7 +355,7 @@ Setting this value too high can have adverse performance impacts as your cluster
may become heavily saturated with searches and writes. Setting this value too
low may lead the reindexing process to take a very long time to complete.
The best value for this will depend on your cluster size, whether you're willing
The best value for this depends on your cluster size, whether you're willing
to accept some degraded search performance during reindexing, and how important
it is for the reindex to finish quickly and resume indexing.
......@@ -430,23 +430,23 @@ In order to debug issues with the migrations you can check the [`elasticsearch.l
### Retry a halted migration
Some migrations are built with a retry limit. If the migration cannot finish within the retry limit,
it will be halted and a notification will be displayed in the Advanced Search integration settings.
it is halted and a notification is displayed in the Advanced Search integration settings.
It is recommended to check the [`elasticsearch.log` file](../administration/logs.md#elasticsearchlog) to
debug why the migration was halted and make any changes before retrying the migration. Once you believe you've
fixed the cause of the failure, click "Retry migration", and the migration will be scheduled to be retried
fixed the cause of the failure, click "Retry migration", and the migration is scheduled to be retried
in the background.
If you cannot get the migration to succeed, you may
consider the [last resort to recreate the index from
scratch](#last-resort-to-recreate-an-index). This may allow you to skip over
the problem because a newly created index will skip all migrations as the index
will be recreated with the correct up-to-date schema.
the problem because a newly created index skips all migrations as the index
is recreated with the correct up-to-date schema.
### All migrations must be finished before doing a major upgrade
Before doing a major version GitLab upgrade, you should have completed all
migrations that exist up until the latest minor version before that major
version. If you have halted migrations, these will need to be resolved and
version. If you have halted migrations, these need to be resolved and
[retried](#retry-a-halted-migration) before proceeding with a major version
upgrade. Read more about [upgrading to a new major
version](../update/index.md#upgrading-to-a-new-major-version).
......@@ -468,7 +468,7 @@ The following are some available Rake tasks:
| [`sudo gitlab-rake gitlab:elastic:resume_indexing`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Resumes Elasticsearch indexing. |
| [`sudo gitlab-rake gitlab:elastic:index_projects`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Iterates over all projects and queues Sidekiq jobs to index them in the background. |
| [`sudo gitlab-rake gitlab:elastic:index_projects_status`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Determines the overall status of the indexing. It is done by counting the total number of indexed projects, dividing by a count of the total number of projects, then multiplying by 100. |
| [`sudo gitlab-rake gitlab:elastic:clear_index_status`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Deletes all instances of IndexStatus for all projects. Note that this command will result in a complete wipe of the index, and it should be used with caution. |
| [`sudo gitlab-rake gitlab:elastic:clear_index_status`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Deletes all instances of IndexStatus for all projects. Note that this command results in a complete wipe of the index, and it should be used with caution. |
| [`sudo gitlab-rake gitlab:elastic:create_empty_index`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Generates empty indexes (the default index and a separate issues index) and assigns an alias for each on the Elasticsearch side only if it doesn't already exist. |
| [`sudo gitlab-rake gitlab:elastic:delete_index`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Removes the GitLab indexes and aliases (if they exist) on the Elasticsearch instance. |
| [`sudo gitlab-rake gitlab:elastic:recreate_index`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/ee/lib/tasks/gitlab/elastic.rake) | Wrapper task for `gitlab:elastic:delete_index` and `gitlab:elastic:create_empty_index`. |
......@@ -501,7 +501,7 @@ I, [2019-03-04T21:27:05.215266 #3384] INFO -- : Indexing GitLab User / test (ID
## Advanced Search index scopes
When performing a search, the GitLab index will use the following scopes:
When performing a search, the GitLab index uses the following scopes:
| Scope Name | What it searches |
| ---------------- | ---------------------- |
......@@ -521,25 +521,25 @@ When performing a search, the GitLab index will use the following scopes:
For basic guidance on choosing a cluster configuration you may refer to [Elastic Cloud Calculator](https://cloud.elastic.co/pricing). You can find more information below.
- Generally, you will want to use at least a 2-node cluster configuration with one replica, which will allow you to have resilience. If your storage usage is growing quickly, you may want to plan horizontal scaling (adding more nodes) beforehand.
- It's not recommended to use HDD storage with the search cluster, because it will take a hit on performance. It's better to use SSD storage (NVMe or SATA SSD drives for example).
- Generally, you want to use at least a 2-node cluster configuration with one replica, which allows you to have resilience. If your storage usage is growing quickly, you may want to plan horizontal scaling (adding more nodes) beforehand.
- It's not recommended to use HDD storage with the search cluster, because it takes a hit on performance. It's better to use SSD storage (NVMe or SATA SSD drives for example).
- You can use the [GitLab Performance Tool](https://gitlab.com/gitlab-org/quality/performance) to benchmark search performance with different search cluster sizes and configurations.
- `Heap size` should be set to no more than 50% of your physical RAM. Additionally, it shouldn't be set to more than the threshold for zero-based compressed oops. The exact threshold varies, but 26 GB is safe on most systems, but can also be as large as 30 GB on some systems. See [Heap size settings](https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html#heap-size-settings) and [Setting JVM options](https://www.elastic.co/guide/en/elasticsearch/reference/current/jvm-options.html) for more details.
- Number of CPUs (CPU cores) per node usually corresponds to the `Number of Elasticsearch shards` setting described below.
- A good guideline is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better. This will generally help the cluster stay in good health.
- A good guideline is to ensure you keep the number of shards per node below 20 per GB heap it has configured. A node with a 30GB heap should therefore have a maximum of 600 shards, but the further below this limit you can keep it the better. This generally helps the cluster stay in good health.
- Number of Elasticsearch shards:
- Small shards result in small segments, which increases overhead. Aim to keep the average shard size between at least a few GB and a few tens of GB.
- Another consideration is the number of documents. To determine the number of shards to use, sum the numbers in the **Menu > Admin > Dashboard > Statistics** pane (the number of documents to be indexed), divide by 5 million, and add 5. For example:
- If you have fewer than about 2,000,000 documents, use the default of 5 shards
- 10,000,000 documents: `10000000/5000000 + 5` = 7 shards
- 100,000,000 documents: `100000000/5000000 + 5` = 25 shards
- `refresh_interval` is a per index setting. You may want to adjust that from default `1s` to a bigger value if you don't need data in real-time. This will change how soon you will see fresh results. If that's important for you, you should leave it as close as possible to the default value.
- `refresh_interval` is a per index setting. You may want to adjust that from default `1s` to a bigger value if you don't need data in real-time. This changes how soon you see fresh results. If that's important for you, you should leave it as close as possible to the default value.
- You might want to raise [`indices.memory.index_buffer_size`](https://www.elastic.co/guide/en/elasticsearch/reference/current/indexing-buffer.html) to 30% or 40% if you have a lot of heavy indexing operations.
### Advanced Search integration settings guidance
- The `Number of Elasticsearch shards` setting usually corresponds with the number of CPUs available in your cluster. For example, if you have a 3-node cluster with 4 cores each, this means you will benefit from having at least 3*4=12 shards in the cluster. It's only possible to change the shards number by using [Split index API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-split-index.html) or by reindexing to a different index with a changed number of shards.
- The `Number of Elasticsearch replicas` setting should most of the time be equal to `1` (each shard will have 1 replica). Using `0` is not recommended, because losing one node will corrupt the index.
- The `Number of Elasticsearch shards` setting usually corresponds with the number of CPUs available in your cluster. For example, if you have a 3-node cluster with 4 cores each, this means you benefit from having at least 3*4=12 shards in the cluster. It's only possible to change the shards number by using [Split index API](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-split-index.html) or by reindexing to a different index with a changed number of shards.
- The `Number of Elasticsearch replicas` setting should most of the time be equal to `1` (each shard has 1 replica). Using `0` is not recommended, because losing one node corrupts the index.
### Indexing large instances
......@@ -548,7 +548,7 @@ This section may be helpful in the event that the other
due to large volumes of data being indexed.
WARNING:
Indexing a large instance will generate a lot of Sidekiq jobs.
Indexing a large instance generates a lot of Sidekiq jobs.
Make sure to prepare for this task by having a [Scalable and Highly Available
Setup](../administration/reference_architectures/index.md) or creating [extra
Sidekiq processes](../administration/operations/extra_sidekiq_processes.md).
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment