Commit 9d5b7b8d authored by Markus Koller's avatar Markus Koller Committed by Achilleas Pipinellis

Document ES web indexing

parent 3ca9a6bd
...@@ -128,8 +128,10 @@ total are being tracked in [epic &153](https://gitlab.com/groups/gitlab-org/-/ep ...@@ -128,8 +128,10 @@ total are being tracked in [epic &153](https://gitlab.com/groups/gitlab-org/-/ep
## Enabling Elasticsearch ## Enabling Elasticsearch
In order to enable Elasticsearch, you need to have admin access. Go to In order to enable Elasticsearch, you need to have admin access. Navigate to
**Admin > Settings > Integrations** and find the "Elasticsearch" section. **Admin Area** (wrench icon), then **Settings > Integrations** and expand the **Elasticsearch** section.
Click **Save changes** for the changes to take effect.
The following Elasticsearch settings are available: The following Elasticsearch settings are available:
...@@ -171,171 +173,222 @@ from the Elasticsearch index as expected. ...@@ -171,171 +173,222 @@ from the Elasticsearch index as expected.
To disable the Elasticsearch integration: To disable the Elasticsearch integration:
1. Navigate to the **Admin > Settings > Integrations** 1. Navigate to the **Admin Area** (wrench icon), then **Settings > Integrations**.
1. Find the 'Elasticsearch' section and uncheck 'Search with Elasticsearch enabled' 1. Expand the **Elasticsearch** section and uncheck **Elasticsearch indexing**
and 'Elasticsearch indexing' and **Search with Elasticsearch enabled**.
1. Click **Save** for the changes to take effect 1. Click **Save changes** for the changes to take effect.
1. (Optional) Delete the existing index by running the command `sudo gitlab-rake gitlab:elastic:delete_index` 1. (Optional) Delete the existing index by running one of these commands:
```sh
# Omnibus installations
sudo gitlab-rake gitlab:elastic:delete_index
# Installations from source
bundle exec rake gitlab:elastic:delete_index RAILS_ENV=production
```
## Adding GitLab's data to the Elasticsearch index ## Adding GitLab's data to the Elasticsearch index
### Indexing small instances (database size less than 500 MiB, size of repos less than 5 GiB) While Elasticsearch indexing is enabled, new changes in your GitLab instance will be automatically indexed as they happen.
To backfill existing data, you can use one of the methods below to index it in background jobs.
Configure Elasticsearch's host and port in **Admin > Settings**. Then index the data using one of the following commands: ### Indexing through the administration UI
```sh > [Introduced](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/15390) in [GitLab Starter](https://about.gitlab.com/pricing/) 12.3.
# Omnibus installations
sudo gitlab-rake gitlab:elastic:index
# Installations from source To index via the admin area:
bundle exec rake gitlab:elastic:index RAILS_ENV=production
``` 1. Navigate to the **Admin Area** (wrench icon), then **Settings > Integrations** and expand the **Elasticsearch** section.
1. [Enable **Elasticsearch indexing** and configure your host and port](#enabling-elasticsearch).
1. Create empty indexes using one of the following commands:
```sh
# Omnibus installations
sudo gitlab-rake gitlab:elastic:create_empty_index
# Installations from source
bundle exec rake gitlab:elastic:create_empty_index RAILS_ENV=production
```
1. Click **Index all projects**.
1. Click **Check progress** in the confirmation message to see the status of the background jobs.
1. Personal snippets need to be indexed manually by running one of these commands:
```sh
# Omnibus installations
sudo gitlab-rake gitlab:elastic:index_snippets
# Installations from source
bundle exec rake gitlab:elastic:index_snippets RAILS_ENV=production
```
1. After the indexing has completed, enable [**Search with Elasticsearch**](#enabling-elasticsearch).
### Indexing through Rake tasks
#### Indexing small instances
CAUTION: **Warning**:
This will delete your existing indexes.
If the database size is less than 500 MiB, and the size of all hosted repos is less than 5 GiB:
1. [Enable **Elasticsearch indexing** and configure your host and port](#enabling-elasticsearch).
1. Index your data using one of the following commands:
```sh
# Omnibus installations
sudo gitlab-rake gitlab:elastic:index
# Installations from source
bundle exec rake gitlab:elastic:index RAILS_ENV=production
```
After it completes the indexing process, [enable Elasticsearch searching](elasticsearch.md#enabling-elasticsearch). 1. After the indexing has completed, enable [**Search with Elasticsearch**](#enabling-elasticsearch).
### Indexing large instances #### Indexing large instances
WARNING: **Warning**: CAUTION: **Warning**:
Performing asynchronous indexing, as this will describe, will generate a lot of sidekiq jobs. Performing asynchronous indexing will generate a lot of Sidekiq jobs.
Make sure to prepare for this task by either [Horizontally Scaling](../administration/high_availability/README.md#basic-scaling) Make sure to prepare for this task by either [Horizontally Scaling](../administration/high_availability/README.md#basic-scaling)
or creating [extra sidekiq processes](../administration/operations/extra_sidekiq_processes.md) or creating [extra Sidekiq processes](../administration/operations/extra_sidekiq_processes.md)
Configure Elasticsearch's host and port in **Admin > Settings > Integrations**. Then create empty indexes using one of the following commands: 1. [Enable **Elasticsearch indexing** and configure your host and port](#enabling-elasticsearch).
1. Create empty indexes using one of the following commands:
```sh ```sh
# Omnibus installations # Omnibus installations
sudo gitlab-rake gitlab:elastic:create_empty_index sudo gitlab-rake gitlab:elastic:create_empty_index
# Installations from source # Installations from source
bundle exec rake gitlab:elastic:create_empty_index RAILS_ENV=production bundle exec rake gitlab:elastic:create_empty_index RAILS_ENV=production
``` ```
Indexing large Git repositories can take a while. To speed up the process, you 1. Indexing large Git repositories can take a while. To speed up the process, you
can temporarily disable auto-refreshing and replicating. In our experience, you can expect a 20% can temporarily disable auto-refreshing and replicating. In our experience, you can expect a 20%
decrease in indexing time. We'll enable them when indexing is done. This step is optional! decrease in indexing time. We'll enable them when indexing is done. This step is optional!
```bash ```bash
curl --request PUT localhost:9200/gitlab-production/_settings --data '{ curl --request PUT localhost:9200/gitlab-production/_settings --data '{
"index" : { "index" : {
"refresh_interval" : "-1", "refresh_interval" : "-1",
"number_of_replicas" : 0 "number_of_replicas" : 0
} }' } }'
``` ```
Then enable Elasticsearch indexing and run project indexing tasks: 1. Index projects and their associated data:
```sh ```sh
# Omnibus installations # Omnibus installations
sudo gitlab-rake gitlab:elastic:index_projects sudo gitlab-rake gitlab:elastic:index_projects
# Installations from source # Installations from source
bundle exec rake gitlab:elastic:index_projects RAILS_ENV=production bundle exec rake gitlab:elastic:index_projects RAILS_ENV=production
``` ```
This enqueues a Sidekiq job for each project that needs to be indexed. This enqueues a Sidekiq job for each project that needs to be indexed.
You can view the jobs in the admin panel (they are placed in the `elastic_indexer` You can view the jobs in **Admin Area > Monitoring > Background Jobs > Queues Tab**
queue), or you can query indexing status using a rake task: and click `elastic_indexer`, or you can query indexing status using a rake task:
```sh ```sh
# Omnibus installations # Omnibus installations
sudo gitlab-rake gitlab:elastic:index_projects_status sudo gitlab-rake gitlab:elastic:index_projects_status
# Installations from source # Installations from source
bundle exec rake gitlab:elastic:index_projects_status RAILS_ENV=production bundle exec rake gitlab:elastic:index_projects_status RAILS_ENV=production
Indexing is 65.55% complete (6555/10000 projects) Indexing is 65.55% complete (6555/10000 projects)
``` ```
If you want to limit the index to a range of projects you can provide the If you want to limit the index to a range of projects you can provide the
`ID_FROM` and `ID_TO` parameters: `ID_FROM` and `ID_TO` parameters:
```sh ```sh
# Omnibus installations # Omnibus installations
sudo gitlab-rake gitlab:elastic:index_projects ID_FROM=1001 ID_TO=2000 sudo gitlab-rake gitlab:elastic:index_projects ID_FROM=1001 ID_TO=2000
# Installations from source # Installations from source
bundle exec rake gitlab:elastic:index_projects ID_FROM=1001 ID_TO=2000 RAILS_ENV=production bundle exec rake gitlab:elastic:index_projects ID_FROM=1001 ID_TO=2000 RAILS_ENV=production
``` ```
Where `ID_FROM` and `ID_TO` are project IDs. Both parameters are optional. Where `ID_FROM` and `ID_TO` are project IDs. Both parameters are optional.
The above examples will index all projects starting with ID `1001` up to (and including) ID `2000`. The above example will index all projects from ID `1001` up to (and including) ID `2000`.
TIP: **Troubleshooting:** TIP: **Troubleshooting:**
Sometimes the project indexing jobs queued by `gitlab:elastic:index_projects` Sometimes the project indexing jobs queued by `gitlab:elastic:index_projects`
can get interrupted. This may happen for many reasons, but it's always safe can get interrupted. This may happen for many reasons, but it's always safe
to run the indexing task again - it will skip those repositories that have to run the indexing task again. It will skip repositories that have
already been indexed. already been indexed.
As the indexer stores the last commit SHA of every indexed repository in the As the indexer stores the last commit SHA of every indexed repository in the
database, you can run the indexer with the special parameter `UPDATE_INDEX` and database, you can run the indexer with the special parameter `UPDATE_INDEX` and
it will check every project repository again to make sure that every commit in it will check every project repository again to make sure that every commit in
that repository is indexed, it can be useful in case if your index is outdated: a repository is indexed, which can be useful in case if your index is outdated:
```sh ```sh
# Omnibus installations # Omnibus installations
sudo gitlab-rake gitlab:elastic:index_projects UPDATE_INDEX=true ID_TO=1000 sudo gitlab-rake gitlab:elastic:index_projects UPDATE_INDEX=true ID_TO=1000
# Installations from source # Installations from source
bundle exec rake gitlab:elastic:index_projects UPDATE_INDEX=true ID_TO=1000 RAILS_ENV=production bundle exec rake gitlab:elastic:index_projects UPDATE_INDEX=true ID_TO=1000 RAILS_ENV=production
``` ```
You can also use the `gitlab:elastic:clear_index_status` Rake task to force the You can also use the `gitlab:elastic:clear_index_status` Rake task to force the
indexer to "forget" all progress, so retrying the indexing process from the indexer to "forget" all progress, so it will retry the indexing process from the
start. start.
The `index_projects` command enqueues jobs to index all project and wiki 1. Personal snippets are not associated with a project and need to be indexed separately
repositories, and most database content. However, snippets still need to be by running one of these commands:
indexed separately. To do so, run one of these commands:
```sh ```sh
# Omnibus installations # Omnibus installations
sudo gitlab-rake gitlab:elastic:index_snippets sudo gitlab-rake gitlab:elastic:index_snippets
# Installations from source # Installations from source
bundle exec rake gitlab:elastic:index_snippets RAILS_ENV=production bundle exec rake gitlab:elastic:index_snippets RAILS_ENV=production
``` ```
Enable replication and refreshing again after indexing (only if you previously disabled it): 1. Enable replication and refreshing again after indexing (only if you previously disabled it):
```bash ```bash
curl --request PUT localhost:9200/gitlab-production/_settings --data '{ curl --request PUT localhost:9200/gitlab-production/_settings --data '{
"index" : { "index" : {
"number_of_replicas" : 1, "number_of_replicas" : 1,
"refresh_interval" : "1s" "refresh_interval" : "1s"
} }' } }'
``` ```
A force merge should be called after enabling the refreshing above. A force merge should be called after enabling the refreshing above.
For Elasticsearch 6.x, before proceeding with the force merge, the index should be in read-only mode: For Elasticsearch 6.x, the index should be in read-only mode before proceeding with the force merge:
```bash ```bash
curl --request PUT localhost:9200/gitlab-production/_settings --data '{ curl --request PUT localhost:9200/gitlab-production/_settings --data '{
"settings": { "settings": {
"index.blocks.write": true "index.blocks.write": true
} }' } }'
``` ```
Then, initiate the force merge: Then, initiate the force merge:
```bash ```bash
curl --request POST 'http://localhost:9200/gitlab-production/_forcemerge?max_num_segments=5' curl --request POST 'http://localhost:9200/gitlab-production/_forcemerge?max_num_segments=5'
``` ```
After this, if your index is in read-only, switch back to read-write: After this, if your index is in read-only mode, switch back to read-write:
```bash ```bash
curl --request PUT localhost:9200/gitlab-production/_settings --data '{ curl --request PUT localhost:9200/gitlab-production/_settings --data '{
"settings": { "settings": {
"index.blocks.write": false "index.blocks.write": false
} }' } }'
``` ```
Enable Elasticsearch search in **Admin > Settings > Integrations**. That's it. Enjoy it! 1. After the indexing has completed, enable [**Search with Elasticsearch**](#enabling-elasticsearch).
### Index limit ### Indexing limitations
Currently for repository and snippet files, GitLab would only index up to 1 MB of content, in order to avoid indexing timeout. For repository and snippet files, GitLab will only index up to 1 MiB of content, in order to avoid indexing timeouts.
## GitLab Elasticsearch Rake Tasks ## GitLab Elasticsearch Rake Tasks
...@@ -352,7 +405,7 @@ There are several rake tasks available to you via the command line: ...@@ -352,7 +405,7 @@ There are several rake tasks available to you via the command line:
- [`sudo gitlab-rake gitlab:elastic:index_projects_status`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake) - [`sudo gitlab-rake gitlab:elastic:index_projects_status`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- This determines the overall status of the indexing. It is done by counting the total number of indexed projects, dividing by a count of the total number of projects, then multiplying by 100. - This determines the overall status of the indexing. It is done by counting the total number of indexed projects, dividing by a count of the total number of projects, then multiplying by 100.
- [`sudo gitlab-rake gitlab:elastic:create_empty_index`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake) - [`sudo gitlab-rake gitlab:elastic:create_empty_index`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- This generates an empty index on the Elasticsearch side. - This generates an empty index on the Elasticsearch side, deleting the existing one if present.
- [`sudo gitlab-rake gitlab:elastic:clear_index_status`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake) - [`sudo gitlab-rake gitlab:elastic:clear_index_status`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
- This deletes all instances of IndexStatus for all projects. - This deletes all instances of IndexStatus for all projects.
- [`sudo gitlab-rake gitlab:elastic:delete_index`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake) - [`sudo gitlab-rake gitlab:elastic:delete_index`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/tasks/gitlab/elastic.rake)
...@@ -468,7 +521,7 @@ Here are some common pitfalls and how to overcome them: ...@@ -468,7 +521,7 @@ Here are some common pitfalls and how to overcome them:
pp s.search_objects.to_a pp s.search_objects.to_a
``` ```
See [Elasticsearch Index Scopes](elasticsearch.md#elasticsearch-index-scopes) for more information on searching for specific types of data. See [Elasticsearch Index Scopes](#elasticsearch-index-scopes) for more information on searching for specific types of data.
- **I indexed all the repositories but then switched Elasticsearch servers and now I can't find anything** - **I indexed all the repositories but then switched Elasticsearch servers and now I can't find anything**
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment