Commit 700cc059 authored by Kati Paizee's avatar Kati Paizee

Merge branch 'fneill-doc-fixes-2' into 'master'

Fix style and language issues in Elasticsearch page

See merge request gitlab-org/gitlab!69743
parents 30cce575 110d1999
......@@ -684,7 +684,7 @@ indexed, which have a separate limit. For more information, read
- For self-managed installations, the field length is unlimited by default.
You can configure this limit for self-managed installations when you
[enable Elasticsearch](../integration/elasticsearch.md#enabling-advanced-search).
[enable Elasticsearch](../integration/elasticsearch.md#enable-advanced-search).
Set the limit to `0` to disable it.
## Wiki limits
......
......@@ -47,8 +47,7 @@ Adding a new service follows the same [merge request workflow](contributing/merg
The first iteration should be to add the ability to connect and use the service as an externally installed component. Often this involves providing settings in GitLab to connect to the service, or allow connections from it. And then shipping documentation on how to install and configure the service with GitLab.
NOTE:
[Elasticsearch](../integration/elasticsearch.md#installing-elasticsearch) is an example of a service that has been integrated this way. And many of the other services, including internal projects like Gitaly, started off as separately installed alternatives.
[Elasticsearch](../integration/elasticsearch.md#install-elasticsearch) is an example of a service that has been integrated this way. Many of the other services, including internal projects like Gitaly, started off as separately installed alternatives.
**For services that depend on the existing GitLab codebase:**
......
......@@ -9,7 +9,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
This area is to maintain a compendium of useful information when working with Elasticsearch.
Information on how to enable Elasticsearch and perform the initial indexing is in
the [Elasticsearch integration documentation](../integration/elasticsearch.md#enabling-advanced-search).
the [Elasticsearch integration documentation](../integration/elasticsearch.md#enable-advanced-search).
## Deep Dive
......
......@@ -68,7 +68,7 @@ Keep in mind, these are **minimum requirements** for Elasticsearch.
Heavily-used Elasticsearch clusters likely require considerably more
resources.
## Installing Elasticsearch
## Install Elasticsearch
Elasticsearch is *not* included in the Omnibus packages or when you install from
source. You must [install it separately](https://www.elastic.co/guide/en/elasticsearch/reference/7.x/install-elasticsearch.html "Elasticsearch 7.x installation documentation").
......@@ -81,27 +81,23 @@ it yourself or use a cloud hosted offering like Elastic's [Elasticsearch Service
service. Running Elasticsearch on the same server as GitLab is not recommended
and can cause a degradation in GitLab instance performance.
**For a single node Elasticsearch cluster the functional cluster health status
is yellow** (will never be green) because the primary shard is allocated but
replicas can not be as there is no other node to which Elasticsearch can assign a
replica.
**For a single node Elasticsearch cluster, the functional cluster health status
is always yellow and never green**. This is due to allocation of the primary shard. Replicas cannot be allocated as there is no other node to which Elasticsearch can assign a replica.
After the data is added to the database or repository and [Elasticsearch is
enabled in the Admin Area](#enabling-advanced-search) the search index is
enabled in the Admin Area](#enable-advanced-search), the search index is
updated automatically.
## Upgrading to a new Elasticsearch major version
## Upgrade to a new Elasticsearch major version
Since Elasticsearch can read and use indices created in the previous major version, you don't need to change anything in the GitLab configuration when upgrading Elasticsearch.
Elasticsearch can read and use indices created in the previous major version, so you don't need to change anything in the GitLab configuration when upgrading Elasticsearch.
The only thing worth noting is that if you have created your current index before GitLab 13.0, you might want to reindex from scratch (which implicitly creates an alias) in order to use some features, for example [Zero downtime reindexing](#zero-downtime-reindexing). Once you do that, you are able to perform zero-downtime reindexing and will benefit from any future features that make use of the alias.
If you created your current index before GitLab 13.0, you might want to reindex from scratch (which implicitly creates an alias) to use some features, for example [Zero downtime reindexing](#zero-downtime-reindexing). After you reindex, you can perform zero-downtime reindexing and benefit from future features that make use of the alias.
If you are unsure when your current index was created,
you can check whether it was created after GitLab 13.0 by using the
[Elasticsearch cat aliases API](https://www.elastic.co/guide/en/elasticsearch/reference/7.11/cat-alias.html).
If the list of aliases returned contains an entry for `gitlab-production` that points to an index
To check if your current index was created after GitLab 13.0, use the [Elasticsearch cat aliases API](https://www.elastic.co/guide/en/elasticsearch/reference/7.11/cat-alias.html).
If the returned list of aliases contains an entry for `gitlab-production` that points to an index
named `gitlab-production-<numerical timestamp>`, your index was created after GitLab 13.0.
If the `gitlab-production` alias is missing, you need to reindex from scratch to use
If the `gitlab-production` alias is missing, you must reindex from scratch to use
features such as Zero-downtime reindexing.
## Elasticsearch repository indexer
......@@ -126,7 +122,7 @@ First, we need to install some dependencies, then we build and install
the indexer itself.
This project relies on [International Components for Unicode](http://site.icu-project.org/) (ICU) for text encoding,
therefore we need to ensure the development packages for your platform are
therefore we must ensure the development packages for your platform are
installed before running `make`.
#### Debian / Ubuntu
......@@ -154,7 +150,7 @@ brew install icu4c
export PKG_CONFIG_PATH="/usr/local/opt/icu4c/lib/pkgconfig:$PKG_CONFIG_PATH"
```
### Building and installing
### Build and install
To build and install the indexer, run:
......@@ -166,7 +162,7 @@ sudo -u git -H bundle exec rake gitlab:indexer:install[$indexer_path] RAILS_ENV=
cd $indexer_path && sudo make install
```
The `gitlab-elasticsearch-indexer` will be installed to `/usr/local/bin`.
The `gitlab-elasticsearch-indexer` is installed to `/usr/local/bin`.
You can change the installation path with the `PREFIX` environment variable.
Please remember to pass the `-E` flag to `sudo` if you do so.
......@@ -177,19 +173,19 @@ Example:
PREFIX=/usr sudo -E make install
```
After installation, be sure to [enable Elasticsearch](#enabling-advanced-search).
After installation, be sure to [enable Elasticsearch](#enable-advanced-search).
NOTE:
If you see an error such as `Permission denied - /home/git/gitlab-elasticsearch-indexer/` while indexing, you
may need to set the `production -> elasticsearch -> indexer_path` setting in your `gitlab.yml` file to
`/usr/local/bin/gitlab-elasticsearch-indexer`, which is where the binary is installed.
## Enabling Advanced Search
## Enable Advanced Search
For GitLab instances with more than 50GB repository data you can follow the instructions for [Indexing large
instances](#indexing-large-instances) below.
To enable Advanced Search, you need to have admin access to GitLab:
To enable Advanced Search, you must have admin access to GitLab:
1. On the top bar, select **Menu > Admin**.
1. On the left sidebar, select **Settings > Advanced Search**.
......@@ -206,7 +202,7 @@ To enable Advanced Search, you need to have admin access to GitLab:
1. Select **Index all projects**.
1. Select **Check progress** in the confirmation message to see the status of
the background jobs.
1. Personal snippets need to be indexed using another Rake task:
1. Personal snippets must be indexed using another Rake task:
```shell
# Omnibus installations
......@@ -216,7 +212,7 @@ To enable Advanced Search, you need to have admin access to GitLab:
bundle exec rake gitlab:elastic:index_snippets RAILS_ENV=production
```
1. After the indexing has completed, enable **Search with Elasticsearch enabled** and select **Save changes**.
1. After indexing completes, enable **Search with Elasticsearch enabled** and select **Save changes**.
NOTE:
When your Elasticsearch cluster is down while Elasticsearch is enabled,
......@@ -237,8 +233,8 @@ The following Elasticsearch settings are available:
| `Username` | The `username` of your Elasticsearch instance. |
| `Password` | The password of your Elasticsearch instance. |
| `Number of Elasticsearch shards` | Elasticsearch indexes are split into multiple shards for performance reasons. In general, you should use at least 5 shards, and indexes with tens of millions of documents need to have more shards ([see below](#guidance-on-choosing-optimal-cluster-configuration)). Changes to this value do not take effect until the index is recreated. You can read more about tradeoffs in the [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html). |
| `Number of Elasticsearch replicas` | Each Elasticsearch shard can have a number of replicas. These are a complete copy of the shard, and can provide increased query performance or resilience against hardware failure. Increasing this value will greatly increase total disk space required by the index. |
| `Limit namespaces and projects that can be indexed` | Enabling this will allow you to select namespaces and projects to index. All other namespaces and projects will use database search instead. If you enable this option but do not select any namespaces or projects, none will be indexed. [Read more below](#limiting-namespaces-and-projects).
| `Number of Elasticsearch replicas` | Each Elasticsearch shard can have a number of replicas. These are a complete copy of the shard, and can provide increased query performance or resilience against hardware failure. Increasing this value increases total disk space required by the index. |
| `Limit namespaces and projects that can be indexed` | Enabling this allows you to select namespaces and projects to index. All other namespaces and projects use database search instead. If you enable this option but do not select any namespaces or projects, none will be indexed. [Read more below](#limit-namespaces-and-projects).
| `Using AWS hosted Elasticsearch with IAM credentials` | Sign your Elasticsearch requests using [AWS IAM authorization](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html), [AWS EC2 Instance Profile Credentials](https://docs.aws.amazon.com/codedeploy/latest/userguide/getting-started-create-iam-instance-profile.html#getting-started-create-iam-instance-profile-cli), or [AWS ECS Tasks Credentials](https://docs.aws.amazon.com/AmazonECS/latest/userguide/task-iam-roles.html). Please refer to [Identity and Access Management in Amazon Elasticsearch Service](https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-ac.html) for details of AWS hosted Elasticsearch domain access policy configuration. |
| `AWS Region` | The AWS region in which your Elasticsearch service is located. |
| `AWS Access Key` | The AWS access key. |
......@@ -255,31 +251,31 @@ Sidekiq performance. Return them to their default values if you see increased `s
in your Sidekiq logs. For more information, see
[issue 322147](https://gitlab.com/gitlab-org/gitlab/-/issues/322147).
### Limiting namespaces and projects
### Limit namespaces and projects
If you select `Limit namespaces and projects that can be indexed`, more options will become available.
If you select `Limit namespaces and projects that can be indexed`, more options become available.
![limit namespaces and projects options](img/limit_namespaces_projects_options.png)
You can select namespaces and projects to index exclusively. Note that if the namespace is a group it will include
You can select namespaces and projects to index exclusively. Note that if the namespace is a group, it includes
any subgroups and projects belonging to those subgroups to be indexed as well.
Advanced Search only provides cross-group code/commit search (global) if all name-spaces are indexed. In this particular scenario where only a subset of namespaces are indexed, a global search will not provide a code or commit scope. This will be possible only in the scope of an indexed namespace. Currently there is no way to code/commit search in multiple indexed namespaces (when only a subset of namespaces has been indexed). For example if two groups are indexed, there is no way to run a single code search on both. You can only run a code search on the first group and then on the second.
Advanced Search only provides cross-group code/commit search (global) if all name-spaces are indexed. In this particular scenario where only a subset of namespaces are indexed, a global search will not provide a code or commit scope. This is possible only in the scope of an indexed namespace. There is no way to code/commit search in multiple indexed namespaces (when only a subset of namespaces has been indexed). For example if two groups are indexed, there is no way to run a single code search on both. You can only run a code search on the first group and then on the second.
You can filter the selection dropdown by writing part of the namespace or project name you're interested in.
![limit namespace filter](img/limit_namespace_filter.png)
NOTE:
If no namespaces or projects are selected, no Advanced Search indexing will take place.
If no namespaces or projects are selected, no Advanced Search indexing takes place.
WARNING:
If you have already indexed your instance, you will have to regenerate the index in order to delete all existing data
for filtering to work correctly. To do this run the Rake tasks `gitlab:elastic:recreate_index` and
`gitlab:elastic:clear_index_status`. Afterwards, removing a namespace or a project from the list will delete the data
If you have already indexed your instance, you must regenerate the index to delete all existing data
for filtering to work correctly. To do this, run the Rake tasks `gitlab:elastic:recreate_index` and
`gitlab:elastic:clear_index_status`. Afterwards, removing a namespace or a project from the list deletes the data
from the Elasticsearch index as expected.
## Enabling custom language analyzers
## Enable custom language analyzers
You can improve the language support for Chinese and Japanese languages by utilizing [`smartcn`](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.html) and/or [`kuromoji`](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html) analysis plugins from Elastic.
......@@ -303,7 +299,7 @@ For guidance on what to install, see the following Elasticsearch language plugin
| `Enable Japanese (kuromoji) custom analyzer: Indexing` | Enables or disables Japanese language support using [`kuromoji`](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html) custom analyzer for newly created indices.|
| `Enable Japanese (kuromoji) custom analyzer: Search` | Enables or disables using [`kuromoji`](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html) fields for Advanced Search. Please only enable this after [installing the plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html), enabling custom analyzer indexing and recreating the index.|
## Disabling Advanced Search
## Disable Advanced Search
To disable the Elasticsearch integration:
......@@ -327,7 +323,7 @@ The idea behind this reindexing method is to leverage the [Elasticsearch reindex
and Elasticsearch index alias feature to perform the operation. We set up an index alias which connects to a
`primary` index which is used by GitLab for reads/writes. When reindexing process starts, we temporarily pause
the writes to the `primary` index. Then, we create another index and invoke the Reindex API which migrates the
index data onto the new index. Once the reindexing job is complete, we switch to the new index by connecting the
index data onto the new index. After the reindexing job is complete, we switch to the new index by connecting the
index alias to it which becomes the new `primary` index. At the end, we resume the writes and normal operation resumes.
### Trigger the reindex via the Advanced Search administration
......@@ -350,7 +346,7 @@ After this process is completed, the original index is scheduled to be deleted a
14 days. You can cancel this action by pressing the **Cancel** button on the same
page you triggered the reindexing process.
While the reindexing is running, you will be able to follow its progress under that same section.
While the reindexing is running, you can follow its progress under that same section.
#### Elasticsearch zero-downtime reindexing
......@@ -486,8 +482,8 @@ version](../update/index.md#upgrading-to-a-new-major-version).
Rake tasks are available to:
- [Build and install](#building-and-installing) the indexer.
- Delete indexes when [disabling Elasticsearch](#disabling-advanced-search).
- [Build and install](#build-and-install) the indexer.
- Delete indexes when [disabling Elasticsearch](#disable-advanced-search).
- Add GitLab data to an index.
The following are some available Rake tasks:
......@@ -573,7 +569,7 @@ For basic guidance on choosing a cluster configuration you may refer to [Elastic
### Indexing large instances
This section may be helpful in the event that the other
[basic instructions](#enabling-advanced-search) cause problems
[basic instructions](#enable-advanced-search) cause problems
due to large volumes of data being indexed.
WARNING:
......@@ -582,7 +578,7 @@ Make sure to prepare for this task by having a [Scalable and Highly Available
Setup](../administration/reference_architectures/index.md) or creating [extra
Sidekiq processes](../administration/operations/extra_sidekiq_processes.md).
1. [Configure your Elasticsearch host and port](#enabling-advanced-search).
1. [Configure your Elasticsearch host and port](#enable-advanced-search).
1. Create empty indexes:
```shell
......@@ -603,7 +599,7 @@ Sidekiq processes](../administration/operations/extra_sidekiq_processes.md).
bundle exec rake gitlab:elastic:clear_index_status RAILS_ENV=production
```
1. [Enable **Elasticsearch indexing**](#enabling-advanced-search).
1. [Enable **Elasticsearch indexing**](#enable-advanced-search).
1. Indexing large Git repositories can take a while. To speed up the process, you can [tune for indexing speed](https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html#tune-for-indexing-speed):
- You can temporarily disable [`refresh`](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html), the operation responsible for making changes to an index available to search.
......@@ -734,7 +730,7 @@ Sidekiq processes](../administration/operations/extra_sidekiq_processes.md).
} }'
```
1. After the indexing has completed, enable [**Search with Elasticsearch enabled**](#enabling-advanced-search).
1. After the indexing has completed, enable [**Search with Elasticsearch enabled**](#enable-advanced-search).
### Deleted documents
......@@ -836,7 +832,7 @@ be necessary for you to [reindex](#zero-downtime-reindexing) after updating GitL
### I indexed all the repositories but I can't get any hits for my search term in the UI
Make sure you indexed all the database data [as stated above](#enabling-advanced-search).
Make sure you indexed all the database data [as stated above](#enable-advanced-search).
If there aren't any results (hits) in the UI search, check if you are seeing the same results via the rails console (`sudo gitlab-rails console`):
......@@ -857,7 +853,7 @@ More [complex Elasticsearch API calls](https://www.elastic.co/guide/en/elasticse
It is important to understand at which level the problem is manifesting (UI, Rails code, Elasticsearch side) to be able to [troubleshoot further](../administration/troubleshooting/elasticsearch.md#search-results-workflow).
NOTE:
The above instructions are not to be used for scenarios that only index a [subset of namespaces](#limiting-namespaces-and-projects).
The above instructions are not to be used for scenarios that only index a [subset of namespaces](#limit-namespaces-and-projects).
See [Elasticsearch Index Scopes](#advanced-search-index-scopes) for more information on searching for specific types of data.
......@@ -953,7 +949,7 @@ Gitlab::Elastic::Indexer::Error: time="2020-01-23T09:13:00Z" level=fatal msg="he
```
You probably have not used either `http://` or `https://` as part of your value in the **"URL"** field of the Elasticsearch Integration Menu. Please make sure you are using either `http://` or `https://` in this field as the [Elasticsearch client for Go](https://github.com/olivere/elastic) that we are using [needs the prefix for the URL to be accepted as valid](https://github.com/olivere/elastic/commit/a80af35aa41856dc2c986204e2b64eab81ccac3a).
Once you have corrected the formatting of the URL, delete the index (via the [dedicated Rake task](#gitlab-advanced-search-rake-tasks)) and [reindex the content of your instance](#enabling-advanced-search).
Once you have corrected the formatting of the URL, delete the index (via the [dedicated Rake task](#gitlab-advanced-search-rake-tasks)) and [reindex the content of your instance](#enable-advanced-search).
### My Elasticsearch cluster has a plugin and the integration is not working
......
......@@ -103,7 +103,7 @@ sudo -u git -H make
### 8. Install/Update `gitlab-elasticsearch-indexer` **(PREMIUM SELF)**
Please follow the [install instruction](../integration/elasticsearch.md#installing-elasticsearch).
Please follow the [install instruction](../integration/elasticsearch.md#install-elasticsearch).
### 9. Start application
......
......@@ -88,7 +88,7 @@ sudo -u git -H bundle exec rake cache:clear RAILS_ENV=production
### 4. Install `gitlab-elasticsearch-indexer` **(PREMIUM SELF)**
Please follow the [install instruction](../integration/elasticsearch.md#installing-elasticsearch).
Please follow the [install instruction](../integration/elasticsearch.md#install-elasticsearch).
### 5. Start application
......
......@@ -37,7 +37,7 @@ To access the default page for Admin Area settings:
| Option | Description |
| ------ | ----------- |
| [Elasticsearch](../../../integration/elasticsearch.md#enabling-advanced-search) | Elasticsearch integration. Elasticsearch AWS IAM. |
| [Elasticsearch](../../../integration/elasticsearch.md#enable-advanced-search) | Elasticsearch integration. Elasticsearch AWS IAM. |
| [Kroki](../../../administration/integration/kroki.md#enable-kroki-in-gitlab) | Allow rendering of diagrams in AsciiDoc and Markdown documents using [kroki.io](https://kroki.io). |
| [Mailgun](../../../administration/integration/mailgun.md) | Enable your GitLab instance to receive invite email bounce events from Mailgun, if it is your email provider. |
| [PlantUML](../../../administration/integration/plantuml.md) | Allow rendering of PlantUML diagrams in documents. |
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment