Commit 48b62da3 authored by Dmitry Gruzd's avatar Dmitry Gruzd Committed by Dylan Griffith

Add Chinese and Japanese languages

This MR implements support for 2 elasticsearch analysis plugins: smartcn
and kuromoji to improve Chinese and Japanese languages support. This is
an optional feature and requires plugin(s) installation.
parent 33a3220f
# frozen_string_literal: true
class AddEsCustomAnalyzersSettings < ActiveRecord::Migration[6.0]
DOWNTIME = false
def change
add_column :application_settings, :elasticsearch_analyzers_smartcn_enabled, :bool, null: false, default: false
add_column :application_settings, :elasticsearch_analyzers_smartcn_search, :bool, null: false, default: false
add_column :application_settings, :elasticsearch_analyzers_kuromoji_enabled, :bool, null: false, default: false
add_column :application_settings, :elasticsearch_analyzers_kuromoji_search, :bool, null: false, default: false
end
end
1bd99d7d6b972ea66495f21358e3b8731532219fcf75731bf643c312eb56820d
\ No newline at end of file
...@@ -9297,6 +9297,10 @@ CREATE TABLE application_settings ( ...@@ -9297,6 +9297,10 @@ CREATE TABLE application_settings (
encrypted_ci_jwt_signing_key text, encrypted_ci_jwt_signing_key text,
encrypted_ci_jwt_signing_key_iv text, encrypted_ci_jwt_signing_key_iv text,
container_registry_expiration_policies_worker_capacity integer DEFAULT 0 NOT NULL, container_registry_expiration_policies_worker_capacity integer DEFAULT 0 NOT NULL,
elasticsearch_analyzers_smartcn_enabled boolean DEFAULT false NOT NULL,
elasticsearch_analyzers_smartcn_search boolean DEFAULT false NOT NULL,
elasticsearch_analyzers_kuromoji_enabled boolean DEFAULT false NOT NULL,
elasticsearch_analyzers_kuromoji_search boolean DEFAULT false NOT NULL,
secret_detection_token_revocation_enabled boolean DEFAULT false NOT NULL, secret_detection_token_revocation_enabled boolean DEFAULT false NOT NULL,
secret_detection_token_revocation_url text, secret_detection_token_revocation_url text,
encrypted_secret_detection_token_revocation_token text, encrypted_secret_detection_token_revocation_token text,
......
...@@ -246,6 +246,29 @@ for filtering to work correctly. To do this run the Rake tasks `gitlab:elastic:r ...@@ -246,6 +246,29 @@ for filtering to work correctly. To do this run the Rake tasks `gitlab:elastic:r
`gitlab:elastic:clear_index_status`. Afterwards, removing a namespace or a project from the list will delete the data `gitlab:elastic:clear_index_status`. Afterwards, removing a namespace or a project from the list will delete the data
from the Elasticsearch index as expected. from the Elasticsearch index as expected.
## Enabling custom language analyzers
You can improve the language support for Chinese and Japanese languages by utilizing [smartcn](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.html) and/or [kuromoji](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html) analysis plugins from Elastic.
To enable language(s) support:
1. Install the desired plugin(s), please refer to [Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/plugins/7.9/installation.html) for plugins installation instructions. The plugin(s) must be installed on every node in the cluster, and each node must be restarted after installation. For a list of plugins, see the table later in this section.
1. Navigate to the **Admin Area** (wrench icon), then **Settings > General**..
1. Expand the **Advanced Search** section and locate **Custom analyzers: language support**.
1. Enable plugin(s) support for **Indexing**.
1. Click **Save changes** for the changes to take effect.
1. Trigger [Zero downtime reindexing](#zero-downtime-reindexing) or reindex everything from scratch to create a new index with updated mappings.
1. Enable plugin(s) support for **Searching** after the previous step is completed.
For guidance on what to install, see the following Elasticsearch language plugin options:
| Parameter | Description |
|-------------------------------------------------------|-------------|
| `Enable Chinese (smartcn) custom analyzer: Indexing` | Enables or disables Chinese language support using [smartcn](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.html) custom analyzer for newly created indices.|
| `Enable Chinese (smartcn) custom analyzer: Search` | Enables or disables using [smartcn](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.html) fields for Advanced Search. Please only enable this after [installing the plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.html), enabling custom analyzer indexing and recreating the index.|
| `Enable Japanese (kuromoji) custom analyzer: Indexing` | Enables or disables Japanese language support using [kuromoji](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html) custom analyzer for newly created indices.|
| `Enable Japanese (kuromoji) custom analyzer: Search` | Enables or disables using [kuromoji](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html) fields for Advanced Search. Please only enable this after [installing the plugin](https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html), enabling custom analyzer indexing and recreating the index.|
## Disabling Advanced Search ## Disabling Advanced Search
To disable the Elasticsearch integration: To disable the Elasticsearch integration:
......
...@@ -40,6 +40,10 @@ module EE ...@@ -40,6 +40,10 @@ module EE
:elasticsearch_namespace_ids, :elasticsearch_namespace_ids,
:elasticsearch_project_ids, :elasticsearch_project_ids,
:elasticsearch_client_request_timeout, :elasticsearch_client_request_timeout,
:elasticsearch_analyzers_smartcn_enabled,
:elasticsearch_analyzers_smartcn_search,
:elasticsearch_analyzers_kuromoji_enabled,
:elasticsearch_analyzers_kuromoji_search,
:enforce_namespace_storage_limit, :enforce_namespace_storage_limit,
:geo_status_timeout, :geo_status_timeout,
:geo_node_allowed_ips, :geo_node_allowed_ips,
......
...@@ -119,6 +119,10 @@ module EE ...@@ -119,6 +119,10 @@ module EE
elasticsearch_shards: 5, elasticsearch_shards: 5,
elasticsearch_url: ENV['ELASTIC_URL'] || 'http://localhost:9200', elasticsearch_url: ENV['ELASTIC_URL'] || 'http://localhost:9200',
elasticsearch_client_request_timeout: 0, elasticsearch_client_request_timeout: 0,
elasticsearch_analyzers_smartcn_enabled: false,
elasticsearch_analyzers_smartcn_search: false,
elasticsearch_analyzers_kuromoji_enabled: false,
elasticsearch_analyzers_kuromoji_search: false,
email_additional_text: nil, email_additional_text: nil,
enforce_namespace_storage_limit: false, enforce_namespace_storage_limit: false,
enforce_pat_expiration: true, enforce_pat_expiration: true,
......
...@@ -148,6 +148,41 @@ ...@@ -148,6 +148,41 @@
- else - else
= f.text_field :elasticsearch_project_ids, class: 'js-elasticsearch-projects', value: elasticsearch_project_ids, data: { selected: elasticsearch_objects_options(@application_setting.elasticsearch_limited_projects(true)).to_json } = f.text_field :elasticsearch_project_ids, class: 'js-elasticsearch-projects', value: elasticsearch_project_ids, data: { selected: elasticsearch_objects_options(@application_setting.elasticsearch_limited_projects(true)).to_json }
.sub-section
%h4= _('Custom analyzers: language support')
%h5
= _('Chinese language support using')
%a{ href: 'https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-smartcn.html' }
= _('smartcn custom analyzer')
.form-group
.form-check
= f.check_box :elasticsearch_analyzers_smartcn_enabled, class: 'form-check-input'
= f.label :elasticsearch_analyzers_smartcn_enabled, class: 'form-check-label' do
= _('Enable smartcn custom analyzer: Indexing')
.form-group
.form-check
= f.check_box :elasticsearch_analyzers_smartcn_search, class: 'form-check-input', disabled: !Gitlab::CurrentSettings.elasticsearch_analyzers_smartcn_enabled?
= f.label :elasticsearch_analyzers_smartcn_search, class: 'form-check-label' do
= _('Enable smartcn custom analyzer: Search')
.form-text.gl-text-gray-600
= _('Please only enable search after installing the plugin, enabling indexing and recreating the index')
%h5
= _('Japanese language support using')
%a{ href: 'https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-kuromoji.html' }
= _('kuromoji custom analyzer')
.form-group
.form-check
= f.check_box :elasticsearch_analyzers_kuromoji_enabled, class: 'form-check-input'
= f.label :elasticsearch_analyzers_kuromoji_enabled, class: 'form-check-label' do
= _('Enable kuromoji custom analyzer: Indexing')
.form-group
.form-check
= f.check_box :elasticsearch_analyzers_kuromoji_search, class: 'form-check-input', disabled: !Gitlab::CurrentSettings.elasticsearch_analyzers_kuromoji_enabled?
= f.label :elasticsearch_analyzers_kuromoji_search, class: 'form-check-label' do
= _('Enable kuromoji custom analyzer: Search')
.form-text.gl-text-gray-600
= _('Please only enable search after installing the plugin, enabling indexing and recreating the index')
.sub-section .sub-section
%h4= _('Elasticsearch AWS IAM credentials') %h4= _('Elasticsearch AWS IAM credentials')
.form-group .form-group
......
---
title: 'Advanced Search: Add optional Chinese and Japanese languages support'
merge_request: 45513
author:
type: added
...@@ -55,6 +55,8 @@ module Elastic ...@@ -55,6 +55,8 @@ module Elastic
end end
def basic_query_hash(fields, query) def basic_query_hash(fields, query)
fields = CustomLanguageAnalyzers.add_custom_analyzers_fields(fields)
query_hash = query_hash =
if query.present? if query.present?
{ {
......
# frozen_string_literal: true
module Elastic
module Latest
module CustomLanguageAnalyzers
class << self
SUPPORTED_FIELDS = %i{title description}.freeze
def custom_analyzers_mappings(type: :text)
hash = { doc: { properties: {} } }
SUPPORTED_FIELDS.each do |field|
hash[:doc][:properties][field] = {
fields: custom_analyzers_fields(type: type)
}
end
hash
end
def custom_analyzers_fields(type:)
custom_analyzers_enabled.each_with_object({}) do |analyzer, hash|
hash[analyzer.to_sym] = {
analyzer: analyzer,
type: type
}
end
end
def add_custom_analyzers_fields(fields)
search_analyzers = custom_analyzers_search
return fields if search_analyzers.blank?
fields_names = fields.map { |m| m[/\w+/] }
SUPPORTED_FIELDS.each do |field|
next unless fields_names.include?(field.to_s)
search_analyzers.each do |analyzer|
fields << "#{field}.#{analyzer}"
end
end
fields
end
private
def custom_analyzers_enabled
[].tap do |enabled|
enabled << 'smartcn' if ::Gitlab::CurrentSettings.elasticsearch_analyzers_smartcn_enabled
enabled << 'kuromoji' if ::Gitlab::CurrentSettings.elasticsearch_analyzers_kuromoji_enabled
end
end
def custom_analyzers_search
enabled_analyzers = custom_analyzers_enabled
[].tap do |analyzers|
analyzers << 'smartcn' if enabled_analyzers.include?('smartcn') && ::Gitlab::CurrentSettings.elasticsearch_analyzers_smartcn_search
analyzers << 'kuromoji' if enabled_analyzers.include?('kuromoji') && ::Gitlab::CurrentSettings.elasticsearch_analyzers_kuromoji_search
end
end
end
end
end
end
...@@ -3,6 +3,17 @@ ...@@ -3,6 +3,17 @@
module Gitlab module Gitlab
module Elastic module Elastic
class Helper class Helper
ES_ENABLED_CLASSES = [
Project,
Issue,
MergeRequest,
Snippet,
Note,
Milestone,
ProjectWiki,
Repository
].freeze
attr_reader :version, :client attr_reader :version, :client
attr_accessor :target_name attr_accessor :target_name
...@@ -28,6 +39,19 @@ module Gitlab ...@@ -28,6 +39,19 @@ module Gitlab
end end
end end
def default_settings
ES_ENABLED_CLASSES.inject({}) do |settings, klass|
settings.deep_merge(klass.__elasticsearch__.settings.to_hash)
end
end
def default_mappings
mappings = ES_ENABLED_CLASSES.inject({}) do |m, klass|
m.deep_merge(klass.__elasticsearch__.mappings.to_hash)
end
mappings.deep_merge(::Elastic::Latest::CustomLanguageAnalyzers.custom_analyzers_mappings)
end
def create_empty_index(with_alias: true, options: {}) def create_empty_index(with_alias: true, options: {})
new_index_name = options[:index_name] || "#{target_name}-#{Time.now.strftime("%Y%m%d-%H%M")}" new_index_name = options[:index_name] || "#{target_name}-#{Time.now.strftime("%Y%m%d-%H%M")}"
...@@ -35,24 +59,10 @@ module Gitlab ...@@ -35,24 +59,10 @@ module Gitlab
raise "Index under '#{with_alias ? target_name : new_index_name}' already exists, use `recreate_index` to recreate it." raise "Index under '#{with_alias ? target_name : new_index_name}' already exists, use `recreate_index` to recreate it."
end end
settings = {} settings = default_settings
mappings = {}
[
Project,
Issue,
MergeRequest,
Snippet,
Note,
Milestone,
ProjectWiki,
Repository
].each do |klass|
settings.deep_merge!(klass.__elasticsearch__.settings.to_hash)
mappings.deep_merge!(klass.__elasticsearch__.mappings.to_hash)
end
settings.merge!(options[:settings]) if options[:settings] settings.merge!(options[:settings]) if options[:settings]
mappings = default_mappings
mappings.merge!(options[:mappings]) if options[:mappings] mappings.merge!(options[:mappings]) if options[:mappings]
create_index_options = { create_index_options = {
......
...@@ -39,6 +39,20 @@ RSpec.describe Gitlab::Elastic::Helper do ...@@ -39,6 +39,20 @@ RSpec.describe Gitlab::Elastic::Helper do
end end
end end
describe '#default_mappings' do
context 'custom analyzers' do
let(:custom_analyzers_mappings) { { doc: { properties: { title: { fields: { custom: true } } } } } }
before do
allow(::Elastic::Latest::CustomLanguageAnalyzers).to receive(:custom_analyzers_mappings).and_return(custom_analyzers_mappings)
end
it 'merges custom language analyzers mappings' do
expect(helper.default_mappings[:doc][:properties][:title]).to include(custom_analyzers_mappings[:doc][:properties][:title])
end
end
end
describe '#create_empty_index' do describe '#create_empty_index' do
context 'with an empty cluster' do context 'with an empty cluster' do
context 'with alias and index' do context 'with alias and index' do
......
# frozen_string_literal: true
require 'fast_spec_helper'
require 'rspec-parameterized'
RSpec.describe Elastic::Latest::CustomLanguageAnalyzers do
describe '.custom_analyzers_mappings' do
before do
allow(::Gitlab::CurrentSettings).to receive(:elasticsearch_analyzers_smartcn_enabled).and_return(true)
allow(::Gitlab::CurrentSettings).to receive(:elasticsearch_analyzers_kuromoji_enabled).and_return(true)
end
it 'returns correct structure' do
expect(described_class.custom_analyzers_mappings).to eq(
{
doc: {
properties: {
title: {
fields: described_class.custom_analyzers_fields(type: :text)
},
description: {
fields: described_class.custom_analyzers_fields(type: :text)
}
}
}
}
)
end
end
describe '.custom_analyzers_fields' do
using RSpec::Parameterized::TableSyntax
where(:smartcn_enabled, :kuromoji_enabled, :expected_result) do
false | false | {}
true | false | { smartcn: { analyzer: 'smartcn', type: :text } }
false | true | { kuromoji: { analyzer: 'kuromoji', type: :text } }
true | true | { smartcn: { analyzer: 'smartcn', type: :text }, kuromoji: { analyzer: 'kuromoji', type: :text } }
end
with_them do
before do
allow(::Gitlab::CurrentSettings).to receive(:elasticsearch_analyzers_smartcn_enabled).and_return(smartcn_enabled)
allow(::Gitlab::CurrentSettings).to receive(:elasticsearch_analyzers_kuromoji_enabled).and_return(kuromoji_enabled)
end
it 'returns correct config' do
expect(described_class.custom_analyzers_fields(type: :text)).to eq(expected_result)
end
end
end
describe '.add_custom_analyzers_fields' do
using RSpec::Parameterized::TableSyntax
let!(:original_fields) { %w(title^2 confidential).freeze }
where(:smartcn_enabled, :kuromoji_enabled, :smartcn_search, :kuromoji_search, :expected_additional_fields) do
false | false | false | false | []
false | false | true | true | []
true | true | false | false | []
true | true | true | false | %w(title.smartcn)
true | true | false | true | %w(title.kuromoji)
true | true | true | true | %w(title.smartcn title.kuromoji)
end
with_them do
before do
allow(::Gitlab::CurrentSettings).to receive(:elasticsearch_analyzers_smartcn_enabled).and_return(smartcn_enabled)
allow(::Gitlab::CurrentSettings).to receive(:elasticsearch_analyzers_kuromoji_enabled).and_return(kuromoji_enabled)
allow(::Gitlab::CurrentSettings).to receive(:elasticsearch_analyzers_smartcn_search).and_return(smartcn_search)
allow(::Gitlab::CurrentSettings).to receive(:elasticsearch_analyzers_kuromoji_search).and_return(kuromoji_search)
end
it 'returns correct fields' do
expect(described_class.add_custom_analyzers_fields(original_fields.dup)).to eq(original_fields + expected_additional_fields)
end
end
end
end
...@@ -5164,6 +5164,9 @@ msgstr "" ...@@ -5164,6 +5164,9 @@ msgstr ""
msgid "Child epic doesn't exist." msgid "Child epic doesn't exist."
msgstr "" msgstr ""
msgid "Chinese language support using"
msgstr ""
msgid "Choose %{strong_open}Create archive%{strong_close} and wait for archiving to complete." msgid "Choose %{strong_open}Create archive%{strong_close} and wait for archiving to complete."
msgstr "" msgstr ""
...@@ -7910,6 +7913,9 @@ msgstr "" ...@@ -7910,6 +7913,9 @@ msgstr ""
msgid "Custom Git clone URL for HTTP(S)" msgid "Custom Git clone URL for HTTP(S)"
msgstr "" msgstr ""
msgid "Custom analyzers: language support"
msgstr ""
msgid "Custom hostname (for private commit emails)" msgid "Custom hostname (for private commit emails)"
msgstr "" msgstr ""
...@@ -9924,6 +9930,12 @@ msgstr "" ...@@ -9924,6 +9930,12 @@ msgstr ""
msgid "Enable integration" msgid "Enable integration"
msgstr "" msgstr ""
msgid "Enable kuromoji custom analyzer: Indexing"
msgstr ""
msgid "Enable kuromoji custom analyzer: Search"
msgstr ""
msgid "Enable maintenance mode" msgid "Enable maintenance mode"
msgstr "" msgstr ""
...@@ -9960,6 +9972,12 @@ msgstr "" ...@@ -9960,6 +9972,12 @@ msgstr ""
msgid "Enable shared runners for this group" msgid "Enable shared runners for this group"
msgstr "" msgstr ""
msgid "Enable smartcn custom analyzer: Indexing"
msgstr ""
msgid "Enable smartcn custom analyzer: Search"
msgstr ""
msgid "Enable snowplow tracking" msgid "Enable snowplow tracking"
msgstr "" msgstr ""
...@@ -14892,6 +14910,9 @@ msgstr "" ...@@ -14892,6 +14910,9 @@ msgstr ""
msgid "January" msgid "January"
msgstr "" msgstr ""
msgid "Japanese language support using"
msgstr ""
msgid "Jira Issues" msgid "Jira Issues"
msgstr "" msgstr ""
...@@ -19795,6 +19816,9 @@ msgstr "" ...@@ -19795,6 +19816,9 @@ msgstr ""
msgid "Please note that this application is not provided by GitLab and you should verify its authenticity before allowing access." msgid "Please note that this application is not provided by GitLab and you should verify its authenticity before allowing access."
msgstr "" msgstr ""
msgid "Please only enable search after installing the plugin, enabling indexing and recreating the index"
msgstr ""
msgid "Please provide a name" msgid "Please provide a name"
msgstr "" msgstr ""
...@@ -31589,6 +31613,9 @@ msgstr "" ...@@ -31589,6 +31613,9 @@ msgstr ""
msgid "jigsaw is not defined" msgid "jigsaw is not defined"
msgstr "" msgstr ""
msgid "kuromoji custom analyzer"
msgstr ""
msgid "last commit:" msgid "last commit:"
msgstr "" msgstr ""
...@@ -32216,6 +32243,9 @@ msgstr "" ...@@ -32216,6 +32243,9 @@ msgstr ""
msgid "sign in" msgid "sign in"
msgstr "" msgstr ""
msgid "smartcn custom analyzer"
msgstr ""
msgid "sort:" msgid "sort:"
msgstr "" msgstr ""
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment