Commit be796903 authored by Nick Thomas's avatar Nick Thomas

Merge branch '328-versioned-search' into 'master'

Elasticsearch versioned schema for Snippet

See merge request gitlab-org/gitlab-ee!14428
parents 229e99f0 e0e399c4
......@@ -3,5 +3,3 @@
class PersonalSnippet < Snippet
include WithUploads
end
PersonalSnippet.prepend_if_ee('EE::PersonalSnippet')
......@@ -12,5 +12,3 @@ class ProjectSnippet < Snippet
participant :author
participant :notes_with_associations
end
ProjectSnippet.prepend_if_ee('EE::ProjectSnippet')
......@@ -5,46 +5,42 @@
require 'gitlab/current_settings'
Gitlab.ee do
require 'elasticsearch/model'
### Modified from elasticsearch-model/lib/elasticsearch/model.rb
[
Elasticsearch::Model::Client::ClassMethods,
Elasticsearch::Model::Naming::ClassMethods,
Elasticsearch::Model::Indexing::ClassMethods,
Elasticsearch::Model::Searching::ClassMethods
].each do |mod|
Elasticsearch::Model::Proxy::ClassMethodsProxy.include mod
end
[
Elasticsearch::Model::Client::InstanceMethods,
Elasticsearch::Model::Naming::InstanceMethods,
Elasticsearch::Model::Indexing::InstanceMethods,
Elasticsearch::Model::Serializing::InstanceMethods
].each do |mod|
Elasticsearch::Model::Proxy::InstanceMethodsProxy.include mod
end
Elasticsearch::Model::Proxy::InstanceMethodsProxy.class_eval <<-CODE, __FILE__, __LINE__ + 1
def as_indexed_json(options={})
target.respond_to?(:as_indexed_json) ? target.__send__(:as_indexed_json, options) : super
end
CODE
### Monkey patches
Elasticsearch::Model::Response::Records.prepend GemExtensions::Elasticsearch::Model::Response::Records
Elasticsearch::Model::Adapter::Multiple::Records.prepend GemExtensions::Elasticsearch::Model::Adapter::Multiple::Records
Elasticsearch::Model::Indexing::InstanceMethods.prepend GemExtensions::Elasticsearch::Model::Indexing::InstanceMethods
module Elasticsearch
module Model
module Client
# This mutex is only used to synchronize *creation* of a new client, so
# all including classes can share the same client instance
CLIENT_MUTEX = Mutex.new
cattr_accessor :cached_client
cattr_accessor :cached_config
module ClassMethods
# Override the default ::Elasticsearch::Model::Client implementation to
# return a client configured from application settings. All including
# classes will use the same instance, which is refreshed automatically
# if the settings change.
#
# _client is present to match the arity of the overridden method, where
# it is also not used.
#
# @return [Elasticsearch::Transport::Client]
def client(_client = nil)
store = ::Elasticsearch::Model::Client
store::CLIENT_MUTEX.synchronize do
config = Gitlab::CurrentSettings.elasticsearch_config
if store.cached_client.nil? || config != store.cached_config
store.cached_client = ::Gitlab::Elastic::Client.build(config)
store.cached_config = config
end
end
store.cached_client
end
end
end
end
end
Elasticsearch::Model::Adapter::ActiveRecord::Importing.prepend GemExtensions::Elasticsearch::Model::Adapter::ActiveRecord::Importing
Elasticsearch::Model::Client::InstanceMethods.prepend GemExtensions::Elasticsearch::Model::Client
Elasticsearch::Model::Client::ClassMethods.prepend GemExtensions::Elasticsearch::Model::Client
Elasticsearch::Model::ClassMethods.prepend GemExtensions::Elasticsearch::Model::Client
Elasticsearch::Model.singleton_class.prepend GemExtensions::Elasticsearch::Model::Client
end
......@@ -148,6 +148,36 @@ Uses an [Edge NGram token filter](https://www.elastic.co/guide/en/elasticsearch/
- Searches can have their own analyzers. Remember to check when editing analyzers
- `Character` filters (as opposed to token filters) always replace the original character, so they're not a good choice as they can hinder exact searches
## Architecture
GitLab uses `elasticsearch-rails` for handling communication with Elasticsearch server. However, in order to achieve zero-downtime deployment during schema changes, an extra abstraction layer is built to allow:
* Indexing (writes) to multiple indexes, with different mappings
* Switching to different index for searches (reads) on the fly
Currently we are on the process of migrating models to this new design (e.g. `Snippet`), and it is hardwired to work with a single version for now.
Traditionally, `elasticsearch-rails` provides class and instance level `__elasticsearch__` proxy methods. If you call `Issue.__elasticsearch__`, you will get an instance of `Elasticsearch::Model::Proxy::ClassMethodsProxy`, and if you call `Issue.first.__elasticsearch__`, you will get an instance of `Elasticsearch::Model::Proxy::InstanceMethodsProxy`. These proxy objects would talk to Elasticsearch server directly.
In the new design, `__elasticsearch__` instead represents one extra layer of proxy. It would keep multiple versions of the actual proxy objects, and it would forward read and write calls to the proxy of the intended version.
The `elasticsearch-rails`'s way of specifying each model's mappings and other settings is to create a module for the model to include. However in the new design, each model would have its own corresponding subclassed proxy object, where the settings reside in. For example, snippet related setting in the past reside in `SnippetsSearch` module, but in the new design would reside in `SnippetClassProxy` (which is a subclass of `Elasticsearch::Model::Proxy::ClassMethodsProxy`). This reduces namespace pollution in model classes.
The global configurations per version are now in the `Elastic::(Version)::Config` class. You can change mappings there.
### Creating new version of schema
Currently GitLab would still work with a single version of setting. Once it is implemented, multiple versions of setting can exists in different folders (e.g. `ee/lib/elastic/v12p1` and `ee/lib/elastic/v12p3`). To keep a continuous git history, the latest version lives under the `/latest` folder, but is aliased as the latest version.
If the current version is `v12p1`, and we need to create a new version for `v12p3`, the steps are as follows:
1. Copy the entire folder of `v12p1` as `v12p3`
1. Change the namespace for files under `v12p3` folder from `V12p1` to `V12p3` (which are still aliased to `Latest`)
1. Delete `v12p1` folder
1. Copy the entire folder of `latest` as `v12p1`
1. Change the namespace for files under `v12p1` folder from `Latest` to `V12p1`
1. Make changes to `Latest` as needed
## Troubleshooting
### Getting `flood stage disk watermark [95%] exceeded`
......
......@@ -26,192 +26,10 @@ module Elastic
# ES6 requires a single type per index
document_type 'doc'
settings \
index: {
number_of_shards: AsJSON.new { Gitlab::CurrentSettings.elasticsearch_shards },
number_of_replicas: AsJSON.new { Gitlab::CurrentSettings.elasticsearch_replicas },
codec: 'best_compression',
analysis: {
analyzer: {
default: {
tokenizer: 'standard',
filter: %w(standard lowercase my_stemmer)
},
my_ngram_analyzer: {
tokenizer: 'my_ngram_tokenizer',
filter: ['lowercase']
}
},
filter: {
my_stemmer: {
type: 'stemmer',
name: 'light_english'
}
},
tokenizer: {
my_ngram_tokenizer: {
type: 'nGram',
min_gram: 2,
max_gram: 3,
token_chars: %w(letter digit)
}
}
}
}
# Since we can't have multiple types in ES6, but want to be able to use JOINs, we must declare all our
# fields together instead of per model
mappings dynamic: 'strict' do
### Shared fields
indexes :id, type: :integer
indexes :created_at, type: :date
indexes :updated_at, type: :date
# ES6-compatible way of having a parent, this is shared with all
# Please note that if we add a parent to `project` we'll have to use that "grand-parent" as the routing value
# for all children of project - therefore it is not advised.
indexes :join_field, type: :join,
relations: {
project: %i(
issue
merge_request
milestone
note
blob
wiki_blob
commit
)
}
# ES6 requires a single type per index, so we implement our own "type"
indexes :type, type: :keyword
indexes :iid, type: :integer
indexes :title, type: :text,
index_options: 'offsets'
indexes :description, type: :text,
index_options: 'offsets'
indexes :state, type: :text
indexes :project_id, type: :integer
indexes :author_id, type: :integer
## Projects and Snippets
indexes :visibility_level, type: :integer
### ISSUES
indexes :confidential, type: :boolean
# The field assignee_id does not exist in issues table anymore.
# Nevertheless we'll keep this field as is because we don't want users to rebuild index
# + the ES treats arrays transparently so
# to any integer field you can write any array of integers and you don't have to change mapping.
# More over you can query those items just like a single integer value.
indexes :assignee_id, type: :integer
### MERGE REQUESTS
indexes :target_branch, type: :text,
index_options: 'offsets'
indexes :source_branch, type: :text,
index_options: 'offsets'
indexes :merge_status, type: :text
indexes :source_project_id, type: :integer
indexes :target_project_id, type: :integer
### NOTES
indexes :note, type: :text,
index_options: 'offsets'
indexes :issue do
indexes :assignee_id, type: :integer
indexes :author_id, type: :integer
indexes :confidential, type: :boolean
end
# ES6 gets rid of "index: :not_analyzed" option, but a keyword type behaves the same
# as it is not analyzed and is only searchable by its exact value.
indexes :noteable_type, type: :keyword
indexes :noteable_id, type: :keyword
### PROJECTS
indexes :name, type: :text,
index_options: 'offsets'
indexes :path, type: :text,
index_options: 'offsets'
indexes :name_with_namespace, type: :text,
index_options: 'offsets',
analyzer: :my_ngram_analyzer
indexes :path_with_namespace, type: :text,
index_options: 'offsets'
indexes :namespace_id, type: :integer
indexes :archived, type: :boolean
indexes :issues_access_level, type: :integer
indexes :merge_requests_access_level, type: :integer
indexes :snippets_access_level, type: :integer
indexes :wiki_access_level, type: :integer
indexes :repository_access_level, type: :integer
indexes :last_activity_at, type: :date
indexes :last_pushed_at, type: :date
### SNIPPETS
indexes :file_name, type: :text,
index_options: 'offsets'
indexes :content, type: :text,
index_options: 'offsets'
### REPOSITORIES
indexes :blob do
indexes :type, type: :keyword
indexes :id, type: :text,
index_options: 'offsets',
analyzer: :sha_analyzer
indexes :rid, type: :keyword
indexes :oid, type: :text,
index_options: 'offsets',
analyzer: :sha_analyzer
indexes :commit_sha, type: :text,
index_options: 'offsets',
analyzer: :sha_analyzer
indexes :path, type: :text,
analyzer: :path_analyzer
indexes :file_name, type: :text,
analyzer: :code_analyzer,
search_analyzer: :code_search_analyzer
indexes :content, type: :text,
index_options: 'offsets',
analyzer: :code_analyzer,
search_analyzer: :code_search_analyzer
indexes :language, type: :keyword
end
indexes :commit do
indexes :type, type: :keyword
indexes :id, type: :text,
index_options: 'offsets',
analyzer: :sha_analyzer
indexes :rid, type: :keyword
indexes :sha, type: :text,
index_options: 'offsets',
analyzer: :sha_analyzer
indexes :author do
indexes :name, type: :text, index_options: 'offsets'
indexes :email, type: :text, index_options: 'offsets'
indexes :time, type: :date, format: :basic_date_time_no_millis
end
indexes :committer do
indexes :name, type: :text, index_options: 'offsets'
indexes :email, type: :text, index_options: 'offsets'
indexes :time, type: :date, format: :basic_date_time_no_millis
end
indexes :message, type: :text, index_options: 'offsets'
end
end
# A temp solution to keep only one copy of setting,
# will be removed in https://gitlab.com/gitlab-org/gitlab-ee/issues/12548
__elasticsearch__.instance_variable_set(:@settings, Elastic::Latest::Config.settings)
__elasticsearch__.instance_variable_set(:@mapping, Elastic::Latest::Config.mappings)
after_commit on: :create do
if Gitlab::CurrentSettings.elasticsearch_indexing? && self.searchable?
......
# frozen_string_literal: true
module Elastic
module ApplicationVersionedSearch
extend ActiveSupport::Concern
FORWARDABLE_INSTANCE_METHODS = [:es_id, :es_parent].freeze
FORWARDABLE_CLASS_METHODS = [:elastic_search, :es_import, :nested?, :es_type, :index_name, :document_type, :mapping, :mappings, :settings, :import].freeze
def __elasticsearch__(&block)
@__elasticsearch__ ||= ::Elastic::MultiVersionInstanceProxy.new(self)
end
# Should be overridden in the models where some records should be skipped
def searchable?
self.use_elasticsearch?
end
def use_elasticsearch?
self.project&.use_elasticsearch?
end
def es_type
self.class.es_type
end
included do
delegate(*FORWARDABLE_INSTANCE_METHODS, to: :__elasticsearch__)
class << self
delegate(*FORWARDABLE_CLASS_METHODS, to: :__elasticsearch__)
end
# Add to the registry if it's a class (and not in intermediate module)
Elasticsearch::Model::Registry.add(self) if self.is_a?(Class)
after_commit on: :create do
if Gitlab::CurrentSettings.elasticsearch_indexing? && self.searchable?
ElasticIndexerWorker.perform_async(:index, self.class.to_s, self.id, self.es_id)
end
end
after_commit on: :update do
if Gitlab::CurrentSettings.elasticsearch_indexing? && self.searchable?
ElasticIndexerWorker.perform_async(
:update,
self.class.to_s,
self.id,
self.es_id,
changed_fields: self.previous_changes.keys
)
end
end
after_commit on: :destroy do
if Gitlab::CurrentSettings.elasticsearch_indexing? && self.searchable?
ElasticIndexerWorker.perform_async(
:delete,
self.class.to_s,
self.id,
self.es_id,
es_parent: self.es_parent
)
end
end
end
class_methods do
def __elasticsearch__
@__elasticsearch__ ||= ::Elastic::MultiVersionClassProxy.new(self)
end
end
end
end
......@@ -4,92 +4,17 @@ module Elastic
module SnippetsSearch
extend ActiveSupport::Concern
included do
include ApplicationSearch
def as_indexed_json(options = {})
# We don't use as_json(only: ...) because it calls all virtual and serialized attributtes
# https://gitlab.com/gitlab-org/gitlab-ee/issues/349
data = {}
[
:id,
:title,
:file_name,
:content,
:created_at,
:updated_at,
:project_id,
:author_id,
:visibility_level
].each do |attr|
data[attr.to_s] = safely_read_attribute_for_elasticsearch(attr)
end
# ES6 is now single-type per index, so we implement our own typing
data['type'] = es_type
data
end
def use_elasticsearch?
# FIXME: check project.use_elasticsearch? for ProjectSnippets?
# see https://gitlab.com/gitlab-org/gitlab-ee/issues/11850
::Gitlab::CurrentSettings.elasticsearch_indexing?
end
def self.elastic_search(query, options: {})
query_hash = basic_query_hash(%w(title file_name), query)
query_hash = filter(query_hash, options[:user])
include ApplicationVersionedSearch
self.__elasticsearch__.search(query_hash)
end
def self.elastic_search_code(query, options: {})
query_hash = basic_query_hash(%w(content), query)
query_hash = filter(query_hash, options[:user])
self.__elasticsearch__.search(query_hash)
end
def self.filter(query_hash, user)
return query_hash if user && user.full_private_access?
filter = if user
{
bool: {
should: [
{ term: { author_id: user.id } },
{ terms: { project_id: authorized_project_ids_for_user(user) } },
{
bool: {
filter: { terms: { visibility_level: [Snippet::PUBLIC, Snippet::INTERNAL] } },
must_not: { exists: { field: 'project_id' } }
}
}
]
}
}
else
{
bool: {
filter: { term: { visibility_level: Snippet::PUBLIC } },
must_not: { exists: { field: 'project_id' } }
}
}
end
query_hash[:query][:bool][:filter] = filter
query_hash
end
def use_elasticsearch?
# FIXME: check project.use_elasticsearch? for ProjectSnippets?
# see https://gitlab.com/gitlab-org/gitlab-ee/issues/11850
::Gitlab::CurrentSettings.elasticsearch_indexing?
end
def self.authorized_project_ids_for_user(user)
if Ability.allowed?(user, :read_cross_project)
user.authorized_projects.pluck(:id)
else
[]
end
included do
class << self
delegate :elastic_search_code, to: :__elasticsearch__
end
end
end
......
# frozen_string_literal: true
module EE
module PersonalSnippet
extend ActiveSupport::Concern
prepended do
include Elastic::SnippetsSearch
end
end
end
# frozen_string_literal: true
module EE
module ProjectSnippet
extend ActiveSupport::Concern
prepended do
include Elastic::SnippetsSearch
end
end
end
# frozen_string_literal: true
# Defer evaluation from class-definition time to index-creation time
module Elastic
class AsJSON
def initialize(&blk)
@blk = blk
end
def call
@blk.call
end
def as_json(*args, &blk)
call
end
end
end
# frozen_string_literal: true
# Stores stable methods for ApplicationClassProxy
# which is unlikely to change from version to version.
module Elastic
module ClassProxyUtil
extend ActiveSupport::Concern
def initialize(target)
super(target)
config = version_namespace.const_get('Config')
@index_name = config.index_name
@document_type = config.document_type
@settings = config.settings
@mapping = config.mapping
end
### Multi-version utils
alias_method :real_class, :class
def version_namespace
self.class.parent
end
class_methods do
def methods_for_all_write_targets
%i(refresh_index!)
end
def methods_for_one_write_target
%i(import create_index! delete_index!)
end
end
end
end
# frozen_string_literal: true
# Stores stable methods for ApplicationInstanceProxy
# which is unlikely to change from version to version.
module Elastic
module InstanceProxyUtil
extend ActiveSupport::Concern
def initialize(target)
super(target)
config = version_namespace.const_get('Config')
@index_name = config.index_name
@document_type = config.document_type
end
### Multi-version utils
def real_class
self.singleton_class.superclass
end
def version_namespace
real_class.parent
end
class_methods do
def methods_for_all_write_targets
[:index_document, :delete_document, :update_document, :update_document_attributes]
end
def methods_for_one_write_target
[]
end
end
private
# Some attributes are actually complicated methods. Bad data can cause
# them to raise exceptions. When this happens, we still want the remainder
# of the object to be saved, so silently swallow the errors
def safely_read_attribute_for_elasticsearch(attr_name)
target.send(attr_name) # rubocop:disable GitlabSecurity/PublicSend
rescue => err
target.logger.warn("Elasticsearch failed to read #{attr_name} for #{target.class} #{target.id}: #{err}")
nil
end
end
end
# frozen_string_literal: true
module Elastic
module Latest
class ApplicationClassProxy < Elasticsearch::Model::Proxy::ClassMethodsProxy
include ClassProxyUtil
# Should be overridden for all nested models
def nested?
false
end
def es_type
target.name.underscore
end
def es_import(**options)
transform = lambda do |r|
proxy = r.__elasticsearch__.version(version_namespace)
{ index: { _id: proxy.es_id, data: proxy.as_indexed_json } }.tap do |data|
data[:index][:routing] = proxy.es_parent if proxy.es_parent
end
end
options[:transform] = transform
self.import(options)
end
private
def highlight_options(fields)
es_fields = fields.map { |field| field.split('^').first }.each_with_object({}) do |field, memo|
memo[field.to_sym] = {}
end
{ fields: es_fields }
end
def basic_query_hash(fields, query)
query_hash =
if query.present?
{
query: {
bool: {
must: [{
simple_query_string: {
fields: fields,
query: query,
default_operator: :and
}
}],
filter: [{
term: { type: self.es_type }
}]
}
}
}
else
{
query: {
bool: {
must: { match_all: {} }
}
},
track_scores: true
}
end
query_hash[:sort] = [
{ updated_at: { order: :desc } },
:_score
]
query_hash[:highlight] = highlight_options(fields)
query_hash
end
def iid_query_hash(iid)
{
query: {
bool: {
filter: [{ term: { iid: iid } }]
}
}
}
end
# Builds an elasticsearch query that will select child documents from a
# set of projects, taking user access rules into account.
def project_ids_filter(query_hash, options)
project_query = project_ids_query(
options[:current_user],
options[:project_ids],
options[:public_and_internal_projects],
options[:features]
)
query_hash[:query][:bool][:filter] ||= []
query_hash[:query][:bool][:filter] << {
has_parent: {
parent_type: "project",
query: {
bool: project_query
}
}
}
query_hash
end
# Builds an elasticsearch query that will select projects the user is
# granted access to.
#
# If a project feature(s) is specified, it indicates interest in child
# documents gated by that project feature - e.g., "issues". The feature's
# visibility level must be taken into account.
def project_ids_query(user, project_ids, public_and_internal_projects, features = nil)
# When reading cross project is not allowed, only allow searching a
# a single project, so the `:read_*` ability is only checked once.
unless Ability.allowed?(user, :read_cross_project)
project_ids = [] if project_ids.is_a?(Array) && project_ids.size > 1
end
# At least one condition must be present, so pick no projects for
# anonymous users.
# Pick private, internal and public projects the user is a member of.
# Pick all private projects for admins & auditors.
conditions = [pick_projects_by_membership(project_ids, features)]
if public_and_internal_projects
# Skip internal projects for anonymous and external users.
# Others are given access to all internal projects. Admins & auditors
# get access to internal projects where the feature is private.
conditions << pick_projects_by_visibility(Project::INTERNAL, user, features) if user && !user.external?
# All users, including anonymous, can access public projects.
# Admins & auditors get access to public projects where the feature is
# private.
conditions << pick_projects_by_visibility(Project::PUBLIC, user, features)
end
{ should: conditions }
end
# Most users come with a list of projects they are members of, which may
# be a mix of public, internal or private. Grant access to them all, as
# long as the project feature is not disabled.
#
# Admins & auditors are given access to all private projects. Access to
# internal or public projects where the project feature is private is not
# granted here.
def pick_projects_by_membership(project_ids, features = nil)
condition =
if project_ids == :any
{ term: { visibility_level: Project::PRIVATE } }
else
{ terms: { id: project_ids } }
end
limit_by_feature(condition, features, include_members_only: true)
end
# Grant access to projects of the specified visibility level to the user.
#
# If a project feature is specified, access is only granted if the feature
# is enabled or, for admins & auditors, private.
def pick_projects_by_visibility(visibility, user, features)
condition = { term: { visibility_level: visibility } }
limit_by_feature(condition, features, include_members_only: user&.full_private_access?)
end
# If a project feature(s) is specified, access is dependent on its visibility
# level being enabled (or private if `include_members_only: true`).
#
# This method is a no-op if no project feature is specified.
# It accepts an array of features or a single feature, when an array is provided
# it queries if any of the features is enabled.
#
# Always denies access to projects when the features are disabled - even to
# admins & auditors - as stale child documents may be present.
def limit_by_feature(condition, features, include_members_only:)
return condition unless features
features = Array(features)
features.map do |feature|
limit =
if include_members_only
{ terms: { "#{feature}_access_level" => [::ProjectFeature::ENABLED, ::ProjectFeature::PRIVATE] } }
else
{ term: { "#{feature}_access_level" => ::ProjectFeature::ENABLED } }
end
{ bool: { filter: [condition, limit] } }
end
end
end
end
end
# frozen_string_literal: true
module Elastic
module Latest
class ApplicationInstanceProxy < Elasticsearch::Model::Proxy::InstanceMethodsProxy
include InstanceProxyUtil
def es_parent
"project_#{target.project_id}" unless target.is_a?(Project) || target&.project_id.nil?
end
def es_type
self.class.es_type
end
def es_id
"#{es_type}_#{target.id}"
end
private
def generic_attributes
{
'join_field' => {
'name' => es_type,
'parent' => es_parent
},
'type' => es_type
}
end
end
end
end
# frozen_string_literal: true
module Elastic
module Latest
module Config
# To obtain settings and mappings methods
extend Elasticsearch::Model::Indexing::ClassMethods
extend Elasticsearch::Model::Naming::ClassMethods
self.index_name = [Rails.application.class.parent_name.downcase, Rails.env].join('-')
# ES6 requires a single type per index
self.document_type = 'doc'
settings \
index: {
number_of_shards: Elastic::AsJSON.new { Gitlab::CurrentSettings.elasticsearch_shards },
number_of_replicas: Elastic::AsJSON.new { Gitlab::CurrentSettings.elasticsearch_replicas },
codec: 'best_compression',
analysis: {
analyzer: {
default: {
tokenizer: 'standard',
filter: %w(standard lowercase my_stemmer)
},
my_ngram_analyzer: {
tokenizer: 'my_ngram_tokenizer',
filter: ['lowercase']
}
},
filter: {
my_stemmer: {
type: 'stemmer',
name: 'light_english'
}
},
tokenizer: {
my_ngram_tokenizer: {
type: 'nGram',
min_gram: 2,
max_gram: 3,
token_chars: %w(letter digit)
}
}
}
}
# Since we can't have multiple types in ES6, but want to be able to use JOINs, we must declare all our
# fields together instead of per model
mappings dynamic: 'strict' do
### Shared fields
indexes :id, type: :integer
indexes :created_at, type: :date
indexes :updated_at, type: :date
# ES6-compatible way of having a parent, this is shared with all
# Please note that if we add a parent to `project` we'll have to use that "grand-parent" as the routing value
# for all children of project - therefore it is not advised.
indexes :join_field, type: :join,
relations: {
project: %i(
issue
merge_request
milestone
note
blob
wiki_blob
commit
)
}
# ES6 requires a single type per index, so we implement our own "type"
indexes :type, type: :keyword
indexes :iid, type: :integer
indexes :title, type: :text,
index_options: 'offsets'
indexes :description, type: :text,
index_options: 'offsets'
indexes :state, type: :text
indexes :project_id, type: :integer
indexes :author_id, type: :integer
## Projects and Snippets
indexes :visibility_level, type: :integer
### ISSUES
indexes :confidential, type: :boolean
# The field assignee_id does not exist in issues table anymore.
# Nevertheless we'll keep this field as is because we don't want users to rebuild index
# + the ES treats arrays transparently so
# to any integer field you can write any array of integers and you don't have to change mapping.
# More over you can query those items just like a single integer value.
indexes :assignee_id, type: :integer
### MERGE REQUESTS
indexes :target_branch, type: :text,
index_options: 'offsets'
indexes :source_branch, type: :text,
index_options: 'offsets'
indexes :merge_status, type: :text
indexes :source_project_id, type: :integer
indexes :target_project_id, type: :integer
### NOTES
indexes :note, type: :text,
index_options: 'offsets'
indexes :issue do
indexes :assignee_id, type: :integer
indexes :author_id, type: :integer
indexes :confidential, type: :boolean
end
# ES6 gets rid of "index: :not_analyzed" option, but a keyword type behaves the same
# as it is not analyzed and is only searchable by its exact value.
indexes :noteable_type, type: :keyword
indexes :noteable_id, type: :keyword
### PROJECTS
indexes :name, type: :text,
index_options: 'offsets'
indexes :path, type: :text,
index_options: 'offsets'
indexes :name_with_namespace, type: :text,
index_options: 'offsets',
analyzer: :my_ngram_analyzer
indexes :path_with_namespace, type: :text,
index_options: 'offsets'
indexes :namespace_id, type: :integer
indexes :archived, type: :boolean
indexes :issues_access_level, type: :integer
indexes :merge_requests_access_level, type: :integer
indexes :snippets_access_level, type: :integer
indexes :wiki_access_level, type: :integer
indexes :repository_access_level, type: :integer
indexes :last_activity_at, type: :date
indexes :last_pushed_at, type: :date
### SNIPPETS
indexes :file_name, type: :text,
index_options: 'offsets'
indexes :content, type: :text,
index_options: 'offsets'
### REPOSITORIES
indexes :blob do
indexes :type, type: :keyword
indexes :id, type: :text,
index_options: 'offsets',
analyzer: :sha_analyzer
indexes :rid, type: :keyword
indexes :oid, type: :text,
index_options: 'offsets',
analyzer: :sha_analyzer
indexes :commit_sha, type: :text,
index_options: 'offsets',
analyzer: :sha_analyzer
indexes :path, type: :text,
analyzer: :path_analyzer
indexes :file_name, type: :text,
analyzer: :code_analyzer,
search_analyzer: :code_search_analyzer
indexes :content, type: :text,
index_options: 'offsets',
analyzer: :code_analyzer,
search_analyzer: :code_search_analyzer
indexes :language, type: :keyword
end
indexes :commit do
indexes :type, type: :keyword
indexes :id, type: :text,
index_options: 'offsets',
analyzer: :sha_analyzer
indexes :rid, type: :keyword
indexes :sha, type: :text,
index_options: 'offsets',
analyzer: :sha_analyzer
indexes :author do
indexes :name, type: :text, index_options: 'offsets'
indexes :email, type: :text, index_options: 'offsets'
indexes :time, type: :date, format: :basic_date_time_no_millis
end
indexes :committer do
indexes :name, type: :text, index_options: 'offsets'
indexes :email, type: :text, index_options: 'offsets'
indexes :time, type: :date, format: :basic_date_time_no_millis
end
indexes :message, type: :text, index_options: 'offsets'
end
end
end
end
end
# frozen_string_literal: true
module Elastic
module Latest
class SnippetClassProxy < ApplicationClassProxy
def elastic_search(query, options: {})
query_hash = basic_query_hash(%w(title file_name), query)
query_hash = filter(query_hash, options[:user])
search(query_hash)
end
def elastic_search_code(query, options: {})
query_hash = basic_query_hash(%w(content), query)
query_hash = filter(query_hash, options[:user])
search(query_hash)
end
def es_type
target.base_class.name.underscore
end
private
def filter(query_hash, user)
return query_hash if user && user.full_private_access?
filter =
if user
{
bool: {
should: [
{ term: { author_id: user.id } },
{ terms: { project_id: authorized_project_ids_for_user(user) } },
{
bool: {
filter: [
{ terms: { visibility_level: [Snippet::PUBLIC, Snippet::INTERNAL] } },
{ term: { type: self.es_type } }
],
must_not: { exists: { field: 'project_id' } }
}
}
]
}
}
else
{
bool: {
filter: [
{ term: { visibility_level: Snippet::PUBLIC } },
{ term: { type: self.es_type } }
],
must_not: { exists: { field: 'project_id' } }
}
}
end
query_hash[:query][:bool][:filter] = filter
query_hash
end
def authorized_project_ids_for_user(user)
if Ability.allowed?(user, :read_cross_project)
user.authorized_projects.pluck_primary_key
else
[]
end
end
end
end
end
# frozen_string_literal: true
module Elastic
module Latest
class SnippetInstanceProxy < ApplicationInstanceProxy
def as_indexed_json(options = {})
# We don't use as_json(only: ...) because it calls all virtual and serialized attributes
# https://gitlab.com/gitlab-org/gitlab-ee/issues/349
data = {}
[
:id,
:title,
:file_name,
:content,
:created_at,
:updated_at,
:project_id,
:author_id,
:visibility_level
].each do |attr|
data[attr.to_s] = safely_read_attribute_for_elasticsearch(attr)
end
# ES6 is now single-type per index, so we implement our own typing
data['type'] = es_type
data
end
end
end
end
# frozen_string_literal: true
# Agnostic proxy to decide which version of elastic_target to use based on method being reads or writes
module Elastic
class MultiVersionClassProxy
include MultiVersionUtil
def initialize(data_target)
@data_target = data_target
@data_class = get_data_class(data_target)
generate_forwarding
end
def version(version)
super.tap do |elastic_target|
elastic_target.extend Elasticsearch::Model::Importing::ClassMethods
elastic_target.extend Elasticsearch::Model::Adapter.from_class(@data_class).importing_mixin
end
end
def proxy_class_name
"#{@data_class.name}ClassProxy"
end
end
end
# frozen_string_literal: true
# Agnostic proxy to decide which version of elastic_target to use based on method being reads or writes
module Elastic
class MultiVersionInstanceProxy
include MultiVersionUtil
def initialize(data_target)
@data_target = data_target
@data_class = get_data_class(data_target.class)
generate_forwarding
end
def proxy_class_name
"#{@data_class.name}InstanceProxy"
end
end
end
# frozen_string_literal: true
module Elastic
module MultiVersionUtil
extend ActiveSupport::Concern
include Gitlab::Utils::StrongMemoize
attr_reader :data_class, :data_target
# TODO: remove once multi-version is functional https://gitlab.com/gitlab-org/gitlab-ee/issues/10156
TARGET_VERSION = 'V12p1'
# @params version [String, Module] can be a string "V12p1" or module (Elastic::V12p1)
def version(version)
version = Elastic.const_get(version) if version.is_a?(String)
version.const_get(proxy_class_name).new(data_target)
end
private
# TODO: load from db table https://gitlab.com/gitlab-org/gitlab-ee/issues/12555
def elastic_reading_target
strong_memoize(:elastic_reading_target) do
version(TARGET_VERSION)
end
end
# TODO: load from db table https://gitlab.com/gitlab-org/gitlab-ee/issues/12555
def elastic_writing_targets
strong_memoize(:elastic_writing_targets) do
[elastic_reading_target]
end
end
def get_data_class(klass)
klass < ActiveRecord::Base ? klass.base_class : klass
end
def generate_forwarding
methods_for_all_write_targets = elastic_writing_targets.first.real_class.methods_for_all_write_targets
methods_for_one_write_target = elastic_writing_targets.first.real_class.methods_for_one_write_target
methods_for_all_write_targets.each do |method|
self.class.forward_to_all_write_targets(method)
end
read_methods = elastic_reading_target.real_class.public_instance_methods
read_methods -= methods_for_all_write_targets
read_methods -= methods_for_one_write_target
read_methods -= self.class.instance_methods
read_methods.delete(:method_missing)
read_methods.each do |method|
self.class.forward_read_method(method)
end
end
class_methods do
def forward_read_method(method)
return if respond_to?(method)
delegate method, to: :elastic_reading_target
end
def forward_to_all_write_targets(method)
return if respond_to?(method)
define_method(method) do |*args|
responses = elastic_writing_targets.map do |elastic_target|
elastic_target.public_send(method, *args) # rubocop:disable GitlabSecurity/PublicSend
end
responses.find { |response| response['_shards']['successful'] == 0 } || responses.last
end
end
end
end
end
# frozen_string_literal: true
module Elastic
module V12p1
ApplicationClassProxy = Elastic::Latest::ApplicationClassProxy
end
end
# frozen_string_literal: true
module Elastic
module V12p1
ApplicationInstanceProxy = Elastic::Latest::ApplicationInstanceProxy
end
end
# frozen_string_literal: true
module Elastic
module V12p1
Config = Elastic::Latest::Config
end
end
# frozen_string_literal: true
module Elastic
module V12p1
SnippetClassProxy = Elastic::Latest::SnippetClassProxy
end
end
# frozen_string_literal: true
module Elastic
module V12p1
SnippetInstanceProxy = Elastic::Latest::SnippetInstanceProxy
end
end
# frozen_string_literal: true
module GemExtensions
module Elasticsearch
module Model
module Adapter
module ActiveRecord
module Importing
def __transform
lambda { |model| { index: { _id: model.id, data: model.__elasticsearch__.version(version_namespace).as_indexed_json } } }
end
end
end
end
end
end
end
# frozen_string_literal: true
# Override `__elasticsearch__.client` to
# return a client configured from application settings. All including
# classes will use the same instance, which is refreshed automatically
# if the settings change.
#
# _client is present to match the arity of the overridden method, where
# it is also not used.
module GemExtensions
module Elasticsearch
module Model
module Client
CLIENT_MUTEX = Mutex.new
cattr_accessor :cached_client
cattr_accessor :cached_config
def client(_client = nil)
store = ::GemExtensions::Elasticsearch::Model::Client
store::CLIENT_MUTEX.synchronize do
config = Gitlab::CurrentSettings.elasticsearch_config
if store.cached_client.nil? || config != store.cached_config
store.cached_client = ::Gitlab::Elastic::Client.build(config)
store.cached_config = config
end
end
store.cached_client
end
end
end
end
end
......@@ -19,8 +19,8 @@ module Gitlab
ProjectWiki,
Repository
].each do |klass|
settings.deep_merge!(klass.settings.to_hash)
mappings.deep_merge!(klass.mappings.to_hash)
settings.deep_merge!(klass.__elasticsearch__.settings.to_hash)
mappings.deep_merge!(klass.__elasticsearch__.mappings.to_hash)
end
client = Project.__elasticsearch__.client
......
# frozen_string_literal: true
require 'spec_helper'
describe Elastic::Latest::Config do
describe '.document_type' do
it 'returns config' do
expect(described_class.document_type).to eq('doc')
end
end
describe '.settings' do
it 'returns config' do
expect(described_class.settings).to be_a(Elasticsearch::Model::Indexing::Settings)
end
end
describe '.mappings' do
it 'returns config' do
expect(described_class.mapping).to be_a(Elasticsearch::Model::Indexing::Mappings)
end
end
end
# frozen_string_literal: true
require 'spec_helper'
describe Elastic::MultiVersionClassProxy do
subject { described_class.new(ProjectSnippet) }
describe '#version' do
it 'returns class proxy in specified version' do
result = subject.version('V12p1')
expect(result).to be_a(Elastic::V12p1::SnippetClassProxy)
expect(result.target).to eq(ProjectSnippet)
end
end
describe 'method forwarding' do
let(:old_target) { double(:old_target) }
let(:new_target) { double(:new_target) }
let(:response) do
{ "_index" => "gitlab-test", "_type" => "doc", "_id" => "snippet_1", "_version" => 3, "result" => "updated", "_shards" => { "total" => 2, "successful" => 1, "failed" => 0 }, "created" => false }
end
before do
allow(subject).to receive(:elastic_reading_target).and_return(old_target)
allow(subject).to receive(:elastic_writing_targets).and_return([old_target, new_target])
end
it 'forwards methods which should touch all write targets' do
Elastic::V12p1::SnippetClassProxy.methods_for_all_write_targets.each do |method|
expect(new_target).to receive(method).and_return(response)
expect(old_target).to receive(method).and_return(response)
subject.public_send(method)
end
end
it 'forwards read methods to only reading target' do
expect(old_target).to receive(:search)
expect(new_target).not_to receive(:search)
subject.search
expect(subject).not_to respond_to(:method_missing)
end
it 'does not forward write methods which should touch specific version' do
Elastic::V12p1::SnippetClassProxy.methods_for_one_write_target.each do |method|
expect(subject).not_to respond_to(method)
end
end
end
end
# frozen_string_literal: true
require 'spec_helper'
describe Elastic::MultiVersionInstanceProxy do
let(:snippet) { create(:project_snippet) }
subject { described_class.new(snippet) }
describe '#version' do
it 'returns instance proxy in specified version' do
result = subject.version('V12p1')
expect(result).to be_a(Elastic::V12p1::SnippetInstanceProxy)
expect(result.target).to eq(snippet)
end
end
describe 'method forwarding' do
let(:old_target) { double(:old_target) }
let(:new_target) { double(:new_target) }
let(:response) do
{ "_index" => "gitlab-test", "_type" => "doc", "_id" => "snippet_1", "_version" => 3, "result" => "updated", "_shards" => { "total" => 2, "successful" => 1, "failed" => 0 }, "created" => false }
end
before do
allow(subject).to receive(:elastic_reading_target).and_return(old_target)
allow(subject).to receive(:elastic_writing_targets).and_return([old_target, new_target])
end
it 'forwards methods which should touch all write targets' do
Elastic::V12p1::SnippetInstanceProxy.methods_for_all_write_targets.each do |method|
expect(new_target).to receive(method).and_return(response)
expect(old_target).to receive(method).and_return(response)
subject.public_send(method)
end
end
it 'forwards read methods to only reading target' do
expect(old_target).to receive(:as_indexed_json)
expect(new_target).not_to receive(:as_indexed_json)
subject.as_indexed_json
expect(subject).not_to respond_to(:method_missing)
end
it 'does not forward write methods which should touch specific version' do
Elastic::V12p1::SnippetInstanceProxy.methods_for_one_write_target.each do |method|
expect(subject).not_to respond_to(method)
end
end
end
end
# frozen_string_literal: true
require 'spec_helper'
describe Gitlab::Elastic::SnippetSearchResults, :elastic do
let(:snippet) { create(:personal_snippet, content: 'foo', file_name: 'foo') }
let(:results) { described_class.new(snippet.author, 'foo') }
before do
stub_ee_application_setting(elasticsearch_search: true, elasticsearch_indexing: true)
perform_enqueued_jobs { snippet }
Snippet.__elasticsearch__.refresh_index!
end
describe '#snippet_titles_count' do
it 'returns the amount of matched snippet titles' do
expect(results.snippet_titles_count).to eq(1)
end
end
describe '#snippet_blobs_count' do
it 'returns the amount of matched snippet blobs' do
expect(results.snippet_blobs_count).to eq(1)
end
end
context 'when user is not author' do
let(:results) { described_class.new(create(:user), 'foo') }
it 'returns nothing' do
expect(results.snippet_titles_count).to eq(0)
expect(results.snippet_blobs_count).to eq(0)
end
end
context 'when user is nil' do
let(:results) { described_class.new(nil, 'foo') }
it 'returns nothing' do
expect(results.snippet_titles_count).to eq(0)
expect(results.snippet_blobs_count).to eq(0)
end
context 'when snippet is public' do
let(:snippet) { create(:personal_snippet, :public, content: 'foo', file_name: 'foo') }
it 'returns public snippet' do
expect(results.snippet_titles_count).to eq(1)
expect(results.snippet_blobs_count).to eq(1)
end
end
end
context 'when user has full_private_access' do
let(:user) { create(:admin) }
let(:results) { described_class.new(user, 'foo') }
it 'returns matched snippets' do
expect(results.snippet_titles_count).to eq(1)
expect(results.snippet_blobs_count).to eq(1)
end
end
end
......@@ -136,14 +136,14 @@ describe Snippet, :elastic do
'visibility_level'
).merge({ 'type' => snippet.es_type })
expect(snippet.as_indexed_json).to eq(expected_hash)
expect(snippet.__elasticsearch__.as_indexed_json).to eq(expected_hash)
end
it 'uses same index for Snippet subclasses' do
Snippet.subclasses.each do |snippet_class|
expect(snippet_class.index_name).to eq(Snippet.index_name)
expect(snippet_class.document_type).to eq(Snippet.document_type)
expect(snippet_class.mappings.to_hash).to eq(Snippet.mappings.to_hash)
expect(snippet_class.__elasticsearch__.mappings.to_hash).to eq(Snippet.__elasticsearch__.mappings.to_hash)
end
end
end
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment