Include letters, numbers & underscore always as Elasticsearch token

There are various other regexes here that are trying to capture tokens in different contexts but at the very least we should also always be greedily capturing a series of letters, numbers and underscores. It's OK if this is already covered in some cases by another regex since we de-duplicate tokens anyway. The test included in this change is an example where we don't correctly capture this token today and it is a common example in Ruby so we should cover it.

Include letters, numbers & underscore always as Elasticsearch token
There are various other regexes here that are trying to capture tokens in different contexts but at the very least we should also always be greedily capturing a series of letters, numbers and underscores. It's OK if this is already covered in some cases by another regex since we de-duplicate tokens anyway. The test included in this change is an example where we don't correctly capture this token today and it is a common example in Ruby so we should cover it.
a0a14f6f · Dylan Griffith · 7f13e0bc · a0a14f6f · a0a14f6f · a0a14f6f
Commit a0a14f6f authored Jul 08, 2020 by Dylan Griffith
4 changed files
--- a/doc/development/elasticsearch.md
+++ b/doc/development/elasticsearch.md
@@ -111,7 +111,8 @@ Patterns:
 - `'"((?:\\"|[^"]|\\")*)"'`: captures terms inside quotes, removing the quotes
 - `"'((?:\\'|[^']|\\')*)'"`: same as above, for single-quotes
 - `'\.([^.]+)(?=\.|\s|\Z)'`: separate terms with periods in-between
- `'([\p{L}_.-]+)'` : some common chars in file names to keep the whole filename intact (eg. `my_file-ñame.txt`)
+- `'([\p{L}_.-]+)'`: some common chars in file names to keep the whole filename intact (eg. `my_file-ñame.txt`)
+- `'([\p{L}\d_]+)'`: letters, numbers and underscores are the most common tokens in programming. Always capture them greedily regardless of context.

 ## Gotchas


--- a/ee/changelogs/unreleased/elasticsearch-word-tokens-with-underscores.yml
+++ b/ee/changelogs/unreleased/elasticsearch-word-tokens-with-underscores.yml
+---
+title: Allow searching word tokens with letters, numbers and underscores in advanced global search
+merge_request: 36255
+author:
+type: changed
--- a/ee/lib/elastic/latest/config.rb
+++ b/ee/lib/elastic/latest/config.rb
@@ -61,7 +61,8 @@ module Elastic
                  '"((?:\\"|[^"]|\\")*)"', # capture terms inside quotes, removing the quotes
                  "'((?:\\'|[^']|\\')*)'", # same as above, for single quotes
                  '\.([^.]+)(?=\.|\s|\Z)', # separate terms on periods
-                  '([\p{L}_.-]+)' # some common chars in file names to keep the whole filename intact (eg. my_file-name.txt)
+                  '([\p{L}_.-]+)', # some common chars in file names to keep the whole filename intact (eg. my_file-name.txt)
+                  '([\p{L}\d_]+)' # letters, numbers and underscores are the most common tokens in programming. Always capture them greedily regardless of context.
                ]
              }
            },

--- a/ee/spec/lib/gitlab/elastic/search_results_spec.rb
+++ b/ee/spec/lib/gitlab/elastic/search_results_spec.rb
@@ -637,6 +637,12 @@ RSpec.describe Gitlab::Elastic::SearchResults, :elastic, :sidekiq_might_not_need
          ParenthesesBetweenTokens)tokenAfterParentheses
          a.b.c=missing_token_around_equals

+          def self.ruby_method_name(ruby_method_arg)
+          RubyClassInvoking.ruby_method_call(with_arg)
+
+          def self.ruby_method_123(ruby_another_method_arg)
+          RubyClassInvoking.ruby_call_method_123(with_arg)
+
        FILE
      end
      let(:file_name) { 'elastic_specialchars_test.md' }
@@ -703,6 +709,22 @@ RSpec.describe Gitlab::Elastic::SearchResults, :elastic, :sidekiq_might_not_need
      it 'finds a token after = without a space' do
        expect(search_for('missing_token_around_equals')).to include(file_name)
      end
+
+      it 'finds a ruby method name even if preceeded with dot' do
+        expect(search_for('ruby_method_name')).to include(file_name)
+      end
+
+      it 'finds a ruby method name with numbers' do
+        expect(search_for('ruby_method_123')).to include(file_name)
+      end
+
+      it 'finds a ruby method call even if preceeded with dot' do
+        expect(search_for('ruby_method_call')).to include(file_name)
+      end
+
+      it 'finds a ruby method call with numbers' do
+        expect(search_for('ruby_call_method_123')).to include(file_name)
+      end
    end
  end