Set 1s server side timeout on Elasticsearch counts

These count requests are loaded one per tab every time the search page loads. This means a single search for one type of document will trigger up to 7 other searches just to get the counts for the other tabs. These tab counts are often incredibly expensive requests too especially relative to the cheaper searches. For example an issue search may take 1s while a blobs count will take 30s. Due to a limited thread pool on the Elasticsearch side we regularly see these count queries being the cause of queuing which is slowing down otherwise fast searches on GitLab.com. As such we want to set a timeout on these. This timeout is just a server side Elasticsearch timeout for now which is a soft limit because Elasticsearch is asynchronous and it may actually take Elasticsearch longer to realise it's timed out and cancel the query. As such we may see searches take a few seconds before they timeout even though the timeout is 1s. This is not perfect but benchmarking in the related issue shows this still can drastically improve throughput and this is one of the easiest steps to take now. One thing to also note about this approach is that users will still see a count in the event of a timeout. The count may be a partial count and actually lower than the true count. If they switch to the tab they will see a true count. I think this is probably still better than displaying nothing since the main value the tab counts have is showing whether or not there are searches on that tab at all. Later we may wish to introduce client side timeouts on our ES client but it's trickier to accomplish since we use a single client configuration which has a global timeout for all Elasticsearch queries. Additionally client side timeouts will result in errors that we may wish to handle specially to show some indicator on the tab. Read more at https://gitlab.com/gitlab-org/gitlab/-/issues/301146

Set 1s server side timeout on Elasticsearch counts
These count requests are loaded one per tab every time the search page loads. This means a single search for one type of document will trigger up to 7 other searches just to get the counts for the other tabs. These tab counts are often incredibly expensive requests too especially relative to the cheaper searches. For example an issue search may take 1s while a blobs count will take 30s. Due to a limited thread pool on the Elasticsearch side we regularly see these count queries being the cause of queuing which is slowing down otherwise fast searches on GitLab.com. As such we want to set a timeout on these. This timeout is just a server side Elasticsearch timeout for now which is a soft limit because Elasticsearch is asynchronous and it may actually take Elasticsearch longer to realise it's timed out and cancel the query. As such we may see searches take a few seconds before they timeout even though the timeout is 1s. This is not perfect but benchmarking in the related issue shows this still can drastically improve throughput and this is one of the easiest steps to take now. One thing to also note about this approach is that users will still see a count in the event of a timeout. The count may be a partial count and actually lower than the true count. If they switch to the tab they will see a true count. I think this is probably still better than displaying nothing since the main value the tab counts have is showing whether or not there are searches on that tab at all. Later we may wish to introduce client side timeouts on our ES client but it's trickier to accomplish since we use a single client configuration which has a global timeout for all Elasticsearch queries. Additionally client side timeouts will result in errors that we may wish to handle specially to show some indicator on the tab. Read more at https://gitlab.com/gitlab-org/gitlab/-/issues/301146
a893b326 · Dylan Griffith · 6f28cd1d · a893b326 · a893b326 · a893b326
Commit a893b326 authored Feb 05, 2021 by Dylan Griffith
3 changed files
--- a/ee/changelogs/unreleased/301146-1s-server-side-timeouts-on-elasticsearch-counts.yml
+++ b/ee/changelogs/unreleased/301146-1s-server-side-timeouts-on-elasticsearch-counts.yml
+---
+title: Set 1s server side timeout on Elasticsearch counts
+merge_request: 53435
+author:
+type: performance
--- a/ee/lib/elastic/latest/application_class_proxy.rb
+++ b/ee/lib/elastic/latest/application_class_proxy.rb
@@ -10,6 +10,11 @@ module Elastic
      def search(query, search_options = {})
        es_options = routing_options(search_options)

+        # Counts need to be fast as we load one count per type of document
+        # on every page load. Fail early if they are slow since they don't
+        # need to be accurate.
+        es_options[:timeout] = '1s' if search_options[:count_only]
+
        # Calling elasticsearch-ruby method
        super(query, es_options)
      end

--- a/ee/spec/support/shared_examples/lib/gitlab/elastic/search_results_shared_examples.rb
+++ b/ee/spec/support/shared_examples/lib/gitlab/elastic/search_results_shared_examples.rb
@@ -40,6 +40,7 @@ RSpec.shared_examples 'does not load results for count only queries' do |scopes|
        expect(request.dig(:body, :size)).to eq(0)
        expect(request.dig(:body, :query, :bool, :must)).to be_blank
        expect(request[:highlight]).to be_blank
+        expect(request.dig(:params, :timeout)).to eq('1s')
      end
    end
  end