Commit a28844ea authored by Stan Hu's avatar Stan Hu

Prevent amplification of ReactiveCachingWorker jobs upon failures

When `ReactiveCachingWorker` hits an SSL or other exception that occurs
quickly and reliably, automatically rescheduling a new worker could lead
to excessive number of jobs being scheduled. This happens because not
only does the failed job get rescheduled in a minute, but each Sidekiq
retry will also add even more rescheduled jobs.

In busy instances, this can become an issue because large numbers of
`ReactiveCachingWorker` running can cause high rates of `ExclusiveLease`
reads to occur and possibly saturate the Redis server with queries.

We now disable this automatic retry and rely on Sidekiq to perform its 3
retries with a backoff period.

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/64176
parent 6e9f8820
......@@ -178,7 +178,7 @@ module ReactiveCaching
def enqueuing_update(*args)
yield
ensure
ReactiveCachingWorker.perform_in(self.class.reactive_cache_refresh_interval, self.class, id, *args)
end
end
......
---
title: Prevent amplification of ReactiveCachingWorker jobs upon failures
merge_request: 30432
author:
type: performance
......@@ -206,8 +206,9 @@ describe ReactiveCaching, :use_clean_rails_memory_store_caching do
expect(read_reactive_cache(instance)).to eq("preexisting")
end
it 'enqueues a repeat worker' do
expect_reactive_cache_update_queued(instance)
it 'does not enqueue a repeat worker' do
expect(ReactiveCachingWorker)
.not_to receive(:perform_in)
expect { go! }.to raise_error("foo")
end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment