Prevent amplification of ReactiveCachingWorker jobs upon failures
When `ReactiveCachingWorker` hits an SSL or other exception that occurs quickly and reliably, automatically rescheduling a new worker could lead to excessive number of jobs being scheduled. This happens because not only does the failed job get rescheduled in a minute, but each Sidekiq retry will also add even more rescheduled jobs. In busy instances, this can become an issue because large numbers of `ReactiveCachingWorker` running can cause high rates of `ExclusiveLease` reads to occur and possibly saturate the Redis server with queries. We now disable this automatic retry and rely on Sidekiq to perform its 3 retries with a backoff period. Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/64176
Showing
Please register or sign in to comment