changelogs/unreleased/sh-disable-reactive-caching-automatic-retries.yml · 7d840e4bbeb6ac8281167b180c88871b64e9d703 · nexedi / gitlab-ce

Prevent amplification of ReactiveCachingWorker jobs upon failures · a28844ea

Stan Hu authored Jul 06, 2019

When `ReactiveCachingWorker` hits an SSL or other exception that occurs
quickly and reliably, automatically rescheduling a new worker could lead
to excessive number of jobs being scheduled. This happens because not
only does the failed job get rescheduled in a minute, but each Sidekiq
retry will also add even more rescheduled jobs.

In busy instances, this can become an issue because large numbers of
`ReactiveCachingWorker` running can cause high rates of `ExclusiveLease`
reads to occur and possibly saturate the Redis server with queries.

We now disable this automatic retry and rely on Sidekiq to perform its 3
retries with a backoff period.

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/64176

a28844ea

sh-disable-reactive-caching-automatic-retries.yml 124 Bytes

Replace sh-disable-reactive-caching-automatic-retries.yml