• Stan Hu's avatar
    Prevent amplification of ReactiveCachingWorker jobs upon failures · a28844ea
    Stan Hu authored
    When `ReactiveCachingWorker` hits an SSL or other exception that occurs
    quickly and reliably, automatically rescheduling a new worker could lead
    to excessive number of jobs being scheduled. This happens because not
    only does the failed job get rescheduled in a minute, but each Sidekiq
    retry will also add even more rescheduled jobs.
    
    In busy instances, this can become an issue because large numbers of
    `ReactiveCachingWorker` running can cause high rates of `ExclusiveLease`
    reads to occur and possibly saturate the Redis server with queries.
    
    We now disable this automatic retry and rely on Sidekiq to perform its 3
    retries with a backoff period.
    
    Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/64176
    a28844ea
reactive_caching_spec.rb 6.47 KB