1. 06 Jul, 2019 1 commit
    • Stan Hu's avatar
      Prevent amplification of ReactiveCachingWorker jobs upon failures · a28844ea
      Stan Hu authored
      When `ReactiveCachingWorker` hits an SSL or other exception that occurs
      quickly and reliably, automatically rescheduling a new worker could lead
      to excessive number of jobs being scheduled. This happens because not
      only does the failed job get rescheduled in a minute, but each Sidekiq
      retry will also add even more rescheduled jobs.
      
      In busy instances, this can become an issue because large numbers of
      `ReactiveCachingWorker` running can cause high rates of `ExclusiveLease`
      reads to occur and possibly saturate the Redis server with queries.
      
      We now disable this automatic retry and rely on Sidekiq to perform its 3
      retries with a backoff period.
      
      Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/64176
      a28844ea
  2. 05 Jul, 2019 39 commits