Commit fee9723c authored by Mike Kozono's avatar Mike Kozono

Deduplicate Geo cronjobs

The default deduplication setting is `:until_executing`. For cronjobs
which are enqueued every minute, we can confidently take it a bit further
and use `:until_executed`.

This should be slightly more efficient, and it should help some edge
cases like https://gitlab.com/gitlab-org/gitlab/-/issues/328057.

The modified classes are inherited in the following class files, as shown in
grep output:

```
./ee/app/workers/geo/scheduler/primary/scheduler_worker.rb:      class SchedulerWorker < Geo::Scheduler::SchedulerWorker
./ee/app/workers/geo/scheduler/primary/per_shard_scheduler_worker.rb:      class PerShardSchedulerWorker < Geo::Scheduler::PerShardSchedulerWorker
./ee/app/workers/geo/scheduler/secondary/scheduler_worker.rb:      class SchedulerWorker < Geo::Scheduler::SchedulerWorker
./ee/app/workers/geo/scheduler/secondary/per_shard_scheduler_worker.rb:      class PerShardSchedulerWorker < Geo::Scheduler::PerShardSchedulerWorker
./ee/app/workers/geo/repository_sync_worker.rb:  class RepositorySyncWorker < Geo::Scheduler::Secondary::PerShardSchedulerWorker
./ee/app/workers/geo/registry_sync_worker.rb:  class RegistrySyncWorker < Geo::Scheduler::Secondary::SchedulerWorker
./ee/app/workers/geo/repository_verification/primary/shard_worker.rb:      class ShardWorker < Geo::Scheduler::Primary::SchedulerWorker
./ee/app/workers/geo/repository_verification/primary/batch_worker.rb:      class BatchWorker < Geo::Scheduler::Primary::PerShardSchedulerWorker
./ee/app/workers/geo/repository_verification/secondary/shard_worker.rb:      class ShardWorker < Geo::Scheduler::Secondary::SchedulerWorker
./ee/app/workers/geo/repository_verification/secondary/scheduler_worker.rb:      class SchedulerWorker < Geo::Scheduler::Secondary::PerShardSchedulerWorker
./ee/app/workers/geo/container_repository_sync_dispatch_worker.rb:  class ContainerRepositorySyncDispatchWorker < Geo::Scheduler::Secondary::SchedulerWorker
./ee/app/workers/geo/repository_shard_sync_worker.rb:  class RepositoryShardSyncWorker < Geo::Scheduler::Secondary::SchedulerWorker
./ee/app/workers/geo/file_download_dispatch_worker.rb:  class FileDownloadDispatchWorker < Geo::Scheduler::Secondary::SchedulerWorker
```

Resolves https://gitlab.com/gitlab-org/gitlab/-/issues/328364

Changelog: performance
parent fdd8067b
...@@ -13,6 +13,13 @@ module Geo ...@@ -13,6 +13,13 @@ module Geo
feature_category :geo_replication feature_category :geo_replication
# These workers are enqueued every minute by sidekiq-cron. If one of them
# is already enqueued or running, then there isn't a strong case for
# enqueuing another. And there are edge cases where enqueuing another
# would exacerbate a problem. See
# https://gitlab.com/gitlab-org/gitlab/-/issues/328057.
deduplicate :until_executed
def perform def perform
each_eligible_shard { |shard_name| schedule_job(shard_name) } each_eligible_shard { |shard_name| schedule_job(shard_name) }
end end
......
...@@ -10,6 +10,14 @@ module Geo ...@@ -10,6 +10,14 @@ module Geo
include ::Gitlab::Utils::StrongMemoize include ::Gitlab::Utils::StrongMemoize
include GeoBackoffDelay include GeoBackoffDelay
# These workers are enqueued regularly by sidekiq-cron or by an per-shard
# worker which is enqueued by sidekiq-cron. If one of these workers is
# already enqueued or running, then there isn't a strong case for
# enqueuing another. And there are edge cases where enqueuing another
# would exacerbate a problem. See
# https://gitlab.com/gitlab-org/gitlab/-/issues/328057.
deduplicate :until_executed
DB_RETRIEVE_BATCH_SIZE = 1000 DB_RETRIEVE_BATCH_SIZE = 1000
LEASE_TIMEOUT = 60.minutes LEASE_TIMEOUT = 60.minutes
RUN_TIME = 60.minutes.to_i RUN_TIME = 60.minutes.to_i
......
---
title: Deduplicate Geo cronjobs
merge_request: 59799
author:
type: performance
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment