Commit a53383fd authored by Mike Kozono's avatar Mike Kozono

Geo: Increase backoff cap for missing on primary

For legacy blobs, which are Job artifacts, LFS objects, and Uploads.

On staging.gitlab.com, many files are (intentionally) missing on the
primary, so geo.staging.gitlab.com attempts to sync them every hour. We
don't want to disable retries after some maximum number, because we want
the system to automatically recover if the files ever appear. But every
hour is a bit excessive, given all retries have failed up to that point.
So this commit raises the retry time cap for legacy blobs missing on
primary from 1 hour to 4 hours.

As an aside, resources which are replicated by the Geo framework will
soon gain the automatic verification and re-verification feature. This
will eventually resync resources which were missing on the primary and
then became not missing.
parent 6efb0e39
......@@ -79,9 +79,11 @@ module Geo
retry_later = !registry.success || registry.missing_on_primary
if retry_later
custom_max_wait_time = missing_on_primary ? 4.hours : nil
# We don't limit the amount of retries
registry.retry_count = (registry.retry_count || 0) + 1
registry.retry_at = next_retry_time(registry.retry_count)
registry.retry_at = next_retry_time(registry.retry_count, custom_max_wait_time)
else
registry.retry_count = 0
registry.retry_at = nil
......
---
title: 'Geo: Increase backoff cap for Job artifacts, LFS objects, and Uploads which are missing on primary'
merge_request: 50812
author:
type: changed
......@@ -8,9 +8,10 @@ module Delay
# To prevent the retry time from storing invalid dates in the database,
# cap the max time to a hour plus some random jitter value.
def next_retry_time(retry_count)
def next_retry_time(retry_count, custom_max_wait_time = nil)
proposed_time = Time.now + delay(retry_count).seconds
max_future_time = 1.hour.from_now + delay(1).seconds
max_wait_time = custom_max_wait_time || 1.hour
max_future_time = max_wait_time.from_now + delay(1).seconds
[proposed_time, max_future_time].min
end
......
......@@ -323,13 +323,13 @@ RSpec.describe Geo::FileDownloadService do
end
end
it 'sets a retry date with a maximum of about 7 days' do
registry_entry.update!(retry_count: 100, retry_at: 7.days.from_now)
it 'sets a retry date with a maximum of about 4 hours' do
registry_entry.update!(retry_count: 100, retry_at: 1.minute.ago)
freeze_time do
execute!
expect(registry_entry.reload.retry_at < 8.days.from_now).to be_truthy
expect(registry_entry.reload.retry_at).to be_within(3.minutes).of(4.hours.from_now)
end
end
end
......@@ -362,13 +362,13 @@ RSpec.describe Geo::FileDownloadService do
end
end
it 'sets a retry date with a maximum of about 7 days' do
registry_entry.update!(retry_count: 100, retry_at: 7.days.from_now)
it 'sets a retry date with a maximum of about 1 hour' do
registry_entry.update!(retry_count: 100, retry_at: 1.minute.ago)
freeze_time do
execute!
expect(registry_entry.reload.retry_at < 8.days.from_now).to be_truthy
expect(registry_entry.reload.retry_at).to be_within(3.minutes).of(1.hour.from_now)
end
end
end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment