Introduce Ci::UpdateLockedUnknownArtifactsWorker
Feature flag: ci_job_artifacts_backlog_work Control execution of our new artifact-backlog-work service by turning Ci::JobArtifacts::UpdateUnknownLockedStatusService into a no-op. Feature flag: ci_job_artifacts_backlog_large_loop_limit Control the number of rows updated in or removed from ci_job_artifacts in a single execution by increasing the loop_limit from 100 ot 500 iterations. BATCH_SIZE = 100 implies a record limit for a single worker execution of 10,000 or 50,000 rows. First, we need to remove the DISTINCT clause from this query. Using it tried to distinct the whole table before applying the limit, which is bad. Also, using DISTINCT implicitly queried for the lowest job_id values first, which created an enormous pile of dead tuple at the end of the index that we were explicitly starting our query from. This is catastrophic for index performance on replicas, but not on the primary, which can use LP_DEAD hint to skip dead index tuples. On replicas, performance remains degraded until we VACUUM. Second, we apply an explicit order by expiration to prevent us from accidentally querying rows in an order that correlates with job_id. By ensuring that we query rows that are generally well-distributed across the table, we prevent large blocks of dead index tuples that need to be scanned before we can come up with 100 live tuples to return. Third, we destroy unlocked job artifacts without updating. By updating rows in ci_job_artifacts to then be deleted on the next loop cycle, we create twice as many dead tuples as necessary in the table. Artifacts locked by ci_pipelines.locked will be updated and left in place, but the others will get passed directly into the DestroyBatchService to be removed. The dead tuple created by the update is not necessary.
Showing
Please register or sign in to comment