app/models/bulk_imports/entity.rb · 5defc140e39daf5d83a4fd5ff6b995ec4f62011a · nexedi / gitlab-ce

BulkImports: Track pipeline work with BulkImports::Tracker#status · fe9bc7b1

Kassio Borges authored Mar 12, 2021

= Context

Create a database representation, with the `BulkImports::Tracker`, for
each BulkImport Pipeline (`include BulkImports::Pipeline`). This way,
the pipeline progress can be tracked using the `status` column (state
machine).

With this change we can run each pipeline on its own background job
(https://gitlab.com/gitlab-org/gitlab/-/issues/323384), which will bring
the power of retrying the pipeline in case of failures, like rate
limiting.

Besides that, having each pipeline progress in the database will be
handy to create a better UI to give an accurate sense of progress to the
user in the future.

- Merge request that introduced the `status` column to
  `BulkImports::tracker`:
  https://gitlab.com/gitlab-org/gitlab/-/merge_requests/5568

= The change

Before this change, `BulkImports::Tracker` was tracking only the
pagination status, when required. So, a record was only created when the
pipeline required pagination handling.

Now, before running a pipeline a `BulkImports::Tracker` record is
created, and when the pipeline is finished, failed or skipped, these
status are also updated/tracked in the pipeline's `BulkImports::Tracker`
record.

The pipeline status is required to run them in individual background
jobs because the pipelines have an order to run. For instance, we cannot
import Epics before importing the Group or the Group labels.

- Related to: https://gitlab.com/gitlab-org/gitlab/-/issues/324109

= Next step

Create the `BulkImports::PipelineWorker` to run each pipeline on its own
job.

- https://gitlab.com/gitlab-org/gitlab/-/issues/323384

= References

- Epic: https://gitlab.com/groups/gitlab-org/-/epics/5544
- Spike where the idea was tested:
  https://gitlab.com/gitlab-org/gitlab/-/merge_requests/54970

fe9bc7b1

entity.rb 3.1 KB

Replace entity.rb