@@ -480,3 +480,380 @@ it executes `occurrence.pipeline.created_at`.
When looping through the vulnerability occurrences in the Sidekiq worker, we
could try to load the corresponding pipeline and choose to skip processing that
occurrence if pipeline is not found.
## Architecture
The loose foreign keys feature is implemented within the `LooseForeignKeys` Ruby namespace. The
code is isolated from the core application code and theoretically, it could be a standalone library.
The feature is invoked solely in the [`LooseForeignKeys::CleanupWorker`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/app/workers/loose_foreign_keys/cleanup_worker.rb) worker class. The worker is scheduled via a
cron job where the schedule depends on the configuration of the GitLab instance.
- Non-decomposed GitLab (1 database): invoked every minute.
- Decomposed GitLab (2 databases, CI and Main): invoked every minute, cleaning up one database
at a time. For example, the cleanup worker for the main database runs every two minutes.
To avoid lock contention and the processing of the same database rows, the worker does not run
parallel. This behavior is ensured with a Redis lock.
**Record cleanup procedure:**
1. Acquire the Redis lock.
1. Determine which database to clean up.
1. Collect all database tables where the deletions are tracked (parent tables).
- This is achieved by reading the `config/gitlab_loose_foreign_keys.yml` file.
- A table is considered "tracked" when a loose foreign key definition exists for the table and
the `DELETE` trigger is installed.
1. Cycle through the tables with an infinite loop.
1. For each table, load a batch of deleted parent records to clean up.
1. Depending on the YAML configuration, build `DELETE` or `UPDATE` (nullify) queries for the
referenced child tables.
1. Invoke the queries.
1. Repeat until all child records are cleaned up or the maximum limit is reached.
1. Remove the deleted parent records when all child records are cleaned up.
### Database structure
The feature relies on triggers installed on the parent tables. When a parent record is deleted,
the trigger will automatically insert a new record into the `loose_foreign_keys_deleted_records`
database table.
The inserted record will store the following information about the deleted record:
-`fully_qualified_table_name`: name of the database table where the record was located.
-`primary_key_value`: the ID of the record, the value will be present in the child tables as
the foreign key value. At the moment, composite primary keys are not supported, the parent table
must have an `id` column.
-`status`: defaults to pending, represents the status of the cleanup process.
-`consume_after`: defaults to the current time.
-`cleanup_attempts`: defaults to 0. The number of times the worker tried to clean up this record.
A non-zero number would mean that this record has many child records and cleaning it up requires
several runs.
#### Database decomposition
The `loose_foreign_keys_deleted_records` table will exist on both database servers (Ci and Main)
after the [database decomposition](https://gitlab.com/groups/gitlab-org/-/epics/6168). The worker
ill determine which parent tables belong to which database by reading the