Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
G
gitlab-ce
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
1
Merge Requests
1
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
nexedi
gitlab-ce
Commits
12491bd1
Commit
12491bd1
authored
Dec 03, 2021
by
Grzegorz Bizon
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Describe three parallel workstreams for CI/CD time-decay
This commit also assigns and AEC and a domain expert.
parent
aea0661a
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
108 additions
and
14 deletions
+108
-14
doc/architecture/blueprints/ci_data_decay/index.md
doc/architecture/blueprints/ci_data_decay/index.md
+108
-14
No files found.
doc/architecture/blueprints/ci_data_decay/index.md
View file @
12491bd1
...
@@ -48,34 +48,125 @@ the upcoming years.
...
@@ -48,34 +48,125 @@ the upcoming years.
CI/CD data is subject to
CI/CD data is subject to
[
time-decay
](
https://about.gitlab.com/company/team/structure/working-groups/database-scalability/time-decay.html
)
[
time-decay
](
https://about.gitlab.com/company/team/structure/working-groups/database-scalability/time-decay.html
)
because, usually, pipelines that are a few months old are not frequently
because, usually, pipelines that are a few months old are not frequently
accessed or
relevant anymore. Restricting access to processing pipelines that
accessed or
are even not relevant anymore. Restricting access to processing
are longer than a few months might help us to Describe our approach
to
pipelines that are longer than a few months might help us to move this data
to
partitioning using time-decay pattern
.
a different storage, that is more performance and cost effective
.
It is already possible to prevent processing
of
builds
[
that have been
It is already possible to prevent processing builds
[
that have been
archived
](
/ee/user/admin_area/settings/continuous_integration.html#archive-jobs
)
.
archived
](
/ee/user/admin_area/settings/continuous_integration.html#archive-jobs
)
.
When a build gets archived it will not be possible to retry it, but we do not
When a build gets archived it will not be possible to retry it, but we do not
re
move data from the database.
move data from the database.
In order to improve performance and make it easier to scale CI/CD data storage
In order to improve performance and make it easier to scale CI/CD data storage
we might want to
:
we might want to
follow these three tracks described below.
1.
Make it possible to move data out of PostgreSQL to a different data store
### Move rarely accessed data
when a build, or a pipeline, gets archived using a retention policy.
1.
Make it possible to partition certain database tables, storing CI/CD data,
using the same retention policy as in the point above.
### Data retention
Once a build (or a pipeline) gets archived, it is no longer possible to resume
pipeline processing in such pipeline. It means that all the metadata, we store
in PostgreSQL, that is needed to efficiently and reliably process builds can be
safely moved to a different data store.
### Partitioning
Currently, storing pipeline processing data is expensive as this kind of CI/CD
data represents a significant portion of data stored in CI/CD tables. Once we
restrict access to processing archived pipelines, we can move this metadata to
a different place - preferably object storage - and make it accessible on
demand, when it is really needed again (for example for compliance purposes).
### Archived data access
Epic: [Link]
### Partition rarely accessed data
After we move CI/CD metadata to a different store, the problem of having
billions of rows describing pipelines, build and artifacts, remains. We still
need to keep reference to the metadata we store in object storage and we still
do need to be able to retrieve this information reliably in bulk (or search
through it).
It means that by moving data to object storage we might not be able to reduce
the number of rows in CI/CD tables. Moving data to object storage should help
with reducing the data size, but not the quantity of entries describing this
data. Because of this limitation, we still want to partition CI/CD data to
reduce the impact on the database (indices size, auto-vacuum time and
frequency).
Partitioning rarely accessed data should also follow the policy defined for
builds archival, to make it consistent and reliable.
Epic: [Link]
### Partition frequently used queuing tables
While working on the
[
CI/CD Scale
](
https://docs.gitlab.com/ee/architecture/blueprints/ci_scale/
)
architecture, we have introduced a
[
new architecture for queuing CI/CD builds
](
https://gitlab.com/groups/gitlab-org/-/epics/5909#note_680407908
)
for execution.
This allowed us to significant improve performance, but we still do consider
the new solution as an intermediate mechanism, needed before we start working
on the next iteration, that should improve the architecture of builds queuing
even more (it might require moving off the PostgreSQL fully or partially).
In the meantime we want to ship another iteration, an intermediate step towards
more flexible and reliable solution. We want to partition the new queuing
tables, to reduce the impact on the database, to improve reliability and
database health.
Partitioning of CI/CD queuing tables does not need to follow the policy defined
for builds archival. Instead we should leverage a long-standing policy saying
that builds created more 24 hours ago need to be removed from the queue. This
business rule is present in the product since the inception of GitLab CI.
Epic:
[
Prepare queuing tables for list-style partitioning
](
https://gitlab.com/gitlab-org/gitlab/-/issues/347027
)
.
## Caveats
All the three tracks we will use to implement CI/CD time decay pattern are
associated with some challenges. Most important ones are documented below.
### Removing data
While it might be tempting to simply remove old or archived data from our
database, this should be avoided. We should not permanently remove user data
unless a consent is given to do so. We can, however, move data to a different
data store, like object storage.
Archived data can still be needed sometimes (for example for compliance
reasons). We want to be able to retrieve this data if needed, as long as
permanent removal has not been requested or approved by a user.
### Accessing data
Implementing CI/CD data time-decay through partitioning might be challenging
when we still want to make it possible for users to access data stored across
many partitions.
In order to do that we will need to make sure that when archived data needs be
accessed, users provide a time range in which the data has been created. In
order to make it efficient it might be necessary to restrict access to querying
data residing in more than two partitions at once. We can do that by supporting
time ranges spanning the duration that equals to the builds archival policy.
#### Merge request pipelines
Once we partition CI/CD data, especially CI builds, we need to find an
efficient mechanism to present pipeline statuses in merge requests.
How to exactly do that is an implementation detail that we will need to figure
out as the work progresses. We do have many tools to achieve that - data
denormalization, routing reads to proper partitions based on data stored with a
merge request.
## Iterations
## Iterations
All three tacks can be worked on in parallel:
1.
[
Move archived CI/CD data to object storage
](
https://gitlab.com/gitlab-org/gitlab/-/merge_requests/68228
)
2.
[
Partition CI/CD tables using CI/CD data retention policy
](
LINK
)
3.
[
Partition CI/CD queuing tables using list partitioning
](
https://gitlab.com/gitlab-org/gitlab/-/issues/347027
)
## Status
## Status
In progres
s.
Request For Comment
s.
## Who
## Who
...
@@ -88,6 +179,7 @@ Proposal:
...
@@ -88,6 +179,7 @@ Proposal:
| Author | Grzegorz Bizon |
| Author | Grzegorz Bizon |
| Engineering Leader | Cheryl Li |
| Engineering Leader | Cheryl Li |
| Product Manager | Jackie Porter |
| Product Manager | Jackie Porter |
| Architecture Evolution Coach | Kamil Trzciński |
DRIs:
DRIs:
...
@@ -101,5 +193,7 @@ Domain experts:
...
@@ -101,5 +193,7 @@ Domain experts:
| Area | Who
| Area | Who
|------------------------------|------------------------|
|------------------------------|------------------------|
| Continuous Integration | Marius Bobin |
| PostgreSQL Database | Andreas Brandl |
<!-- vale gitlab.Spelling = YES -->
<!-- vale gitlab.Spelling = YES -->
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment