Commit c8d22d6e authored by Lin Jen-Shin's avatar Lin Jen-Shin

Merge branch '344792-consider-getting-rid-of-the-ci_pre_clone_script-variable' into 'master'

Remove the cache-repo job as it's no longer in use

See merge request gitlab-org/gitlab!73978
parents 81a14a3e afee549c
# Builds a cached .tar.gz of the $CI_DEFAULT_BRANCH branch with full history and
# uploads it to Google Cloud Storage. This archive is downloaded by a
# script defined by a CI/CD variable named CI_PRE_CLONE_SCRIPT. This has
# two benefits:
#
# 1. It speeds up builds. A 800 MB download only takes seconds.
# 2. It significantly reduces load on the file server. Smaller deltas
# means less time spent in git pack-objects.
#
# Since the destination directory of the archive depends on the project
# ID, this is only run on GitLab.com.
#
# CI_REPO_CACHE_CREDENTIALS contains the Google Cloud service account
# JSON for uploading to the gitlab-ci-git-repo-cache bucket. These
# credentials are stored in the Production vault.
#
# Note that this bucket should be located in the same continent as the
# runner, or network egress charges will apply:
# https://cloud.google.com/storage/pricing
cache-repo:
extends: .cache-repo:rules
image: gcr.io/google.com/cloudsdktool/cloud-sdk:alpine
stage: sync
variables:
GIT_STRATEGY: none
SHALLOW_CLONE_TAR_FILENAME: gitlab-master-shallow.tar
FULL_CLONE_TAR_FILENAME: gitlab-master.tar
before_script:
- '[ -z "$CI_REPO_CACHE_CREDENTIALS" ] || gcloud auth activate-service-account --key-file=$CI_REPO_CACHE_CREDENTIALS'
script:
# Enable shallow repo caching unless the $DISABLE_SHALLOW_REPO_CACHING variable exists (in the case the shallow clone caching isn't working well)
# The `git repack` call works around a Git bug with shallow clones: https://gitlab.com/gitlab-org/git/-/issues/86
- if [ -z "$DISABLE_SHALLOW_REPO_CACHING" ]; then
cd .. && rm -rf $CI_PROJECT_NAME;
today=$(date +%Y-%m-%d);
year=$(date +%Y);
last_year=`expr $year - 1`;
one_year_ago=$(echo $today | sed "s/$year/$last_year/");
echo "Cloning $CI_REPOSITORY_URL into $CI_PROJECT_NAME with commits from $one_year_ago.";
time git clone --progress --no-checkout --shallow-since=$one_year_ago $CI_REPOSITORY_URL $CI_PROJECT_NAME;
cd $CI_PROJECT_NAME;
time git repack -d;
echo "Archiving $CI_PROJECT_NAME into /tmp/$SHALLOW_CLONE_TAR_FILENAME.";
time git remote rm origin;
time tar cf /tmp/$SHALLOW_CLONE_TAR_FILENAME .;
echo "GZipping /tmp/$SHALLOW_CLONE_TAR_FILENAME.";
time gzip /tmp/$SHALLOW_CLONE_TAR_FILENAME;
[ -z "$CI_REPO_CACHE_CREDENTIALS" ] || (echo "Uploading /tmp/$SHALLOW_CLONE_TAR_FILENAME.gz to GCloud." && time gsutil cp /tmp/$SHALLOW_CLONE_TAR_FILENAME.gz gs://gitlab-ci-git-repo-cache/project-$CI_PROJECT_ID/$SHALLOW_CLONE_TAR_FILENAME.gz);
fi
# Disable the full repo caching unless the $DISABLE_SHALLOW_REPO_CACHING variable exists (in the case the shallow clone caching isn't working well)
- if [ -n "$DISABLE_SHALLOW_REPO_CACHING" ]; then
cd .. && rm -rf $CI_PROJECT_NAME;
echo "Cloning $CI_REPOSITORY_URL into $CI_PROJECT_NAME.";
time git clone --progress $CI_REPOSITORY_URL $CI_PROJECT_NAME;
cd $CI_PROJECT_NAME;
time git repack -d;
echo "Archiving $CI_PROJECT_NAME into /tmp/$FULL_CLONE_TAR_FILENAME.";
time git remote rm origin;
time tar cf /tmp/$FULL_CLONE_TAR_FILENAME .;
echo "GZipping /tmp/$FULL_CLONE_TAR_FILENAME.";
time gzip /tmp/$FULL_CLONE_TAR_FILENAME;
[ -z "$CI_REPO_CACHE_CREDENTIALS" ] || (echo "Uploading /tmp/$FULL_CLONE_TAR_FILENAME.gz to GCloud." && time gsutil cp /tmp/$FULL_CLONE_TAR_FILENAME.gz gs://gitlab-ci-git-repo-cache/project-$CI_PROJECT_ID/$FULL_CLONE_TAR_FILENAME.gz);
fi
...@@ -94,9 +94,6 @@ ...@@ -94,9 +94,6 @@
.if-dot-com-ee-nightly-schedule-child-pipeline: &if-dot-com-ee-nightly-schedule-child-pipeline .if-dot-com-ee-nightly-schedule-child-pipeline: &if-dot-com-ee-nightly-schedule-child-pipeline
if: '$CI_SERVER_HOST == "gitlab.com" && $CI_PROJECT_PATH == "gitlab-org/gitlab" && $CI_PIPELINE_SOURCE == "parent_pipeline" && $FREQUENCY == "nightly"' if: '$CI_SERVER_HOST == "gitlab.com" && $CI_PROJECT_PATH == "gitlab-org/gitlab" && $CI_PIPELINE_SOURCE == "parent_pipeline" && $FREQUENCY == "nightly"'
.if-cache-credentials-schedule: &if-cache-credentials-schedule
if: '$CI_REPO_CACHE_CREDENTIALS && $CI_PIPELINE_SOURCE == "schedule"'
.if-dot-com-gitlab-org-default-branch: &if-dot-com-gitlab-org-default-branch .if-dot-com-gitlab-org-default-branch: &if-dot-com-gitlab-org-default-branch
if: '$CI_SERVER_HOST == "gitlab.com" && $CI_PROJECT_NAMESPACE == "gitlab-org" && $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH' if: '$CI_SERVER_HOST == "gitlab.com" && $CI_PROJECT_NAMESPACE == "gitlab-org" && $CI_COMMIT_REF_NAME == $CI_DEFAULT_BRANCH'
...@@ -482,14 +479,6 @@ ...@@ -482,14 +479,6 @@
- changes: *ci-build-images-patterns - changes: *ci-build-images-patterns
- changes: *code-qa-patterns - changes: *code-qa-patterns
####################
# Cache repo rules #
####################
.cache-repo:rules:
rules:
- <<: *if-cache-credentials-schedule
allow_failure: true
############# #############
# CNG rules # # CNG rules #
############# #############
......
...@@ -259,6 +259,5 @@ For very active repositories with a large number of references and files, you ca ...@@ -259,6 +259,5 @@ For very active repositories with a large number of references and files, you ca
must be configured per-repository. The pack-objects cache also automatically works for forks. On GitLab.com, where the pack-objects cache is must be configured per-repository. The pack-objects cache also automatically works for forks. On GitLab.com, where the pack-objects cache is
enabled on all Gitaly servers, we found that we no longer need a pre-clone step for `gitlab-org/gitlab` development. enabled on all Gitaly servers, we found that we no longer need a pre-clone step for `gitlab-org/gitlab` development.
- Optimize your CI/CD jobs by seeding repository data in a pre-clone step with the - Optimize your CI/CD jobs by seeding repository data in a pre-clone step with the
[`pre_clone_script`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section) of GitLab Runner. See our [`pre_clone_script`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section) of GitLab Runner. See the
[development documentation](../../development/pipelines.md#pre-clone-step) for an overview of how we used to implement this approach on [Runner Cloud for Linux](../runners/runner_cloud/linux_runner_cloud.md#pre-clone-script) for more details.
GitLab.com for the main GitLab repository.
...@@ -52,13 +52,67 @@ can be used for: ...@@ -52,13 +52,67 @@ can be used for:
To use this feature, define a [CI/CD variable](../../../ci/variables/index.md#custom-cicd-variables) called To use this feature, define a [CI/CD variable](../../../ci/variables/index.md#custom-cicd-variables) called
`CI_PRE_CLONE_SCRIPT` that contains a bash script. `CI_PRE_CLONE_SCRIPT` that contains a bash script.
[This example](../../../development/pipelines.md#pre-clone-step)
demonstrates how you might use a pre-clone step to seed the build
directory.
NOTE: NOTE:
The `CI_PRE_CLONE_SCRIPT` variable does not work on Windows runners. The `CI_PRE_CLONE_SCRIPT` variable does not work on Windows runners.
### Pre-clone script example
This example was used in the `gitlab-org/gitlab` project until November 2021.
The project no longer uses this optimization because the [pack-objects cache](../../../administration/gitaly/configure_gitaly.md#pack-objects-cache)
lets Gitaly serve the full CI/CD fetch traffic. See [Git fetch caching](../../../development/pipelines.md#git-fetch-caching).
The `CI_PRE_CLONE_SCRIPT` was defined as a project CI/CD variable:
```shell
(
echo "Downloading archived master..."
wget -O /tmp/gitlab.tar.gz https://storage.googleapis.com/gitlab-ci-git-repo-cache/project-278964/gitlab-master-shallow.tar.gz
if [ ! -f /tmp/gitlab.tar.gz ]; then
echo "Repository cache not available, cloning a new directory..."
exit
fi
rm -rf $CI_PROJECT_DIR
echo "Extracting tarball into $CI_PROJECT_DIR..."
mkdir -p $CI_PROJECT_DIR
cd $CI_PROJECT_DIR
tar xzf /tmp/gitlab.tar.gz
rm -f /tmp/gitlab.tar.gz
chmod a+w $CI_PROJECT_DIR
)
```
The first step of the script downloads `gitlab-master.tar.gz` from Google Cloud Storage.
There was a [GitLab CI/CD job named `cache-repo`](https://gitlab.com/gitlab-org/gitlab/-/blob/5fb40526c8c8aaafc5f92eab36d5bbddaca3893d/.gitlab/ci/cache-repo.gitlab-ci.yml)
that was responsible for keeping that archive up-to-date. Every two hours on a scheduled pipeline,
it did the following:
1. Create a fresh clone of the `gitlab-org/gitlab` repository on GitLab.com.
1. Save the data as a `.tar.gz`.
1. Upload it into the Google Cloud Storage bucket.
When a job ran with this configuration, the output looked similar to:
```shell
$ eval "$CI_PRE_CLONE_SCRIPT"
Downloading archived master...
Extracting tarball into /builds/gitlab-org/gitlab...
Fetching changes...
Reinitialized existing Git repository in /builds/gitlab-org/gitlab/.git/
```
The `Reinitialized existing Git repository` message shows that
the pre-clone step worked. The runner runs `git init`, which
overwrites the Git configuration with the appropriate settings to fetch
from the GitLab repository.
`CI_REPO_CACHE_CREDENTIALS` must contain the Google Cloud service account
JSON for uploading to the `gitlab-ci-git-repo-cache` bucket.
Note that this bucket should be located in the same continent as the
runner, or [you can incur network egress charges](https://cloud.google.com/storage/pricing).
## `config.toml` ## `config.toml`
The full contents of our `config.toml` are: The full contents of our `config.toml` are:
......
...@@ -730,7 +730,6 @@ and included in `rules` definitions via [YAML anchors](../ci/yaml/yaml_optimizat ...@@ -730,7 +730,6 @@ and included in `rules` definitions via [YAML anchors](../ci/yaml/yaml_optimizat
| `if-dot-com-gitlab-org-and-security-merge-request` | Limit jobs creation to merge requests for the `gitlab-org` and `gitlab-org/security` groups on GitLab.com. | | | `if-dot-com-gitlab-org-and-security-merge-request` | Limit jobs creation to merge requests for the `gitlab-org` and `gitlab-org/security` groups on GitLab.com. | |
| `if-dot-com-gitlab-org-and-security-tag` | Limit jobs creation to tags for the `gitlab-org` and `gitlab-org/security` groups on GitLab.com. | | | `if-dot-com-gitlab-org-and-security-tag` | Limit jobs creation to tags for the `gitlab-org` and `gitlab-org/security` groups on GitLab.com. | |
| `if-dot-com-ee-schedule` | Limits jobs to scheduled pipelines for the `gitlab-org/gitlab` project on GitLab.com. | | | `if-dot-com-ee-schedule` | Limits jobs to scheduled pipelines for the `gitlab-org/gitlab` project on GitLab.com. | |
| `if-cache-credentials-schedule` | Limits jobs to scheduled pipelines with the `$CI_REPO_CACHE_CREDENTIALS` variable set. | |
| `if-security-pipeline-merge-result` | Matches if the pipeline is for a security merge request triggered by `@gitlab-release-tools-bot`. | | | `if-security-pipeline-merge-result` | Matches if the pipeline is for a security merge request triggered by `@gitlab-release-tools-bot`. | |
<!-- vale gitlab.Substitutions = YES --> <!-- vale gitlab.Substitutions = YES -->
...@@ -823,60 +822,6 @@ allows Gitaly to serve the full CI/CD fetch traffic now. See [Git fetch caching] ...@@ -823,60 +822,6 @@ allows Gitaly to serve the full CI/CD fetch traffic now. See [Git fetch caching]
The pre-clone step works by using the `CI_PRE_CLONE_SCRIPT` variable The pre-clone step works by using the `CI_PRE_CLONE_SCRIPT` variable
[defined by GitLab.com shared runners](../ci/runners/runner_cloud/linux_runner_cloud.md#pre-clone-script). [defined by GitLab.com shared runners](../ci/runners/runner_cloud/linux_runner_cloud.md#pre-clone-script).
The `CI_PRE_CLONE_SCRIPT` is defined as a project CI/CD variable:
```shell
(
echo "Downloading archived master..."
wget -O /tmp/gitlab.tar.gz https://storage.googleapis.com/gitlab-ci-git-repo-cache/project-278964/gitlab-master-shallow.tar.gz
if [ ! -f /tmp/gitlab.tar.gz ]; then
echo "Repository cache not available, cloning a new directory..."
exit
fi
rm -rf $CI_PROJECT_DIR
echo "Extracting tarball into $CI_PROJECT_DIR..."
mkdir -p $CI_PROJECT_DIR
cd $CI_PROJECT_DIR
tar xzf /tmp/gitlab.tar.gz
rm -f /tmp/gitlab.tar.gz
chmod a+w $CI_PROJECT_DIR
)
```
The first step of the script downloads `gitlab-master.tar.gz` from
Google Cloud Storage. There is a [GitLab CI job named `cache-repo`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/ci/cache-repo.gitlab-ci.yml#L5)
that is responsible for keeping that archive up-to-date. Every two hours
on a scheduled pipeline, it does the following:
1. Creates a fresh clone of the `gitlab-org/gitlab` repository on GitLab.com.
1. Saves the data as a `.tar.gz`.
1. Uploads it into the Google Cloud Storage bucket.
When a CI job runs with this configuration, the output looks something like this:
```shell
$ eval "$CI_PRE_CLONE_SCRIPT"
Downloading archived master...
Extracting tarball into /builds/group/project...
Fetching changes...
Reinitialized existing Git repository in /builds/group/project/.git/
```
Note that the `Reinitialized existing Git repository` message shows that
the pre-clone step worked. The runner runs `git init`, which
overwrites the Git configuration with the appropriate settings to fetch
from the GitLab repository.
`CI_REPO_CACHE_CREDENTIALS` contains the Google Cloud service account
JSON for uploading to the `gitlab-ci-git-repo-cache` bucket. (If you're a
GitLab Team Member, find credentials in the
[GitLab shared 1Password account](https://about.gitlab.com/handbook/security/#1password-for-teams).
Note that this bucket should be located in the same continent as the
runner, or [you can incur network egress charges](https://cloud.google.com/storage/pricing).
--- ---
[Return to Development documentation](index.md) [Return to Development documentation](index.md)
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment