Commit d093ba78 authored by Marcel Amirault's avatar Marcel Amirault Committed by Suzanne Selhorn

Revamp CI/CD cache reference docs

parent 6bc3aace
......@@ -57,55 +57,69 @@ For runners to work with caches efficiently, you must do one of the following:
- Use multiple runners that have
[distributed caching](https://docs.gitlab.com/runner/configuration/autoscale.html#distributed-runners-caching),
where the cache is stored in S3 buckets. Shared runners on GitLab.com behave this way. These runners can be in autoscale mode,
but they don't have to be.
but they don't have to be.
- Use multiple runners with the same architecture and have these runners
share a common network-mounted directory to store the cache. This directory should use NFS or something similar.
These runners must be in autoscale mode.
These runners must be in autoscale mode.
### Share caches between jobs in the same branch
To have jobs for each branch use the same cache, define a cache with the `key: ${CI_COMMIT_REF_SLUG}`:
```yaml
cache:
key: ${CI_COMMIT_REF_SLUG}
```
## Use multiple caches
This configuration prevents you from accidentally overwriting the cache. However, the
first pipeline for a merge request is slow. The next time a commit is pushed to the branch, the
cache is re-used and jobs run faster.
> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/32814) in GitLab 13.10.
> - [Feature Flag removed](https://gitlab.com/gitlab-org/gitlab/-/issues/321877), in GitLab 13.12.
To enable per-job and per-branch caching:
You can have a maximum of four caches:
```yaml
cache:
key: "$CI_JOB_NAME-$CI_COMMIT_REF_SLUG"
test-job:
stage: build
cache:
- key:
files:
- Gemfile.lock
paths:
- vendor/ruby
- key:
files:
- yarn.lock
paths:
- .yarn-cache/
script:
- bundle install --path=vendor
- yarn install --cache-folder .yarn-cache
- echo Run tests...
```
To enable per-stage and per-branch caching:
If multiple caches are combined with a [Fallback cache key](#fallback-cache-key),
the fallback cache is fetched every time a cache is not found.
```yaml
cache:
key: "$CI_JOB_STAGE-$CI_COMMIT_REF_SLUG"
```
## Fallback cache key
### Share caches across jobs in different branches
> [Introduced](https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/1534) in GitLab Runner 13.4.
To share a cache across all branches and all jobs, use the same key for everything:
You can use the `$CI_COMMIT_REF_SLUG` [predefined variable](../variables/predefined_variables.md)
to specify your [`cache:key`](../yaml/README.md#cachekey). For example, if your
`$CI_COMMIT_REF_SLUG` is `test` you can set a job to download cache that's tagged with `test`.
```yaml
cache:
key: one-key-to-rule-them-all
```
If a cache with this tag is not found, you can use `CACHE_FALLBACK_KEY` to
specify a cache to use when none exists.
To share caches between branches, but have a unique cache for each job:
In the following example, if the `$CI_COMMIT_REF_SLUG` is not found, the job uses the key defined
by the `CACHE_FALLBACK_KEY` variable:
```yaml
cache:
key: ${CI_JOB_NAME}
variables:
CACHE_FALLBACK_KEY: fallback-key
job1:
script:
- echo
cache:
key: "$CI_COMMIT_REF_SLUG"
paths:
- binaries/
```
### Disable cache for specific jobs
## Disable cache for specific jobs
If you have defined the cache globally, it means that each job uses the
same definition. You can override this behavior per-job, and if you want to
......@@ -116,7 +130,7 @@ job:
cache: {}
```
### Inherit global configuration, but override specific settings per job
## Inherit global configuration, but override specific settings per job
You can override cache settings without overwriting the global cache by using
[anchors](../yaml/README.md#anchors). For example, if you want to override the
......@@ -124,7 +138,7 @@ You can override cache settings without overwriting the global cache by using
```yaml
cache: &global_cache
key: ${CI_COMMIT_REF_SLUG}
key: $CI_COMMIT_REF_SLUG
paths:
- node_modules/
- public/
......@@ -150,6 +164,49 @@ PHP packages, Ruby gems, Python libraries, and others can all be cached.
For more examples, check out our [GitLab CI/CD templates](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/ci/templates).
### Share caches between jobs in the same branch
To have jobs for each branch use the same cache, define a cache with the `key: $CI_COMMIT_REF_SLUG`:
```yaml
cache:
key: $CI_COMMIT_REF_SLUG
```
This configuration prevents you from accidentally overwriting the cache. However, the
first pipeline for a merge request is slow. The next time a commit is pushed to the branch, the
cache is re-used and jobs run faster.
To enable per-job and per-branch caching:
```yaml
cache:
key: "$CI_JOB_NAME-$CI_COMMIT_REF_SLUG"
```
To enable per-stage and per-branch caching:
```yaml
cache:
key: "$CI_JOB_STAGE-$CI_COMMIT_REF_SLUG"
```
### Share caches across jobs in different branches
To share a cache across all branches and all jobs, use the same key for everything:
```yaml
cache:
key: one-key-to-rule-them-all
```
To share caches between branches, but have a unique cache for each job:
```yaml
cache:
key: $CI_JOB_NAME
```
### Cache Node.js dependencies
If your project is using [npm](https://www.npmjs.com/) to install the Node.js
......@@ -166,7 +223,7 @@ image: node:latest
# Cache modules in between jobs
cache:
key: ${CI_COMMIT_REF_SLUG}
key: $CI_COMMIT_REF_SLUG
paths:
- .npm/
......@@ -193,7 +250,7 @@ image: php:7.2
# Cache libraries in between jobs
cache:
key: ${CI_COMMIT_REF_SLUG}
key: $CI_COMMIT_REF_SLUG
paths:
- vendor/
......@@ -262,7 +319,7 @@ image: ruby:2.6
# Cache gems in between builds
cache:
key: ${CI_COMMIT_REF_SLUG}
key: $CI_COMMIT_REF_SLUG
paths:
- vendor/ruby
......@@ -287,7 +344,7 @@ cache:
key:
files:
- Gemfile.lock
prefix: ${CI_JOB_NAME}
prefix: $CI_JOB_NAME
paths:
- vendor/ruby
......
......@@ -2351,250 +2351,215 @@ as Review Apps. You can see an example that uses Review Apps at
Use `cache` to specify a list of files and directories to
cache between jobs. You can only use paths that are in the local working copy.
If `cache` is defined outside the scope of jobs, it's set
globally and all jobs use that configuration.
Caching is shared between pipelines and jobs. Caches are restored before [artifacts](#artifacts).
Read how caching works and find out some good practices in the
[caching dependencies documentation](../caching/index.md).
Learn more about caches in [Caching in GitLab CI/CD](../caching/index.md).
#### `cache:paths`
Use the `paths` directive to choose which files or directories to cache. Paths
are relative to the project directory (`$CI_PROJECT_DIR`) and can't directly link outside it.
You can use Wildcards that use [glob](https://en.wikipedia.org/wiki/Glob_(programming))
patterns and:
Use the `cache:paths` keyword to choose which files or directories to cache.
**Keyword type**: Job-specific. You can use it only as part of a job.
**Possible inputs**: An array of paths relative to the project directory (`$CI_PROJECT_DIR`).
You can use wildcards that use [glob](https://en.wikipedia.org/wiki/Glob_(programming))
patterns:
- In [GitLab Runner 13.0](https://gitlab.com/gitlab-org/gitlab-runner/-/issues/2620) and later,
[`doublestar.Glob`](https://pkg.go.dev/github.com/bmatcuk/doublestar@v1.2.2?tab=doc#Match).
- In GitLab Runner 12.10 and earlier,
[`filepath.Match`](https://pkg.go.dev/path/filepath#Match).
**Example of `cache:paths`**:
Cache all files in `binaries` that end in `.apk` and the `.config` file:
```yaml
rspec:
script: test
script:
- echo "This job uses a cache."
cache:
key: binaries-cache
paths:
- binaries/*.apk
- .config
```
Locally defined cache overrides globally defined options. The following `rspec`
job caches only `binaries/`:
```yaml
cache:
paths:
- my/files
rspec:
script: test
cache:
key: rspec
paths:
- binaries/
```
**Related topics**:
The cache is shared between jobs, so if you're using different
paths for different jobs, you should also set a different `cache:key`.
Otherwise cache content can be overwritten.
- See the [common `cache` use cases](../caching/index.md#common-use-cases) for more
`cache:paths` examples.
#### `cache:key`
The `key` keyword defines the affinity of caching between jobs.
You can have a single cache for all jobs, cache per-job, cache per-branch,
or any other way that fits your workflow. You can fine tune caching,
including caching data between different jobs or even different branches.
The `cache:key` variable can use any of the
[predefined variables](../variables/README.md). The default key, if not
set, is just literal `default`, which means everything is shared between
pipelines and jobs by default.
For example, to enable per-branch caching:
```yaml
cache:
key: "$CI_COMMIT_REF_SLUG"
paths:
- binaries/
```
If you use **Windows Batch** to run your shell scripts you need to replace
`$` with `%`:
Use the `cache:key` keyword to give each cache a unique identifying key. All jobs
that use the same cache key use the same cache, including in different pipelines.
```yaml
cache:
key: "%CI_COMMIT_REF_SLUG%"
paths:
- binaries/
```
If not set, the default key is `default`. All jobs with the `cache:` keyword but
no `cache:key` share the `default` cache.
The `cache:key` variable can't contain the `/` character, or the equivalent
URI-encoded `%2F`. A value made only of dots (`.`, `%2E`) is also forbidden.
You can specify a [fallback cache key](#fallback-cache-key) to use if the specified `cache:key` is not found.
**Keyword type**: Job-specific. You can use it only as part of a job.
##### Multiple caches
**Possible inputs**:
> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/32814) in GitLab 13.10.
> - [Feature Flag removed](https://gitlab.com/gitlab-org/gitlab/-/issues/321877), in GitLab 13.12.
- A string.
- A [predefined variables](../variables/README.md).
- A combination of both.
You can have a maximum of four caches:
**Example of `cache:key`**:
```yaml
test-job:
stage: build
cache:
- key:
files:
- Gemfile.lock
paths:
- vendor/ruby
- key:
files:
- yarn.lock
paths:
- .yarn-cache/
cache-job:
script:
- bundle install --path=vendor
- yarn install --cache-folder .yarn-cache
- echo Run tests...
- echo "This job uses a cache."
cache:
key: binaries-cache-$CI_COMMIT_REF_SLUG
paths:
- binaries/
```
If multiple caches are combined with a [Fallback cache key](#fallback-cache-key),
the fallback is fetched multiple times if multiple caches are not found.
#### Fallback cache key
> [Introduced](https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/1534) in GitLab Runner 13.4.
**Additional details**:
You can use the `$CI_COMMIT_REF_SLUG` [variable](#variables) to specify your [`cache:key`](#cachekey).
For example, if your `$CI_COMMIT_REF_SLUG` is `test` you can set a job
to download cache that's tagged with `test`.
- If you use **Windows Batch** to run your shell scripts you need to replace
`$` with `%`. For example: `key: %CI_COMMIT_REF_SLUG%`
- The `cache:key` value can't contain:
If a cache with this tag is not found, you can use `CACHE_FALLBACK_KEY` to
specify a cache to use when none exists.
- The `/` character, or the equivalent URI-encoded `%2F`.
- Only the `.` character (any number), or the equivalent URI-encoded `%2E`.
In the following example, if the `$CI_COMMIT_REF_SLUG` is not found, the job uses the key defined
by the `CACHE_FALLBACK_KEY` variable:
- The cache is shared between jobs, so if you're using different
paths for different jobs, you should also set a different `cache:key`.
Otherwise cache content can be overwritten.
```yaml
variables:
CACHE_FALLBACK_KEY: fallback-key
**Related topics**:
cache:
key: "$CI_COMMIT_REF_SLUG"
paths:
- binaries/
```
- You can specify a [fallback cache key](../caching/index.md#fallback-cache-key)
to use if the specified `cache:key` is not found.
- You can [use multiple cache keys](../caching/index.md#use-multiple-caches) in a single job.
- See the [common `cache` use cases](../caching/index.md#common-use-cases) for more
`cache:key` examples.
##### `cache:key:files`
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/18986) in GitLab v12.5.
The `cache:key:files` keyword extends the `cache:key` functionality by making it easier
to reuse some caches, and rebuild them less often, which speeds up subsequent pipeline
runs.
Use the `cache:key:files` keyword to generate a new key when one or two specific files
change. `cache:key:files` lets you reuse some caches, and rebuild them less often,
which speeds up subsequent pipeline runs.
When you include `cache:key:files`, you must also list the project files that are used to generate the key, up to a maximum of two files.
The cache `key` is a SHA checksum computed from the most recent commits (up to two, if two files are listed)
that changed the given files. If neither file is changed in any commits,
the fallback key is `default`.
**Keyword type**: Job-specific. You can use it only as part of a job.
**Possible inputs**: An array of one or two file paths.
**Example of `cache:key:files`**:
```yaml
cache:
key:
files:
- Gemfile.lock
- package.json
paths:
- vendor/ruby
- node_modules
cache-job:
script:
- echo "This job uses a cache."
cache:
key:
files:
- Gemfile.lock
- package.json
paths:
- vendor/ruby
- node_modules
```
This example creates a cache for Ruby and Node.js dependencies that
is tied to current versions of the `Gemfile.lock` and `package.json` files. Whenever one of
This example creates a cache for Ruby and Node.js dependencies. The cache
is tied to the current versions of the `Gemfile.lock` and `package.json` files. When one of
these files changes, a new cache key is computed and a new cache is created. Any future
job runs that use the same `Gemfile.lock` and `package.json` with `cache:key:files`
use the new cache, instead of rebuilding the dependencies.
**Additional details**: The cache `key` is a SHA computed from the most recent commits
that changed each listed file. If neither file is changed in any commits, the
fallback key is `default`.
##### `cache:key:prefix`
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/18986) in GitLab v12.5.
When you want to combine a prefix with the SHA computed for `cache:key:files`,
use the `prefix` keyword with `key:files`.
For example, if you add a `prefix` of `test`, the resulting key is: `test-feef9576d21ee9b6a32e30c5c79d0a0ceb68d1e5`.
If neither file is changed in any commits, the prefix is added to `default`, so the
key in the example would be `test-default`.
Use `cache:key:prefix` to combine a prefix with the SHA computed for [`cache:key:files`](#cachekeyfiles).
Like `cache:key`, `prefix` can use any of the [predefined variables](../variables/README.md),
but cannot include:
**Keyword type**: Job-specific. You can use it only as part of a job.
- the `/` character (or the equivalent URI-encoded `%2F`)
- a value made only of `.` (or the equivalent URI-encoded `%2E`)
**Possible inputs**:
```yaml
cache:
key:
files:
- Gemfile.lock
prefix: ${CI_JOB_NAME}
paths:
- vendor/ruby
- A string
- A [predefined variables](../variables/README.md)
- A combination of both.
**Example of `cache:key:prefix`**:
```yaml
rspec:
script:
- bundle exec rspec
- echo "This rspec job uses a cache."
cache:
key:
files:
- Gemfile.lock
prefix: $CI_JOB_NAME
paths:
- vendor/ruby
```
For example, adding a `prefix` of `$CI_JOB_NAME`
causes the key to look like: `rspec-feef9576d21ee9b6a32e30c5c79d0a0ceb68d1e5` and
the job cache is shared across different branches. If a branch changes
`Gemfile.lock`, that branch has a new SHA checksum for `cache:key:files`. A new cache key
is generated, and a new cache is created for that key.
If `Gemfile.lock` is not found, the prefix is added to
`default`, so the key in the example would be `rspec-default`.
For example, adding a `prefix` of `$CI_JOB_NAME` causes the key to look like `rspec-feef9576d21ee9b6a32e30c5c79d0a0ceb68d1e5`.
If a branch changes `Gemfile.lock`, that branch has a new SHA checksum for `cache:key:files`.
A new cache key is generated, and a new cache is created for that key. If `Gemfile.lock`
is not found, the prefix is added to `default`, so the key in the example would be `rspec-default`.
**Additional details**: If no file in `cache:key:files` is changed in any commits,
the prefix is added to the `default` key.
#### `cache:untracked`
Set `untracked: true` to cache all files that are untracked in your Git
repository:
Use `untracked: true` to cache all files that are untracked in your Git repository:
```yaml
rspec:
script: test
cache:
untracked: true
```
**Keyword type**: Job-specific. You can use it only as part of a job.
Cache all Git untracked files and files in `binaries`:
**Possible inputs**: `true` or `false` (default).
**Example of `cache:untracked`**:
```yaml
rspec:
script: test
cache:
untracked: true
paths:
- binaries/
```
**Additional details**:
- You can combine `cache:untracked` with `cache:paths` to cache all untracked files
as well as files in the configured paths. For example:
```yaml
rspec:
script: test
cache:
untracked: true
paths:
- binaries/
```
#### `cache:when`
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/18969) in GitLab 13.5 and GitLab Runner v13.5.0.
`cache:when` defines when to save the cache, based on the status of the job. You can
set `cache:when` to:
Use `cache:when` to define when to save the cache, based on the status of the job.
**Keyword type**: Job-specific. You can use it only as part of a job.
**Possible inputs**:
- `on_success` (default): Save the cache only when the job succeeds.
- `on_failure`: Save the cache only when the job fails.
- `always`: Always save the cache.
For example, to store a cache whether or not the job fails or succeeds:
**Example of `cache:untracked`**:
```yaml
rspec:
......@@ -2605,32 +2570,47 @@ rspec:
when: 'always'
```
This example stores the cache whether or not the job fails or succeeds.
#### `cache:policy`
The default behavior of a caching job is to download the files at the start of
execution, and to re-upload them at the end. Any changes made by the
job are persisted for future runs. This behavior is known as the `pull-push` cache
policy.
To change the upload and download behavior of a cache, use the `cache:policy` keyword.
By default, the job downloads the cache when the job starts, and uploads changes
to the cache when the job ends. This is the `pull-push` policy (default).
If you know the job does not alter the cached files, you can skip the upload step
by setting `policy: pull` in the job specification. You can add an ordinary cache
job at an earlier stage to ensure the cache is updated from time to time:
To set a job to only download the cache when the job starts, but never upload changes
when the job finishes, use `cache:policy:pull`.
```yaml
stages:
- setup
- test
To set a job to only upload a cache when the job finishes, but never download the
cache when the job starts, use `cache:policy:push`.
Use the `pull` policy when you have many jobs executing in parallel that use the same cache.
This policy speeds up job execution and reduces load on the cache server. You can
use a job with the `push` policy to build the cache.
**Keyword type**: Job-specific. You can use it only as part of a job.
**Possible inputs**:
- `pull`
- `push`
- `pull-push` (default)
prepare:
stage: setup
**Example of `cache:policy`**:
```yaml
prepare-dependencies-job:
stage: build
cache:
key: gems
paths:
- vendor/bundle
policy: push
script:
- bundle install --deployment
- echo "This job only downloads dependencies and builds the cache."
- echo "Downloading dependencies..."
rspec:
faster-test-job:
stage: test
cache:
key: gems
......@@ -2638,16 +2618,10 @@ rspec:
- vendor/bundle
policy: pull
script:
- bundle exec rspec ...
- echo "This job script uses the cache, but does not update it."
- echo "Running tests..."
```
Use the `pull` policy when you have many jobs executing in parallel that use caches. This
policy speeds up job execution and reduces load on the cache server.
If you have a job that unconditionally recreates the cache without
referring to its previous contents, you can skip the download step.
To do so, add `policy: push` to the job.
### `artifacts`
Use `artifacts` to specify a list of files and directories that are
......
......@@ -559,7 +559,7 @@ request, be sure to start the `dont-interrupt-me` job before pushing.
- `.qa-cache`
- `.yarn-cache`
- `.assets-compile-cache` (the key includes `${NODE_ENV}` so it's actually two different caches).
1. These cache definitions are composed of [multiple atomic caches](../ci/yaml/README.md#multiple-caches).
1. These cache definitions are composed of [multiple atomic caches](../ci/caching/index.md#use-multiple-caches).
1. Only 6 specific jobs, running in 2-hourly scheduled pipelines, are pushing (i.e. updating) to the caches:
- `update-setup-test-env-cache`, defined in [`.gitlab/ci/rails.gitlab-ci.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/ci/rails.gitlab-ci.yml).
- `update-gitaly-binaries-cache`, defined in [`.gitlab/ci/rails.gitlab-ci.yml`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/.gitlab/ci/rails.gitlab-ci.yml).
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment