Commit e75abeb5 authored by Amy Qualls's avatar Amy Qualls

Merge branch 'eread/refactor-reduce-repo-size-page' into 'master'

Refactor reduce repo size content

See merge request gitlab-org/gitlab!33924
parents 494e98a5 29b51562
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
type: reference, howto type: reference, howto
--- ---
# Protected Branches # Protected branches
[Permissions](../permissions.md) in GitLab are fundamentally defined around the [Permissions](../permissions.md) in GitLab are fundamentally defined around the
idea of having read or write permission to the repository and branches. To impose idea of having read or write permission to the repository and branches. To impose
......
...@@ -2,13 +2,13 @@ ...@@ -2,13 +2,13 @@
type: reference, howto type: reference, howto
--- ---
# Protected Tags # Protected tags
> [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10356) in GitLab 9.1. > [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/merge_requests/10356) in GitLab 9.1.
Protected Tags allow control over who has permission to create tags as well as preventing accidental update or deletion once created. Each rule allows you to match either an individual tag name, or use wildcards to control multiple tags at once. Protected tags allow control over who has permission to create tags as well as preventing accidental update or deletion once created. Each rule allows you to match either an individual tag name, or use wildcards to control multiple tags at once.
This feature evolved out of [Protected Branches](protected_branches.md) This feature evolved out of [protected branches](protected_branches.md)
## Overview ## Overview
......
...@@ -5,32 +5,34 @@ info: To determine the technical writer assigned to the Stage/Group associated w ...@@ -5,32 +5,34 @@ info: To determine the technical writer assigned to the Stage/Group associated w
type: howto type: howto
--- ---
# Reducing the repository size using Git # Reduce repository size
When large files are added to a Git repository this makes fetching the Git repositories become larger over time. When large files are added to a Git repository:
repository slower, because everyone will need to download the file. These files
can also take up a large amount of storage space on the server over time.
Rewriting a repository can remove unwanted history to make the repository - Fetching the repository becomes slower because everyone must download the files.
smaller. [`git filter-repo`](https://github.com/newren/git-filter-repo) is a - They take up a large amount of storage space on the server.
tool for quickly rewriting Git repository history, and is recommended over [`git - Git repository storage limits [can be reached](#storage-limits).
filter-branch`](https://git-scm.com/docs/git-filter-branch) and
[BFG](https://rtyley.github.io/bfg-repo-cleaner/). Rewriting a repository can remove unwanted history to make the repository smaller.
[`git filter-repo`](https://github.com/newren/git-filter-repo) is a tool for quickly rewriting Git
repository history, and is recommended over both:
- [`git filter-branch`](https://git-scm.com/docs/git-filter-branch).
- [BFG](https://rtyley.github.io/bfg-repo-cleaner/).
DANGER: **Danger:** DANGER: **Danger:**
Rewriting repository history is a destructive operation. Make sure to backup Rewriting repository history is a destructive operation. Make sure to backup your repository before
your repository before you begin. The best way is to [export the you begin. The best way back up a repository is to
project](../settings/import_export.html#exporting-a-project-and-its-data). [export the project](../settings/import_export.md#exporting-a-project-and-its-data).
## Purging files from your repository history ## Purge files from repository history
To make cloning your project faster, rewrite branches and tags to remove To make cloning your project faster, rewrite branches and tags to remove unwanted files.
unwanted files.
1. [Install `git filter-repo`](https://github.com/newren/git-filter-repo/blob/master/INSTALL.md) 1. [Install `git filter-repo`](https://github.com/newren/git-filter-repo/blob/master/INSTALL.md)
using a supported package manager, or from source. using a supported package manager or from source.
1. Clone a fresh copy of the repository using `--bare`. 1. Clone a fresh copy of the repository using `--bare`:
```shell ```shell
git clone --bare https://example.gitlab.com/my/project.git git clone --bare https://example.gitlab.com/my/project.git
...@@ -44,84 +46,92 @@ unwanted files. ...@@ -44,84 +46,92 @@ unwanted files.
git filter-repo --strip-blobs-bigger-than 10M git filter-repo --strip-blobs-bigger-than 10M
``` ```
To purge specific large files by path, the `--path` and `--invert-paths` To purge specific large files by path, the `--path` and `--invert-paths` options can be combined:
options can be combined.
```shell ```shell
git filter-repo --path path/to/big/file.m4v --invert-paths git filter-repo --path path/to/big/file.m4v --invert-paths
``` ```
See the [`git filter-repo` documentation](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES) See the
for more examples, and the complete documentation. [`git filter-repo` documentation](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
for more examples and the complete documentation.
1. Running `git filter-repo` removes all remotes. To restore the remote for your project, run:
```shell
git remote add origin https://example.gitlab.com/<namespace>/<project_name>.git
```
1. Force push your changes to overwrite all branches on GitLab. 1. Force push your changes to overwrite all branches on GitLab:
```shell ```shell
git push origin --force --all git push origin --force --all
``` ```
[Protected Branches](../protected_branches.md) will cause this to fail. To [Protected branches](../protected_branches.md) will cause this to fail. To proceed, you must
proceed you will need to remove branch protection, push, and then remove branch protection, push, and then re-enable protected branches.
reconfigure protected branches.
1. To remove large files from tagged releases, force push your changes to all 1. To remove large files from tagged releases, force push your changes to all tags on GitLab:
tags on GitLab.
```shell ```shell
git push origin --force --tags git push origin --force --tags
``` ```
[Protected Tags](../protected_tags.md) will cause this to [Protected tags](../protected_tags.md) will cause this to fail. To proceed, you must remove tag
fail. To proceed you will need to remove tag protection, push, and then protection, push, and then re-enable protected tags.
reconfigure protected tags.
## Purge files from GitLab storage
To reduce the size of your repository in GitLab, you must remove GitLab internal references to
commits that contain large files. Before completing these steps,
[purge files from your repository history](#purge-files-from-repository-history).
## Purging files from GitLab storage As well as [branches](branches/index.md) and tags, which are a type of Git ref, GitLab automatically
creates other refs. These refs prevent dead links to commits, or missing diffs when viewing merge
requests. [Repository cleanup](#repository-cleanup) can be used to remove these from GitLab.
To reduce the size of your repository in GitLab you will need to remove GitLab The following internal refs are not advertised:
internal refs that reference commits contain large files. Before completing
these steps, first [purged files from your repository history](#purging-files-from-your-repository-history).
As well as branches and tags, which are a type of Git ref, GitLab automatically - `refs/merge-requests/*` for merge requests.
creates other refs. These refs prevent dead links to commits, or missing diffs - `refs/pipelines/*` for
when viewing merge requests. [Repository cleanup](#repository-cleanup) can be [pipelines](../../../ci/pipelines/index.md#troubleshooting-fatal-reference-is-not-a-tree).
used to remove these from GitLab. - `refs/environments/*` for environments.
The internal refs for merge requests (`refs/merge-requests/*`), This means they are not usually included when fetching, which makes fetching faster. In addition,
[pipelines](../../../ci/pipelines/index.md#troubleshooting-fatal-reference-is-not-a-tree) `refs/keep-around/*` are hidden refs to prevent commits with discussion from being deleted and
(`refs/pipelines/*`), and environments (`refs/environments/*`) are not cannot be fetched at all.
advertised, which means they are not included when fetching, which makes
fetching faster. The hidden refs to prevent commits with discussion from being However, these refs can be accessed from the Git bundle inside a project export.
deleted (`refs/keep-around/*`) cannot be fetched at all. These refs can,
however, be accessed from the Git bundle inside the project export.
1. [Install `git filter-repo`](https://github.com/newren/git-filter-repo/blob/master/INSTALL.md) 1. [Install `git filter-repo`](https://github.com/newren/git-filter-repo/blob/master/INSTALL.md)
using a supported package manager, or from source. using a supported package manager or from source.
1. Generate a fresh [export from the project](../settings/import_export.md#exporting-a-project-and-its-data) and 1. Generate a fresh [export from the
download to your computer. project](../settings/import_export.html#exporting-a-project-and-its-data) and download it.
1. Decompress the backup using `tar` 1. Decompress the backup using `tar`:
```shell ```shell
tar xzf project-backup.tar.gz tar xzf project-backup.tar.gz
``` ```
This will contain a `project.bundle` file, which was created by [`git bundle`](https://git-scm.com/docs/git-bundle) This will contain a `project.bundle` file, which was created by
[`git bundle`](https://git-scm.com/docs/git-bundle).
1. Clone a fresh copy of the repository from the bundle. 1. Clone a fresh copy of the repository from the bundle:
```shell ```shell
git clone --bare --mirror /path/to/project.bundle git clone --bare --mirror /path/to/project.bundle
``` ```
1. Using `git filter-repo`, purge any files from the history of your repository. 1. Using `git filter-repo`, purge any files from the history of your repository. Because we are
Because we are trying to remove internal refs, we will rely on the trying to remove internal refs, we will rely on the `commit-map` produced by each run to tell us
`commit-map` produced by each run to tell us which internal refs to remove. which internal refs to remove.
NOTE:**Note:** NOTE:**Note:**
`git filter-repo` creates a new `commit-map` file every run, and overwrite the `git filter-repo` creates a new `commit-map` file every run, and overwrite the `commit-map` from
`commit-map` from the previous run. You will need this file from **every** the previous run. You will need this file from **every** run. Do the next step every time you run
run. Do the next step every time you run `git filter-repo`. `git filter-repo`.
To purge all large files, the `--strip-blobs-bigger-than` option can be used: To purge all large files, the `--strip-blobs-bigger-than` option can be used:
...@@ -129,110 +139,106 @@ however, be accessed from the Git bundle inside the project export. ...@@ -129,110 +139,106 @@ however, be accessed from the Git bundle inside the project export.
git filter-repo --strip-blobs-bigger-than 10M git filter-repo --strip-blobs-bigger-than 10M
``` ```
To purge specific large files by path, the `--path` and `--invert-paths` To purge specific large files by path, the `--path` and `--invert-paths` options can be combined.
options can be combined.
```shell ```shell
git filter-repo --path path/to/big/file.m4v --invert-paths git filter-repo --path path/to/big/file.m4v --invert-paths
``` ```
See the [`git filter-repo` documentation](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES) See the
for more examples, and the complete documentation. [`git filter-repo` documentation](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#EXAMPLES)
for more examples and the complete documentation.
1. After running `git filter-repo`, the header and unchanged commits need to be
removed from the `commit-map` before uploading to GitLab.
```shell
tail -n +2 filter-repo/commit-map | grep -E -v '^(\w+) \1$' >> commit-map.txt
```
This command can be run after each run of `git filter-repo` to append the
output of the run to `commit-map.txt`
1. Navigate to **Project > Settings > Repository > Repository Cleanup**.
Upload the `commit-map.txt` file and press **Start cleanup**. This will 1. Run a [repository cleanup](#repository-cleanup).
remove any internal Git references to the old commits, and run `git gc`
against the repository. You will receive an email once it has completed.
## Repository cleanup ## Repository cleanup
> [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/19376) in GitLab 11.6. > [Introduced](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/19376) in GitLab 11.6.
Repository cleanup allows you to upload a text file of objects and GitLab will remove Repository cleanup allows you to upload a text file of objects and GitLab will remove internal Git
internal Git references to these objects. references to these objects. You can use
[`git filter-repo`](https://github.com/newren/git-filter-repo) to produce a list of objects (in a
`commit-map` file) that can be used with repository cleanup.
To clean up a repository: To clean up a repository:
1. Go to the project for the repository. 1. Go to the project for the repository.
1. Navigate to **{settings}** **Settings > Repository**. 1. Navigate to **{settings}** **Settings > Repository**.
1. Upload a list of objects. 1. Upload a list of objects. For example, a `commit-map` file.
1. Click **Start cleanup**. 1. Click **Start cleanup**.
This will remove any internal Git references to old commits, and run `git gc` This will:
against the repository. You will receive an email once it has completed.
These tools produce suitable output for purging history on the server: - Remove any internal Git references to old commits.
- Run `git gc` against the repository.
- [`git filter-repo`](https://github.com/newren/git-filter-repo): use the You will receive an email once it has completed.
`commit-map` file.
- [BFG](https://rtyley.github.io/bfg-repo-cleaner/): use the When using repository cleanup, note:
`object-id-map.old-new.txt` file.
NOTE: **Note:** - Housekeeping prunes loose objects older than 2 weeks. This means objects added in the last 2 weeks
Housekeeping prunes loose objects older than 2 weeks. This means objects added will not be removed immediately. If you have access to the
in the last 2 weeks will not be removed immediately. If you have access to the [Gitaly](../../../administration/gitaly/index.md) server, you may run `git gc --prune=now` to
Gitaly server, you may run `git gc --prune=now` to prune all loose object prune all loose objects immediately.
immediately. - This process will remove some copies of the rewritten commits from GitLab's cache and database,
but there are still numerous gaps in coverage and some of the copies may persist indefinitely.
[Clearing the instance cache](../../../administration/raketasks/maintenance.md#clear-redis-cache)
may help to remove some of them, but it should not be depended on for security purposes!
NOTE: **Note:** ## Storage limits
This process will remove some copies of the rewritten commits from GitLab's
cache and database, but there are still numerous gaps in coverage - at present, Repository size limits:
some of the copies may persist indefinitely. [Clearing the instance
cache](../../../administration/raketasks/maintenance.md#clear-redis-cache) may - Can [be set by an administrator](../../admin_area/settings/account_and_limit_settings.md#repository-size-limit-starter-only)
help to remove some of them, but it should not be depended on for security on self-managed instances. **(STARTER ONLY)**
purposes! - Are [set for GitLab.com](../../gitlab_com/index.md#repository-size-limit).
## Exceeding storage limit When a project has reached its size limit, you cannot:
A GitLab Enterprise Edition administrator can set a [repository size - Push to the project.
limit](../../admin_area/settings/account_and_limit_settings.md) which will - Create a new merge request.
prevent you from exceeding it. - Merge existing merge requests.
- Upload LFS objects.
When a project has reached its size limit, you will not be able to push to it,
create a new merge request, or merge existing ones. You will still be able to You can still:
create new issues, and clone the project though. Uploading LFS objects will
also be denied. - Create new issues.
- Clone the project.
If you exceed the repository size limit, your first thought might be to remove
some data, make a new commit and push back to the repository. Perhaps you can If you exceed the repository size limit, you might try to:
move some blobs to LFS, or remove some old dependency updates from history.
Unfortunately, it's not so easy and that workflow won't work. Deleting files in 1. Remove some data.
a commit doesn't actually reduce the size of the repo since the earlier commits 1. Make a new commit.
and blobs are still around. What you need to do is rewrite history with Git's 1. Push back to the repository.
[`filter-branch` option](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History#The-Nuclear-Option:-filter-branch),
or an open source community-maintained tool like the Perhaps you might also:
- Move some blobs to LFS.
- Remove some old dependency updates from history.
Unfortunately, this workflow won't work. Deleting files in a commit doesn't actually reduce the size
of the repository because the earlier commits and blobs still exist.
What you need to do is rewrite history. We recommend the open-source community-maintained tool
[`git filter-repo`](https://github.com/newren/git-filter-repo). [`git filter-repo`](https://github.com/newren/git-filter-repo).
Note that even with that method, until `git gc` runs on the GitLab side, the NOTE: **Note:**
"removed" commits and blobs will still be around. You also need to be able to Until `git gc` runs on the GitLab side, the "removed" commits and blobs will still exist. You also
push the rewritten history to GitLab, which may be impossible if you've already must be able to push the rewritten history to GitLab, which may be impossible if you've already
exceeded the maximum size limit. exceeded the maximum size limit.
In order to lift these restrictions, the administrator of the GitLab instance In order to lift these restrictions, the administrator of the self-managed GitLab instance must
needs to increase the limit on the particular project that exceeded it, so it's increase the limit on the particular project that exceeded it. Therefore, it's always better to
always better to spot that you're approaching the limit and act proactively to proactively stay underneath the limit. If you hit the limit, and can't have it temporarily
stay underneath it. If you hit the limit, and your admin can't - or won't - increased, your only option is to:
temporarily increase it for you, your only option is to prune all the unneeded
stuff locally, and then create a new project on GitLab and start using that 1. Prune all the unneeded stuff locally.
instead. 1. Create a new project on GitLab and start using that instead.
CAUTION: **Caution:** CAUTION: **Caution:**
This process is not suitable for removing sensitive data like password or keys This process is not suitable for removing sensitive data like password or keys from your repository.
from your repository. Information about commits, including file content, is Information about commits, including file content, is cached in the database, and will remain
cached in the database, and will remain visible even after they have been visible even after they have been removed from the repository.
removed from the repository.
<!-- ## Troubleshooting <!-- ## Troubleshooting
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment