Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
G
gitlab-ce
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
1
Merge Requests
1
Analytics
Analytics
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Commits
Issue Boards
Open sidebar
nexedi
gitlab-ce
Commits
3cdd002b
Commit
3cdd002b
authored
Aug 05, 2019
by
GitLab Bot
Browse files
Options
Browse Files
Download
Plain Diff
Automatic merge of gitlab-org/gitlab-ce master
parents
bac1119b
fdb89349
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
148 additions
and
0 deletions
+148
-0
doc/topics/git/index.md
doc/topics/git/index.md
+1
-0
doc/topics/git/partial_clone.md
doc/topics/git/partial_clone.md
+147
-0
No files found.
doc/topics/git/index.md
View file @
3cdd002b
...
...
@@ -72,6 +72,7 @@ The following are advanced topics for those who want to get the most out of Git:
-
[
Custom Git Hooks
](
../../administration/custom_hooks.md
)
-
[
Git Attributes
](
../../user/project/git_attributes.md
)
-
Git Submodules:
[
Using Git submodules with GitLab CI
](
../../ci/git_submodules.md#using-git-submodules-with-gitlab-ci
)
-
[
Partial Clone
](
partial_clone.md
)
## API
...
...
doc/topics/git/partial_clone.md
0 → 100644
View file @
3cdd002b
# Partial Clone for Large Repositories
CAUTION:
**Alpha:**
Partial Clone is an experimental feature, and will significantly increase
Gitaly resource utilization when performing a partial clone, and decrease
performance of subsequent fetch operations.
As Git repositories become very large, usability decreases as performance
decreases. One major challenge is cloning the repository, because Git will
download the entire repository including every commit and every version of
every object. This can be slow to transfer, and require large amounts of disk
space.
Historically, performing a
**shallow clone**
(
[
`--depth`
](
https://www.git-scm.com/docs/git-clone#Documentation/git-clone.txt---depthltdepthgt
)
)
has been the only way to reduce the amount of data transferred when cloning
a Git repository. This does not, however, allow filtering by sub-tree which is
important for monolithic repositories containing many projects, or by object
size preventing unnecessary large objects being downloaded.
[
Partial clone
](
https://github.com/git/git/blob/master/Documentation/technical/partial-clone.txt
)
is a performance optimization that "allows Git to function without having a
complete copy of the repository. The goal of this work is to allow Git better
handle extremely large repositories."
Specifically, using partial clone, it should be possible for Git to natively
support:
-
large objects, instead of using
[
Git LFS
](
https://git-lfs.github.com/
)
-
enormous repositories
Briefly, partial clone works by:
-
excluding objects from being transferred when cloning or fetching a
repository using a new
`--filter`
flag
-
downloading missing objects on demand
Follow
[
Git for enormous repositories
](
https://gitlab.com/groups/gitlab-org/-/epics/773
)
for roadmap and updates.
## Enabling partial clone
GitLab 12.1 uses Git 2.21.0 which has an arbitrary file access security
vulnerability when
`uploadpack.allowFilter`
is enabled, and should not be
enabled in production environments.
A feature flag is planned to enable
`uploadpack.allowFilter`
and
`uploadpack.allowAnySHA1InWant`
once the version of Git used by GitLab has been
updated to Git 2.22.0.
Follow
[
this issue
](
https://gitlab.com/gitlab-org/gitaly/issues/1553
)
for
updated.
## Excluding objects by size
Partial Clone allows large objects to be stored directly in the Git repository,
and be excluded from clones as desired by the user. This eliminates the error
prone process of deciding which objects should be stored in LFS or not. Using
partial clone, all files – large or small – may be treated the same.
With the
`uploadpack.allowFilter`
and
`uploadpack.allowAnySHA1InWant`
options
enabled on the Git server:
```
bash
# clone the repo, excluding blobs larger than 1 megabyte
git clone
--filter
=
blob:limit
=
1m <url>
# in the checkout step of the clone, and any subsequent operations
# any blobs that are needed will be downloaded on demand
git checkout feature-branch
```
## Excluding objects by path
Partial Clone allows clones to be filtered by path using a format similar to a
`.gitignore`
file stored inside the repository.
With the
`uploadpack.allowFilter`
and
`uploadpack.allowAnySHA1InWant`
options
enabled on the Git server:
1.
**Create a filter spec.**
For example, consider a monolithic repository with
many applications, each in a different subdirectory in the root. Create a file
`shiny-app/.filterspec`
using the GitLab web interface:
```.gitignore
# Only the paths listed in the file will be downloaded when performing a
# partial clone using `--filter=sparse:oid=shiny-app/.gitfilterspec`
# Explicitly include filterspec needed to configure sparse checkout with
# git config --local core.sparsecheckout true
# git show master:snazzy-app/.gitfilterspec >> .git/info/sparse-checkout
shiny-app/.gitfilterspec
# Shiny App
shiny-app/
# Dependencies
shimmery-app/
shared-component-a/
shared-component-b/
```
2.
*Create a new Git repository and fetch.*
Support for
`--filter=sparse:oid`
using the clone command is incomplete, so we will emulate the clone command
by hand, using
`git init`
and
`git fetch`
. Follow
[
gitaly#1769
](
https://gitlab.com/gitlab-org/gitaly/issues/1769
)
for updates.
```bash
# Create a new directory for the Git repository
mkdir jumbo-repo && cd jumbo-repo
# Initialize a new Git repository
git init
# Add the remote
git remote add origin git@gitlab.com/example/jumbo-repo
# Enable partial clone support for the remote
git config --local extensions.partialClone origin
# Fetch the filtered set of objects using the filterspec stored on the
# server. WARNING: this step is slow!
git fetch --filter=sparse:oid=master:shiny-app/.gitfilterspec origin
# Optional: observe there are missing objects that we have not fetched
git rev-list --all --quiet --objects --missing=print | wc -l
```
CAUTION: **IDE and Shell integrations:**
Git integrations with `bash`, `zsh`, etc and editors that automatically
show Git status information often run `git fetch` which will fetch the
entire repository. You many need to disable or reconfigure these
integrations.
3.
**Sparse checkout**
must be enabled and configured to prevent objects from
other paths being downloaded automatically when checking out branches. Follow
[
gitaly#1765
](
https://gitlab.com/gitlab-org/gitaly/issues/1765
)
for updates.
```bash
# Enable sparse checkout
git config --local core.sparsecheckout true
# Configure sparse checkout
git show master:snazzy-app/.gitfilterspec >> .git/info/sparse-checkout
# Checkout master
git checkout master
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment