Merge branch 'jv-enable-workhorse-sidechannel-2' into 'master'

Enable workhorse_use_sidechannel by default See merge request gitlab-org/gitlab!73536

Merge branch 'jv-enable-workhorse-sidechannel-2' into 'master'
Enable workhorse_use_sidechannel by default See merge request gitlab-org/gitlab!73536
530df3c2 · Sean McGivern · 23b10a21 · 3c9766d6 · 530df3c2 · 530df3c2
Commit 530df3c2 authored Nov 05, 2021 by Sean McGivern
4 changed files
--- a/config/feature_flags/development/workhorse_use_sidechannel.yml
+++ b/config/feature_flags/development/workhorse_use_sidechannel.yml
@@ -5,4 +5,4 @@ rollout_issue_url: https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1
 milestone: '14.4'
 type: development
 group: 'group::scalability'
-default_enabled: false
+default_enabled: true
--- a/doc/ci/large_repositories/index.md
+++ b/doc/ci/large_repositories/index.md
@@ -250,12 +250,15 @@ concurrent = 4
 This makes the cloning configuration to be part of the given runner
 and does not require us to update each `.gitlab-ci.yml`.
-## Pre-clone step
+## Git fetch caching or pre-clone step
-> [An issue exists](https://gitlab.com/groups/gitlab-com/gl-infra/-/epics/463) to remove the need for this optimization.
+For very active repositories with a large number of references and files, you can either (or both):
-For very active repositories with a large number of references and files, you can also
+- Consider using the [Gitaly pack-objects cache](../../administration/gitaly/configure_gitaly.md#pack-objects-cache) instead of a
-optimize your CI jobs by seeding repository data with GitLab Runner's [`pre_clone_script`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section).
+  pre-clone step. This is easier to set up and it benefits all repositories on your GitLab server, unlike the pre-clone step that
+  must be configured per-repository. The pack-objects cache also automatically works for forks. For `gitlab-org/gitlab` development
-See [our development documentation](../../development/pipelines.md#pre-clone-step) for
+  on GitLab.com, we stopped using a pre-clone step.
-an overview of how we implemented this approach on GitLab.com for the main GitLab repository.
+- Optimize your CI/CD jobs by seeding repository data in a pre-clone step with the
+  [`pre_clone_script`](https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section) of GitLab Runner. See our
+  [development documentation](../../development/pipelines.md#pre-clone-step) for an overview of how we used to implement this approach on
+  GitLab.com for the main GitLab repository.
--- a/doc/development/pipelines.md
+++ b/doc/development/pipelines.md
@@ -793,19 +793,30 @@ request, be sure to start the `dont-interrupt-me` job before pushing.
 We limit the artifacts that are saved and retrieved by jobs to the minimum in order to reduce the upload/download time and costs, as well as the artifacts storage.
-### Pre-clone step
+### Git fetch caching
+Because GitLab.com uses the [pack-objects cache](../administration/gitaly/configure_gitaly.md#pack-objects-cache),
+concurrent Git fetches of the same pipeline ref are deduplicated on
+the Gitaly server (always) and served from cache (when available).
+This works well for the following reasons:
-The `gitlab-org/gitlab` project on GitLab.com uses a [pre-clone step](https://gitlab.com/gitlab-org/gitlab/-/issues/39134)
+- The pack-objects cache is enabled on all Gitaly servers on GitLab.com.
-to seed the project with a recent archive of the repository. This is done for
+- The CI/CD [Git strategy setting](../ci/pipelines/settings.md#choose-the-default-git-strategy) for `gitlab-org/gitlab` is **Git clone**,
-several reasons:
+  causing all jobs to fetch the same data, which maximizes the cache hit ratio.
+- We use [shallow clone](../ci/pipelines/settings.md#limit-the-number-of-changes-fetched-during-clone) to avoid downloading the full Git
+  history for every job.
+### Pre-clone step
- It speeds up builds because a 800 MB download only takes seconds, as opposed to a full Git clone.
+NOTE:
- It significantly reduces load on the file server, as smaller deltas mean less time spent in `git pack-objects`.
+We no longer use this optimization for `gitlab-org/gitlab` because the [pack-objects cache](../administration/gitaly/configure_gitaly.md#pack-objects-cache)
+allows Gitaly to serve the full CI/CD fetch traffic now. See [Git fetch caching](#git-fetch-caching).
 The pre-clone step works by using the `CI_PRE_CLONE_SCRIPT` variable
 [defined by GitLab.com shared runners](../ci/runners/build_cloud/linux_build_cloud.md#pre-clone-script).
-The `CI_PRE_CLONE_SCRIPT` is currently defined as a project CI/CD variable:
+The `CI_PRE_CLONE_SCRIPT` is defined as a project CI/CD variable:
 ```shell
 (

--- a/doc/update/index.md
+++ b/doc/update/index.md
@@ -305,10 +305,14 @@ and [Helm Chart deployments](https://docs.gitlab.com/charts/). They come with ap
 ### 14.5.0
-When `make` is run, Gitaly builds are now created in `_build/bin` and no longer in the root directory of the source directory. If you
+- When `make` is run, Gitaly builds are now created in `_build/bin` and no longer in the root directory of the source directory. If you
 are using a source install, update paths to these binaries in your [systemd unit files](upgrading_from_source.md#configure-systemd-units)
 or [init scripts](upgrading_from_source.md#configure-sysv-init-script) by [following the documentation](upgrading_from_source.md).
+- Connections between Workhorse and Gitaly use the Gitaly `backchannel` protocol by default. If you deployed a gRPC proxy between Workhorse and Gitaly,
+  Workhorse can no longer connect. As a workaround, [disable the temporary `workhorse_use_sidechannel`](../administration/feature_flags.md#enable-or-disable-the-feature)
+  feature flag. If you need a proxy between Workhorse and Gitaly, use a TCP proxy.
 ### 14.4.0
 Git 2.33.x and later is required. We recommend you use the