Commit 8ba36cd0 authored by Craig Norris's avatar Craig Norris

Merge branch 'docs-aqualls-future-tense-2' into 'master'

Clean up future tense for present tense

See merge request gitlab-org/gitlab!49290
parents 90f18823 a4c843f5
...@@ -10,7 +10,7 @@ The GitLab CI/CD pipeline includes a `danger-review` job that uses [Danger](http ...@@ -10,7 +10,7 @@ The GitLab CI/CD pipeline includes a `danger-review` job that uses [Danger](http
to perform a variety of automated checks on the code under test. to perform a variety of automated checks on the code under test.
Danger is a gem that runs in the CI environment, like any other analysis tool. Danger is a gem that runs in the CI environment, like any other analysis tool.
What sets it apart from, e.g., RuboCop, is that it's designed to allow you to What sets it apart from (for example, RuboCop) is that it's designed to allow you to
easily write arbitrary code to test properties of your code or changes. To this easily write arbitrary code to test properties of your code or changes. To this
end, it provides a set of common helpers and access to information about what end, it provides a set of common helpers and access to information about what
has actually changed in your environment, then simply runs your code! has actually changed in your environment, then simply runs your code!
...@@ -32,7 +32,7 @@ from the start of the merge request. ...@@ -32,7 +32,7 @@ from the start of the merge request.
### Disadvantages ### Disadvantages
- It's not obvious Danger will update the old comment, thus you need to - It's not obvious Danger updates the old comment, thus you need to
pay attention to it if it is updated or not. pay attention to it if it is updated or not.
## Run Danger locally ## Run Danger locally
...@@ -48,13 +48,12 @@ bin/rake danger_local ...@@ -48,13 +48,12 @@ bin/rake danger_local
On startup, Danger reads a [`Dangerfile`](https://gitlab.com/gitlab-org/gitlab/blob/master/Dangerfile) On startup, Danger reads a [`Dangerfile`](https://gitlab.com/gitlab-org/gitlab/blob/master/Dangerfile)
from the project root. GitLab's Danger code is decomposed into a set of helpers from the project root. GitLab's Danger code is decomposed into a set of helpers
and plugins, all within the [`danger/`](https://gitlab.com/gitlab-org/gitlab-foss/tree/master/danger/) and plugins, all within the [`danger/`](https://gitlab.com/gitlab-org/gitlab-foss/tree/master/danger/)
subdirectory, so ours just tells Danger to load it all. Danger will then run subdirectory, so ours just tells Danger to load it all. Danger then runs
each plugin against the merge request, collecting the output from each. A plugin each plugin against the merge request, collecting the output from each. A plugin
may output notifications, warnings, or errors, all of which are copied to the may output notifications, warnings, or errors, all of which are copied to the
CI job's log. If an error happens, the CI job (and so the entire pipeline) will CI job's log. If an error happens, the CI job (and so the entire pipeline) fails.
be failed.
On merge requests, Danger will also copy the output to a comment on the MR On merge requests, Danger also copies the output to a comment on the MR
itself, increasing visibility. itself, increasing visibility.
## Development guidelines ## Development guidelines
...@@ -75,17 +74,17 @@ often face similar challenges, after all. Think about how you could fulfill the ...@@ -75,17 +74,17 @@ often face similar challenges, after all. Think about how you could fulfill the
same need while ensuring everyone can benefit from the work, and do that instead same need while ensuring everyone can benefit from the work, and do that instead
if you can. if you can.
If a standard tool (e.g. `rubocop`) exists for a task, it is better to use it If a standard tool (for example, `rubocop`) exists for a task, it's better to
directly, rather than calling it via Danger. Running and debugging the results use it directly, rather than calling it by using Danger. Running and debugging
of those tools locally is easier if Danger isn't involved, and unless you're the results of those tools locally is easier if Danger isn't involved, and
using some Danger-specific functionality, there's no benefit to including it in unless you're using some Danger-specific functionality, there's no benefit to
the Danger run. including it in the Danger run.
Danger is well-suited to prototyping and rapidly iterating on solutions, so if Danger is well-suited to prototyping and rapidly iterating on solutions, so if
what we want to build is unclear, a solution in Danger can be thought of as a what we want to build is unclear, a solution in Danger can be thought of as a
trial run to gather information about a product area. If you're doing this, make trial run to gather information about a product area. If you're doing this, make
sure the problem you're trying to solve, and the outcomes of that prototyping, sure the problem you're trying to solve, and the outcomes of that prototyping,
are captured in an issue or epic as you go along. This will help us to address are captured in an issue or epic as you go along. This helps us to address
the need as part of the product in a future version of GitLab! the need as part of the product in a future version of GitLab!
### Implementation details ### Implementation details
...@@ -110,16 +109,17 @@ At present, we do this by putting the code in a module in `lib/gitlab/danger/... ...@@ -110,16 +109,17 @@ At present, we do this by putting the code in a module in `lib/gitlab/danger/...
and including it in the matching `danger/plugins/...` file. Specs can then be and including it in the matching `danger/plugins/...` file. Specs can then be
added in `spec/lib/gitlab/danger/...`. added in `spec/lib/gitlab/danger/...`.
You'll only know if your `Dangerfile` works by pushing the branch that contains To determine if your `Dangerfile` works, push the branch that contains it to
it to GitLab. This can be quite frustrating, as it significantly increases the GitLab. This can be quite frustrating, as it significantly increases the cycle
cycle time when developing a new task, or trying to debug something in an time when developing a new task, or trying to debug something in an existing
existing one. If you've followed the guidelines above, most of your code can one. If you've followed the guidelines above, most of your code can be exercised
be exercised locally in RSpec, minimizing the number of cycles you need to go locally in RSpec, minimizing the number of cycles you need to go through in CI.
through in CI. However, you can speed these cycles up somewhat by emptying the However, you can speed these cycles up somewhat by emptying the
`.gitlab/ci/rails.gitlab-ci.yml` file in your merge request. Just don't forget `.gitlab/ci/rails.gitlab-ci.yml` file in your merge request. Just don't forget
to revert the change before merging! to revert the change before merging!
To enable the Dangerfile on another existing GitLab project, run the following extra steps, based on [this procedure](https://danger.systems/guides/getting_started.html#creating-a-bot-account-for-danger-to-use): To enable the Dangerfile on another existing GitLab project, run the following
extra steps, based on [this procedure](https://danger.systems/guides/getting_started.html#creating-a-bot-account-for-danger-to-use):
1. Add `@gitlab-bot` to the project as a `reporter`. 1. Add `@gitlab-bot` to the project as a `reporter`.
1. Add the `@gitlab-bot`'s `GITLAB_API_PRIVATE_TOKEN` value as a value for a new CI/CD 1. Add the `@gitlab-bot`'s `GITLAB_API_PRIVATE_TOKEN` value as a value for a new CI/CD
...@@ -156,10 +156,10 @@ at GitLab so far: ...@@ -156,10 +156,10 @@ at GitLab so far:
To work around this, you can add an [environment To work around this, you can add an [environment
variable](../ci/variables/README.md) called variable](../ci/variables/README.md) called
`DANGER_GITLAB_API_TOKEN` with a personal API token to your `DANGER_GITLAB_API_TOKEN` with a personal API token to your
fork. That way the danger comments will be made from CI using that fork. That way the danger comments are made from CI using that
API token instead. API token instead.
Making the variable Making the variable
[masked](../ci/variables/README.md#mask-a-custom-variable) will make sure [masked](../ci/variables/README.md#mask-a-custom-variable) makes sure
it doesn't show up in the job logs. The variable cannot be it doesn't show up in the job logs. The variable cannot be
[protected](../ci/variables/README.md#protect-a-custom-variable), [protected](../ci/variables/README.md#protect-a-custom-variable),
as it needs to be present for all feature branches. as it needs to be present for all feature branches.
...@@ -146,7 +146,7 @@ Remember: ...@@ -146,7 +146,7 @@ Remember:
advance of a milestone release and for larger documentation changes. advance of a milestone release and for larger documentation changes.
- You can request a post-merge Technical Writer review of documentation if it's important to get the - You can request a post-merge Technical Writer review of documentation if it's important to get the
code with which it ships merged as soon as possible. In this case, the author of the original MR code with which it ships merged as soon as possible. In this case, the author of the original MR
will address the feedback provided by the Technical Writer in a follow-up MR. can address the feedback provided by the Technical Writer in a follow-up MR.
- The Technical Writer can also help decide that documentation can be merged without Technical - The Technical Writer can also help decide that documentation can be merged without Technical
writer review, with the review to occur soon after merge. writer review, with the review to occur soon after merge.
......
...@@ -143,7 +143,7 @@ There are a few gotchas with it: ...@@ -143,7 +143,7 @@ There are a few gotchas with it:
- you should always [`extend ::Gitlab::Utils::Override`](utilities.md#override) and use `override` to - you should always [`extend ::Gitlab::Utils::Override`](utilities.md#override) and use `override` to
guard the "overrider" method to ensure that if the method gets renamed in guard the "overrider" method to ensure that if the method gets renamed in
CE, the EE override won't be silently forgotten. CE, the EE override isn't silently forgotten.
- when the "overrider" would add a line in the middle of the CE - when the "overrider" would add a line in the middle of the CE
implementation, you should refactor the CE method and split it in implementation, you should refactor the CE method and split it in
smaller methods. Or create a "hook" method that is empty in CE, smaller methods. Or create a "hook" method that is empty in CE,
...@@ -284,7 +284,7 @@ wrap it in a self-descriptive method and use that method. ...@@ -284,7 +284,7 @@ wrap it in a self-descriptive method and use that method.
For example, in GitLab-FOSS, the only user created by the system is `User.ghost` For example, in GitLab-FOSS, the only user created by the system is `User.ghost`
but in EE there are several types of bot-users that aren't really users. It would but in EE there are several types of bot-users that aren't really users. It would
be incorrect to override the implementation of `User#ghost?`, so instead we add be incorrect to override the implementation of `User#ghost?`, so instead we add
a method `#internal?` to `app/models/user.rb`. The implementation will be: a method `#internal?` to `app/models/user.rb`. The implementation:
```ruby ```ruby
def internal? def internal?
...@@ -303,13 +303,13 @@ end ...@@ -303,13 +303,13 @@ end
### Code in `config/routes` ### Code in `config/routes`
When we add `draw :admin` in `config/routes.rb`, the application will try to When we add `draw :admin` in `config/routes.rb`, the application tries to
load the file located in `config/routes/admin.rb`, and also try to load the load the file located in `config/routes/admin.rb`, and also try to load the
file located in `ee/config/routes/admin.rb`. file located in `ee/config/routes/admin.rb`.
In EE, it should at least load one file, at most two files. If it cannot find In EE, it should at least load one file, at most two files. If it cannot find
any files, an error will be raised. In CE, since we don't know if there will any files, an error is raised. In CE, since we don't know if an
be an EE route, it will not raise any errors even if it cannot find anything. an EE route exists, it doesn't raise any errors even if it cannot find anything.
This means if we want to extend a particular CE route file, just add the same This means if we want to extend a particular CE route file, just add the same
file located in `ee/config/routes`. If we want to add an EE only route, we file located in `ee/config/routes`. If we want to add an EE only route, we
...@@ -467,7 +467,7 @@ end ...@@ -467,7 +467,7 @@ end
#### Using `render_if_exists` #### Using `render_if_exists`
Instead of using regular `render`, we should use `render_if_exists`, which Instead of using regular `render`, we should use `render_if_exists`, which
will not render anything if it cannot find the specific partial. We use this doesn't render anything if it cannot find the specific partial. We use this
so that we could put `render_if_exists` in CE, keeping code the same between so that we could put `render_if_exists` in CE, keeping code the same between
CE and EE. CE and EE.
...@@ -482,7 +482,7 @@ The disadvantage of this: ...@@ -482,7 +482,7 @@ The disadvantage of this:
##### Caveats ##### Caveats
The `render_if_exists` view path argument must be relative to `app/views/` and `ee/app/views`. The `render_if_exists` view path argument must be relative to `app/views/` and `ee/app/views`.
Resolving an EE template path that is relative to the CE view path will not work. Resolving an EE template path that is relative to the CE view path doesn't work.
```haml ```haml
- # app/views/projects/index.html.haml - # app/views/projects/index.html.haml
...@@ -577,7 +577,7 @@ We can define `params` and use `use` in another `params` definition to ...@@ -577,7 +577,7 @@ We can define `params` and use `use` in another `params` definition to
include parameters defined in EE. However, we need to define the "interface" first include parameters defined in EE. However, we need to define the "interface" first
in CE in order for EE to override it. We don't have to do this in other places in CE in order for EE to override it. We don't have to do this in other places
due to `prepend_if_ee`, but Grape is complex internally and we couldn't easily due to `prepend_if_ee`, but Grape is complex internally and we couldn't easily
do that, so we'll follow regular object-oriented practices that we define the do that, so we follow regular object-oriented practices that we define the
interface first here. interface first here.
For example, suppose we have a few more optional parameters for EE. We can move the For example, suppose we have a few more optional parameters for EE. We can move the
...@@ -738,7 +738,7 @@ end ...@@ -738,7 +738,7 @@ end
It's very hard to extend this in an EE module, and this is simply storing It's very hard to extend this in an EE module, and this is simply storing
some meta-data for a particular route. Given that, we could simply leave the some meta-data for a particular route. Given that, we could simply leave the
EE `route_setting` in CE as it won't hurt and we are just not going to use EE `route_setting` in CE as it doesn't hurt and we don't use
those meta-data in CE. those meta-data in CE.
We could revisit this policy when we're using `route_setting` more and whether We could revisit this policy when we're using `route_setting` more and whether
...@@ -1039,7 +1039,7 @@ export default { ...@@ -1039,7 +1039,7 @@ export default {
`import MyComponent from 'ee_else_ce/path/my_component'.vue` `import MyComponent from 'ee_else_ce/path/my_component'.vue`
- this way the correct component will be included for either the ce or ee implementation - this way the correct component is included for either the CE or EE implementation
**For EE components that need different results for the same computed values, we can pass in props to the CE wrapper as seen in the example.** **For EE components that need different results for the same computed values, we can pass in props to the CE wrapper as seen in the example.**
...@@ -1053,7 +1053,7 @@ export default { ...@@ -1053,7 +1053,7 @@ export default {
For regular JS files, the approach is similar. For regular JS files, the approach is similar.
1. We will keep using the [`ee_else_ce`](../development/ee_features.md#javascript-code-in-assetsjavascripts) helper, this means that EE only code should be inside the `ee/` folder. 1. We keep using the [`ee_else_ce`](../development/ee_features.md#javascript-code-in-assetsjavascripts) helper, this means that EE only code should be inside the `ee/` folder.
1. An EE file should be created with the EE only code, and it should extend the CE counterpart. 1. An EE file should be created with the EE only code, and it should extend the CE counterpart.
1. For code inside functions that can't be extended, the code should be moved into a new file and we should use `ee_else_ce` helper: 1. For code inside functions that can't be extended, the code should be moved into a new file and we should use `ee_else_ce` helper:
......
...@@ -93,7 +93,7 @@ All the `GitlabUploader` derived classes should comply with this path segment sc ...@@ -93,7 +93,7 @@ All the `GitlabUploader` derived classes should comply with this path segment sc
| | | `ObjectStorage::Concern#upload_path | | | | `ObjectStorage::Concern#upload_path |
``` ```
The `RecordsUploads::Concern` concern will create an `Upload` entry for every file stored by a `GitlabUploader` persisting the dynamic parts of the path using The `RecordsUploads::Concern` concern creates an `Upload` entry for every file stored by a `GitlabUploader` persisting the dynamic parts of the path using
`GitlabUploader#dynamic_path`. You may then use the `Upload#build_uploader` method to manipulate the file. `GitlabUploader#dynamic_path`. You may then use the `Upload#build_uploader` method to manipulate the file.
## Object Storage ## Object Storage
...@@ -108,9 +108,9 @@ The `CarrierWave::Uploader#store_dir` is overridden to ...@@ -108,9 +108,9 @@ The `CarrierWave::Uploader#store_dir` is overridden to
### Using `ObjectStorage::Extension::RecordsUploads` ### Using `ObjectStorage::Extension::RecordsUploads`
This concern will automatically include `RecordsUploads::Concern` if not already included. This concern includes `RecordsUploads::Concern` if not already included.
The `ObjectStorage::Concern` uploader will search for the matching `Upload` to select the correct object store. The `Upload` is mapped using `#store_dirs + identifier` for each store (LOCAL/REMOTE). The `ObjectStorage::Concern` uploader searches for the matching `Upload` to select the correct object store. The `Upload` is mapped using `#store_dirs + identifier` for each store (LOCAL/REMOTE).
```ruby ```ruby
class SongUploader < GitlabUploader class SongUploader < GitlabUploader
...@@ -130,7 +130,7 @@ end ...@@ -130,7 +130,7 @@ end
### Using a mounted uploader ### Using a mounted uploader
The `ObjectStorage::Concern` will query the `model.<mount>_store` attribute to select the correct object store. The `ObjectStorage::Concern` queries the `model.<mount>_store` attribute to select the correct object store.
This column must be present in the model schema. This column must be present in the model schema.
```ruby ```ruby
......
...@@ -14,7 +14,7 @@ might encounter or should avoid during development of GitLab CE and EE. ...@@ -14,7 +14,7 @@ might encounter or should avoid during development of GitLab CE and EE.
In GitLab 10.8 and later, Omnibus has [dropped the `app/assets` directory](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/2456), In GitLab 10.8 and later, Omnibus has [dropped the `app/assets` directory](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/2456),
after asset compilation. The `ee/app/assets`, `vendor/assets` directories are dropped as well. after asset compilation. The `ee/app/assets`, `vendor/assets` directories are dropped as well.
This means that reading files from that directory will fail in Omnibus-installed GitLab instances: This means that reading files from that directory fails in Omnibus-installed GitLab instances:
```ruby ```ruby
file = Rails.root.join('app/assets/images/logo.svg') file = Rails.root.join('app/assets/images/logo.svg')
...@@ -243,8 +243,8 @@ end ...@@ -243,8 +243,8 @@ end
In this case, if for any reason the top level `ApplicationController` In this case, if for any reason the top level `ApplicationController`
is loaded but `Projects::ApplicationController` is not, `ApplicationController` is loaded but `Projects::ApplicationController` is not, `ApplicationController`
would be resolved to `::ApplicationController` and then the `project` method will would be resolved to `::ApplicationController` and then the `project` method is
be undefined and we will get an error. undefined, causing an error.
#### Solution #### Solution
...@@ -265,7 +265,7 @@ By specifying `Projects::`, we tell Rails exactly what class we are referring ...@@ -265,7 +265,7 @@ By specifying `Projects::`, we tell Rails exactly what class we are referring
to and we would avoid the issue. to and we would avoid the issue.
NOTE: NOTE:
This problem will disappear as soon as we upgrade to Rails 6 and use the Zeitwerk autoloader. This problem disappears as soon as we upgrade to Rails 6 and use the Zeitwerk autoloader.
### Further reading ### Further reading
......
...@@ -12,16 +12,16 @@ info: To determine the technical writer assigned to the Stage/Group associated w ...@@ -12,16 +12,16 @@ info: To determine the technical writer assigned to the Stage/Group associated w
In order to comply with the terms the libraries we use are licensed under, we have to make sure to check new gems for compatible licenses whenever they're added. To automate this process, we use the [license_finder](https://github.com/pivotal/LicenseFinder) gem by Pivotal. It runs every time a new commit is pushed and verifies that all gems and node modules in the bundle use a license that doesn't conflict with the licensing of either GitLab Community Edition or GitLab Enterprise Edition. In order to comply with the terms the libraries we use are licensed under, we have to make sure to check new gems for compatible licenses whenever they're added. To automate this process, we use the [license_finder](https://github.com/pivotal/LicenseFinder) gem by Pivotal. It runs every time a new commit is pushed and verifies that all gems and node modules in the bundle use a license that doesn't conflict with the licensing of either GitLab Community Edition or GitLab Enterprise Edition.
There are some limitations with the automated testing, however. CSS, JavaScript, or Ruby libraries which are not included by way of Bundler, NPM, or Yarn (for instance those manually copied into our source tree in the `vendor` directory), must be verified manually and independently. Take care whenever one such library is used, as automated tests won't catch problematic licenses from them. There are some limitations with the automated testing, however. CSS, JavaScript, or Ruby libraries which are not included by way of Bundler, NPM, or Yarn (for instance those manually copied into our source tree in the `vendor` directory), must be verified manually and independently. Take care whenever one such library is used, as automated tests don't catch problematic licenses from them.
Some gems may not include their license information in their `gemspec` file, and some node modules may not include their license information in their `package.json` file. These won't be detected by License Finder, and will have to be verified manually. Some gems may not include their license information in their `gemspec` file, and some node modules may not include their license information in their `package.json` file. These aren't detected by License Finder, and must be verified manually.
### License Finder commands ### License Finder commands
NOTE: NOTE:
License Finder currently uses GitLab misused terms of `whitelist` and `blacklist`. As a result, the commands below reference those terms. We've created an [issue on their project](https://github.com/pivotal/LicenseFinder/issues/745) to propose that they rename their commands. License Finder currently uses GitLab misused terms of `whitelist` and `blacklist`. As a result, the commands below reference those terms. We've created an [issue on their project](https://github.com/pivotal/LicenseFinder/issues/745) to propose that they rename their commands.
There are a few basic commands License Finder provides that you'll need in order to manage license detection. There are a few basic commands License Finder provides that you need in order to manage license detection.
To verify that the checks are passing, and/or to see what dependencies are causing the checks to fail: To verify that the checks are passing, and/or to see what dependencies are causing the checks to fail:
......
...@@ -7,7 +7,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w ...@@ -7,7 +7,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Mass inserting Rails models # Mass inserting Rails models
Setting the environment variable [`MASS_INSERT=1`](rake_tasks.md#environment-variables) Setting the environment variable [`MASS_INSERT=1`](rake_tasks.md#environment-variables)
when running [`rake setup`](rake_tasks.md) will create millions of records, but these records when running [`rake setup`](rake_tasks.md) creates millions of records, but these records
aren't visible to the `root` user by default. aren't visible to the `root` user by default.
To make any number of the mass-inserted projects visible to the `root` user, run To make any number of the mass-inserted projects visible to the `root` user, run
......
...@@ -47,7 +47,7 @@ Cache Hit: ...@@ -47,7 +47,7 @@ Cache Hit:
resource. resource.
1. If the `If-None-Match` header matches the current value in Redis we know 1. If the `If-None-Match` header matches the current value in Redis we know
that the resource did not change so we can send 304 response immediately, that the resource did not change so we can send 304 response immediately,
without querying the database at all. The client's browser will use the without querying the database at all. The client's browser uses the
cached response. cached response.
1. If the `If-None-Match` header does not match the current value in Redis 1. If the `If-None-Match` header does not match the current value in Redis
we have to generate a new response, because the resource changed. we have to generate a new response, because the resource changed.
......
...@@ -16,7 +16,7 @@ target ID. For example, at the time of writing we have such a setup for ...@@ -16,7 +16,7 @@ target ID. For example, at the time of writing we have such a setup for
- `source_type`: a string defining the model to use, can be either `Project` or - `source_type`: a string defining the model to use, can be either `Project` or
`Namespace`. `Namespace`.
- `source_id`: the ID of the row to retrieve based on `source_type`. For - `source_id`: the ID of the row to retrieve based on `source_type`. For
example, when `source_type` is `Project` then `source_id` will contain a example, when `source_type` is `Project` then `source_id` contains a
project ID. project ID.
While such a setup may appear to be useful, it comes with many drawbacks; enough While such a setup may appear to be useful, it comes with many drawbacks; enough
...@@ -24,8 +24,8 @@ that you should avoid this at all costs. ...@@ -24,8 +24,8 @@ that you should avoid this at all costs.
## Space Wasted ## Space Wasted
Because this setup relies on string values to determine the model to use it will Because this setup relies on string values to determine the model to use, it
end up wasting a lot of space. For example, for `Project` and `Namespace` the wastes a lot of space. For example, for `Project` and `Namespace` the
maximum size is 9 bytes, plus 1 extra byte for every string when using maximum size is 9 bytes, plus 1 extra byte for every string when using
PostgreSQL. While this may only be 10 bytes per row, given enough tables and PostgreSQL. While this may only be 10 bytes per row, given enough tables and
rows using such a setup we can end up wasting quite a bit of disk space and rows using such a setup we can end up wasting quite a bit of disk space and
...@@ -84,7 +84,7 @@ Let's say you have a `members` table storing both approved and pending members, ...@@ -84,7 +84,7 @@ Let's say you have a `members` table storing both approved and pending members,
for both projects and groups, and the pending state is determined by the column for both projects and groups, and the pending state is determined by the column
`requested_at` being set or not. Schema wise such a setup can lead to various `requested_at` being set or not. Schema wise such a setup can lead to various
columns only being set for certain rows, wasting space. It's also possible that columns only being set for certain rows, wasting space. It's also possible that
certain indexes will only be set for certain rows, again wasting space. Finally, certain indexes are only set for certain rows, again wasting space. Finally,
querying such a table requires less than ideal queries. For example: querying such a table requires less than ideal queries. For example:
```sql ```sql
...@@ -121,7 +121,7 @@ WHERE group_id = 4 ...@@ -121,7 +121,7 @@ WHERE group_id = 4
``` ```
If you want to get both you can use a UNION, though you need to be explicit If you want to get both you can use a UNION, though you need to be explicit
about what columns you want to SELECT as otherwise the result set will use the about what columns you want to SELECT as otherwise the result set uses the
columns of the first query. For example: columns of the first query. For example:
```sql ```sql
...@@ -147,6 +147,6 @@ filter rows using the `IS NULL` condition. ...@@ -147,6 +147,6 @@ filter rows using the `IS NULL` condition.
To summarize: using separate tables allows us to use foreign keys effectively, To summarize: using separate tables allows us to use foreign keys effectively,
create indexes only where necessary, conserve space, query data more create indexes only where necessary, conserve space, query data more
efficiently, and scale these tables more easily (e.g. by storing them on efficiently, and scale these tables more easily (e.g. by storing them on
separate disks). A nice side effect of this is that code can also become easier separate disks). A nice side effect of this is that code can also become easier,
as you won't end up with a single model having to handle different kinds of as a single model isn't responsible for handling different kinds of
data. data.
...@@ -8,7 +8,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w ...@@ -8,7 +8,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
Post deployment migrations are regular Rails migrations that can optionally be Post deployment migrations are regular Rails migrations that can optionally be
executed after a deployment. By default these migrations are executed alongside executed after a deployment. By default these migrations are executed alongside
the other migrations. To skip these migrations you will have to set the the other migrations. To skip these migrations you must set the
environment variable `SKIP_POST_DEPLOYMENT_MIGRATIONS` to a non-empty value environment variable `SKIP_POST_DEPLOYMENT_MIGRATIONS` to a non-empty value
when running `rake db:migrate`. when running `rake db:migrate`.
...@@ -19,7 +19,7 @@ migrations: ...@@ -19,7 +19,7 @@ migrations:
bundle exec rake db:migrate bundle exec rake db:migrate
``` ```
This however will skip post deployment migrations: This however skips post deployment migrations:
```shell ```shell
SKIP_POST_DEPLOYMENT_MIGRATIONS=true bundle exec rake db:migrate SKIP_POST_DEPLOYMENT_MIGRATIONS=true bundle exec rake db:migrate
...@@ -40,7 +40,7 @@ Once all servers have been updated you can run `chef-client` again on a single ...@@ -40,7 +40,7 @@ Once all servers have been updated you can run `chef-client` again on a single
server _without_ the environment variable. server _without_ the environment variable.
The process is similar for other deployment techniques: first you would deploy The process is similar for other deployment techniques: first you would deploy
with the environment variable set, then you'll essentially re-deploy a single with the environment variable set, then you re-deploy a single
server but with the variable _unset_. server but with the variable _unset_.
## Creating Migrations ## Creating Migrations
...@@ -51,7 +51,7 @@ To create a post deployment migration you can use the following Rails generator: ...@@ -51,7 +51,7 @@ To create a post deployment migration you can use the following Rails generator:
bundle exec rails g post_deployment_migration migration_name_here bundle exec rails g post_deployment_migration migration_name_here
``` ```
This will generate the migration file in `db/post_migrate`. These migrations This generates the migration file in `db/post_migrate`. These migrations
behave exactly like regular Rails migrations. behave exactly like regular Rails migrations.
## Use Cases ## Use Cases
......
...@@ -24,7 +24,7 @@ When using the script, command-line documentation is available by passing no ...@@ -24,7 +24,7 @@ When using the script, command-line documentation is available by passing no
arguments. arguments.
When using the method in an interactive console session, any changes to the When using the method in an interactive console session, any changes to the
application code within that console session will be reflected in the profiler application code within that console session is reflected in the profiler
output. output.
For example: For example:
...@@ -37,14 +37,14 @@ Gitlab::Profiler.profile('/my-user') ...@@ -37,14 +37,14 @@ Gitlab::Profiler.profile('/my-user')
# Returns a RubyProf::Profile where 100 seconds is spent in UsersController#show # Returns a RubyProf::Profile where 100 seconds is spent in UsersController#show
``` ```
For routes that require authorization you will need to provide a user to For routes that require authorization you must provide a user to
`Gitlab::Profiler`. You can do this like so: `Gitlab::Profiler`. You can do this like so:
```ruby ```ruby
Gitlab::Profiler.profile('/gitlab-org/gitlab-test', user: User.first) Gitlab::Profiler.profile('/gitlab-org/gitlab-test', user: User.first)
``` ```
Passing a `logger:` keyword argument to `Gitlab::Profiler.profile` will send Passing a `logger:` keyword argument to `Gitlab::Profiler.profile` sends
ActiveRecord and ActionController log output to that logger. Further options are ActiveRecord and ActionController log output to that logger. Further options are
documented with the method source. documented with the method source.
...@@ -123,7 +123,7 @@ starting GitLab. For example: ...@@ -123,7 +123,7 @@ starting GitLab. For example:
ENABLE_BULLET=true bundle exec rails s ENABLE_BULLET=true bundle exec rails s
``` ```
Bullet will log query problems to both the Rails log as well as the Chrome Bullet logs query problems to both the Rails log as well as the Chrome
console. console.
As a follow up to finding `N+1` queries with Bullet, consider writing a [QueryRecoder test](query_recorder.md) to prevent a regression. As a follow up to finding `N+1` queries with Bullet, consider writing a [QueryRecoder test](query_recorder.md) to prevent a regression.
......
...@@ -101,7 +101,7 @@ format the reference as: ...@@ -101,7 +101,7 @@ format the reference as:
This default implementation is not very efficient, because we need to call This default implementation is not very efficient, because we need to call
`#find_object` for each reference, which may require issuing a DB query every `#find_object` for each reference, which may require issuing a DB query every
time. For this reason, most reference filter implementations will instead use an time. For this reason, most reference filter implementations instead use an
optimization included in `AbstractReferenceFilter`: optimization included in `AbstractReferenceFilter`:
> `AbstractReferenceFilter` provides a lazily initialized value > `AbstractReferenceFilter` provides a lazily initialized value
...@@ -140,7 +140,7 @@ We are skipping: ...@@ -140,7 +140,7 @@ We are skipping:
To avoid filtering such nodes for each `ReferenceFilter`, we do it only once and store the result in the result Hash of the pipeline as `result[:reference_filter_nodes]`. To avoid filtering such nodes for each `ReferenceFilter`, we do it only once and store the result in the result Hash of the pipeline as `result[:reference_filter_nodes]`.
Pipeline `result` is passed to each filter for modification, so every time when `ReferenceFilter` replaces text or link tag, filtered list (`reference_filter_nodes`) will be updated for the next filter to use. Pipeline `result` is passed to each filter for modification, so every time when `ReferenceFilter` replaces text or link tag, filtered list (`reference_filter_nodes`) are updated for the next filter to use.
## Reference parsers ## Reference parsers
...@@ -199,4 +199,4 @@ In practice, all reference parsers inherit from [`BaseParser`](https://gitlab.co ...@@ -199,4 +199,4 @@ In practice, all reference parsers inherit from [`BaseParser`](https://gitlab.co
- `#nodes_user_can_reference(user, nodes)` to filter nodes directly. - `#nodes_user_can_reference(user, nodes)` to filter nodes directly.
A failure to implement this class for each reference type means that the A failure to implement this class for each reference type means that the
application will raise exceptions during Markdown processing. application raises exceptions during Markdown processing.
...@@ -16,7 +16,7 @@ scalability and reliability. ...@@ -16,7 +16,7 @@ scalability and reliability.
_[diagram source - GitLab employees only](https://docs.google.com/drawings/d/1RTGtuoUrE0bDT-9smoHbFruhEMI4Ys6uNrufe5IA-VI/edit)_ _[diagram source - GitLab employees only](https://docs.google.com/drawings/d/1RTGtuoUrE0bDT-9smoHbFruhEMI4Ys6uNrufe5IA-VI/edit)_
The diagram above shows a GitLab reference architecture scaled up for 50,000 The diagram above shows a GitLab reference architecture scaled up for 50,000
users. We will discuss each component below. users. We discuss each component below.
## Components ## Components
...@@ -26,11 +26,10 @@ The PostgreSQL database holds all metadata for projects, issues, merge ...@@ -26,11 +26,10 @@ The PostgreSQL database holds all metadata for projects, issues, merge
requests, users, etc. The schema is managed by the Rails application requests, users, etc. The schema is managed by the Rails application
[db/structure.sql](https://gitlab.com/gitlab-org/gitlab/blob/master/db/structure.sql). [db/structure.sql](https://gitlab.com/gitlab-org/gitlab/blob/master/db/structure.sql).
GitLab Web/API servers and Sidekiq nodes talk directly to the database via a GitLab Web/API servers and Sidekiq nodes talk directly to the database by using a
Rails object relational model (ORM). Most SQL queries are accessed via this Rails object relational model (ORM). Most SQL queries are accessed by using this
ORM, although some custom SQL is also written for performance or for ORM, although some custom SQL is also written for performance or for
exploiting advanced PostgreSQL features (e.g. recursive CTEs, LATERAL JOINs, exploiting advanced PostgreSQL features (like recursive CTEs or LATERAL JOINs).
etc.).
The application has a tight coupling to the database schema. When the The application has a tight coupling to the database schema. When the
application starts, Rails queries the database schema, caching the tables and application starts, Rails queries the database schema, caching the tables and
...@@ -42,8 +41,8 @@ no-downtime changes](what_requires_downtime.md). ...@@ -42,8 +41,8 @@ no-downtime changes](what_requires_downtime.md).
#### Multi-tenancy #### Multi-tenancy
A single database is used to store all customer data. Each user can belong to A single database is used to store all customer data. Each user can belong to
many groups or projects, and the access level (e.g. guest, developer, many groups or projects, and the access level (including guest, developer, or
maintainer, etc.) to groups and projects determines what users can see and maintainer) to groups and projects determines what users can see and
what they can access. what they can access.
Users with admin access can access all projects and even impersonate Users with admin access can access all projects and even impersonate
...@@ -70,7 +69,7 @@ dates](https://gitlab.com/groups/gitlab-org/-/epics/2023). For example, ...@@ -70,7 +69,7 @@ dates](https://gitlab.com/groups/gitlab-org/-/epics/2023). For example,
the `events` and `audit_events` table are natural candidates for this the `events` and `audit_events` table are natural candidates for this
kind of partitioning. kind of partitioning.
Sharding is likely more difficult and will require significant changes Sharding is likely more difficult and requires significant changes
to the schema and application. For example, if we have to store projects to the schema and application. For example, if we have to store projects
in many different databases, we immediately run into the question, "How in many different databases, we immediately run into the question, "How
can we retrieve data across different projects?" One answer to this is can we retrieve data across different projects?" One answer to this is
...@@ -78,7 +77,7 @@ to abstract data access into API calls that abstract the database from ...@@ -78,7 +77,7 @@ to abstract data access into API calls that abstract the database from
the application, but this is a significant amount of work. the application, but this is a significant amount of work.
There are solutions that may help abstract the sharding to some extent There are solutions that may help abstract the sharding to some extent
from the application. For example, we will want to look at [Citus from the application. For example, we want to look at [Citus
Data](https://www.citusdata.com/product/community) closely. Citus Data Data](https://www.citusdata.com/product/community) closely. Citus Data
provides a Rails plugin that adds a [tenant ID to ActiveRecord provides a Rails plugin that adds a [tenant ID to ActiveRecord
models](https://www.citusdata.com/blog/2017/01/05/easily-scale-out-multi-tenant-apps/). models](https://www.citusdata.com/blog/2017/01/05/easily-scale-out-multi-tenant-apps/).
...@@ -100,17 +99,16 @@ systems. ...@@ -100,17 +99,16 @@ systems.
A recent [database checkup shows a breakdown of the table sizes on A recent [database checkup shows a breakdown of the table sizes on
GitLab.com](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/8022#master-1022016101-8). GitLab.com](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/8022#master-1022016101-8).
Since `merge_request_diff_files` contains over 1 TB of data, we will want to Since `merge_request_diff_files` contains over 1 TB of data, we want to
reduce/eliminate this table first. GitLab has support for [storing diffs in reduce/eliminate this table first. GitLab has support for [storing diffs in
object storage](../administration/merge_request_diffs.md), which we [will object storage](../administration/merge_request_diffs.md), which we [want to do on
want to do on
GitLab.com](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7356). GitLab.com](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7356).
#### High availability #### High availability
There are several strategies to provide high-availability and redundancy: There are several strategies to provide high-availability and redundancy:
- Write-ahead logs (WAL) streamed to object storage (e.g. S3, Google Cloud - Write-ahead logs (WAL) streamed to object storage (for example, S3, or Google Cloud
Storage). Storage).
- Read-replicas (hot backups). - Read-replicas (hot backups).
- Delayed replicas. - Delayed replicas.
...@@ -126,11 +124,10 @@ the read replicas. [Omnibus ships with both repmgr and Patroni](../administratio ...@@ -126,11 +124,10 @@ the read replicas. [Omnibus ships with both repmgr and Patroni](../administratio
#### Load-balancing #### Load-balancing
GitLab EE has [application support for load balancing using read GitLab EE has [application support for load balancing using read
replicas](../administration/database_load_balancing.md). This load replicas](../administration/database_load_balancing.md). This load balancer does
balancer does some smart things that are not traditionally available in some actions that aren't traditionally available in standard load balancers. For
standard load balancers. For example, the application will only consider a example, the application considers a replica only if its replication lag is low
replica if its replication lag is low (e.g. WAL data behind by < 100 (for example, WAL data behind by less than 100 MB).
megabytes).
More [details are in a blog More [details are in a blog
post](https://about.gitlab.com/blog/2017/10/02/scaling-the-gitlab-database/). post](https://about.gitlab.com/blog/2017/10/02/scaling-the-gitlab-database/).
...@@ -140,7 +137,7 @@ post](https://about.gitlab.com/blog/2017/10/02/scaling-the-gitlab-database/). ...@@ -140,7 +137,7 @@ post](https://about.gitlab.com/blog/2017/10/02/scaling-the-gitlab-database/).
As PostgreSQL forks a backend process for each request, PostgreSQL has a As PostgreSQL forks a backend process for each request, PostgreSQL has a
finite limit of connections that it can support, typically around 300 by finite limit of connections that it can support, typically around 300 by
default. Without a connection pooler like PgBouncer, it's quite possible to default. Without a connection pooler like PgBouncer, it's quite possible to
hit connection limits. Once the limits are reached, then GitLab will generate hit connection limits. Once the limits are reached, then GitLab generates
errors or slow down as it waits for a connection to be available. errors or slow down as it waits for a connection to be available.
#### High availability #### High availability
...@@ -151,7 +148,7 @@ background job and/or Web requests. There are two ways to address this ...@@ -151,7 +148,7 @@ background job and/or Web requests. There are two ways to address this
limitation: limitation:
- Run multiple PgBouncer instances. - Run multiple PgBouncer instances.
- Use a multi-threaded connection pooler (e.g. - Use a multi-threaded connection pooler (for example,
[Odyssey](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7776). [Odyssey](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7776).
On some Linux systems, it's possible to run [multiple PgBouncer instances on On some Linux systems, it's possible to run [multiple PgBouncer instances on
...@@ -192,9 +189,9 @@ connections gracefully. ...@@ -192,9 +189,9 @@ connections gracefully.
There are three ways Redis is used in GitLab: There are three ways Redis is used in GitLab:
- Queues. Sidekiq jobs marshal jobs into JSON payloads. - Queues: Sidekiq jobs marshal jobs into JSON payloads.
- Persistent state. Session data, exclusive leases, etc. - Persistent state: Session data and exclusive leases.
- Cache. Repository data (e.g. Branch and tag names), view partials, etc. - Cache: Repository data (like Branch and tag names) and view partials.
For GitLab instances running at scale, splitting Redis usage into For GitLab instances running at scale, splitting Redis usage into
separate Redis clusters helps for two reasons: separate Redis clusters helps for two reasons:
...@@ -206,8 +203,8 @@ For example, the cache instance can behave like an least-recently used ...@@ -206,8 +203,8 @@ For example, the cache instance can behave like an least-recently used
(LRU) cache by setting the `maxmemory` configuration option. That option (LRU) cache by setting the `maxmemory` configuration option. That option
should not be set for the queues or persistent clusters because data should not be set for the queues or persistent clusters because data
would be evicted from memory at random times. This would cause jobs to would be evicted from memory at random times. This would cause jobs to
be dropped on the floor, which would cause many problems (e.g. merges be dropped on the floor, which would cause many problems (like merges
not running, builds not updating, etc.). not running or builds not updating).
Sidekiq also polls its queues quite frequently, and this activity can Sidekiq also polls its queues quite frequently, and this activity can
slow down other queries. For this reason, having a dedicated Redis slow down other queries. For this reason, having a dedicated Redis
...@@ -219,7 +216,7 @@ Redis process. ...@@ -219,7 +216,7 @@ Redis process.
Single-core: Like PgBouncer, a single Redis process can only use one Single-core: Like PgBouncer, a single Redis process can only use one
core. It does not support multi-threading. core. It does not support multi-threading.
Dumb secondaries: Redis secondaries (aka replicas) don't actually Dumb secondaries: Redis secondaries (also known as replicas) don't actually
handle any load. Unlike PostgreSQL secondaries, they don't even serve handle any load. Unlike PostgreSQL secondaries, they don't even serve
read queries. They simply replicate data from the primary and take over read queries. They simply replicate data from the primary and take over
only when the primary fails. only when the primary fails.
...@@ -236,7 +233,7 @@ election to determine a new leader. ...@@ -236,7 +233,7 @@ election to determine a new leader.
No leader: A Redis cluster can get into a mode where there are no No leader: A Redis cluster can get into a mode where there are no
primaries. For example, this can happen if Redis nodes are misconfigured primaries. For example, this can happen if Redis nodes are misconfigured
to follow the wrong node. Sometimes this requires forcing one node to to follow the wrong node. Sometimes this requires forcing one node to
become a primary via the [`REPLICAOF NO ONE` become a primary by using the [`REPLICAOF NO ONE`
command](https://redis.io/commands/replicaof). command](https://redis.io/commands/replicaof).
### Sidekiq ### Sidekiq
...@@ -260,8 +257,8 @@ directories in the GitLab code base. ...@@ -260,8 +257,8 @@ directories in the GitLab code base.
As jobs are added to the Sidekiq queue, Sidekiq worker threads need to As jobs are added to the Sidekiq queue, Sidekiq worker threads need to
pull these jobs from the queue and finish them at a rate faster than pull these jobs from the queue and finish them at a rate faster than
they are added. When an imbalance occurs (e.g. delays in the database, they are added. When an imbalance occurs (for example, delays in the database
slow jobs, etc.), Sidekiq queues can balloon and lead to runaway queues. or slow jobs), Sidekiq queues can balloon and lead to runaway queues.
In recent months, many of these queues have ballooned due to delays in In recent months, many of these queues have ballooned due to delays in
PostgreSQL, PgBouncer, and Redis. For example, PgBouncer saturation can PostgreSQL, PgBouncer, and Redis. For example, PgBouncer saturation can
...@@ -278,11 +275,11 @@ in a timely manner: ...@@ -278,11 +275,11 @@ in a timely manner:
used to process each commit message in the push, but now it farms out used to process each commit message in the push, but now it farms out
this to `ProcessCommitWorker`. this to `ProcessCommitWorker`.
- Redistribute/gerrymander Sidekiq processes by queue - Redistribute/gerrymander Sidekiq processes by queue
types. Long-running jobs (e.g. relating to project import) can often types. Long-running jobs (for example, relating to project import) can often
squeeze out jobs that run fast (e.g. delivering e-mail). [This technique squeeze out jobs that run fast (for example, delivering e-mail). [This technique
was used in to optimize our existing Sidekiq deployment](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7219#note_218019483). was used in to optimize our existing Sidekiq deployment](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7219#note_218019483).
- Optimize jobs. Eliminating unnecessary work, reducing network calls - Optimize jobs. Eliminating unnecessary work, reducing network calls
(e.g. SQL, Gitaly, etc.), and optimizing processor time can yield significant (including SQL and Gitaly), and optimizing processor time can yield significant
benefits. benefits.
From the Sidekiq logs, it's possible to see which jobs run the most From the Sidekiq logs, it's possible to see which jobs run the most
......
...@@ -73,7 +73,7 @@ shell check: ...@@ -73,7 +73,7 @@ shell check:
``` ```
TIP: **Tip:** TIP: **Tip:**
By default, ShellCheck will use the [shell detection](https://github.com/koalaman/shellcheck/wiki/SC2148#rationale) By default, ShellCheck uses the [shell detection](https://github.com/koalaman/shellcheck/wiki/SC2148#rationale)
to determine the shell dialect in use. If the shell file is out of your control and ShellCheck cannot to determine the shell dialect in use. If the shell file is out of your control and ShellCheck cannot
detect the dialect, use `-s` flag to specify it: `-s sh` or `-s bash`. detect the dialect, use `-s` flag to specify it: `-s sh` or `-s bash`.
...@@ -101,7 +101,7 @@ shfmt: ...@@ -101,7 +101,7 @@ shfmt:
``` ```
TIP: **Tip:** TIP: **Tip:**
By default, shfmt will use the [shell detection](https://github.com/mvdan/sh#shfmt) similar to one of ShellCheck By default, shfmt uses the [shell detection](https://github.com/mvdan/sh#shfmt) similar to one of ShellCheck
and ignore files starting with a period. To override this, use `-ln` flag to specify the shell dialect: and ignore files starting with a period. To override this, use `-ln` flag to specify the shell dialect:
`-ln posix` or `-ln bash`. `-ln posix` or `-ln bash`.
......
...@@ -46,7 +46,7 @@ We have three challenges here: performance, availability, and scalability. ...@@ -46,7 +46,7 @@ We have three challenges here: performance, availability, and scalability.
### Performance ### Performance
Rails process are expensive in terms of both CPU and memory. Ruby [global interpreter lock](https://en.wikipedia.org/wiki/Global_interpreter_lock) adds to cost too because the Ruby process will spend time on I/O operations on step 3 causing incoming requests to pile up. Rails process are expensive in terms of both CPU and memory. Ruby [global interpreter lock](https://en.wikipedia.org/wiki/Global_interpreter_lock) adds to cost too because the Ruby process spends time on I/O operations on step 3 causing incoming requests to pile up.
In order to improve this, [disk buffered upload](#disk-buffered-upload) was implemented. With this, Rails no longer deals with writing uploaded files to disk. In order to improve this, [disk buffered upload](#disk-buffered-upload) was implemented. With this, Rails no longer deals with writing uploaded files to disk.
...@@ -88,7 +88,7 @@ To address this problem an HA object storage can be used and it's supported by [ ...@@ -88,7 +88,7 @@ To address this problem an HA object storage can be used and it's supported by [
Scaling NFS is outside of our support scope, and NFS is not a part of cloud native installations. Scaling NFS is outside of our support scope, and NFS is not a part of cloud native installations.
All features that require Sidekiq and do not use direct upload won't work without NFS. In Kubernetes, machine boundaries translate to PODs, and in this case the uploaded file will be written into the POD private disk. Since Sidekiq POD cannot reach into other pods, the operation will fail to read it. All features that require Sidekiq and do not use direct upload doesn't work without NFS. In Kubernetes, machine boundaries translate to PODs, and in this case the uploaded file is written into the POD private disk. Since Sidekiq POD cannot reach into other pods, the operation fails to read it.
## How to select the proper level of acceleration? ## How to select the proper level of acceleration?
...@@ -96,7 +96,7 @@ Selecting the proper acceleration is a tradeoff between speed of development and ...@@ -96,7 +96,7 @@ Selecting the proper acceleration is a tradeoff between speed of development and
We can identify three major use-cases for an upload: We can identify three major use-cases for an upload:
1. **storage:** if we are uploading for storing a file (i.e. artifacts, packages, discussion attachments). In this case [direct upload](#direct-upload) is the proper level as it's the less resource-intensive operation. Additional information can be found on [File Storage in GitLab](file_storage.md). 1. **storage:** if we are uploading for storing a file (like artifacts, packages, or discussion attachments). In this case [direct upload](#direct-upload) is the proper level as it's the less resource-intensive operation. Additional information can be found on [File Storage in GitLab](file_storage.md).
1. **in-controller/synchronous processing:** if we allow processing **small files** synchronously, using [disk buffered upload](#disk-buffered-upload) may speed up development. 1. **in-controller/synchronous processing:** if we allow processing **small files** synchronously, using [disk buffered upload](#disk-buffered-upload) may speed up development.
1. **Sidekiq/asynchronous processing:** Asynchronous processing must implement [direct upload](#direct-upload), the reason being that it's the only way to support Cloud Native deployments without a shared NFS. 1. **Sidekiq/asynchronous processing:** Asynchronous processing must implement [direct upload](#direct-upload), the reason being that it's the only way to support Cloud Native deployments without a shared NFS.
...@@ -120,7 +120,7 @@ We have three kinds of file encoding in our uploads: ...@@ -120,7 +120,7 @@ We have three kinds of file encoding in our uploads:
1. <i class="fa fa-check-circle"></i> **multipart**: `multipart/form-data` is the most common, a file is encoded as a part of a multipart encoded request. 1. <i class="fa fa-check-circle"></i> **multipart**: `multipart/form-data` is the most common, a file is encoded as a part of a multipart encoded request.
1. <i class="fa fa-check-circle"></i> **body**: some APIs uploads files as the whole request body. 1. <i class="fa fa-check-circle"></i> **body**: some APIs uploads files as the whole request body.
1. <i class="fa fa-times-circle"></i> **JSON**: some JSON API uploads files as base64 encoded strings. This will require a change to GitLab Workhorse, which [is planned](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/226). 1. <i class="fa fa-times-circle"></i> **JSON**: some JSON API uploads files as base64 encoded strings. This requires a change to GitLab Workhorse, which [is planned](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/226).
## Uploading technologies ## Uploading technologies
...@@ -166,7 +166,7 @@ is replaced with the path to the corresponding file before it is forwarded to ...@@ -166,7 +166,7 @@ is replaced with the path to the corresponding file before it is forwarded to
Rails. Rails.
To prevent abuse of this feature, Workhorse signs the modified request with a To prevent abuse of this feature, Workhorse signs the modified request with a
special header, stating which entries it modified. Rails will ignore any special header, stating which entries it modified. Rails ignores any
unsigned path entries. unsigned path entries.
```mermaid ```mermaid
...@@ -220,8 +220,8 @@ In this setup, an extra Rails route must be implemented in order to handle autho ...@@ -220,8 +220,8 @@ In this setup, an extra Rails route must be implemented in order to handle autho
and [its routes](https://gitlab.com/gitlab-org/gitlab/blob/cc723071ad337573e0360a879cbf99bc4fb7adb9/config/routes/git_http.rb#L31-32). and [its routes](https://gitlab.com/gitlab-org/gitlab/blob/cc723071ad337573e0360a879cbf99bc4fb7adb9/config/routes/git_http.rb#L31-32).
- [API endpoints for uploading packages](packages.md#file-uploads). - [API endpoints for uploading packages](packages.md#file-uploads).
This will fallback to _disk buffered upload_ when `direct_upload` is disabled inside the [object storage setting](../administration/uploads.md#object-storage-settings). This falls back to _disk buffered upload_ when `direct_upload` is disabled inside the [object storage setting](../administration/uploads.md#object-storage-settings).
The answer to the `/authorize` call will only contain a file system path. The answer to the `/authorize` call contains only a file system path.
```mermaid ```mermaid
sequenceDiagram sequenceDiagram
...@@ -272,7 +272,7 @@ sequenceDiagram ...@@ -272,7 +272,7 @@ sequenceDiagram
## How to add a new upload route ## How to add a new upload route
In this section, we'll describe how to add a new upload route [accelerated](#uploading-technologies) by Workhorse for [body and multipart](#upload-encodings) encoded uploads. In this section, we describe how to add a new upload route [accelerated](#uploading-technologies) by Workhorse for [body and multipart](#upload-encodings) encoded uploads.
Uploads routes belong to one of these categories: Uploads routes belong to one of these categories:
......
...@@ -10,7 +10,7 @@ A possible security concern when managing a public facing GitLab instance is ...@@ -10,7 +10,7 @@ A possible security concern when managing a public facing GitLab instance is
the ability to steal a users IP address by referencing images in issues, comments, etc. the ability to steal a users IP address by referencing images in issues, comments, etc.
For example, adding `![Example image](http://example.com/example.png)` to For example, adding `![Example image](http://example.com/example.png)` to
an issue description will cause the image to be loaded from the external an issue description causes the image to be loaded from the external
server in order to be displayed. However, this also allows the external server server in order to be displayed. However, this also allows the external server
to log the IP address of the user. to log the IP address of the user.
...@@ -51,7 +51,7 @@ To install a Camo server as an asset proxy: ...@@ -51,7 +51,7 @@ To install a Camo server as an asset proxy:
| `asset_proxy_enabled` | Enable proxying of assets. If enabled, requires: `asset_proxy_url`). | | `asset_proxy_enabled` | Enable proxying of assets. If enabled, requires: `asset_proxy_url`). |
| `asset_proxy_secret_key` | Shared secret with the asset proxy server. | | `asset_proxy_secret_key` | Shared secret with the asset proxy server. |
| `asset_proxy_url` | URL of the asset proxy server. | | `asset_proxy_url` | URL of the asset proxy server. |
| `asset_proxy_whitelist` | Assets that match these domain(s) will NOT be proxied. Wildcards allowed. Your GitLab installation URL is automatically whitelisted. | | `asset_proxy_whitelist` | Assets that match these domain(s) are NOT proxied. Wildcards allowed. Your GitLab installation URL is automatically whitelisted. |
1. Restart the server for the changes to take effect. Each time you change any values for the asset 1. Restart the server for the changes to take effect. Each time you change any values for the asset
proxy, you need to restart the server. proxy, you need to restart the server.
...@@ -59,7 +59,7 @@ To install a Camo server as an asset proxy: ...@@ -59,7 +59,7 @@ To install a Camo server as an asset proxy:
## Using the Camo server ## Using the Camo server
Once the Camo server is running and you've enabled the GitLab settings, any image, video, or audio that Once the Camo server is running and you've enabled the GitLab settings, any image, video, or audio that
references an external source will get proxied to the Camo server. references an external source are proxied to the Camo server.
For example, the following is a link to an image in Markdown: For example, the following is a link to an image in Markdown:
......
...@@ -32,10 +32,10 @@ Rack Attack disabled. ...@@ -32,10 +32,10 @@ Rack Attack disabled.
## Behavior ## Behavior
If set up as described in the [Settings](#settings) section below, two behaviors If set up as described in the [Settings](#settings) section below, two behaviors
will be enabled: are enabled:
- Protected paths will be throttled. - Protected paths are throttled.
- Failed authentications for Git and container registry requests will trigger a temporary IP ban. - Failed authentications for Git and container registry requests trigger a temporary IP ban.
### Protected paths throttle ### Protected paths throttle
...@@ -119,7 +119,7 @@ The following settings can be configured: ...@@ -119,7 +119,7 @@ The following settings can be configured:
specified time. specified time.
- `findtime`: The maximum amount of time that failed requests can count against an IP - `findtime`: The maximum amount of time that failed requests can count against an IP
before it's blacklisted (in seconds). before it's blacklisted (in seconds).
- `bantime`: The total amount of time that a blacklisted IP will be blocked (in - `bantime`: The total amount of time that a blacklisted IP is blocked (in
seconds). seconds).
**Installations from source** **Installations from source**
...@@ -142,8 +142,8 @@ taken in order to enable protection for your GitLab instance: ...@@ -142,8 +142,8 @@ taken in order to enable protection for your GitLab instance:
If you want more restrictive/relaxed throttle rules, edit If you want more restrictive/relaxed throttle rules, edit
`config/initializers/rack_attack.rb` and change the `limit` or `period` values. `config/initializers/rack_attack.rb` and change the `limit` or `period` values.
For example, more relaxed throttle rules will be if you set For example, you can set more relaxed throttle rules with
`limit: 3` and `period: 1.seconds` (this will allow 3 requests per second). `limit: 3` and `period: 1.seconds`, allowing 3 requests per second.
You can also add other paths to the protected list by adding to `paths_to_be_protected` You can also add other paths to the protected list by adding to `paths_to_be_protected`
variable. If you change any of these settings you must restart your variable. If you change any of these settings you must restart your
GitLab instance. GitLab instance.
...@@ -185,10 +185,10 @@ In case you want to remove a blocked IP, follow these steps: ...@@ -185,10 +185,10 @@ In case you want to remove a blocked IP, follow these steps:
### Rack attack is blacklisting the load balancer ### Rack attack is blacklisting the load balancer
Rack Attack may block your load balancer if all traffic appears to come from Rack Attack may block your load balancer if all traffic appears to come from
the load balancer. In that case, you will need to: the load balancer. In that case, you must:
1. [Configure `nginx[real_ip_trusted_addresses]`](https://docs.gitlab.com/omnibus/settings/nginx.html#configuring-gitlab-trusted_proxies-and-the-nginx-real_ip-module). 1. [Configure `nginx[real_ip_trusted_addresses]`](https://docs.gitlab.com/omnibus/settings/nginx.html#configuring-gitlab-trusted_proxies-and-the-nginx-real_ip-module).
This will keep users' IPs from being listed as the load balancer IPs. This keeps users' IPs from being listed as the load balancer IPs.
1. Whitelist the load balancer's IP address(es) in the Rack Attack [settings](#settings). 1. Whitelist the load balancer's IP address(es) in the Rack Attack [settings](#settings).
1. Reconfigure GitLab: 1. Reconfigure GitLab:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment