Commit 8ba36cd0 authored by Craig Norris's avatar Craig Norris

Merge branch 'docs-aqualls-future-tense-2' into 'master'

Clean up future tense for present tense

See merge request gitlab-org/gitlab!49290
parents 90f18823 a4c843f5
......@@ -10,7 +10,7 @@ The GitLab CI/CD pipeline includes a `danger-review` job that uses [Danger](http
to perform a variety of automated checks on the code under test.
Danger is a gem that runs in the CI environment, like any other analysis tool.
What sets it apart from, e.g., RuboCop, is that it's designed to allow you to
What sets it apart from (for example, RuboCop) is that it's designed to allow you to
easily write arbitrary code to test properties of your code or changes. To this
end, it provides a set of common helpers and access to information about what
has actually changed in your environment, then simply runs your code!
......@@ -32,7 +32,7 @@ from the start of the merge request.
### Disadvantages
- It's not obvious Danger will update the old comment, thus you need to
- It's not obvious Danger updates the old comment, thus you need to
pay attention to it if it is updated or not.
## Run Danger locally
......@@ -48,13 +48,12 @@ bin/rake danger_local
On startup, Danger reads a [`Dangerfile`](https://gitlab.com/gitlab-org/gitlab/blob/master/Dangerfile)
from the project root. GitLab's Danger code is decomposed into a set of helpers
and plugins, all within the [`danger/`](https://gitlab.com/gitlab-org/gitlab-foss/tree/master/danger/)
subdirectory, so ours just tells Danger to load it all. Danger will then run
subdirectory, so ours just tells Danger to load it all. Danger then runs
each plugin against the merge request, collecting the output from each. A plugin
may output notifications, warnings, or errors, all of which are copied to the
CI job's log. If an error happens, the CI job (and so the entire pipeline) will
be failed.
CI job's log. If an error happens, the CI job (and so the entire pipeline) fails.
On merge requests, Danger will also copy the output to a comment on the MR
On merge requests, Danger also copies the output to a comment on the MR
itself, increasing visibility.
## Development guidelines
......@@ -75,17 +74,17 @@ often face similar challenges, after all. Think about how you could fulfill the
same need while ensuring everyone can benefit from the work, and do that instead
if you can.
If a standard tool (e.g. `rubocop`) exists for a task, it is better to use it
directly, rather than calling it via Danger. Running and debugging the results
of those tools locally is easier if Danger isn't involved, and unless you're
using some Danger-specific functionality, there's no benefit to including it in
the Danger run.
If a standard tool (for example, `rubocop`) exists for a task, it's better to
use it directly, rather than calling it by using Danger. Running and debugging
the results of those tools locally is easier if Danger isn't involved, and
unless you're using some Danger-specific functionality, there's no benefit to
including it in the Danger run.
Danger is well-suited to prototyping and rapidly iterating on solutions, so if
what we want to build is unclear, a solution in Danger can be thought of as a
trial run to gather information about a product area. If you're doing this, make
sure the problem you're trying to solve, and the outcomes of that prototyping,
are captured in an issue or epic as you go along. This will help us to address
are captured in an issue or epic as you go along. This helps us to address
the need as part of the product in a future version of GitLab!
### Implementation details
......@@ -110,16 +109,17 @@ At present, we do this by putting the code in a module in `lib/gitlab/danger/...
and including it in the matching `danger/plugins/...` file. Specs can then be
added in `spec/lib/gitlab/danger/...`.
You'll only know if your `Dangerfile` works by pushing the branch that contains
it to GitLab. This can be quite frustrating, as it significantly increases the
cycle time when developing a new task, or trying to debug something in an
existing one. If you've followed the guidelines above, most of your code can
be exercised locally in RSpec, minimizing the number of cycles you need to go
through in CI. However, you can speed these cycles up somewhat by emptying the
To determine if your `Dangerfile` works, push the branch that contains it to
GitLab. This can be quite frustrating, as it significantly increases the cycle
time when developing a new task, or trying to debug something in an existing
one. If you've followed the guidelines above, most of your code can be exercised
locally in RSpec, minimizing the number of cycles you need to go through in CI.
However, you can speed these cycles up somewhat by emptying the
`.gitlab/ci/rails.gitlab-ci.yml` file in your merge request. Just don't forget
to revert the change before merging!
To enable the Dangerfile on another existing GitLab project, run the following extra steps, based on [this procedure](https://danger.systems/guides/getting_started.html#creating-a-bot-account-for-danger-to-use):
To enable the Dangerfile on another existing GitLab project, run the following
extra steps, based on [this procedure](https://danger.systems/guides/getting_started.html#creating-a-bot-account-for-danger-to-use):
1. Add `@gitlab-bot` to the project as a `reporter`.
1. Add the `@gitlab-bot`'s `GITLAB_API_PRIVATE_TOKEN` value as a value for a new CI/CD
......@@ -156,10 +156,10 @@ at GitLab so far:
To work around this, you can add an [environment
variable](../ci/variables/README.md) called
`DANGER_GITLAB_API_TOKEN` with a personal API token to your
fork. That way the danger comments will be made from CI using that
fork. That way the danger comments are made from CI using that
API token instead.
Making the variable
[masked](../ci/variables/README.md#mask-a-custom-variable) will make sure
[masked](../ci/variables/README.md#mask-a-custom-variable) makes sure
it doesn't show up in the job logs. The variable cannot be
[protected](../ci/variables/README.md#protect-a-custom-variable),
as it needs to be present for all feature branches.
......@@ -146,7 +146,7 @@ Remember:
advance of a milestone release and for larger documentation changes.
- You can request a post-merge Technical Writer review of documentation if it's important to get the
code with which it ships merged as soon as possible. In this case, the author of the original MR
will address the feedback provided by the Technical Writer in a follow-up MR.
can address the feedback provided by the Technical Writer in a follow-up MR.
- The Technical Writer can also help decide that documentation can be merged without Technical
writer review, with the review to occur soon after merge.
......
......@@ -143,7 +143,7 @@ There are a few gotchas with it:
- you should always [`extend ::Gitlab::Utils::Override`](utilities.md#override) and use `override` to
guard the "overrider" method to ensure that if the method gets renamed in
CE, the EE override won't be silently forgotten.
CE, the EE override isn't silently forgotten.
- when the "overrider" would add a line in the middle of the CE
implementation, you should refactor the CE method and split it in
smaller methods. Or create a "hook" method that is empty in CE,
......@@ -284,7 +284,7 @@ wrap it in a self-descriptive method and use that method.
For example, in GitLab-FOSS, the only user created by the system is `User.ghost`
but in EE there are several types of bot-users that aren't really users. It would
be incorrect to override the implementation of `User#ghost?`, so instead we add
a method `#internal?` to `app/models/user.rb`. The implementation will be:
a method `#internal?` to `app/models/user.rb`. The implementation:
```ruby
def internal?
......@@ -303,13 +303,13 @@ end
### Code in `config/routes`
When we add `draw :admin` in `config/routes.rb`, the application will try to
When we add `draw :admin` in `config/routes.rb`, the application tries to
load the file located in `config/routes/admin.rb`, and also try to load the
file located in `ee/config/routes/admin.rb`.
In EE, it should at least load one file, at most two files. If it cannot find
any files, an error will be raised. In CE, since we don't know if there will
be an EE route, it will not raise any errors even if it cannot find anything.
any files, an error is raised. In CE, since we don't know if an
an EE route exists, it doesn't raise any errors even if it cannot find anything.
This means if we want to extend a particular CE route file, just add the same
file located in `ee/config/routes`. If we want to add an EE only route, we
......@@ -467,7 +467,7 @@ end
#### Using `render_if_exists`
Instead of using regular `render`, we should use `render_if_exists`, which
will not render anything if it cannot find the specific partial. We use this
doesn't render anything if it cannot find the specific partial. We use this
so that we could put `render_if_exists` in CE, keeping code the same between
CE and EE.
......@@ -482,7 +482,7 @@ The disadvantage of this:
##### Caveats
The `render_if_exists` view path argument must be relative to `app/views/` and `ee/app/views`.
Resolving an EE template path that is relative to the CE view path will not work.
Resolving an EE template path that is relative to the CE view path doesn't work.
```haml
- # app/views/projects/index.html.haml
......@@ -577,7 +577,7 @@ We can define `params` and use `use` in another `params` definition to
include parameters defined in EE. However, we need to define the "interface" first
in CE in order for EE to override it. We don't have to do this in other places
due to `prepend_if_ee`, but Grape is complex internally and we couldn't easily
do that, so we'll follow regular object-oriented practices that we define the
do that, so we follow regular object-oriented practices that we define the
interface first here.
For example, suppose we have a few more optional parameters for EE. We can move the
......@@ -738,7 +738,7 @@ end
It's very hard to extend this in an EE module, and this is simply storing
some meta-data for a particular route. Given that, we could simply leave the
EE `route_setting` in CE as it won't hurt and we are just not going to use
EE `route_setting` in CE as it doesn't hurt and we don't use
those meta-data in CE.
We could revisit this policy when we're using `route_setting` more and whether
......@@ -1039,7 +1039,7 @@ export default {
`import MyComponent from 'ee_else_ce/path/my_component'.vue`
- this way the correct component will be included for either the ce or ee implementation
- this way the correct component is included for either the CE or EE implementation
**For EE components that need different results for the same computed values, we can pass in props to the CE wrapper as seen in the example.**
......@@ -1053,7 +1053,7 @@ export default {
For regular JS files, the approach is similar.
1. We will keep using the [`ee_else_ce`](../development/ee_features.md#javascript-code-in-assetsjavascripts) helper, this means that EE only code should be inside the `ee/` folder.
1. We keep using the [`ee_else_ce`](../development/ee_features.md#javascript-code-in-assetsjavascripts) helper, this means that EE only code should be inside the `ee/` folder.
1. An EE file should be created with the EE only code, and it should extend the CE counterpart.
1. For code inside functions that can't be extended, the code should be moved into a new file and we should use `ee_else_ce` helper:
......
......@@ -93,7 +93,7 @@ All the `GitlabUploader` derived classes should comply with this path segment sc
| | | `ObjectStorage::Concern#upload_path |
```
The `RecordsUploads::Concern` concern will create an `Upload` entry for every file stored by a `GitlabUploader` persisting the dynamic parts of the path using
The `RecordsUploads::Concern` concern creates an `Upload` entry for every file stored by a `GitlabUploader` persisting the dynamic parts of the path using
`GitlabUploader#dynamic_path`. You may then use the `Upload#build_uploader` method to manipulate the file.
## Object Storage
......@@ -108,9 +108,9 @@ The `CarrierWave::Uploader#store_dir` is overridden to
### Using `ObjectStorage::Extension::RecordsUploads`
This concern will automatically include `RecordsUploads::Concern` if not already included.
This concern includes `RecordsUploads::Concern` if not already included.
The `ObjectStorage::Concern` uploader will search for the matching `Upload` to select the correct object store. The `Upload` is mapped using `#store_dirs + identifier` for each store (LOCAL/REMOTE).
The `ObjectStorage::Concern` uploader searches for the matching `Upload` to select the correct object store. The `Upload` is mapped using `#store_dirs + identifier` for each store (LOCAL/REMOTE).
```ruby
class SongUploader < GitlabUploader
......@@ -130,7 +130,7 @@ end
### Using a mounted uploader
The `ObjectStorage::Concern` will query the `model.<mount>_store` attribute to select the correct object store.
The `ObjectStorage::Concern` queries the `model.<mount>_store` attribute to select the correct object store.
This column must be present in the model schema.
```ruby
......
......@@ -14,7 +14,7 @@ might encounter or should avoid during development of GitLab CE and EE.
In GitLab 10.8 and later, Omnibus has [dropped the `app/assets` directory](https://gitlab.com/gitlab-org/omnibus-gitlab/-/merge_requests/2456),
after asset compilation. The `ee/app/assets`, `vendor/assets` directories are dropped as well.
This means that reading files from that directory will fail in Omnibus-installed GitLab instances:
This means that reading files from that directory fails in Omnibus-installed GitLab instances:
```ruby
file = Rails.root.join('app/assets/images/logo.svg')
......@@ -243,8 +243,8 @@ end
In this case, if for any reason the top level `ApplicationController`
is loaded but `Projects::ApplicationController` is not, `ApplicationController`
would be resolved to `::ApplicationController` and then the `project` method will
be undefined and we will get an error.
would be resolved to `::ApplicationController` and then the `project` method is
undefined, causing an error.
#### Solution
......@@ -265,7 +265,7 @@ By specifying `Projects::`, we tell Rails exactly what class we are referring
to and we would avoid the issue.
NOTE:
This problem will disappear as soon as we upgrade to Rails 6 and use the Zeitwerk autoloader.
This problem disappears as soon as we upgrade to Rails 6 and use the Zeitwerk autoloader.
### Further reading
......
......@@ -12,16 +12,16 @@ info: To determine the technical writer assigned to the Stage/Group associated w
In order to comply with the terms the libraries we use are licensed under, we have to make sure to check new gems for compatible licenses whenever they're added. To automate this process, we use the [license_finder](https://github.com/pivotal/LicenseFinder) gem by Pivotal. It runs every time a new commit is pushed and verifies that all gems and node modules in the bundle use a license that doesn't conflict with the licensing of either GitLab Community Edition or GitLab Enterprise Edition.
There are some limitations with the automated testing, however. CSS, JavaScript, or Ruby libraries which are not included by way of Bundler, NPM, or Yarn (for instance those manually copied into our source tree in the `vendor` directory), must be verified manually and independently. Take care whenever one such library is used, as automated tests won't catch problematic licenses from them.
There are some limitations with the automated testing, however. CSS, JavaScript, or Ruby libraries which are not included by way of Bundler, NPM, or Yarn (for instance those manually copied into our source tree in the `vendor` directory), must be verified manually and independently. Take care whenever one such library is used, as automated tests don't catch problematic licenses from them.
Some gems may not include their license information in their `gemspec` file, and some node modules may not include their license information in their `package.json` file. These won't be detected by License Finder, and will have to be verified manually.
Some gems may not include their license information in their `gemspec` file, and some node modules may not include their license information in their `package.json` file. These aren't detected by License Finder, and must be verified manually.
### License Finder commands
NOTE:
License Finder currently uses GitLab misused terms of `whitelist` and `blacklist`. As a result, the commands below reference those terms. We've created an [issue on their project](https://github.com/pivotal/LicenseFinder/issues/745) to propose that they rename their commands.
There are a few basic commands License Finder provides that you'll need in order to manage license detection.
There are a few basic commands License Finder provides that you need in order to manage license detection.
To verify that the checks are passing, and/or to see what dependencies are causing the checks to fail:
......
......@@ -7,7 +7,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
# Mass inserting Rails models
Setting the environment variable [`MASS_INSERT=1`](rake_tasks.md#environment-variables)
when running [`rake setup`](rake_tasks.md) will create millions of records, but these records
when running [`rake setup`](rake_tasks.md) creates millions of records, but these records
aren't visible to the `root` user by default.
To make any number of the mass-inserted projects visible to the `root` user, run
......
......@@ -47,7 +47,7 @@ Cache Hit:
resource.
1. If the `If-None-Match` header matches the current value in Redis we know
that the resource did not change so we can send 304 response immediately,
without querying the database at all. The client's browser will use the
without querying the database at all. The client's browser uses the
cached response.
1. If the `If-None-Match` header does not match the current value in Redis
we have to generate a new response, because the resource changed.
......
......@@ -16,7 +16,7 @@ target ID. For example, at the time of writing we have such a setup for
- `source_type`: a string defining the model to use, can be either `Project` or
`Namespace`.
- `source_id`: the ID of the row to retrieve based on `source_type`. For
example, when `source_type` is `Project` then `source_id` will contain a
example, when `source_type` is `Project` then `source_id` contains a
project ID.
While such a setup may appear to be useful, it comes with many drawbacks; enough
......@@ -24,8 +24,8 @@ that you should avoid this at all costs.
## Space Wasted
Because this setup relies on string values to determine the model to use it will
end up wasting a lot of space. For example, for `Project` and `Namespace` the
Because this setup relies on string values to determine the model to use, it
wastes a lot of space. For example, for `Project` and `Namespace` the
maximum size is 9 bytes, plus 1 extra byte for every string when using
PostgreSQL. While this may only be 10 bytes per row, given enough tables and
rows using such a setup we can end up wasting quite a bit of disk space and
......@@ -84,7 +84,7 @@ Let's say you have a `members` table storing both approved and pending members,
for both projects and groups, and the pending state is determined by the column
`requested_at` being set or not. Schema wise such a setup can lead to various
columns only being set for certain rows, wasting space. It's also possible that
certain indexes will only be set for certain rows, again wasting space. Finally,
certain indexes are only set for certain rows, again wasting space. Finally,
querying such a table requires less than ideal queries. For example:
```sql
......@@ -121,7 +121,7 @@ WHERE group_id = 4
```
If you want to get both you can use a UNION, though you need to be explicit
about what columns you want to SELECT as otherwise the result set will use the
about what columns you want to SELECT as otherwise the result set uses the
columns of the first query. For example:
```sql
......@@ -147,6 +147,6 @@ filter rows using the `IS NULL` condition.
To summarize: using separate tables allows us to use foreign keys effectively,
create indexes only where necessary, conserve space, query data more
efficiently, and scale these tables more easily (e.g. by storing them on
separate disks). A nice side effect of this is that code can also become easier
as you won't end up with a single model having to handle different kinds of
separate disks). A nice side effect of this is that code can also become easier,
as a single model isn't responsible for handling different kinds of
data.
......@@ -8,7 +8,7 @@ info: To determine the technical writer assigned to the Stage/Group associated w
Post deployment migrations are regular Rails migrations that can optionally be
executed after a deployment. By default these migrations are executed alongside
the other migrations. To skip these migrations you will have to set the
the other migrations. To skip these migrations you must set the
environment variable `SKIP_POST_DEPLOYMENT_MIGRATIONS` to a non-empty value
when running `rake db:migrate`.
......@@ -19,7 +19,7 @@ migrations:
bundle exec rake db:migrate
```
This however will skip post deployment migrations:
This however skips post deployment migrations:
```shell
SKIP_POST_DEPLOYMENT_MIGRATIONS=true bundle exec rake db:migrate
......@@ -40,7 +40,7 @@ Once all servers have been updated you can run `chef-client` again on a single
server _without_ the environment variable.
The process is similar for other deployment techniques: first you would deploy
with the environment variable set, then you'll essentially re-deploy a single
with the environment variable set, then you re-deploy a single
server but with the variable _unset_.
## Creating Migrations
......@@ -51,7 +51,7 @@ To create a post deployment migration you can use the following Rails generator:
bundle exec rails g post_deployment_migration migration_name_here
```
This will generate the migration file in `db/post_migrate`. These migrations
This generates the migration file in `db/post_migrate`. These migrations
behave exactly like regular Rails migrations.
## Use Cases
......
......@@ -24,7 +24,7 @@ When using the script, command-line documentation is available by passing no
arguments.
When using the method in an interactive console session, any changes to the
application code within that console session will be reflected in the profiler
application code within that console session is reflected in the profiler
output.
For example:
......@@ -37,14 +37,14 @@ Gitlab::Profiler.profile('/my-user')
# Returns a RubyProf::Profile where 100 seconds is spent in UsersController#show
```
For routes that require authorization you will need to provide a user to
For routes that require authorization you must provide a user to
`Gitlab::Profiler`. You can do this like so:
```ruby
Gitlab::Profiler.profile('/gitlab-org/gitlab-test', user: User.first)
```
Passing a `logger:` keyword argument to `Gitlab::Profiler.profile` will send
Passing a `logger:` keyword argument to `Gitlab::Profiler.profile` sends
ActiveRecord and ActionController log output to that logger. Further options are
documented with the method source.
......@@ -123,7 +123,7 @@ starting GitLab. For example:
ENABLE_BULLET=true bundle exec rails s
```
Bullet will log query problems to both the Rails log as well as the Chrome
Bullet logs query problems to both the Rails log as well as the Chrome
console.
As a follow up to finding `N+1` queries with Bullet, consider writing a [QueryRecoder test](query_recorder.md) to prevent a regression.
......
......@@ -101,7 +101,7 @@ format the reference as:
This default implementation is not very efficient, because we need to call
`#find_object` for each reference, which may require issuing a DB query every
time. For this reason, most reference filter implementations will instead use an
time. For this reason, most reference filter implementations instead use an
optimization included in `AbstractReferenceFilter`:
> `AbstractReferenceFilter` provides a lazily initialized value
......@@ -140,7 +140,7 @@ We are skipping:
To avoid filtering such nodes for each `ReferenceFilter`, we do it only once and store the result in the result Hash of the pipeline as `result[:reference_filter_nodes]`.
Pipeline `result` is passed to each filter for modification, so every time when `ReferenceFilter` replaces text or link tag, filtered list (`reference_filter_nodes`) will be updated for the next filter to use.
Pipeline `result` is passed to each filter for modification, so every time when `ReferenceFilter` replaces text or link tag, filtered list (`reference_filter_nodes`) are updated for the next filter to use.
## Reference parsers
......@@ -199,4 +199,4 @@ In practice, all reference parsers inherit from [`BaseParser`](https://gitlab.co
- `#nodes_user_can_reference(user, nodes)` to filter nodes directly.
A failure to implement this class for each reference type means that the
application will raise exceptions during Markdown processing.
application raises exceptions during Markdown processing.
......@@ -16,7 +16,7 @@ scalability and reliability.
_[diagram source - GitLab employees only](https://docs.google.com/drawings/d/1RTGtuoUrE0bDT-9smoHbFruhEMI4Ys6uNrufe5IA-VI/edit)_
The diagram above shows a GitLab reference architecture scaled up for 50,000
users. We will discuss each component below.
users. We discuss each component below.
## Components
......@@ -26,11 +26,10 @@ The PostgreSQL database holds all metadata for projects, issues, merge
requests, users, etc. The schema is managed by the Rails application
[db/structure.sql](https://gitlab.com/gitlab-org/gitlab/blob/master/db/structure.sql).
GitLab Web/API servers and Sidekiq nodes talk directly to the database via a
Rails object relational model (ORM). Most SQL queries are accessed via this
GitLab Web/API servers and Sidekiq nodes talk directly to the database by using a
Rails object relational model (ORM). Most SQL queries are accessed by using this
ORM, although some custom SQL is also written for performance or for
exploiting advanced PostgreSQL features (e.g. recursive CTEs, LATERAL JOINs,
etc.).
exploiting advanced PostgreSQL features (like recursive CTEs or LATERAL JOINs).
The application has a tight coupling to the database schema. When the
application starts, Rails queries the database schema, caching the tables and
......@@ -42,8 +41,8 @@ no-downtime changes](what_requires_downtime.md).
#### Multi-tenancy
A single database is used to store all customer data. Each user can belong to
many groups or projects, and the access level (e.g. guest, developer,
maintainer, etc.) to groups and projects determines what users can see and
many groups or projects, and the access level (including guest, developer, or
maintainer) to groups and projects determines what users can see and
what they can access.
Users with admin access can access all projects and even impersonate
......@@ -70,7 +69,7 @@ dates](https://gitlab.com/groups/gitlab-org/-/epics/2023). For example,
the `events` and `audit_events` table are natural candidates for this
kind of partitioning.
Sharding is likely more difficult and will require significant changes
Sharding is likely more difficult and requires significant changes
to the schema and application. For example, if we have to store projects
in many different databases, we immediately run into the question, "How
can we retrieve data across different projects?" One answer to this is
......@@ -78,7 +77,7 @@ to abstract data access into API calls that abstract the database from
the application, but this is a significant amount of work.
There are solutions that may help abstract the sharding to some extent
from the application. For example, we will want to look at [Citus
from the application. For example, we want to look at [Citus
Data](https://www.citusdata.com/product/community) closely. Citus Data
provides a Rails plugin that adds a [tenant ID to ActiveRecord
models](https://www.citusdata.com/blog/2017/01/05/easily-scale-out-multi-tenant-apps/).
......@@ -100,17 +99,16 @@ systems.
A recent [database checkup shows a breakdown of the table sizes on
GitLab.com](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/8022#master-1022016101-8).
Since `merge_request_diff_files` contains over 1 TB of data, we will want to
Since `merge_request_diff_files` contains over 1 TB of data, we want to
reduce/eliminate this table first. GitLab has support for [storing diffs in
object storage](../administration/merge_request_diffs.md), which we [will
want to do on
object storage](../administration/merge_request_diffs.md), which we [want to do on
GitLab.com](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7356).
#### High availability
There are several strategies to provide high-availability and redundancy:
- Write-ahead logs (WAL) streamed to object storage (e.g. S3, Google Cloud
- Write-ahead logs (WAL) streamed to object storage (for example, S3, or Google Cloud
Storage).
- Read-replicas (hot backups).
- Delayed replicas.
......@@ -126,11 +124,10 @@ the read replicas. [Omnibus ships with both repmgr and Patroni](../administratio
#### Load-balancing
GitLab EE has [application support for load balancing using read
replicas](../administration/database_load_balancing.md). This load
balancer does some smart things that are not traditionally available in
standard load balancers. For example, the application will only consider a
replica if its replication lag is low (e.g. WAL data behind by < 100
megabytes).
replicas](../administration/database_load_balancing.md). This load balancer does
some actions that aren't traditionally available in standard load balancers. For
example, the application considers a replica only if its replication lag is low
(for example, WAL data behind by less than 100 MB).
More [details are in a blog
post](https://about.gitlab.com/blog/2017/10/02/scaling-the-gitlab-database/).
......@@ -140,7 +137,7 @@ post](https://about.gitlab.com/blog/2017/10/02/scaling-the-gitlab-database/).
As PostgreSQL forks a backend process for each request, PostgreSQL has a
finite limit of connections that it can support, typically around 300 by
default. Without a connection pooler like PgBouncer, it's quite possible to
hit connection limits. Once the limits are reached, then GitLab will generate
hit connection limits. Once the limits are reached, then GitLab generates
errors or slow down as it waits for a connection to be available.
#### High availability
......@@ -151,7 +148,7 @@ background job and/or Web requests. There are two ways to address this
limitation:
- Run multiple PgBouncer instances.
- Use a multi-threaded connection pooler (e.g.
- Use a multi-threaded connection pooler (for example,
[Odyssey](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7776).
On some Linux systems, it's possible to run [multiple PgBouncer instances on
......@@ -192,9 +189,9 @@ connections gracefully.
There are three ways Redis is used in GitLab:
- Queues. Sidekiq jobs marshal jobs into JSON payloads.
- Persistent state. Session data, exclusive leases, etc.
- Cache. Repository data (e.g. Branch and tag names), view partials, etc.
- Queues: Sidekiq jobs marshal jobs into JSON payloads.
- Persistent state: Session data and exclusive leases.
- Cache: Repository data (like Branch and tag names) and view partials.
For GitLab instances running at scale, splitting Redis usage into
separate Redis clusters helps for two reasons:
......@@ -206,8 +203,8 @@ For example, the cache instance can behave like an least-recently used
(LRU) cache by setting the `maxmemory` configuration option. That option
should not be set for the queues or persistent clusters because data
would be evicted from memory at random times. This would cause jobs to
be dropped on the floor, which would cause many problems (e.g. merges
not running, builds not updating, etc.).
be dropped on the floor, which would cause many problems (like merges
not running or builds not updating).
Sidekiq also polls its queues quite frequently, and this activity can
slow down other queries. For this reason, having a dedicated Redis
......@@ -219,7 +216,7 @@ Redis process.
Single-core: Like PgBouncer, a single Redis process can only use one
core. It does not support multi-threading.
Dumb secondaries: Redis secondaries (aka replicas) don't actually
Dumb secondaries: Redis secondaries (also known as replicas) don't actually
handle any load. Unlike PostgreSQL secondaries, they don't even serve
read queries. They simply replicate data from the primary and take over
only when the primary fails.
......@@ -236,7 +233,7 @@ election to determine a new leader.
No leader: A Redis cluster can get into a mode where there are no
primaries. For example, this can happen if Redis nodes are misconfigured
to follow the wrong node. Sometimes this requires forcing one node to
become a primary via the [`REPLICAOF NO ONE`
become a primary by using the [`REPLICAOF NO ONE`
command](https://redis.io/commands/replicaof).
### Sidekiq
......@@ -260,8 +257,8 @@ directories in the GitLab code base.
As jobs are added to the Sidekiq queue, Sidekiq worker threads need to
pull these jobs from the queue and finish them at a rate faster than
they are added. When an imbalance occurs (e.g. delays in the database,
slow jobs, etc.), Sidekiq queues can balloon and lead to runaway queues.
they are added. When an imbalance occurs (for example, delays in the database
or slow jobs), Sidekiq queues can balloon and lead to runaway queues.
In recent months, many of these queues have ballooned due to delays in
PostgreSQL, PgBouncer, and Redis. For example, PgBouncer saturation can
......@@ -278,11 +275,11 @@ in a timely manner:
used to process each commit message in the push, but now it farms out
this to `ProcessCommitWorker`.
- Redistribute/gerrymander Sidekiq processes by queue
types. Long-running jobs (e.g. relating to project import) can often
squeeze out jobs that run fast (e.g. delivering e-mail). [This technique
types. Long-running jobs (for example, relating to project import) can often
squeeze out jobs that run fast (for example, delivering e-mail). [This technique
was used in to optimize our existing Sidekiq deployment](https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/7219#note_218019483).
- Optimize jobs. Eliminating unnecessary work, reducing network calls
(e.g. SQL, Gitaly, etc.), and optimizing processor time can yield significant
(including SQL and Gitaly), and optimizing processor time can yield significant
benefits.
From the Sidekiq logs, it's possible to see which jobs run the most
......
......@@ -73,7 +73,7 @@ shell check:
```
TIP: **Tip:**
By default, ShellCheck will use the [shell detection](https://github.com/koalaman/shellcheck/wiki/SC2148#rationale)
By default, ShellCheck uses the [shell detection](https://github.com/koalaman/shellcheck/wiki/SC2148#rationale)
to determine the shell dialect in use. If the shell file is out of your control and ShellCheck cannot
detect the dialect, use `-s` flag to specify it: `-s sh` or `-s bash`.
......@@ -101,7 +101,7 @@ shfmt:
```
TIP: **Tip:**
By default, shfmt will use the [shell detection](https://github.com/mvdan/sh#shfmt) similar to one of ShellCheck
By default, shfmt uses the [shell detection](https://github.com/mvdan/sh#shfmt) similar to one of ShellCheck
and ignore files starting with a period. To override this, use `-ln` flag to specify the shell dialect:
`-ln posix` or `-ln bash`.
......
......@@ -46,7 +46,7 @@ We have three challenges here: performance, availability, and scalability.
### Performance
Rails process are expensive in terms of both CPU and memory. Ruby [global interpreter lock](https://en.wikipedia.org/wiki/Global_interpreter_lock) adds to cost too because the Ruby process will spend time on I/O operations on step 3 causing incoming requests to pile up.
Rails process are expensive in terms of both CPU and memory. Ruby [global interpreter lock](https://en.wikipedia.org/wiki/Global_interpreter_lock) adds to cost too because the Ruby process spends time on I/O operations on step 3 causing incoming requests to pile up.
In order to improve this, [disk buffered upload](#disk-buffered-upload) was implemented. With this, Rails no longer deals with writing uploaded files to disk.
......@@ -88,7 +88,7 @@ To address this problem an HA object storage can be used and it's supported by [
Scaling NFS is outside of our support scope, and NFS is not a part of cloud native installations.
All features that require Sidekiq and do not use direct upload won't work without NFS. In Kubernetes, machine boundaries translate to PODs, and in this case the uploaded file will be written into the POD private disk. Since Sidekiq POD cannot reach into other pods, the operation will fail to read it.
All features that require Sidekiq and do not use direct upload doesn't work without NFS. In Kubernetes, machine boundaries translate to PODs, and in this case the uploaded file is written into the POD private disk. Since Sidekiq POD cannot reach into other pods, the operation fails to read it.
## How to select the proper level of acceleration?
......@@ -96,7 +96,7 @@ Selecting the proper acceleration is a tradeoff between speed of development and
We can identify three major use-cases for an upload:
1. **storage:** if we are uploading for storing a file (i.e. artifacts, packages, discussion attachments). In this case [direct upload](#direct-upload) is the proper level as it's the less resource-intensive operation. Additional information can be found on [File Storage in GitLab](file_storage.md).
1. **storage:** if we are uploading for storing a file (like artifacts, packages, or discussion attachments). In this case [direct upload](#direct-upload) is the proper level as it's the less resource-intensive operation. Additional information can be found on [File Storage in GitLab](file_storage.md).
1. **in-controller/synchronous processing:** if we allow processing **small files** synchronously, using [disk buffered upload](#disk-buffered-upload) may speed up development.
1. **Sidekiq/asynchronous processing:** Asynchronous processing must implement [direct upload](#direct-upload), the reason being that it's the only way to support Cloud Native deployments without a shared NFS.
......@@ -120,7 +120,7 @@ We have three kinds of file encoding in our uploads:
1. <i class="fa fa-check-circle"></i> **multipart**: `multipart/form-data` is the most common, a file is encoded as a part of a multipart encoded request.
1. <i class="fa fa-check-circle"></i> **body**: some APIs uploads files as the whole request body.
1. <i class="fa fa-times-circle"></i> **JSON**: some JSON API uploads files as base64 encoded strings. This will require a change to GitLab Workhorse, which [is planned](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/226).
1. <i class="fa fa-times-circle"></i> **JSON**: some JSON API uploads files as base64 encoded strings. This requires a change to GitLab Workhorse, which [is planned](https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/226).
## Uploading technologies
......@@ -166,7 +166,7 @@ is replaced with the path to the corresponding file before it is forwarded to
Rails.
To prevent abuse of this feature, Workhorse signs the modified request with a
special header, stating which entries it modified. Rails will ignore any
special header, stating which entries it modified. Rails ignores any
unsigned path entries.
```mermaid
......@@ -220,8 +220,8 @@ In this setup, an extra Rails route must be implemented in order to handle autho
and [its routes](https://gitlab.com/gitlab-org/gitlab/blob/cc723071ad337573e0360a879cbf99bc4fb7adb9/config/routes/git_http.rb#L31-32).
- [API endpoints for uploading packages](packages.md#file-uploads).
This will fallback to _disk buffered upload_ when `direct_upload` is disabled inside the [object storage setting](../administration/uploads.md#object-storage-settings).
The answer to the `/authorize` call will only contain a file system path.
This falls back to _disk buffered upload_ when `direct_upload` is disabled inside the [object storage setting](../administration/uploads.md#object-storage-settings).
The answer to the `/authorize` call contains only a file system path.
```mermaid
sequenceDiagram
......@@ -272,7 +272,7 @@ sequenceDiagram
## How to add a new upload route
In this section, we'll describe how to add a new upload route [accelerated](#uploading-technologies) by Workhorse for [body and multipart](#upload-encodings) encoded uploads.
In this section, we describe how to add a new upload route [accelerated](#uploading-technologies) by Workhorse for [body and multipart](#upload-encodings) encoded uploads.
Uploads routes belong to one of these categories:
......
......@@ -10,7 +10,7 @@ A possible security concern when managing a public facing GitLab instance is
the ability to steal a users IP address by referencing images in issues, comments, etc.
For example, adding `![Example image](http://example.com/example.png)` to
an issue description will cause the image to be loaded from the external
an issue description causes the image to be loaded from the external
server in order to be displayed. However, this also allows the external server
to log the IP address of the user.
......@@ -51,7 +51,7 @@ To install a Camo server as an asset proxy:
| `asset_proxy_enabled` | Enable proxying of assets. If enabled, requires: `asset_proxy_url`). |
| `asset_proxy_secret_key` | Shared secret with the asset proxy server. |
| `asset_proxy_url` | URL of the asset proxy server. |
| `asset_proxy_whitelist` | Assets that match these domain(s) will NOT be proxied. Wildcards allowed. Your GitLab installation URL is automatically whitelisted. |
| `asset_proxy_whitelist` | Assets that match these domain(s) are NOT proxied. Wildcards allowed. Your GitLab installation URL is automatically whitelisted. |
1. Restart the server for the changes to take effect. Each time you change any values for the asset
proxy, you need to restart the server.
......@@ -59,7 +59,7 @@ To install a Camo server as an asset proxy:
## Using the Camo server
Once the Camo server is running and you've enabled the GitLab settings, any image, video, or audio that
references an external source will get proxied to the Camo server.
references an external source are proxied to the Camo server.
For example, the following is a link to an image in Markdown:
......
......@@ -32,10 +32,10 @@ Rack Attack disabled.
## Behavior
If set up as described in the [Settings](#settings) section below, two behaviors
will be enabled:
are enabled:
- Protected paths will be throttled.
- Failed authentications for Git and container registry requests will trigger a temporary IP ban.
- Protected paths are throttled.
- Failed authentications for Git and container registry requests trigger a temporary IP ban.
### Protected paths throttle
......@@ -119,7 +119,7 @@ The following settings can be configured:
specified time.
- `findtime`: The maximum amount of time that failed requests can count against an IP
before it's blacklisted (in seconds).
- `bantime`: The total amount of time that a blacklisted IP will be blocked (in
- `bantime`: The total amount of time that a blacklisted IP is blocked (in
seconds).
**Installations from source**
......@@ -142,8 +142,8 @@ taken in order to enable protection for your GitLab instance:
If you want more restrictive/relaxed throttle rules, edit
`config/initializers/rack_attack.rb` and change the `limit` or `period` values.
For example, more relaxed throttle rules will be if you set
`limit: 3` and `period: 1.seconds` (this will allow 3 requests per second).
For example, you can set more relaxed throttle rules with
`limit: 3` and `period: 1.seconds`, allowing 3 requests per second.
You can also add other paths to the protected list by adding to `paths_to_be_protected`
variable. If you change any of these settings you must restart your
GitLab instance.
......@@ -185,10 +185,10 @@ In case you want to remove a blocked IP, follow these steps:
### Rack attack is blacklisting the load balancer
Rack Attack may block your load balancer if all traffic appears to come from
the load balancer. In that case, you will need to:
the load balancer. In that case, you must:
1. [Configure `nginx[real_ip_trusted_addresses]`](https://docs.gitlab.com/omnibus/settings/nginx.html#configuring-gitlab-trusted_proxies-and-the-nginx-real_ip-module).
This will keep users' IPs from being listed as the load balancer IPs.
This keeps users' IPs from being listed as the load balancer IPs.
1. Whitelist the load balancer's IP address(es) in the Rack Attack [settings](#settings).
1. Reconfigure GitLab:
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment