@@ -282,8 +282,8 @@ When deployed, GitLab should be considered the amalgamation of the below process
GitLab can be considered to have two layers from a process perspective:
-**Monitoring**: Anything from this layer is not required to deliver GitLab the application, but will allow administrators more insight into their infrastructure and what the service as a whole is doing.
-**Core**: Any process that is vital for the delivery of GitLab as a platform. If any of these processes halt there will be a GitLab outage. For the Core layer, you can further divide into:
-**Monitoring**: Anything from this layer is not required to deliver GitLab the application, but allows administrators more insight into their infrastructure and what the service as a whole is doing.
-**Core**: Any process that is vital for the delivery of GitLab as a platform. If any of these processes halt, a GitLab outage results. For the Core layer, you can further divide into:
-**Processors**: These processes are responsible for actually performing operations and presenting the service.
-**Data**: These services store/expose structured data for the GitLab service.
...
...
@@ -297,7 +297,7 @@ GitLab can be considered to have two layers from a process perspective:
- Process: `alertmanager`
- GitLab.com: [Monitoring of GitLab.com](https://about.gitlab.com/handbook/engineering/monitoring/)
[Alert manager](https://prometheus.io/docs/alerting/latest/alertmanager/) is a tool provided by Prometheus that _"handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or Opsgenie. It also takes care of silencing and inhibition of alerts."_ You can read more in [issue #45740](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/45740) about what we will be alerting on.
[Alert manager](https://prometheus.io/docs/alerting/latest/alertmanager/) is a tool provided by Prometheus that _"handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or Opsgenie. It also takes care of silencing and inhibition of alerts."_ You can read more in [issue #45740](https://gitlab.com/gitlab-org/gitlab-foss/-/issues/45740) about what we alert on.
#### Certificate management
...
...
@@ -600,7 +600,7 @@ GitLab packages the popular Database to provide storage for Application meta dat
- Process: `postgres-exporter`
- GitLab.com: [Monitoring of GitLab.com](https://about.gitlab.com/handbook/engineering/monitoring/)
[`postgres_exporter`](https://github.com/wrouesnel/postgres_exporter) is the community provided Prometheus exporter that will deliver data about PostgreSQL to Prometheus for use in Grafana Dashboards.
[`postgres_exporter`](https://github.com/wrouesnel/postgres_exporter) is the community provided Prometheus exporter that delivers data about PostgreSQL to Prometheus for use in Grafana Dashboards.
#### Prometheus
...
...
@@ -656,10 +656,10 @@ Redis is packaged to provide a place to store:
The registry is what users use to store their own Docker images. The bundled
registry uses NGINX as a load balancer and GitLab as an authentication manager.
Whenever a client requests to pull or push an image from the registry, it will
return a `401` response along with a header detailing where to get an
authentication token, in this case the GitLab instance. The client will then
request a pull or push auth token from GitLab and retry the original request
Whenever a client requests to pull or push an image from the registry, it
returns a `401` response along with a header detailing where to get an
authentication token, in this case the GitLab instance. The client then
requests a pull or push auth token from GitLab and retries the original request
to the registry. Learn more about [token authentication](https://docs.docker.com/registry/spec/auth/token/).
An external registry can also be configured to use GitLab as an auth endpoint.
[Puma](https://puma.io/) is a Ruby application server that is used to run the core Rails Application that provides the user facing features in GitLab. Often process output you will see this as `bundle` or `config.ru` depending on the GitLab version.
[Puma](https://puma.io/) is a Ruby application server that is used to run the core Rails Application that provides the user facing features in GitLab. Often this displays in process output as `bundle` or `config.ru` depending on the GitLab version.
[Unicorn](https://yhbt.net/unicorn/) is a Ruby application server that is used to run the core Rails Application that provides the user facing features in GitLab. Often process output you will see this as `bundle` or `config.ru` depending on the GitLab version.
[Unicorn](https://yhbt.net/unicorn/) is a Ruby application server that is used to run the core Rails Application that provides the user facing features in GitLab. Often this displays in process output as `bundle` or `config.ru` depending on the GitLab version.
#### LDAP Authentication
...
...
@@ -792,16 +792,16 @@ It's important to understand the distinction as some processes are used in both
### GitLab Web HTTP request cycle
When making a request to an HTTP Endpoint (think `/users/sign_in`) the request will take the following path through the GitLab Service:
When making a request to an HTTP Endpoint (think `/users/sign_in`) the request takes the following path through the GitLab Service:
- NGINX - Acts as our first line reverse proxy.
- GitLab Workhorse - This determines if it needs to go to the Rails application or somewhere else to reduce load on Puma.
- Puma - Since this is a web request, and it needs to access the application it will go to Puma.
- Puma - Since this is a web request, and it needs to access the application, it routes to Puma.
- PostgreSQL/Gitaly/Redis - Depending on the type of request, it may hit these services to store or retrieve data.
### GitLab Git request cycle
Below we describe the different paths that HTTP vs. SSH Git requests will take. There is some overlap with the Web Request Cycle but also some differences.
Below we describe the different paths that HTTP vs. SSH Git requests take. There is some overlap with the Web Request Cycle but also some differences.
This is a blocking operation, but it won't cause problems because the table is not yet used,
This is a blocking operation, but it doesn't cause problems because the table is not yet used,
and therefore it does not have any records yet.
## Heavy operations in a single transaction
When using a single-transaction migration, a transaction will hold on a database connection
When using a single-transaction migration, a transaction holds a database connection
for the duration of the migration, so you must make sure the actions in the migration
do not take too much time: GitLab.com’s production database has a `15s` timeout, so
in general, the cumulative execution time in a migration should aim to fit comfortably
...
...
@@ -183,8 +183,8 @@ on the `users` table once it has been enqueued.
More information about PostgresSQL locks: [Explicit Locking](https://www.postgresql.org/docs/current/explicit-locking.html)
For stability reasons, GitLab.com has a specific [`statement_timeout`](../user/gitlab_com/index.md#postgresql)
set. When the migration is invoked, any database query will have
a fixed time to execute. In a worst-case scenario, the request will sit in the
set. When the migration is invoked, any database query has
a fixed time to execute. In a worst-case scenario, the request sits in the
lock queue, blocking other queries for the duration of the configured statement timeout,
then failing with `canceling statement due to statement timeout` error.
...
...
@@ -275,7 +275,7 @@ end
**Creating a new table when we have two foreign keys:**
For this, we'll need three migrations:
For this, we need three migrations:
1. Creating the table without foreign keys (with the indices).
1. Add foreign key to the first table.
...
...
@@ -355,7 +355,7 @@ def up
end
```
The RuboCop rule generally allows standard Rails migration methods, listed below. This example will cause a Rubocop offense:
The RuboCop rule generally allows standard Rails migration methods, listed below. This example causes a Rubocop offense:
```ruby
disable_ddl_transaction!
...
...
@@ -400,9 +400,9 @@ In a worst-case scenario, the method:
- Executes the block for a maximum of 50 times over 40 minutes.
- Most of the time is spent in a pre-configured sleep period after each iteration.
- After the 50th retry, the block will be executed without `lock_timeout`, just
- After the 50th retry, the block is executed without `lock_timeout`, just
like a standard migration invocation.
- If a lock cannot be acquired, the migration will fail with `statement timeout` error.
- If a lock cannot be acquired, the migration fails with `statement timeout` error.
The migration might fail if there is a very long running transaction (40+ minutes)
accessing the `users` table.
...
...
@@ -438,9 +438,9 @@ class MyMigration < ActiveRecord::Migration[6.0]
end
```
Here the call to `disable_statement_timeout`will use the connection local to
Here the call to `disable_statement_timeout`uses the connection local to
the `with_multiple_threads` block, instead of re-using the global connection
pool. This ensures each thread has its own connection object, and won't time
pool. This ensures each thread has its own connection object, and doesn't time
out when trying to obtain one.
PostgreSQL has a maximum amount of connections that it allows. This
...
...
@@ -583,7 +583,7 @@ remember to add an index on the column.
This is **required** for all foreign-keys, e.g., to support efficient cascading
deleting: when a lot of rows in a table get deleted, the referenced records need
to be deleted too. The database has to look for corresponding records in the
referenced table. Without an index, this will result in a sequential scan on the
referenced table. Without an index, this results in a sequential scan on the
table, which can take a long time.
Here's an example where we add a new column with a foreign key
...
...
@@ -621,7 +621,7 @@ the standard `add_column` helper should be used in all cases.
Before PostgreSQL 11, adding a column with a default was problematic as it would
have caused a full table rewrite. The corresponding helper `add_column_with_default`
has been deprecated and will be removed in a later release.
has been deprecated and is scheduled to be removed in a later release.
If a backport adding a column with a default value is needed for %12.9 or earlier versions,
it should use `add_column_with_default` helper. If a [large table](https://gitlab.com/gitlab-org/gitlab/-/blob/master/rubocop/rubocop-migrations.yml#L3)
...
...
@@ -667,7 +667,7 @@ without requiring `disable_ddl_transaction!`.
## Updating an existing column
To update an existing column to a particular value, you can use
`update_column_in_batches`. This will split the updates into batches, so we
`update_column_in_batches`. This splits the updates into batches, so we
don't update too many rows at in a single statement.
This updates the column `foo` in the `projects` table to 10, where `some_column`
...
...
@@ -782,12 +782,12 @@ end
## Integer column type
By default, an integer column can hold up to a 4-byte (32-bit) number. That is
a max value of 2,147,483,647. Be aware of this when creating a column that will
hold file sizes in byte units. If you are tracking file size in bytes, this
a max value of 2,147,483,647. Be aware of this when creating a column that
holds file sizes in byte units. If you are tracking file size in bytes, this
restricts the maximum file size to just over 2GB.
To allow an integer column to hold up to an 8-byte (64-bit) number, explicitly
set the limit to 8-bytes. This will allow the column to hold a value up to
set the limit to 8-bytes. This allows the column to hold a value up to
`9,223,372,036,854,775,807`.
Rails migration example:
...
...
@@ -837,7 +837,7 @@ timestamps with timezones:
-`datetime_with_timezone`
This ensures all timestamps have a time zone specified. This, in turn, means
existing timestamps won't suddenly use a different timezone when the system's
existing timestamps don't suddenly use a different timezone when the system's
timezone changes. It also makes it very clear which timezone was used in the
first place.
...
...
@@ -938,19 +938,17 @@ Since we had to do this a few times already, there are now some helpers to help
with this.
To use this you can include `Gitlab::Database::RenameReservedPathsMigration::V1`
in your migration. This will provide 3 methods which you can pass one or more
in your migration. This provides 3 methods which you can pass one or more
paths that need to be rejected.
**`rename_root_paths`**: This will rename the path of all _namespaces_ with the
-**`rename_root_paths`**: Renames the path of all _namespaces_ with the
given name that don't have a `parent_id`.
**`rename_child_paths`**: This will rename the path of all _namespaces_ with the
-**`rename_child_paths`**: Renames the path of all _namespaces_ with the
given name that have a `parent_id`.
**`rename_wildcard_paths`**: This will rename the path of all _projects_, and all
-**`rename_wildcard_paths`**: Renames the path of all _projects_, and all
_namespaces_ that have a `project_id`.
The `path` column for these rows will be renamed to their previous value followed
The `path` column for these rows are renamed to their previous value followed
by an integer. For example: `users` would turn into `users0`
By default, this seeds an average of 10 issues per week for the last 52 weeks
per project. All issues will also be randomly labeled with team, type, severity,
per project. All issues are also randomly labeled with team, type, severity,
and priority.
#### Seeding groups with sub-groups
...
...
@@ -116,8 +116,8 @@ so no worries about missing errors.
There are a few environment flags you can pass to change how projects are seeded
-`SIZE`: defaults to `8`, max: `32`. Amount of projects to create.
-`LARGE_PROJECTS`: defaults to false. If set will clone 6 large projects to help with testing.
-`FORK`: defaults to false. If set to `true` will fork `torvalds/linux` five times. Can also be set to an existing project full_path and it will fork that instead.
-`LARGE_PROJECTS`: defaults to false. If set, clones 6 large projects to help with testing.
-`FORK`: defaults to false. If set to `true`, forks `torvalds/linux` five times. Can also be set to an existing project `full_path` to fork that instead.
## Run tests
...
...
@@ -132,10 +132,10 @@ In order to run the test you can use the following commands:
`bin/rake spec` takes significant time to pass.
Instead of running the full test suite locally, you can save a lot of time by running
a single test or directory related to your changes. After you submit a merge request,
CI will run full test suite for you. Green CI status in the merge request means
CI runs full test suite for you. Green CI status in the merge request means
full test suite is passed.
You can't run `rspec .` since this will try to run all the `_spec.rb`
You can't run `rspec .` since this tries to run all the `_spec.rb`
files it can find, also the ones in `/tmp`
You can pass RSpec command line options to the `spec:unit`,
...
...
@@ -157,7 +157,7 @@ To run several tests inside one directory:
speeds up development by keeping your application running in the background so
you don't need to boot it every time you run a test, Rake task or migration.
If you want to use it, you'll need to export the `ENABLE_SPRING` environment
If you want to use it, you must export the `ENABLE_SPRING` environment
variable to `1`:
```shell
...
...
@@ -180,7 +180,7 @@ environment you can do so with the following command:
## Bypass the shell by splitting commands into separate tokens
When we pass shell commands as a single string to Ruby, Ruby will let`/bin/sh` evaluate the entire string. Essentially, we are asking the shell to evaluate a one-line script. This creates a risk for shell injection attacks. It is better to split the shell command into tokens ourselves. Sometimes we use the scripting capabilities of the shell to change the working directory or set environment variables. All of this can also be achieved securely straight from Ruby
When we pass shell commands as a single string to Ruby, Ruby lets`/bin/sh` evaluate the entire string. Essentially, we are asking the shell to evaluate a one-line script. This creates a risk for shell injection attacks. It is better to split the shell command into tokens ourselves. Sometimes we use the scripting capabilities of the shell to change the working directory or set environment variables. All of this can also be achieved securely straight from Ruby