Any change submitted can have an impact not only on the application itself but
also those maintaining it and those keeping it up and running (e.g. production
also those maintaining it and those keeping it up and running (for example, production
engineers). As a result you should think carefully about the impact of your
merge request on not only the application but also on the people keeping it up
and running.
...
...
@@ -85,34 +85,34 @@ the following:
1. Is there something that we can do differently to not process such a
big data set?
1. Should we build some fail-safe mechanism to contain
computational complexity? Usually it is better to degrade
computational complexity? Usually it's better to degrade
the service for a single user instead of all users.
## Query plans and database structure
The query plan can answer the questions whether we need additional
indexes, or whether we perform expensive filtering (i.e. using sequential scans).
The query plan can tell us if we will need additional
indexes, or expensive filtering (such as using sequential scans).
Each query plan should be run against substantial size of data set.
For example if you look for issues with specific conditions,
you should consider validating the query against
For example, if you look for issues with specific conditions,
you should consider validating a query against
a small number (a few hundred) and a big number (100_000) of issues.
See how the query will behave if the result will be a few
and a few thousand.
This is needed as we have users using GitLab for very big projects and
in a very unconventional way. Even, if it seems that it is unlikely
that such big data set will be used, it is still plausible that one
of our customers will have the problem with the feature.
in a very unconventional way. Even if it seems that it's unlikely
that such a big data set will be used, it's still plausible that one
of our customers will encounter a problem with the feature.
Understanding ahead of time how it is going to behave at scale even if we accept it,
is the desired outcome. We should always have a plan or understanding what it takes
to optimise feature to the magnitude of higher usage patterns.
Understanding ahead of time how it's going to behave at scale, even if we accept it,
is the desired outcome. We should always have a plan or understanding of what it will take
to optimize the feature for higher usage patterns.
Every database structure should be optimised and sometimes even over-described
to be prepared to be easily extended. The hardest part after some point is
Every database structure should be optimized and sometimes even over-described
in preparation for easy extension. The hardest part after some point is
data migration. Migrating millions of rows will always be troublesome and
can have negative impact on application.
can have a negative impact on the application.
To better understand how to get help with the query plan reviews
read this section on [how to prepare the merge request for a database review](https://docs.gitlab.com/ee/development/database_review.html#how-to-prepare-the-merge-request-for-a-database-review).
...
...
@@ -167,14 +167,14 @@ be clearly mentioned in the merge request description.
## Batch process
**Summary:** Iterating a single process to external services (e.g. PostgreSQL, Redis, Object Storage, etc)
**Summary:** Iterating a single process to external services (for example, PostgreSQL, Redis, Object Storage)
should be executed in a **batch-style** in order to reduce connection overheads.
For fetching rows from various tables in a batch-style, please see [Eager Loading](#eager-loading) section.
### Example: Delete multiple files from Object Storage
When you delete multiple files from object storage (e.g. GCS),
When you delete multiple files from object storage, like GCS,
executing a single REST API call multiple times is a quite expensive
process. Ideally, this should be done in a batch-style, for example, S3 provides
@@ -257,9 +259,9 @@ One of the reasons of the increased memory footprint could be Ruby memory fragme
To diagnose it, you can visualize Ruby heap as described in [this post by Aaron Patterson](https://tenderlovemaking.com/2017/09/27/visualizing-your-ruby-heap.html).
To start, you want to dump the heap of the process you are investigating to a JSON file.
To start, you want to dump the heap of the process you're investigating to a JSON file.
You need to run the command inside the process you are exploring, you may do that with `rbtrace`.
You need to run the command inside the process you're exploring, you may do that with `rbtrace`.
`rbtrace` is already present in GitLab `Gemfile`, you just need to require it.
It could be achieved running webserver or Sidekiq with the environment variable set to `ENABLE_RBTRACE=1`.
...
...
@@ -295,11 +297,11 @@ There is no clear set of steps that you can follow to determine if a certain
piece of code is worth optimizing. The only two things you can do are:
1. Think about what the code does, how it's used, how many times it's called and
how much time is spent in it relative to the total execution time (e.g., the
how much time is spent in it relative to the total execution time (for example, the
total time spent in a web request).
1. Ask others (preferably in the form of an issue).
Some examples of changes that aren't really important/worth the effort:
Some examples of changes that are not really important/worth the effort:
- Replacing double quotes with single quotes.
- Replacing usage of Array with Set when the list of values is very small.
...
...
@@ -309,7 +311,7 @@ Some examples of changes that aren't really important/worth the effort:
## Slow Operations & Sidekiq
Slow operations (e.g. merging branches) or operations that are prone to errors
Slow operations, like merging branches, or operations that are prone to errors
(using external APIs) should be performed in a Sidekiq worker instead of
directly in a web request as much as possible. This has numerous benefits such
as:
...
...
@@ -416,7 +418,7 @@ as omitting it may lead to style check failures.
## Anti-Patterns
This is a collection of [anti-patterns][anti-pattern] that should be avoided
unless these changes have a measurable, significant and positive impact on
unless these changes have a measurable, significant, and positive impact on
production environments.
### Moving Allocations to Constants
...
...
@@ -458,5 +460,4 @@ You may find some useful examples in this snippet: