Any change submitted can have an impact not only on the application itself but
also those maintaining it and those keeping it up and running (e.g. production
also those maintaining it and those keeping it up and running (for example, production
engineers). As a result you should think carefully about the impact of your
merge request on not only the application but also on the people keeping it up
and running.
...
...
@@ -85,34 +85,34 @@ the following:
1. Is there something that we can do differently to not process such a
big data set?
1. Should we build some fail-safe mechanism to contain
computational complexity? Usually it is better to degrade
computational complexity? Usually it's better to degrade
the service for a single user instead of all users.
## Query plans and database structure
The query plan can answer the questions whether we need additional
indexes, or whether we perform expensive filtering (i.e. using sequential scans).
The query plan can tell us if we will need additional
indexes, or expensive filtering (such as using sequential scans).
Each query plan should be run against substantial size of data set.
For example if you look for issues with specific conditions,
you should consider validating the query against
For example, if you look for issues with specific conditions,
you should consider validating a query against
a small number (a few hundred) and a big number (100_000) of issues.
See how the query will behave if the result will be a few
and a few thousand.
This is needed as we have users using GitLab for very big projects and
in a very unconventional way. Even, if it seems that it is unlikely
that such big data set will be used, it is still plausible that one
of our customers will have the problem with the feature.
in a very unconventional way. Even if it seems that it's unlikely
that such a big data set will be used, it's still plausible that one
of our customers will encounter a problem with the feature.
Understanding ahead of time how it is going to behave at scale even if we accept it,
is the desired outcome. We should always have a plan or understanding what it takes
to optimise feature to the magnitude of higher usage patterns.
Understanding ahead of time how it's going to behave at scale, even if we accept it,
is the desired outcome. We should always have a plan or understanding of what it will take
to optimize the feature for higher usage patterns.
Every database structure should be optimised and sometimes even over-described
to be prepared to be easily extended. The hardest part after some point is
Every database structure should be optimized and sometimes even over-described
in preparation for easy extension. The hardest part after some point is
data migration. Migrating millions of rows will always be troublesome and
can have negative impact on application.
can have a negative impact on the application.
To better understand how to get help with the query plan reviews
read this section on [how to prepare the merge request for a database review](https://docs.gitlab.com/ee/development/database_review.html#how-to-prepare-the-merge-request-for-a-database-review).
...
...
@@ -167,14 +167,14 @@ be clearly mentioned in the merge request description.
## Batch process
**Summary:** Iterating a single process to external services (e.g. PostgreSQL, Redis, Object Storage, etc)
**Summary:** Iterating a single process to external services (for example, PostgreSQL, Redis, Object Storage)
should be executed in a **batch-style** in order to reduce connection overheads.
For fetching rows from various tables in a batch-style, please see [Eager Loading](#eager-loading) section.
### Example: Delete multiple files from Object Storage
When you delete multiple files from object storage (e.g. GCS),
When you delete multiple files from object storage, like GCS,
executing a single REST API call multiple times is a quite expensive
process. Ideally, this should be done in a batch-style, for example, S3 provides
@@ -257,9 +259,9 @@ One of the reasons of the increased memory footprint could be Ruby memory fragme
To diagnose it, you can visualize Ruby heap as described in [this post by Aaron Patterson](https://tenderlovemaking.com/2017/09/27/visualizing-your-ruby-heap.html).
To start, you want to dump the heap of the process you are investigating to a JSON file.
To start, you want to dump the heap of the process you're investigating to a JSON file.
You need to run the command inside the process you are exploring, you may do that with `rbtrace`.
You need to run the command inside the process you're exploring, you may do that with `rbtrace`.
`rbtrace` is already present in GitLab `Gemfile`, you just need to require it.
It could be achieved running webserver or Sidekiq with the environment variable set to `ENABLE_RBTRACE=1`.
...
...
@@ -274,7 +276,7 @@ Having the JSON, you finally could render a picture using the script [provided b
```shell
ruby heapviz.rb heap.json
```
Fragmented Ruby heap snapshot could look like this: