Commit f1ef05a8 authored by Achilleas Pipinellis's avatar Achilleas Pipinellis

Merge branch 'docs/refactor-monitoring' into 'master'

Move monitoring/ to new location

From CE https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/6518

See merge request !797
parents 4dbb1f15 14f63091
...@@ -242,7 +242,11 @@ ...@@ -242,7 +242,11 @@
%fieldset %fieldset
%legend Metrics %legend Metrics
%p %p
These settings require a restart to take effect. Setup InfluxDB to measure a wide variety of statistics like the time spent
in running SQL queries. These settings require a
= link_to 'restart', help_page_path('administration/restart_gitlab')
to take effect.
= link_to icon('question-circle'), help_page_path('administration/monitoring/performance/introduction')
.form-group .form-group
.col-sm-offset-2.col-sm-10 .col-sm-offset-2.col-sm-10
.checkbox .checkbox
......
...@@ -62,8 +62,8 @@ ...@@ -62,8 +62,8 @@
- [GitLab Pages configuration](pages/administration.md) Configure GitLab Pages. - [GitLab Pages configuration](pages/administration.md) Configure GitLab Pages.
- [Elasticsearch](integration/elasticsearch.md) Enable Elasticsearch. - [Elasticsearch](integration/elasticsearch.md) Enable Elasticsearch.
- [GitLab GEO](gitlab-geo/README.md) Configure GitLab GEO, a secondary read-only GitLab instance. - [GitLab GEO](gitlab-geo/README.md) Configure GitLab GEO, a secondary read-only GitLab instance.
- [GitLab Performance Monitoring](monitoring/performance/introduction.md) Configure GitLab and InfluxDB for measuring performance metrics. - [GitLab Performance Monitoring](administration/monitoring/performance/introduction.md) Configure GitLab and InfluxDB for measuring performance metrics.
- [Monitoring uptime](monitoring/health_check.md) Check the server status using the health check endpoint. - [Monitoring uptime](user/admin_area/monitoring/health_check.md) Check the server status using the health check endpoint.
- [Debugging Tips](administration/troubleshooting/debug.md) Tips to debug problems when things go wrong - [Debugging Tips](administration/troubleshooting/debug.md) Tips to debug problems when things go wrong
- [Sidekiq Troubleshooting](administration/troubleshooting/sidekiq.md) Debug when Sidekiq appears hung and is not processing jobs. - [Sidekiq Troubleshooting](administration/troubleshooting/sidekiq.md) Debug when Sidekiq appears hung and is not processing jobs.
- [High Availability](administration/high_availability/README.md) Configure multiple servers for scaling or high availability. - [High Availability](administration/high_availability/README.md) Configure multiple servers for scaling or high availability.
......
# GitLab Configuration
GitLab Performance Monitoring is disabled by default. To enable it and change any of its
settings, navigate to the Admin area in **Settings > Metrics**
(`/admin/application_settings`).
The minimum required settings you need to set are the InfluxDB host and port.
Make sure _Enable InfluxDB Metrics_ is checked and hit **Save** to save the
changes.
---
![GitLab Performance Monitoring Admin Settings](img/metrics_gitlab_configuration_settings.png)
---
Finally, a restart of all GitLab processes is required for the changes to take
effect:
```bash
# For Omnibus installations
sudo gitlab-ctl restart
# For installations from source
sudo service gitlab restart
```
## Pending Migrations
When any migrations are pending, the metrics are disabled until the migrations
have been performed.
---
Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [InfluxDB Configuration](influxdb_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
- [Grafana Install/Configuration](grafana_configuration.md)
# Grafana Configuration
[Grafana](http://grafana.org/) is a tool that allows you to visualize time
series metrics through graphs and dashboards. It supports several backend
data stores, including InfluxDB. GitLab writes performance data to InfluxDB
and Grafana will allow you to query InfluxDB to display useful graphs.
For the easiest installation and configuration, install Grafana on the same
server as InfluxDB. For larger installations, you may want to split out these
services.
## Installation
Grafana supplies package repositories (Yum/Apt) for easy installation.
See [Grafana installation documentation](http://docs.grafana.org/installation/)
for detailed steps.
> **Note**: Before starting Grafana for the first time, set the admin user
and password in `/etc/grafana/grafana.ini`. Otherwise, the default password
will be `admin`.
## Configuration
Login as the admin user. Expand the menu by clicking the Grafana logo in the
top left corner. Choose 'Data Sources' from the menu. Then, click 'Add new'
in the top bar.
![Grafana empty data source page](img/grafana_data_source_empty.png)
Fill in the configuration details for the InfluxDB data source. Save and
Test Connection to ensure the configuration is correct.
- **Name**: InfluxDB
- **Default**: Checked
- **Type**: InfluxDB 0.9.x (Even if you're using InfluxDB 0.10.x)
- **Url**: https://localhost:8086 (Or the remote URL if you've installed InfluxDB
on a separate server)
- **Access**: proxy
- **Database**: gitlab
- **User**: admin (Or the username configured when setting up InfluxDB)
- **Password**: The password configured when you set up InfluxDB
![Grafana data source configurations](img/grafana_data_source_configuration.png)
## Apply retention policies and create continuous queries
If you intend to import the GitLab provided Grafana dashboards, you will need to
set up the right retention policies and continuous queries. The easiest way of
doing this is by using the [influxdb-management](https://gitlab.com/gitlab-org/influxdb-management)
repository.
To use this repository you must first clone it:
```
git clone https://gitlab.com/gitlab-org/influxdb-management.git
cd influxdb-management
```
Next you must install the required dependencies:
```
gem install bundler
bundle install
```
Now you must configure the repository by first copying `.env.example` to `.env`
and then editing the `.env` file to contain the correct InfluxDB settings. Once
configured you can simply run `bundle exec rake` and the InfluxDB database will
be configured for you.
For more information see the [influxdb-management README](https://gitlab.com/gitlab-org/influxdb-management/blob/master/README.md).
## Import Dashboards
You can now import a set of default dashboards that will give you a good
start on displaying useful information. GitLab has published a set of default
[Grafana dashboards][grafana-dashboards] to get you started. Clone the
repository or download a zip/tarball, then follow these steps to import each
JSON file.
Open the dashboard dropdown menu and click 'Import'
![Grafana dashboard dropdown](img/grafana_dashboard_dropdown.png)
Click 'Choose file' and browse to the location where you downloaded or cloned
the dashboard repository. Pick one of the JSON files to import.
![Grafana dashboard import](img/grafana_dashboard_import.png)
Once the dashboard is imported, be sure to click save icon in the top bar. If
you do not save the dashboard after importing it will be removed when you
navigate away.
![Grafana save icon](img/grafana_save_icon.png)
Repeat this process for each dashboard you wish to import.
Alternatively you can automatically import all the dashboards into your Grafana
instance. See the README of the [Grafana dashboards][grafana-dashboards]
repository for more information on this process.
[grafana-dashboards]: https://gitlab.com/gitlab-org/grafana-dashboards
---
Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Installation/Configuration](influxdb_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
# InfluxDB Configuration
The default settings provided by [InfluxDB] are not sufficient for a high traffic
GitLab environment. The settings discussed in this document are based on the
settings GitLab uses for GitLab.com, depending on your own needs you may need to
further adjust them.
If you are intending to run InfluxDB on the same server as GitLab, make sure
you have plenty of RAM since InfluxDB can use quite a bit depending on traffic.
Unless you are going with a budget setup, it's advised to run it separately.
## Requirements
- InfluxDB 0.9.5 or newer
- A fairly modern version of Linux
- At least 4GB of RAM
- At least 10GB of storage for InfluxDB data
Note that the RAM and storage requirements can differ greatly depending on the
amount of data received/stored. To limit the amount of stored data users can
look into [InfluxDB Retention Policies][influxdb-retention].
## Installation
Installing InfluxDB is out of the scope of this document. Please refer to the
[InfluxDB documentation].
## InfluxDB Server Settings
Since InfluxDB has many settings that users may wish to customize themselves
(e.g. what port to run InfluxDB on), we'll only cover the essentials.
The configuration file in question is usually located at
`/etc/influxdb/influxdb.conf`. Whenever you make a change in this file,
InfluxDB needs to be restarted.
### Storage Engine
InfluxDB comes with different storage engines and as of InfluxDB 0.9.5 a new
storage engine is available, called [TSM Tree]. All users **must** use the new
`tsm1` storage engine as this [will be the default engine][tsm1-commit] in
upcoming InfluxDB releases.
Make sure you have the following in your configuration file:
```
[data]
dir = "/var/lib/influxdb/data"
engine = "tsm1"
```
### Admin Panel
Production environments should have the InfluxDB admin panel **disabled**. This
feature can be disabled by adding the following to your InfluxDB configuration
file:
```
[admin]
enabled = false
```
### HTTP
HTTP is required when using the [InfluxDB CLI] or other tools such as Grafana,
thus it should be enabled. When enabling make sure to _also_ enable
authentication:
```
[http]
enabled = true
auth-enabled = true
```
_**Note:** Before you enable authentication, you might want to [create an
admin user](#create-a-new-admin-user)._
### UDP
GitLab writes data to InfluxDB via UDP and thus this must be enabled. Enabling
UDP can be done using the following settings:
```
[[udp]]
enabled = true
bind-address = ":8089"
database = "gitlab"
batch-size = 1000
batch-pending = 5
batch-timeout = "1s"
read-buffer = 209715200
```
This does the following:
1. Enable UDP and bind it to port 8089 for all addresses.
2. Store any data received in the "gitlab" database.
3. Define a batch of points to be 1000 points in size and allow a maximum of
5 batches _or_ flush them automatically after 1 second.
4. Define a UDP read buffer size of 200 MB.
One of the most important settings here is the UDP read buffer size as if this
value is set too low, packets will be dropped. You must also make sure the OS
buffer size is set to the same value, the default value is almost never enough.
To set the OS buffer size to 200 MB, on Linux you can run the following command:
```bash
sysctl -w net.core.rmem_max=209715200
```
To make this permanent, add the following to `/etc/sysctl.conf` and restart the
server:
```bash
net.core.rmem_max=209715200
```
It is **very important** to make sure the buffer sizes are large enough to
handle all data sent to InfluxDB as otherwise you _will_ lose data. The above
buffer sizes are based on the traffic for GitLab.com. Depending on the amount of
traffic, users may be able to use a smaller buffer size, but we highly recommend
using _at least_ 100 MB.
When enabling UDP, users should take care to not expose the port to the public,
as doing so will allow anybody to write data into your InfluxDB database (as
[InfluxDB's UDP protocol][udp] doesn't support authentication). We recommend either
whitelisting the allowed IP addresses/ranges, or setting up a VLAN and only
allowing traffic from members of said VLAN.
## Create a new admin user
If you want to [enable authentication](#http), you might want to [create an
admin user][influx-admin]:
```
influx -execute "CREATE USER jeff WITH PASSWORD '1234' WITH ALL PRIVILEGES"
```
## Create the `gitlab` database
Once you get InfluxDB up and running, you need to create a database for GitLab.
Make sure you have changed the [storage engine](#storage-engine) to `tsm1`
before creating a database.
_**Note:** If you [created an admin user](#create-a-new-admin-user) and enabled
[HTTP authentication](#http), remember to append the username (`-username <username>`)
and password (`-password <password>`) you set earlier to the commands below._
Run the following command to create a database named `gitlab`:
```bash
influx -execute 'CREATE DATABASE gitlab'
```
The name **must** be `gitlab`, do not use any other name.
Next, make sure that the database was successfully created:
```bash
influx -execute 'SHOW DATABASES'
```
The output should be similar to:
```
name: databases
---------------
name
_internal
gitlab
```
That's it! Now your GitLab instance should send data to InfluxDB.
---
Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
- [Grafana Install/Configuration](grafana_configuration.md)
[influxdb-retention]: https://docs.influxdata.com/influxdb/v0.9/query_language/database_management/#retention-policy-management
[influxdb documentation]: https://docs.influxdata.com/influxdb/v0.9/
[influxdb cli]: https://docs.influxdata.com/influxdb/v0.9/tools/shell/
[udp]: https://docs.influxdata.com/influxdb/v0.9/write_protocols/udp/
[influxdb]: https://influxdata.com/time-series-platform/influxdb/
[tsm tree]: https://influxdata.com/blog/new-storage-engine-time-structured-merge-tree/
[tsm1-commit]: https://github.com/influxdata/influxdb/commit/15d723dc77651bac83e09e2b1c94be480966cb0d
[influx-admin]: https://docs.influxdata.com/influxdb/v0.9/administration/authentication_and_authorization/#create-a-new-admin-user
# InfluxDB Schema
The following measurements are currently stored in InfluxDB:
- `PROCESS_file_descriptors`
- `PROCESS_gc_statistics`
- `PROCESS_memory_usage`
- `PROCESS_method_calls`
- `PROCESS_object_counts`
- `PROCESS_transactions`
- `PROCESS_views`
- `events`
Here, `PROCESS` is replaced with either `rails` or `sidekiq` depending on the
process type. In all series, any form of duration is stored in milliseconds.
## PROCESS_file_descriptors
This measurement contains the number of open file descriptors over time. The
value field `value` contains the number of descriptors.
## PROCESS_gc_statistics
This measurement contains Ruby garbage collection statistics such as the amount
of minor/major GC runs (relative to the last sampling interval), the time spent
in garbage collection cycles, and all fields/values returned by `GC.stat`.
## PROCESS_memory_usage
This measurement contains the process' memory usage (in bytes) over time. The
value field `value` contains the number of bytes.
## PROCESS_method_calls
This measurement contains the methods called during a transaction along with
their duration, and a name of the transaction action that invoked the method (if
available). The method call duration is stored in the value field `duration`,
while the method name is stored in the tag `method`. The tag `action` contains
the full name of the transaction action. Both the `method` and `action` fields
are in the following format:
```
ClassName#method_name
```
For example, a method called by the `show` method in the `UsersController` class
would have `action` set to `UsersController#show`.
## PROCESS_object_counts
This measurement is used to store retained Ruby objects (per class) and the
amount of retained objects. The number of objects is stored in the `count` value
field while the class name is stored in the `type` tag.
## PROCESS_transactions
This measurement is used to store basic transaction details such as the time it
took to complete a transaction, how much time was spent in SQL queries, etc. The
following value fields are available:
| Value | Description |
| ----- | ----------- |
| `duration` | The total duration of the transaction |
| `allocated_memory` | The amount of bytes allocated while the transaction was running. This value is only reliable when using single-threaded application servers |
| `method_duration` | The total time spent in method calls |
| `sql_duration` | The total time spent in SQL queries |
| `view_duration` | The total time spent in views |
## PROCESS_views
This measurement is used to store view rendering timings for a transaction. The
following value fields are available:
| Value | Description |
| ----- | ----------- |
| `duration` | The rendering time of the view |
| `view` | The path of the view, relative to the application's root directory |
The `action` tag contains the action name of the transaction that rendered the
view.
## events
This measurement is used to store generic events such as the number of Git
pushes, Emails sent, etc. Each point in this measurement has a single value
field called `count`. The value of this field is simply set to `1`. Each point
also has at least one tag: `event`. This tag's value is set to the event name.
Depending on the event type additional tags may be available as well.
---
Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Configuration](influxdb_configuration.md)
- [Grafana Install/Configuration](grafana_configuration.md)
# GitLab Performance Monitoring
GitLab comes with its own application performance measuring system as of GitLab
8.4, simply called "GitLab Performance Monitoring". GitLab Performance Monitoring is available in both the
Community and Enterprise editions.
Apart from this introduction, you are advised to read through the following
documents in order to understand and properly configure GitLab Performance Monitoring:
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Install/Configuration](influxdb_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
- [Grafana Install/Configuration](grafana_configuration.md)
## Introduction to GitLab Performance Monitoring
GitLab Performance Monitoring makes it possible to measure a wide variety of statistics
including (but not limited to):
- The time it took to complete a transaction (a web request or Sidekiq job).
- The time spent in running SQL queries and rendering HAML views.
- The time spent executing (instrumented) Ruby methods.
- Ruby object allocations, and retained objects in particular.
- System statistics such as the process' memory usage and open file descriptors.
- Ruby garbage collection statistics.
Metrics data is written to [InfluxDB][influxdb] over [UDP][influxdb-udp]. Stored
data can be visualized using [Grafana][grafana] or any other application that
supports reading data from InfluxDB. Alternatively data can be queried using the
InfluxDB CLI.
## Metric Types
Two types of metrics are collected:
1. Transaction specific metrics.
1. Sampled metrics, collected at a certain interval in a separate thread.
### Transaction Metrics
Transaction metrics are metrics that can be associated with a single
transaction. This includes statistics such as the transaction duration, timings
of any executed SQL queries, time spent rendering HAML views, etc. These metrics
are collected for every Rack request and Sidekiq job processed.
### Sampled Metrics
Sampled metrics are metrics that can't be associated with a single transaction.
Examples include garbage collection statistics and retained Ruby objects. These
metrics are collected at a regular interval. This interval is made up out of two
parts:
1. A user defined interval.
1. A randomly generated offset added on top of the interval, the same offset
can't be used twice in a row.
The actual interval can be anywhere between a half of the defined interval and a
half above the interval. For example, for a user defined interval of 15 seconds
the actual interval can be anywhere between 7.5 and 22.5. The interval is
re-generated for every sampling run instead of being generated once and re-used
for the duration of the process' lifetime.
[influxdb]: https://influxdata.com/time-series-platform/influxdb/
[influxdb-udp]: https://docs.influxdata.com/influxdb/v0.9/write_protocols/udp/
[grafana]: http://grafana.org/
# Health Check This document was moved to [user/admin_area/monitoring/health_check](../user/admin_area/monitoring/health_check.md).
> [Introduced][ce-3888] in GitLab 8.8.
GitLab provides a health check endpoint for uptime monitoring on the `health_check` web
endpoint. The health check reports on the overall system status based on the status of
the database connection, the state of the database migrations, and the ability to write
and access the cache. This endpoint can be provided to uptime monitoring services like
[Pingdom][pingdom], [Nagios][nagios-health], and [NewRelic][newrelic-health].
## Access Token
An access token needs to be provided while accessing the health check endpoint. The current
accepted token can be found on the `admin/health_check` page of your GitLab instance.
![access token](img/health_check_token.png)
The access token can be passed as a URL parameter:
```
https://gitlab.example.com/health_check.json?token=ACCESS_TOKEN
```
or as an HTTP header:
```bash
curl --header "TOKEN: ACCESS_TOKEN" https://gitlab.example.com/health_check.json
```
## Using the Endpoint
Once you have the access token, health information can be retrieved as plain text, JSON,
or XML using the `health_check` endpoint:
- `https://gitlab.example.com/health_check?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check.json?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check.xml?token=ACCESS_TOKEN`
You can also ask for the status of specific services:
- `https://gitlab.example.com/health_check/cache.json?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check/database.json?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check/migrations.json?token=ACCESS_TOKEN`
For example, the JSON output of the following health check:
```bash
curl --header "TOKEN: ACCESS_TOKEN" https://gitlab.example.com/health_check.json
```
would be like:
```
{"healthy":true,"message":"success"}
```
## Status
On failure, the endpoint will return a `500` HTTP status code. On success, the endpoint
will return a valid successful HTTP status code, and a `success` message. Ideally your
uptime monitoring should look for the success message.
[ce-3888]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/3888
[pingdom]: https://www.pingdom.com
[nagios-health]: https://nagios-plugins.org/doc/man/check_http.html
[newrelic-health]: https://docs.newrelic.com/docs/alerts/alert-policies/downtime-alerts/availability-monitoring
# GitLab Configuration This document was moved to [administration/monitoring/performance/gitlab_configuration](../administration/monitoring/performance/gitlab_configuration.md).
GitLab Performance Monitoring is disabled by default. To enable it and change any of its
settings, navigate to the Admin area in **Settings > Metrics**
(`/admin/application_settings`).
The minimum required settings you need to set are the InfluxDB host and port.
Make sure _Enable InfluxDB Metrics_ is checked and hit **Save** to save the
changes.
---
![GitLab Performance Monitoring Admin Settings](img/metrics_gitlab_configuration_settings.png)
---
Finally, a restart of all GitLab processes is required for the changes to take
effect:
```bash
# For Omnibus installations
sudo gitlab-ctl restart
# For installations from source
sudo service gitlab restart
```
## Pending Migrations
When any migrations are pending, the metrics are disabled until the migrations
have been performed.
---
Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [InfluxDB Configuration](influxdb_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
- [Grafana Install/Configuration](grafana_configuration.md)
# Grafana Configuration This document was moved to [administration/monitoring/performance/grafana_configuration](../administration/monitoring/performance/grafana_configuration.md).
[Grafana](http://grafana.org/) is a tool that allows you to visualize time
series metrics through graphs and dashboards. It supports several backend
data stores, including InfluxDB. GitLab writes performance data to InfluxDB
and Grafana will allow you to query InfluxDB to display useful graphs.
For the easiest installation and configuration, install Grafana on the same
server as InfluxDB. For larger installations, you may want to split out these
services.
## Installation
Grafana supplies package repositories (Yum/Apt) for easy installation.
See [Grafana installation documentation](http://docs.grafana.org/installation/)
for detailed steps.
> **Note**: Before starting Grafana for the first time, set the admin user
and password in `/etc/grafana/grafana.ini`. Otherwise, the default password
will be `admin`.
## Configuration
Login as the admin user. Expand the menu by clicking the Grafana logo in the
top left corner. Choose 'Data Sources' from the menu. Then, click 'Add new'
in the top bar.
![Grafana empty data source page](img/grafana_data_source_empty.png)
Fill in the configuration details for the InfluxDB data source. Save and
Test Connection to ensure the configuration is correct.
- **Name**: InfluxDB
- **Default**: Checked
- **Type**: InfluxDB 0.9.x (Even if you're using InfluxDB 0.10.x)
- **Url**: https://localhost:8086 (Or the remote URL if you've installed InfluxDB
on a separate server)
- **Access**: proxy
- **Database**: gitlab
- **User**: admin (Or the username configured when setting up InfluxDB)
- **Password**: The password configured when you set up InfluxDB
![Grafana data source configurations](img/grafana_data_source_configuration.png)
## Apply retention policies and create continuous queries
If you intend to import the GitLab provided Grafana dashboards, you will need to
set up the right retention policies and continuous queries. The easiest way of
doing this is by using the [influxdb-management](https://gitlab.com/gitlab-org/influxdb-management)
repository.
To use this repository you must first clone it:
```
git clone https://gitlab.com/gitlab-org/influxdb-management.git
cd influxdb-management
```
Next you must install the required dependencies:
```
gem install bundler
bundle install
```
Now you must configure the repository by first copying `.env.example` to `.env`
and then editing the `.env` file to contain the correct InfluxDB settings. Once
configured you can simply run `bundle exec rake` and the InfluxDB database will
be configured for you.
For more information see the [influxdb-management README](https://gitlab.com/gitlab-org/influxdb-management/blob/master/README.md).
## Import Dashboards
You can now import a set of default dashboards that will give you a good
start on displaying useful information. GitLab has published a set of default
[Grafana dashboards][grafana-dashboards] to get you started. Clone the
repository or download a zip/tarball, then follow these steps to import each
JSON file.
Open the dashboard dropdown menu and click 'Import'
![Grafana dashboard dropdown](img/grafana_dashboard_dropdown.png)
Click 'Choose file' and browse to the location where you downloaded or cloned
the dashboard repository. Pick one of the JSON files to import.
![Grafana dashboard import](img/grafana_dashboard_import.png)
Once the dashboard is imported, be sure to click save icon in the top bar. If
you do not save the dashboard after importing it will be removed when you
navigate away.
![Grafana save icon](img/grafana_save_icon.png)
Repeat this process for each dashboard you wish to import.
Alternatively you can automatically import all the dashboards into your Grafana
instance. See the README of the [Grafana dashboards][grafana-dashboards]
repository for more information on this process.
[grafana-dashboards]: https://gitlab.com/gitlab-org/grafana-dashboards
---
Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Installation/Configuration](influxdb_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
# InfluxDB Configuration This document was moved to [administration/monitoring/performance/influxdb_configuration](../administration/monitoring/performance/influxdb_configuration.md).
The default settings provided by [InfluxDB] are not sufficient for a high traffic
GitLab environment. The settings discussed in this document are based on the
settings GitLab uses for GitLab.com, depending on your own needs you may need to
further adjust them.
If you are intending to run InfluxDB on the same server as GitLab, make sure
you have plenty of RAM since InfluxDB can use quite a bit depending on traffic.
Unless you are going with a budget setup, it's advised to run it separately.
## Requirements
- InfluxDB 0.9.5 or newer
- A fairly modern version of Linux
- At least 4GB of RAM
- At least 10GB of storage for InfluxDB data
Note that the RAM and storage requirements can differ greatly depending on the
amount of data received/stored. To limit the amount of stored data users can
look into [InfluxDB Retention Policies][influxdb-retention].
## Installation
Installing InfluxDB is out of the scope of this document. Please refer to the
[InfluxDB documentation].
## InfluxDB Server Settings
Since InfluxDB has many settings that users may wish to customize themselves
(e.g. what port to run InfluxDB on), we'll only cover the essentials.
The configuration file in question is usually located at
`/etc/influxdb/influxdb.conf`. Whenever you make a change in this file,
InfluxDB needs to be restarted.
### Storage Engine
InfluxDB comes with different storage engines and as of InfluxDB 0.9.5 a new
storage engine is available, called [TSM Tree]. All users **must** use the new
`tsm1` storage engine as this [will be the default engine][tsm1-commit] in
upcoming InfluxDB releases.
Make sure you have the following in your configuration file:
```
[data]
dir = "/var/lib/influxdb/data"
engine = "tsm1"
```
### Admin Panel
Production environments should have the InfluxDB admin panel **disabled**. This
feature can be disabled by adding the following to your InfluxDB configuration
file:
```
[admin]
enabled = false
```
### HTTP
HTTP is required when using the [InfluxDB CLI] or other tools such as Grafana,
thus it should be enabled. When enabling make sure to _also_ enable
authentication:
```
[http]
enabled = true
auth-enabled = true
```
_**Note:** Before you enable authentication, you might want to [create an
admin user](#create-a-new-admin-user)._
### UDP
GitLab writes data to InfluxDB via UDP and thus this must be enabled. Enabling
UDP can be done using the following settings:
```
[[udp]]
enabled = true
bind-address = ":8089"
database = "gitlab"
batch-size = 1000
batch-pending = 5
batch-timeout = "1s"
read-buffer = 209715200
```
This does the following:
1. Enable UDP and bind it to port 8089 for all addresses.
2. Store any data received in the "gitlab" database.
3. Define a batch of points to be 1000 points in size and allow a maximum of
5 batches _or_ flush them automatically after 1 second.
4. Define a UDP read buffer size of 200 MB.
One of the most important settings here is the UDP read buffer size as if this
value is set too low, packets will be dropped. You must also make sure the OS
buffer size is set to the same value, the default value is almost never enough.
To set the OS buffer size to 200 MB, on Linux you can run the following command:
```bash
sysctl -w net.core.rmem_max=209715200
```
To make this permanent, add the following to `/etc/sysctl.conf` and restart the
server:
```bash
net.core.rmem_max=209715200
```
It is **very important** to make sure the buffer sizes are large enough to
handle all data sent to InfluxDB as otherwise you _will_ lose data. The above
buffer sizes are based on the traffic for GitLab.com. Depending on the amount of
traffic, users may be able to use a smaller buffer size, but we highly recommend
using _at least_ 100 MB.
When enabling UDP, users should take care to not expose the port to the public,
as doing so will allow anybody to write data into your InfluxDB database (as
[InfluxDB's UDP protocol][udp] doesn't support authentication). We recommend either
whitelisting the allowed IP addresses/ranges, or setting up a VLAN and only
allowing traffic from members of said VLAN.
## Create a new admin user
If you want to [enable authentication](#http), you might want to [create an
admin user][influx-admin]:
```
influx -execute "CREATE USER jeff WITH PASSWORD '1234' WITH ALL PRIVILEGES"
```
## Create the `gitlab` database
Once you get InfluxDB up and running, you need to create a database for GitLab.
Make sure you have changed the [storage engine](#storage-engine) to `tsm1`
before creating a database.
_**Note:** If you [created an admin user](#create-a-new-admin-user) and enabled
[HTTP authentication](#http), remember to append the username (`-username <username>`)
and password (`-password <password>`) you set earlier to the commands below._
Run the following command to create a database named `gitlab`:
```bash
influx -execute 'CREATE DATABASE gitlab'
```
The name **must** be `gitlab`, do not use any other name.
Next, make sure that the database was successfully created:
```bash
influx -execute 'SHOW DATABASES'
```
The output should be similar to:
```
name: databases
---------------
name
_internal
gitlab
```
That's it! Now your GitLab instance should send data to InfluxDB.
---
Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
- [Grafana Install/Configuration](grafana_configuration.md)
[influxdb-retention]: https://docs.influxdata.com/influxdb/v0.9/query_language/database_management/#retention-policy-management
[influxdb documentation]: https://docs.influxdata.com/influxdb/v0.9/
[influxdb cli]: https://docs.influxdata.com/influxdb/v0.9/tools/shell/
[udp]: https://docs.influxdata.com/influxdb/v0.9/write_protocols/udp/
[influxdb]: https://influxdata.com/time-series-platform/influxdb/
[tsm tree]: https://influxdata.com/blog/new-storage-engine-time-structured-merge-tree/
[tsm1-commit]: https://github.com/influxdata/influxdb/commit/15d723dc77651bac83e09e2b1c94be480966cb0d
[influx-admin]: https://docs.influxdata.com/influxdb/v0.9/administration/authentication_and_authorization/#create-a-new-admin-user
# InfluxDB Schema This document was moved to [administration/monitoring/performance/influxdb_schema](../administration/monitoring/performance/influxdb_schema.md).
The following measurements are currently stored in InfluxDB:
- `PROCESS_file_descriptors`
- `PROCESS_gc_statistics`
- `PROCESS_memory_usage`
- `PROCESS_method_calls`
- `PROCESS_object_counts`
- `PROCESS_transactions`
- `PROCESS_views`
- `events`
Here, `PROCESS` is replaced with either `rails` or `sidekiq` depending on the
process type. In all series, any form of duration is stored in milliseconds.
## PROCESS_file_descriptors
This measurement contains the number of open file descriptors over time. The
value field `value` contains the number of descriptors.
## PROCESS_gc_statistics
This measurement contains Ruby garbage collection statistics such as the amount
of minor/major GC runs (relative to the last sampling interval), the time spent
in garbage collection cycles, and all fields/values returned by `GC.stat`.
## PROCESS_memory_usage
This measurement contains the process' memory usage (in bytes) over time. The
value field `value` contains the number of bytes.
## PROCESS_method_calls
This measurement contains the methods called during a transaction along with
their duration, and a name of the transaction action that invoked the method (if
available). The method call duration is stored in the value field `duration`,
while the method name is stored in the tag `method`. The tag `action` contains
the full name of the transaction action. Both the `method` and `action` fields
are in the following format:
```
ClassName#method_name
```
For example, a method called by the `show` method in the `UsersController` class
would have `action` set to `UsersController#show`.
## PROCESS_object_counts
This measurement is used to store retained Ruby objects (per class) and the
amount of retained objects. The number of objects is stored in the `count` value
field while the class name is stored in the `type` tag.
## PROCESS_transactions
This measurement is used to store basic transaction details such as the time it
took to complete a transaction, how much time was spent in SQL queries, etc. The
following value fields are available:
| Value | Description |
| ----- | ----------- |
| `duration` | The total duration of the transaction |
| `allocated_memory` | The amount of bytes allocated while the transaction was running. This value is only reliable when using single-threaded application servers |
| `method_duration` | The total time spent in method calls |
| `sql_duration` | The total time spent in SQL queries |
| `view_duration` | The total time spent in views |
## PROCESS_views
This measurement is used to store view rendering timings for a transaction. The
following value fields are available:
| Value | Description |
| ----- | ----------- |
| `duration` | The rendering time of the view |
| `view` | The path of the view, relative to the application's root directory |
The `action` tag contains the action name of the transaction that rendered the
view.
## events
This measurement is used to store generic events such as the number of Git
pushes, Emails sent, etc. Each point in this measurement has a single value
field called `count`. The value of this field is simply set to `1`. Each point
also has at least one tag: `event`. This tag's value is set to the event name.
Depending on the event type additional tags may be available as well.
---
Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Configuration](influxdb_configuration.md)
- [Grafana Install/Configuration](grafana_configuration.md)
# GitLab Performance Monitoring This document was moved to [administration/monitoring/performance/introduction](../administration/monitoring/performance/introduction.md).
GitLab comes with its own application performance measuring system as of GitLab
8.4, simply called "GitLab Performance Monitoring". GitLab Performance Monitoring is available in both the
Community and Enterprise editions.
Apart from this introduction, you are advised to read through the following
documents in order to understand and properly configure GitLab Performance Monitoring:
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Install/Configuration](influxdb_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
- [Grafana Install/Configuration](grafana_configuration.md)
## Introduction to GitLab Performance Monitoring
GitLab Performance Monitoring makes it possible to measure a wide variety of statistics
including (but not limited to):
- The time it took to complete a transaction (a web request or Sidekiq job).
- The time spent in running SQL queries and rendering HAML views.
- The time spent executing (instrumented) Ruby methods.
- Ruby object allocations, and retained objects in particular.
- System statistics such as the process' memory usage and open file descriptors.
- Ruby garbage collection statistics.
Metrics data is written to [InfluxDB][influxdb] over [UDP][influxdb-udp]. Stored
data can be visualized using [Grafana][grafana] or any other application that
supports reading data from InfluxDB. Alternatively data can be queried using the
InfluxDB CLI.
## Metric Types
Two types of metrics are collected:
1. Transaction specific metrics.
1. Sampled metrics, collected at a certain interval in a separate thread.
### Transaction Metrics
Transaction metrics are metrics that can be associated with a single
transaction. This includes statistics such as the transaction duration, timings
of any executed SQL queries, time spent rendering HAML views, etc. These metrics
are collected for every Rack request and Sidekiq job processed.
### Sampled Metrics
Sampled metrics are metrics that can't be associated with a single transaction.
Examples include garbage collection statistics and retained Ruby objects. These
metrics are collected at a regular interval. This interval is made up out of two
parts:
1. A user defined interval.
1. A randomly generated offset added on top of the interval, the same offset
can't be used twice in a row.
The actual interval can be anywhere between a half of the defined interval and a
half above the interval. For example, for a user defined interval of 15 seconds
the actual interval can be anywhere between 7.5 and 22.5. The interval is
re-generated for every sampling run instead of being generated once and re-used
for the duration of the process' lifetime.
[influxdb]: https://influxdata.com/time-series-platform/influxdb/
[influxdb-udp]: https://docs.influxdata.com/influxdb/v0.9/write_protocols/udp/
[grafana]: http://grafana.org/
# Health Check
> [Introduced][ce-3888] in GitLab 8.8.
GitLab provides a health check endpoint for uptime monitoring on the `health_check` web
endpoint. The health check reports on the overall system status based on the status of
the database connection, the state of the database migrations, and the ability to write
and access the cache. This endpoint can be provided to uptime monitoring services like
[Pingdom][pingdom], [Nagios][nagios-health], and [NewRelic][newrelic-health].
## Access Token
An access token needs to be provided while accessing the health check endpoint. The current
accepted token can be found on the `admin/health_check` page of your GitLab instance.
![access token](img/health_check_token.png)
The access token can be passed as a URL parameter:
```
https://gitlab.example.com/health_check.json?token=ACCESS_TOKEN
```
or as an HTTP header:
```bash
curl --header "TOKEN: ACCESS_TOKEN" https://gitlab.example.com/health_check.json
```
## Using the Endpoint
Once you have the access token, health information can be retrieved as plain text, JSON,
or XML using the `health_check` endpoint:
- `https://gitlab.example.com/health_check?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check.json?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check.xml?token=ACCESS_TOKEN`
You can also ask for the status of specific services:
- `https://gitlab.example.com/health_check/cache.json?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check/database.json?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check/migrations.json?token=ACCESS_TOKEN`
For example, the JSON output of the following health check:
```bash
curl --header "TOKEN: ACCESS_TOKEN" https://gitlab.example.com/health_check.json
```
would be like:
```
{"healthy":true,"message":"success"}
```
## Status
On failure, the endpoint will return a `500` HTTP status code. On success, the endpoint
will return a valid successful HTTP status code, and a `success` message. Ideally your
uptime monitoring should look for the success message.
[ce-3888]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/3888
[pingdom]: https://www.pingdom.com
[nagios-health]: https://nagios-plugins.org/doc/man/check_http.html
[newrelic-health]: https://docs.newrelic.com/docs/alerts/alert-policies/downtime-alerts/availability-monitoring
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment