Move monitoring/ to new location

6207dc90 · Achilleas Pipinellis · 6602b917 · 6207dc90 · 6207dc90 · 6207dc90
Commit 6207dc90 authored Sep 25, 2016 by Achilleas Pipinellis
20 changed files
--- a/doc/README.md
+++ b/doc/README.md
@@ -46,8 +46,8 @@
 - [Migrate GitLab CI to CE/EE](migrate_ci_to_ce/README.md) Follow this guide to migrate your existing GitLab CI data to GitLab CE/EE.
 - [Git LFS configuration](workflow/lfs/lfs_administration.md)
 - [Housekeeping](administration/housekeeping.md) Keep your Git repository tidy and fast.
- [GitLab Performance Monitoring](monitoring/performance/introduction.md) Configure GitLab and InfluxDB for measuring performance metrics.
+- [GitLab Performance Monitoring](administration/monitoring/performance/introduction.md) Configure GitLab and InfluxDB for measuring performance metrics.
- [Monitoring uptime](monitoring/health_check.md) Check the server status using the health check endpoint.
+- [Monitoring uptime](administration/monitoring/health_check.md) Check the server status using the health check endpoint.
 - [Debugging Tips](administration/troubleshooting/debug.md) Tips to debug problems when things go wrong
 - [Sidekiq Troubleshooting](administration/troubleshooting/sidekiq.md) Debug when Sidekiq appears hung and is not processing jobs.
 - [High Availability](administration/high_availability/README.md) Configure multiple servers for scaling or high availability.

--- a/doc/administration/monitoring/health_check.md
+++ b/doc/administration/monitoring/health_check.md
+# Health Check
+> [Introduced][ce-3888] in GitLab 8.8.
+GitLab provides a health check endpoint for uptime monitoring on the `health_check` web
+endpoint. The health check reports on the overall system status based on the status of
+the database connection, the state of the database migrations, and the ability to write
+and access the cache. This endpoint can be provided to uptime monitoring services like
+[Pingdom][pingdom], [Nagios][nagios-health], and [NewRelic][newrelic-health].
+## Access Token
+An access token needs to be provided while accessing the health check endpoint. The current
+accepted token can be found on the `admin/health_check` page of your GitLab instance.
+![access token](img/health_check_token.png)
+The access token can be passed as a URL parameter:
+```
+https://gitlab.example.com/health_check.json?token=ACCESS_TOKEN
+```
+or as an HTTP header:
+```bash
+curl --header "TOKEN: ACCESS_TOKEN" https://gitlab.example.com/health_check.json
+```
+## Using the Endpoint
+Once you have the access token, health information can be retrieved as plain text, JSON,
+or XML using the `health_check` endpoint:
+- `https://gitlab.example.com/health_check?token=ACCESS_TOKEN`
+- `https://gitlab.example.com/health_check.json?token=ACCESS_TOKEN`
+- `https://gitlab.example.com/health_check.xml?token=ACCESS_TOKEN`
+You can also ask for the status of specific services:
+- `https://gitlab.example.com/health_check/cache.json?token=ACCESS_TOKEN`
+- `https://gitlab.example.com/health_check/database.json?token=ACCESS_TOKEN`
+- `https://gitlab.example.com/health_check/migrations.json?token=ACCESS_TOKEN`
+For example, the JSON output of the following health check:
+```bash
+curl --header "TOKEN: ACCESS_TOKEN" https://gitlab.example.com/health_check.json
+```
+would be like:
+```
+{"healthy":true,"message":"success"}
+```
+## Status
+On failure, the endpoint will return a `500` HTTP status code. On success, the endpoint
+will return a valid successful HTTP status code, and a `success` message. Ideally your
+uptime monitoring should look for the success message.
+[ce-3888]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/3888
+[pingdom]: https://www.pingdom.com
+[nagios-health]: https://nagios-plugins.org/doc/man/check_http.html
+[newrelic-health]: https://docs.newrelic.com/docs/alerts/alert-policies/downtime-alerts/availability-monitoring
--- a/doc/monitoring/img/health_check_token.png
+++ b/doc/monitoring/img/health_check_token.png
--- a/doc/administration/monitoring/performance/gitlab_configuration.md
+++ b/doc/administration/monitoring/performance/gitlab_configuration.md
+# GitLab Configuration
+GitLab Performance Monitoring is disabled by default. To enable it and change any of its
+settings, navigate to the Admin area in **Settings > Metrics**
+(`/admin/application_settings`).
+The minimum required settings you need to set are the InfluxDB host and port.
+Make sure _Enable InfluxDB Metrics_ is checked and hit **Save** to save the
+changes.
+---
+![GitLab Performance Monitoring Admin Settings](img/metrics_gitlab_configuration_settings.png)
+---
+Finally, a restart of all GitLab processes is required for the changes to take
+effect:
+```bash
+# For Omnibus installations
+sudo gitlab-ctl restart
+# For installations from source
+sudo service gitlab restart
+```
+## Pending Migrations
+When any migrations are pending, the metrics are disabled until the migrations
+have been performed.
+---
+Read more on:
+- [Introduction to GitLab Performance Monitoring](introduction.md)
+- [InfluxDB Configuration](influxdb_configuration.md)
+- [InfluxDB Schema](influxdb_schema.md)
+- [Grafana Install/Configuration](grafana_configuration.md)
--- a/doc/administration/monitoring/performance/grafana_configuration.md
+++ b/doc/administration/monitoring/performance/grafana_configuration.md
+# Grafana Configuration
+[Grafana](http://grafana.org/) is a tool that allows you to visualize time
+series metrics through graphs and dashboards. It supports several backend
+data stores, including InfluxDB. GitLab writes performance data to InfluxDB
+and Grafana will allow you to query InfluxDB to display useful graphs.
+For the easiest installation and configuration, install Grafana on the same
+server as InfluxDB. For larger installations, you may want to split out these
+services.
+## Installation
+Grafana supplies package repositories (Yum/Apt) for easy installation.
+See [Grafana installation documentation](http://docs.grafana.org/installation/)
+for detailed steps.
+> **Note**: Before starting Grafana for the first time, set the admin user
+and password in `/etc/grafana/grafana.ini`. Otherwise, the default password
+will be `admin`.
+## Configuration
+Login as the admin user. Expand the menu by clicking the Grafana logo in the
+top left corner. Choose 'Data Sources' from the menu. Then, click 'Add new'
+in the top bar.
+![Grafana empty data source page](img/grafana_data_source_empty.png)
+Fill in the configuration details for the InfluxDB data source. Save and
+Test Connection to ensure the configuration is correct.
+- **Name**: InfluxDB
+- **Default**: Checked
+- **Type**: InfluxDB 0.9.x (Even if you're using InfluxDB 0.10.x)
+- **Url**: https://localhost:8086 (Or the remote URL if you've installed InfluxDB
+on a separate server)
+- **Access**: proxy
+- **Database**: gitlab
+- **User**: admin (Or the username configured when setting up InfluxDB)
+- **Password**: The password configured when you set up InfluxDB
+![Grafana data source configurations](img/grafana_data_source_configuration.png)
+## Apply retention policies and create continuous queries
+If you intend to import the GitLab provided Grafana dashboards, you will need to
+set up the right retention policies and continuous queries. The easiest way of
+doing this is by using the [influxdb-management](https://gitlab.com/gitlab-org/influxdb-management)
+repository.
+To use this repository you must first clone it:
+```
+git clone https://gitlab.com/gitlab-org/influxdb-management.git
+cd influxdb-management
+```
+Next you must install the required dependencies:
+```
+gem install bundler
+bundle install
+```
+Now you must configure the repository by first copying `.env.example` to `.env`
+and then editing the `.env` file to contain the correct InfluxDB settings. Once
+configured you can simply run `bundle exec rake` and the InfluxDB database will
+be configured for you.
+For more information see the [influxdb-management README](https://gitlab.com/gitlab-org/influxdb-management/blob/master/README.md).
+## Import Dashboards
+You can now import a set of default dashboards that will give you a good
+start on displaying useful information. GitLab has published a set of default
+[Grafana dashboards][grafana-dashboards] to get you started. Clone the
+repository or download a zip/tarball, then follow these steps to import each
+JSON file.
+Open the dashboard dropdown menu and click 'Import'
+![Grafana dashboard dropdown](img/grafana_dashboard_dropdown.png)
+Click 'Choose file' and browse to the location where you downloaded or cloned
+the dashboard repository. Pick one of the JSON files to import.
+![Grafana dashboard import](img/grafana_dashboard_import.png)
+Once the dashboard is imported, be sure to click save icon in the top bar. If
+you do not save the dashboard after importing it will be removed when you
+navigate away.
+![Grafana save icon](img/grafana_save_icon.png)
+Repeat this process for each dashboard you wish to import.
+Alternatively you can automatically import all the dashboards into your Grafana
+instance. See the README of the [Grafana dashboards][grafana-dashboards]
+repository for more information on this process.
+[grafana-dashboards]: https://gitlab.com/gitlab-org/grafana-dashboards
+---
+Read more on:
+- [Introduction to GitLab Performance Monitoring](introduction.md)
+- [GitLab Configuration](gitlab_configuration.md)
+- [InfluxDB Installation/Configuration](influxdb_configuration.md)
+- [InfluxDB Schema](influxdb_schema.md)
--- a/doc/administration/monitoring/performance/img/grafana_dashboard_dropdown.png
+++ b/doc/administration/monitoring/performance/img/grafana_dashboard_dropdown.png
--- a/doc/administration/monitoring/performance/img/grafana_dashboard_import.png
+++ b/doc/administration/monitoring/performance/img/grafana_dashboard_import.png
--- a/doc/administration/monitoring/performance/img/grafana_data_source_configuration.png
+++ b/doc/administration/monitoring/performance/img/grafana_data_source_configuration.png
--- a/doc/administration/monitoring/performance/img/grafana_data_source_empty.png
+++ b/doc/administration/monitoring/performance/img/grafana_data_source_empty.png
--- a/doc/administration/monitoring/performance/img/grafana_save_icon.png
+++ b/doc/administration/monitoring/performance/img/grafana_save_icon.png
--- a/doc/administration/monitoring/performance/img/metrics_gitlab_configuration_settings.png
+++ b/doc/administration/monitoring/performance/img/metrics_gitlab_configuration_settings.png
--- a/doc/administration/monitoring/performance/influxdb_configuration.md
+++ b/doc/administration/monitoring/performance/influxdb_configuration.md
+# InfluxDB Configuration
+The default settings provided by [InfluxDB] are not sufficient for a high traffic
+GitLab environment. The settings discussed in this document are based on the
+settings GitLab uses for GitLab.com, depending on your own needs you may need to
+further adjust them.
+If you are intending to run InfluxDB on the same server as GitLab, make sure
+you have plenty of RAM since InfluxDB can use quite a bit depending on traffic.
+Unless you are going with a budget setup, it's advised to run it separately.
+## Requirements
+- InfluxDB 0.9.5 or newer
+- A fairly modern version of Linux
+- At least 4GB of RAM
+- At least 10GB of storage for InfluxDB data
+Note that the RAM and storage requirements can differ greatly depending on the
+amount of data received/stored. To limit the amount of stored data users can
+look into [InfluxDB Retention Policies][influxdb-retention].
+## Installation
+Installing InfluxDB is out of the scope of this document. Please refer to the
+[InfluxDB documentation].
+## InfluxDB Server Settings
+Since InfluxDB has many settings that users may wish to customize themselves
+(e.g. what port to run InfluxDB on), we'll only cover the essentials.
+The configuration file in question is usually located at
+`/etc/influxdb/influxdb.conf`. Whenever you make a change in this file,
+InfluxDB needs to be restarted.
+### Storage Engine
+InfluxDB comes with different storage engines and as of InfluxDB 0.9.5 a new
+storage engine is available, called [TSM Tree]. All users **must** use the new
+`tsm1` storage engine as this [will be the default engine][tsm1-commit] in
+upcoming InfluxDB releases.
+Make sure you have the following in your configuration file:
+```
+[data]
+  dir = "/var/lib/influxdb/data"
+  engine = "tsm1"
+```
+### Admin Panel
+Production environments should have the InfluxDB admin panel **disabled**. This
+feature can be disabled by adding the following to your InfluxDB configuration
+file:
+```
+[admin]
+  enabled = false
+```
+### HTTP
+HTTP is required when using the [InfluxDB CLI] or other tools such as Grafana,
+thus it should be enabled. When enabling make sure to _also_ enable
+authentication:
+```
+[http]
+  enabled = true
+  auth-enabled = true
+```
+_**Note:** Before you enable authentication, you might want to [create an
+admin user](#create-a-new-admin-user)._
+### UDP
+GitLab writes data to InfluxDB via UDP and thus this must be enabled. Enabling
+UDP can be done using the following settings:
+```
+[[udp]]
+  enabled = true
+  bind-address = ":8089"
+  database = "gitlab"
+  batch-size = 1000
+  batch-pending = 5
+  batch-timeout = "1s"
+  read-buffer = 209715200
+```
+This does the following:
+1. Enable UDP and bind it to port 8089 for all addresses.
+2. Store any data received in the "gitlab" database.
+3. Define a batch of points to be 1000 points in size and allow a maximum of
+   5 batches _or_ flush them automatically after 1 second.
+4. Define a UDP read buffer size of 200 MB.
+One of the most important settings here is the UDP read buffer size as if this
+value is set too low, packets will be dropped. You must also make sure the OS
+buffer size is set to the same value, the default value is almost never enough.
+To set the OS buffer size to 200 MB, on Linux you can run the following command:
+```bash
+sysctl -w net.core.rmem_max=209715200
+```
+To make this permanent, add the following to `/etc/sysctl.conf` and restart the
+server:
+```bash
+net.core.rmem_max=209715200
+```
+It is **very important** to make sure the buffer sizes are large enough to
+handle all data sent to InfluxDB as otherwise you _will_ lose data. The above
+buffer sizes are based on the traffic for GitLab.com. Depending on the amount of
+traffic, users may be able to use a smaller buffer size, but we highly recommend
+using _at least_ 100 MB.
+When enabling UDP, users should take care to not expose the port to the public,
+as doing so will allow anybody to write data into your InfluxDB database (as
+[InfluxDB's UDP protocol][udp] doesn't support authentication). We recommend either
+whitelisting the allowed IP addresses/ranges, or setting up a VLAN and only
+allowing traffic from members of said VLAN.
+## Create a new admin user
+If you want to [enable authentication](#http), you might want to [create an
+admin user][influx-admin]:
+```
+influx -execute "CREATE USER jeff WITH PASSWORD '1234' WITH ALL PRIVILEGES"
+```
+## Create the `gitlab` database
+Once you get InfluxDB up and running, you need to create a database for GitLab.
+Make sure you have changed the [storage engine](#storage-engine) to `tsm1`
+before creating a database.
+_**Note:** If you [created an admin user](#create-a-new-admin-user) and enabled
+[HTTP authentication](#http), remember to append the username (`-username <username>`)
+and password (`-password <password>`)  you set earlier to the commands below._
+Run the following command to create a database named `gitlab`:
+```bash
+influx -execute 'CREATE DATABASE gitlab'
+```
+The name **must** be `gitlab`, do not use any other name.
+Next, make sure that the database was successfully created:
+```bash
+influx -execute 'SHOW DATABASES'
+```
+The output should be similar to:
+```
+name: databases
+---------------
+name
+_internal
+gitlab
+```
+That's it! Now your GitLab instance should send data to InfluxDB.
+---
+Read more on:
+- [Introduction to GitLab Performance Monitoring](introduction.md)
+- [GitLab Configuration](gitlab_configuration.md)
+- [InfluxDB Schema](influxdb_schema.md)
+- [Grafana Install/Configuration](grafana_configuration.md)
+[influxdb-retention]: https://docs.influxdata.com/influxdb/v0.9/query_language/database_management/#retention-policy-management
+[influxdb documentation]: https://docs.influxdata.com/influxdb/v0.9/
+[influxdb cli]: https://docs.influxdata.com/influxdb/v0.9/tools/shell/
+[udp]: https://docs.influxdata.com/influxdb/v0.9/write_protocols/udp/
+[influxdb]: https://influxdata.com/time-series-platform/influxdb/
+[tsm tree]: https://influxdata.com/blog/new-storage-engine-time-structured-merge-tree/
+[tsm1-commit]: https://github.com/influxdata/influxdb/commit/15d723dc77651bac83e09e2b1c94be480966cb0d
+[influx-admin]: https://docs.influxdata.com/influxdb/v0.9/administration/authentication_and_authorization/#create-a-new-admin-user
--- a/doc/administration/monitoring/performance/influxdb_schema.md
+++ b/doc/administration/monitoring/performance/influxdb_schema.md
+# InfluxDB Schema
+The following measurements are currently stored in InfluxDB:
+- `PROCESS_file_descriptors`
+- `PROCESS_gc_statistics`
+- `PROCESS_memory_usage`
+- `PROCESS_method_calls`
+- `PROCESS_object_counts`
+- `PROCESS_transactions`
+- `PROCESS_views`
+- `events`
+Here, `PROCESS` is replaced with either `rails` or `sidekiq` depending on the
+process type. In all series, any form of duration is stored in milliseconds.
+## PROCESS_file_descriptors
+This measurement contains the number of open file descriptors over time. The
+value field `value` contains the number of descriptors.
+## PROCESS_gc_statistics
+This measurement contains Ruby garbage collection statistics such as the amount
+of minor/major GC runs (relative to the last sampling interval), the time spent
+in garbage collection cycles, and all fields/values returned by `GC.stat`.
+## PROCESS_memory_usage
+This measurement contains the process' memory usage (in bytes) over time. The
+value field `value` contains the number of bytes.
+## PROCESS_method_calls
+This measurement contains the methods called during a transaction along with
+their duration, and a name of the transaction action that invoked the method (if
+available). The method call duration is stored in the value field `duration`,
+while the method name is stored in the tag `method`. The tag `action` contains
+the full name of the transaction action. Both the `method` and `action` fields
+are in the following format:
+```
+ClassName#method_name
+```
+For example, a method called by the `show` method in the `UsersController` class
+would have `action` set to `UsersController#show`.
+## PROCESS_object_counts
+This measurement is used to store retained Ruby objects (per class) and the
+amount of retained objects. The number of objects is stored in the `count` value
+field while the class name is stored in the `type` tag.
+## PROCESS_transactions
+This measurement is used to store basic transaction details such as the time it
+took to complete a transaction, how much time was spent in SQL queries, etc. The
+following value fields are available:
+| Value | Description |
+| ----- | ----------- |
+| `duration`  | The total duration of the transaction |
+| `allocated_memory` | The amount of bytes allocated while the transaction was running. This value is only reliable when using single-threaded application servers |
+| `method_duration` | The total time spent in method calls |
+| `sql_duration` | The total time spent in SQL queries |
+| `view_duration` | The total time spent in views |
+## PROCESS_views
+This measurement is used to store view rendering timings for a transaction. The
+following value fields are available:
+| Value | Description |
+| ----- | ----------- |
+| `duration` | The rendering time of the view |
+| `view` | The path of the view, relative to the application's root directory |
+The `action` tag contains the action name of the transaction that rendered the
+view.
+## events
+This measurement is used to store generic events such as the number of Git
+pushes, Emails sent, etc. Each point in this measurement has a single value
+field called `count`. The value of this field is simply set to `1`. Each point
+also has at least one tag: `event`. This tag's value is set to the event name.
+Depending on the event type additional tags may be available as well.
+---
+Read more on:
+- [Introduction to GitLab Performance Monitoring](introduction.md)
+- [GitLab Configuration](gitlab_configuration.md)
+- [InfluxDB Configuration](influxdb_configuration.md)
+- [Grafana Install/Configuration](grafana_configuration.md)
--- a/doc/administration/monitoring/performance/introduction.md
+++ b/doc/administration/monitoring/performance/introduction.md
+# GitLab Performance Monitoring
+GitLab comes with its own application performance measuring system as of GitLab
+8.4, simply called "GitLab Performance Monitoring". GitLab Performance Monitoring is available in both the
+Community and Enterprise editions.
+Apart from this introduction, you are advised to read through the following
+documents in order to understand and properly configure GitLab Performance Monitoring:
+- [GitLab Configuration](gitlab_configuration.md)
+- [InfluxDB Install/Configuration](influxdb_configuration.md)
+- [InfluxDB Schema](influxdb_schema.md)
+- [Grafana Install/Configuration](grafana_configuration.md)
+## Introduction to GitLab Performance Monitoring
+GitLab Performance Monitoring makes it possible to measure a wide variety of statistics
+including (but not limited to):
+- The time it took to complete a transaction (a web request or Sidekiq job).
+- The time spent in running SQL queries and rendering HAML views.
+- The time spent executing (instrumented) Ruby methods.
+- Ruby object allocations, and retained objects in particular.
+- System statistics such as the process' memory usage and open file descriptors.
+- Ruby garbage collection statistics.
+Metrics data is written to [InfluxDB][influxdb] over [UDP][influxdb-udp]. Stored
+data can be visualized using [Grafana][grafana] or any other application that
+supports reading data from InfluxDB. Alternatively data can be queried using the
+InfluxDB CLI.
+## Metric Types
+Two types of metrics are collected:
+1. Transaction specific metrics.
+1. Sampled metrics, collected at a certain interval in a separate thread.
+### Transaction Metrics
+Transaction metrics are metrics that can be associated with a single
+transaction. This includes statistics such as the transaction duration, timings
+of any executed SQL queries, time spent rendering HAML views, etc. These metrics
+are collected for every Rack request and Sidekiq job processed.
+### Sampled Metrics
+Sampled metrics are metrics that can't be associated with a single transaction.
+Examples include garbage collection statistics and retained Ruby objects. These
+metrics are collected at a regular interval. This interval is made up out of two
+parts:
+1. A user defined interval.
+1. A randomly generated offset added on top of the interval, the same offset
+   can't be used twice in a row.
+The actual interval can be anywhere between a half of the defined interval and a
+half above the interval. For example, for a user defined interval of 15 seconds
+the actual interval can be anywhere between 7.5 and 22.5. The interval is
+re-generated for every sampling run instead of being generated once and re-used
+for the duration of the process' lifetime.
+[influxdb]: https://influxdata.com/time-series-platform/influxdb/
+[influxdb-udp]: https://docs.influxdata.com/influxdb/v0.9/write_protocols/udp/
+[grafana]: http://grafana.org/
--- a/doc/monitoring/health_check.md
+++ b/doc/monitoring/health_check.md
-# Health Check
+This document was moved to [administration/monitoring/health_check](../administration/monitoring/health_check.md).
-> [Introduced][ce-3888] in GitLab 8.8.
-GitLab provides a health check endpoint for uptime monitoring on the `health_check` web
-endpoint. The health check reports on the overall system status based on the status of
-the database connection, the state of the database migrations, and the ability to write
-and access the cache. This endpoint can be provided to uptime monitoring services like
-[Pingdom][pingdom], [Nagios][nagios-health], and [NewRelic][newrelic-health].
-## Access Token
-An access token needs to be provided while accessing the health check endpoint. The current
-accepted token can be found on the `admin/health_check` page of your GitLab instance.
-![access token](img/health_check_token.png)
-The access token can be passed as a URL parameter:
-```
-https://gitlab.example.com/health_check.json?token=ACCESS_TOKEN
-```
-or as an HTTP header:
-```bash
-curl --header "TOKEN: ACCESS_TOKEN" https://gitlab.example.com/health_check.json
-```
-## Using the Endpoint
-Once you have the access token, health information can be retrieved as plain text, JSON,
-or XML using the `health_check` endpoint:
- `https://gitlab.example.com/health_check?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check.json?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check.xml?token=ACCESS_TOKEN`
-You can also ask for the status of specific services:
- `https://gitlab.example.com/health_check/cache.json?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check/database.json?token=ACCESS_TOKEN`
- `https://gitlab.example.com/health_check/migrations.json?token=ACCESS_TOKEN`
-For example, the JSON output of the following health check:
-```bash
-curl --header "TOKEN: ACCESS_TOKEN" https://gitlab.example.com/health_check.json
-```
-would be like:
-```
-{"healthy":true,"message":"success"}
-```
-## Status
-On failure, the endpoint will return a `500` HTTP status code. On success, the endpoint
-will return a valid successful HTTP status code, and a `success` message. Ideally your
-uptime monitoring should look for the success message.
-[ce-3888]: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/3888
-[pingdom]: https://www.pingdom.com
-[nagios-health]: https://nagios-plugins.org/doc/man/check_http.html
-[newrelic-health]: https://docs.newrelic.com/docs/alerts/alert-policies/downtime-alerts/availability-monitoring
--- a/doc/monitoring/performance/gitlab_configuration.md
+++ b/doc/monitoring/performance/gitlab_configuration.md
-# GitLab Configuration
+This document was moved to [administration/monitoring/performance/gitlab_configuration](../administration/monitoring/performance/gitlab_configuration.md).
-GitLab Performance Monitoring is disabled by default. To enable it and change any of its
-settings, navigate to the Admin area in **Settings > Metrics**
-(`/admin/application_settings`).
-The minimum required settings you need to set are the InfluxDB host and port.
-Make sure _Enable InfluxDB Metrics_ is checked and hit **Save** to save the
-changes.
---
-![GitLab Performance Monitoring Admin Settings](img/metrics_gitlab_configuration_settings.png)
---
-Finally, a restart of all GitLab processes is required for the changes to take
-effect:
-```bash
-# For Omnibus installations
-sudo gitlab-ctl restart
-# For installations from source
-sudo service gitlab restart
-```
-## Pending Migrations
-When any migrations are pending, the metrics are disabled until the migrations
-have been performed.
---
-Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [InfluxDB Configuration](influxdb_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
- [Grafana Install/Configuration](grafana_configuration.md)
--- a/doc/monitoring/performance/grafana_configuration.md
+++ b/doc/monitoring/performance/grafana_configuration.md
-# Grafana Configuration
+This document was moved to [administration/monitoring/performance/grafana_configuration](../administration/monitoring/performance/grafana_configuration.md).
-[Grafana](http://grafana.org/) is a tool that allows you to visualize time
-series metrics through graphs and dashboards. It supports several backend
-data stores, including InfluxDB. GitLab writes performance data to InfluxDB
-and Grafana will allow you to query InfluxDB to display useful graphs.
-For the easiest installation and configuration, install Grafana on the same
-server as InfluxDB. For larger installations, you may want to split out these
-services.
-## Installation
-Grafana supplies package repositories (Yum/Apt) for easy installation.
-See [Grafana installation documentation](http://docs.grafana.org/installation/)
-for detailed steps.
-> **Note**: Before starting Grafana for the first time, set the admin user
-and password in `/etc/grafana/grafana.ini`. Otherwise, the default password
-will be `admin`.
-## Configuration
-Login as the admin user. Expand the menu by clicking the Grafana logo in the
-top left corner. Choose 'Data Sources' from the menu. Then, click 'Add new'
-in the top bar.
-![Grafana empty data source page](img/grafana_data_source_empty.png)
-Fill in the configuration details for the InfluxDB data source. Save and
-Test Connection to ensure the configuration is correct.
- **Name**: InfluxDB
- **Default**: Checked
- **Type**: InfluxDB 0.9.x (Even if you're using InfluxDB 0.10.x)
- **Url**: https://localhost:8086 (Or the remote URL if you've installed InfluxDB
-on a separate server)
- **Access**: proxy
- **Database**: gitlab
- **User**: admin (Or the username configured when setting up InfluxDB)
- **Password**: The password configured when you set up InfluxDB
-![Grafana data source configurations](img/grafana_data_source_configuration.png)
-## Apply retention policies and create continuous queries
-If you intend to import the GitLab provided Grafana dashboards, you will need to
-set up the right retention policies and continuous queries. The easiest way of
-doing this is by using the [influxdb-management](https://gitlab.com/gitlab-org/influxdb-management)
-repository.
-To use this repository you must first clone it:
-```
-git clone https://gitlab.com/gitlab-org/influxdb-management.git
-cd influxdb-management
-```
-Next you must install the required dependencies:
-```
-gem install bundler
-bundle install
-```
-Now you must configure the repository by first copying `.env.example` to `.env`
-and then editing the `.env` file to contain the correct InfluxDB settings. Once
-configured you can simply run `bundle exec rake` and the InfluxDB database will
-be configured for you.
-For more information see the [influxdb-management README](https://gitlab.com/gitlab-org/influxdb-management/blob/master/README.md).
-## Import Dashboards
-You can now import a set of default dashboards that will give you a good
-start on displaying useful information. GitLab has published a set of default
-[Grafana dashboards][grafana-dashboards] to get you started. Clone the
-repository or download a zip/tarball, then follow these steps to import each
-JSON file.
-Open the dashboard dropdown menu and click 'Import'
-![Grafana dashboard dropdown](img/grafana_dashboard_dropdown.png)
-Click 'Choose file' and browse to the location where you downloaded or cloned
-the dashboard repository. Pick one of the JSON files to import.
-![Grafana dashboard import](img/grafana_dashboard_import.png)
-Once the dashboard is imported, be sure to click save icon in the top bar. If
-you do not save the dashboard after importing it will be removed when you
-navigate away.
-![Grafana save icon](img/grafana_save_icon.png)
-Repeat this process for each dashboard you wish to import.
-Alternatively you can automatically import all the dashboards into your Grafana
-instance. See the README of the [Grafana dashboards][grafana-dashboards]
-repository for more information on this process.
-[grafana-dashboards]: https://gitlab.com/gitlab-org/grafana-dashboards
---
-Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Installation/Configuration](influxdb_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
--- a/doc/monitoring/performance/influxdb_configuration.md
+++ b/doc/monitoring/performance/influxdb_configuration.md
-# InfluxDB Configuration
+This document was moved to [administration/monitoring/performance/influxdb_configuration](../administration/monitoring/performance/influxdb_configuration.md).
-The default settings provided by [InfluxDB] are not sufficient for a high traffic
-GitLab environment. The settings discussed in this document are based on the
-settings GitLab uses for GitLab.com, depending on your own needs you may need to
-further adjust them.
-If you are intending to run InfluxDB on the same server as GitLab, make sure
-you have plenty of RAM since InfluxDB can use quite a bit depending on traffic.
-Unless you are going with a budget setup, it's advised to run it separately.
-## Requirements
- InfluxDB 0.9.5 or newer
- A fairly modern version of Linux
- At least 4GB of RAM
- At least 10GB of storage for InfluxDB data
-Note that the RAM and storage requirements can differ greatly depending on the
-amount of data received/stored. To limit the amount of stored data users can
-look into [InfluxDB Retention Policies][influxdb-retention].
-## Installation
-Installing InfluxDB is out of the scope of this document. Please refer to the
-[InfluxDB documentation].
-## InfluxDB Server Settings
-Since InfluxDB has many settings that users may wish to customize themselves
-(e.g. what port to run InfluxDB on), we'll only cover the essentials.
-The configuration file in question is usually located at
-`/etc/influxdb/influxdb.conf`. Whenever you make a change in this file,
-InfluxDB needs to be restarted.
-### Storage Engine
-InfluxDB comes with different storage engines and as of InfluxDB 0.9.5 a new
-storage engine is available, called [TSM Tree]. All users **must** use the new
-`tsm1` storage engine as this [will be the default engine][tsm1-commit] in
-upcoming InfluxDB releases.
-Make sure you have the following in your configuration file:
-```
-[data]
-  dir = "/var/lib/influxdb/data"
-  engine = "tsm1"
-```
-### Admin Panel
-Production environments should have the InfluxDB admin panel **disabled**. This
-feature can be disabled by adding the following to your InfluxDB configuration
-file:
-```
-[admin]
-  enabled = false
-```
-### HTTP
-HTTP is required when using the [InfluxDB CLI] or other tools such as Grafana,
-thus it should be enabled. When enabling make sure to _also_ enable
-authentication:
-```
-[http]
-  enabled = true
-  auth-enabled = true
-```
-_**Note:** Before you enable authentication, you might want to [create an
-admin user](#create-a-new-admin-user)._
-### UDP
-GitLab writes data to InfluxDB via UDP and thus this must be enabled. Enabling
-UDP can be done using the following settings:
-```
-[[udp]]
-  enabled = true
-  bind-address = ":8089"
-  database = "gitlab"
-  batch-size = 1000
-  batch-pending = 5
-  batch-timeout = "1s"
-  read-buffer = 209715200
-```
-This does the following:
-1. Enable UDP and bind it to port 8089 for all addresses.
-2. Store any data received in the "gitlab" database.
-3. Define a batch of points to be 1000 points in size and allow a maximum of
-   5 batches _or_ flush them automatically after 1 second.
-4. Define a UDP read buffer size of 200 MB.
-One of the most important settings here is the UDP read buffer size as if this
-value is set too low, packets will be dropped. You must also make sure the OS
-buffer size is set to the same value, the default value is almost never enough.
-To set the OS buffer size to 200 MB, on Linux you can run the following command:
-```bash
-sysctl -w net.core.rmem_max=209715200
-```
-To make this permanent, add the following to `/etc/sysctl.conf` and restart the
-server:
-```bash
-net.core.rmem_max=209715200
-```
-It is **very important** to make sure the buffer sizes are large enough to
-handle all data sent to InfluxDB as otherwise you _will_ lose data. The above
-buffer sizes are based on the traffic for GitLab.com. Depending on the amount of
-traffic, users may be able to use a smaller buffer size, but we highly recommend
-using _at least_ 100 MB.
-When enabling UDP, users should take care to not expose the port to the public,
-as doing so will allow anybody to write data into your InfluxDB database (as
-[InfluxDB's UDP protocol][udp] doesn't support authentication). We recommend either
-whitelisting the allowed IP addresses/ranges, or setting up a VLAN and only
-allowing traffic from members of said VLAN.
-## Create a new admin user
-If you want to [enable authentication](#http), you might want to [create an
-admin user][influx-admin]:
-```
-influx -execute "CREATE USER jeff WITH PASSWORD '1234' WITH ALL PRIVILEGES"
-```
-## Create the `gitlab` database
-Once you get InfluxDB up and running, you need to create a database for GitLab.
-Make sure you have changed the [storage engine](#storage-engine) to `tsm1`
-before creating a database.
-_**Note:** If you [created an admin user](#create-a-new-admin-user) and enabled
-[HTTP authentication](#http), remember to append the username (`-username <username>`)
-and password (`-password <password>`)  you set earlier to the commands below._
-Run the following command to create a database named `gitlab`:
-```bash
-influx -execute 'CREATE DATABASE gitlab'
-```
-The name **must** be `gitlab`, do not use any other name.
-Next, make sure that the database was successfully created:
-```bash
-influx -execute 'SHOW DATABASES'
-```
-The output should be similar to:
-```
-name: databases
---------------
-name
-_internal
-gitlab
-```
-That's it! Now your GitLab instance should send data to InfluxDB.
---
-Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
- [Grafana Install/Configuration](grafana_configuration.md)
-[influxdb-retention]: https://docs.influxdata.com/influxdb/v0.9/query_language/database_management/#retention-policy-management
-[influxdb documentation]: https://docs.influxdata.com/influxdb/v0.9/
-[influxdb cli]: https://docs.influxdata.com/influxdb/v0.9/tools/shell/
-[udp]: https://docs.influxdata.com/influxdb/v0.9/write_protocols/udp/
-[influxdb]: https://influxdata.com/time-series-platform/influxdb/
-[tsm tree]: https://influxdata.com/blog/new-storage-engine-time-structured-merge-tree/
-[tsm1-commit]: https://github.com/influxdata/influxdb/commit/15d723dc77651bac83e09e2b1c94be480966cb0d
-[influx-admin]: https://docs.influxdata.com/influxdb/v0.9/administration/authentication_and_authorization/#create-a-new-admin-user
--- a/doc/monitoring/performance/influxdb_schema.md
+++ b/doc/monitoring/performance/influxdb_schema.md
-# InfluxDB Schema
+This document was moved to [administration/monitoring/performance/influxdb_schema](../administration/monitoring/performance/influxdb_schema.md).
-The following measurements are currently stored in InfluxDB:
- `PROCESS_file_descriptors`
- `PROCESS_gc_statistics`
- `PROCESS_memory_usage`
- `PROCESS_method_calls`
- `PROCESS_object_counts`
- `PROCESS_transactions`
- `PROCESS_views`
- `events`
-Here, `PROCESS` is replaced with either `rails` or `sidekiq` depending on the
-process type. In all series, any form of duration is stored in milliseconds.
-## PROCESS_file_descriptors
-This measurement contains the number of open file descriptors over time. The
-value field `value` contains the number of descriptors.
-## PROCESS_gc_statistics
-This measurement contains Ruby garbage collection statistics such as the amount
-of minor/major GC runs (relative to the last sampling interval), the time spent
-in garbage collection cycles, and all fields/values returned by `GC.stat`.
-## PROCESS_memory_usage
-This measurement contains the process' memory usage (in bytes) over time. The
-value field `value` contains the number of bytes.
-## PROCESS_method_calls
-This measurement contains the methods called during a transaction along with
-their duration, and a name of the transaction action that invoked the method (if
-available). The method call duration is stored in the value field `duration`,
-while the method name is stored in the tag `method`. The tag `action` contains
-the full name of the transaction action. Both the `method` and `action` fields
-are in the following format:
-```
-ClassName#method_name
-```
-For example, a method called by the `show` method in the `UsersController` class
-would have `action` set to `UsersController#show`.
-## PROCESS_object_counts
-This measurement is used to store retained Ruby objects (per class) and the
-amount of retained objects. The number of objects is stored in the `count` value
-field while the class name is stored in the `type` tag.
-## PROCESS_transactions
-This measurement is used to store basic transaction details such as the time it
-took to complete a transaction, how much time was spent in SQL queries, etc. The
-following value fields are available:
-| Value | Description |
-| ----- | ----------- |
-| `duration`  | The total duration of the transaction |
-| `allocated_memory` | The amount of bytes allocated while the transaction was running. This value is only reliable when using single-threaded application servers |
-| `method_duration` | The total time spent in method calls |
-| `sql_duration` | The total time spent in SQL queries |
-| `view_duration` | The total time spent in views |
-## PROCESS_views
-This measurement is used to store view rendering timings for a transaction. The
-following value fields are available:
-| Value | Description |
-| ----- | ----------- |
-| `duration` | The rendering time of the view |
-| `view` | The path of the view, relative to the application's root directory |
-The `action` tag contains the action name of the transaction that rendered the
-view.
-## events
-This measurement is used to store generic events such as the number of Git
-pushes, Emails sent, etc. Each point in this measurement has a single value
-field called `count`. The value of this field is simply set to `1`. Each point
-also has at least one tag: `event`. This tag's value is set to the event name.
-Depending on the event type additional tags may be available as well.
---
-Read more on:
- [Introduction to GitLab Performance Monitoring](introduction.md)
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Configuration](influxdb_configuration.md)
- [Grafana Install/Configuration](grafana_configuration.md)
--- a/doc/monitoring/performance/introduction.md
+++ b/doc/monitoring/performance/introduction.md
-# GitLab Performance Monitoring
+This document was moved to [administration/monitoring/performance/introduction](../administration/monitoring/performance/introduction.md).
-GitLab comes with its own application performance measuring system as of GitLab
-8.4, simply called "GitLab Performance Monitoring". GitLab Performance Monitoring is available in both the
-Community and Enterprise editions.
-Apart from this introduction, you are advised to read through the following
-documents in order to understand and properly configure GitLab Performance Monitoring:
- [GitLab Configuration](gitlab_configuration.md)
- [InfluxDB Install/Configuration](influxdb_configuration.md)
- [InfluxDB Schema](influxdb_schema.md)
- [Grafana Install/Configuration](grafana_configuration.md)
-## Introduction to GitLab Performance Monitoring
-GitLab Performance Monitoring makes it possible to measure a wide variety of statistics
-including (but not limited to):
- The time it took to complete a transaction (a web request or Sidekiq job).
- The time spent in running SQL queries and rendering HAML views.
- The time spent executing (instrumented) Ruby methods.
- Ruby object allocations, and retained objects in particular.
- System statistics such as the process' memory usage and open file descriptors.
- Ruby garbage collection statistics.
-Metrics data is written to [InfluxDB][influxdb] over [UDP][influxdb-udp]. Stored
-data can be visualized using [Grafana][grafana] or any other application that
-supports reading data from InfluxDB. Alternatively data can be queried using the
-InfluxDB CLI.
-## Metric Types
-Two types of metrics are collected:
-1. Transaction specific metrics.
-1. Sampled metrics, collected at a certain interval in a separate thread.
-### Transaction Metrics
-Transaction metrics are metrics that can be associated with a single
-transaction. This includes statistics such as the transaction duration, timings
-of any executed SQL queries, time spent rendering HAML views, etc. These metrics
-are collected for every Rack request and Sidekiq job processed.
-### Sampled Metrics
-Sampled metrics are metrics that can't be associated with a single transaction.
-Examples include garbage collection statistics and retained Ruby objects. These
-metrics are collected at a regular interval. This interval is made up out of two
-parts:
-1. A user defined interval.
-1. A randomly generated offset added on top of the interval, the same offset
-   can't be used twice in a row.
-The actual interval can be anywhere between a half of the defined interval and a
-half above the interval. For example, for a user defined interval of 15 seconds
-the actual interval can be anywhere between 7.5 and 22.5. The interval is
-re-generated for every sampling run instead of being generated once and re-used
-for the duration of the process' lifetime.
-[influxdb]: https://influxdata.com/time-series-platform/influxdb/
-[influxdb-udp]: https://docs.influxdata.com/influxdb/v0.9/write_protocols/udp/
-[grafana]: http://grafana.org/