Commit 3e3323ed authored by syasonik's avatar syasonik

Better document recovery alert behavior and incident management settings

parent 6b180506
...@@ -267,3 +267,19 @@ any other Markdown text field in GitLab by ...@@ -267,3 +267,19 @@ any other Markdown text field in GitLab by
You can embed both [GitLab-hosted metrics](../metrics/embed.md) and You can embed both [GitLab-hosted metrics](../metrics/embed.md) and
[Grafana metrics](../metrics/embed_grafana.md) in incidents and issue [Grafana metrics](../metrics/embed_grafana.md) in incidents and issue
templates. templates.
### Automatically close incidents via recovery alerts
> - [Introduced for Prometheus Integrations](https://gitlab.com/gitlab-org/gitlab/-/issues/13401) in GitLab 12.5.
> - [Introduced for HTTP Integrations](https://gitlab.com/gitlab-org/gitlab/-/issues/13402) in GitLab 13.4.
With Maintainer or higher [permissions](../../user/permissions.md), you can enable
GitLab to close an incident automatically when a **Recovery Alert** is received:
1. Navigate to **Settings > Operations > Incidents** and expand **Incidents**.
1. Check the **Automatically close associated Incident** checkbox.
1. Click **Save changes**.
When GitLab receives a **Recovery Alert**, it closes the associated incident.
This action is recorded as a system message on the incident indicating that it
was closed automatically by the GitLab Alert bot.
...@@ -97,17 +97,17 @@ to configure alerts for this integration. ...@@ -97,17 +97,17 @@ to configure alerts for this integration.
## Customize the alert payload outside of GitLab ## Customize the alert payload outside of GitLab
For all integration types, you can customize the payload by sending the following For HTTP Endpoints without [custom mappings](#map-fields-in-custom-alerts), you can customize the payload by sending the following
parameters. All fields are optional. If the incoming alert does not contain a value for the `Title` field, a default value of `New: Alert` will be applied. parameters. All fields are optional. If the incoming alert does not contain a value for the `Title` field, a default value of `New: Alert` will be applied.
| Property | Type | Description | | Property | Type | Description |
| ------------------------- | --------------- | ----------- | | ------------------------- | --------------- | ----------- |
| `title` | String | The title of the incident. | | `title` | String | The title of the alert.|
| `description` | String | A high-level summary of the problem. | | `description` | String | A high-level summary of the problem. |
| `start_time` | DateTime | The time of the incident. If none is provided, a timestamp of the issue is used. | | `start_time` | DateTime | The time of the alert. If none is provided, a current time is used. |
| `end_time` | DateTime | For existing alerts only. When provided, the alert is resolved and the associated incident is closed. | | `end_time` | DateTime | The resolution time of the alert. If provided, the alert is resolved. |
| `service` | String | The affected service. | | `service` | String | The affected service. |
| `monitoring_tool` | String | The name of the associated monitoring tool. | | `monitoring_tool` | String | The name of the associated monitoring tool. |
| `hosts` | String or Array | One or more hosts, as to where this incident occurred. | | `hosts` | String or Array | One or more hosts, as to where this incident occurred. |
| `severity` | String | The severity of the alert. Case-insensitive. Can be one of: `critical`, `high`, `medium`, `low`, `info`, `unknown`. Defaults to `critical` if missing or value is not in this list. | | `severity` | String | The severity of the alert. Case-insensitive. Can be one of: `critical`, `high`, `medium`, `low`, `info`, `unknown`. Defaults to `critical` if missing or value is not in this list. |
| `fingerprint` | String or Array | The unique identifier of the alert. This can be used to group occurrences of the same alert. | | `fingerprint` | String or Array | The unique identifier of the alert. This can be used to group occurrences of the same alert. |
...@@ -189,6 +189,17 @@ If the existing alert is already `resolved`, GitLab creates a new alert instead. ...@@ -189,6 +189,17 @@ If the existing alert is already `resolved`, GitLab creates a new alert instead.
![Alert Management List](img/alert_list_v13_1.png) ![Alert Management List](img/alert_list_v13_1.png)
## Recovery alerts
> [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/13402) in GitLab 13.4.
The alert in GitLab will be automatically resolved when an HTTP Endpoint
receives a payload with the end time of the alert set. For HTTP Endpoints
without [custom mappings](#map-fields-in-custom-alerts), the expected
field is `end_time`. With custom mappings, you can select the expected field.
You can also configure the associated [incident to be closed automatically](../incident_management/incidents.md#automatically-close-incidents-via-recovery-alerts) when the alert resolves.
## Link to your Opsgenie Alerts ## Link to your Opsgenie Alerts
> [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab Premium 13.2. > [Introduced](https://gitlab.com/groups/gitlab-org/-/epics/3066) in GitLab Premium 13.2.
......
...@@ -96,7 +96,6 @@ Prometheus server to use the ...@@ -96,7 +96,6 @@ Prometheus server to use the
## Trigger actions from alerts **(ULTIMATE)** ## Trigger actions from alerts **(ULTIMATE)**
> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/4925) in [GitLab Ultimate](https://about.gitlab.com/pricing/) 11.11. > - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/issues/4925) in [GitLab Ultimate](https://about.gitlab.com/pricing/) 11.11.
> - [From GitLab Ultimate 12.5](https://gitlab.com/gitlab-org/gitlab/-/issues/13401), when GitLab receives a recovery alert, it automatically closes the associated issue.
Alerts can be used to trigger actions, like opening an issue automatically Alerts can be used to trigger actions, like opening an issue automatically
(disabled by default since `13.1`). To configure the actions: (disabled by default since `13.1`). To configure the actions:
...@@ -127,10 +126,6 @@ values extracted from the [`alerts` field in webhook payload](https://prometheus ...@@ -127,10 +126,6 @@ values extracted from the [`alerts` field in webhook payload](https://prometheus
- **Low**: `low`, `s4`, `p4`, `warn`, `warning` - **Low**: `low`, `s4`, `p4`, `warn`, `warning`
- **Info**: `info`, `s5`, `p5`, `debug`, `information`, `notice` - **Info**: `info`, `s5`, `p5`, `debug`, `information`, `notice`
When GitLab receives a **Recovery Alert**, it closes the associated issue.
This action is recorded as a system message on the issue indicating that it
was closed automatically by the GitLab Alert bot.
To further customize the issue, you can add labels, mentions, or any other supported To further customize the issue, you can add labels, mentions, or any other supported
[quick action](../../user/project/quick_actions.md) in the selected issue template, [quick action](../../user/project/quick_actions.md) in the selected issue template,
which applies to all incidents. To limit quick actions or other information to which applies to all incidents. To limit quick actions or other information to
...@@ -143,3 +138,12 @@ does not yet exist, it is also created automatically. ...@@ -143,3 +138,12 @@ does not yet exist, it is also created automatically.
If the metric exceeds the threshold of the alert for over 5 minutes, GitLab sends If the metric exceeds the threshold of the alert for over 5 minutes, GitLab sends
an email to all [Maintainers and Owners](../../user/permissions.md#project-members-permissions) an email to all [Maintainers and Owners](../../user/permissions.md#project-members-permissions)
of the project. of the project.
### Recovery alerts
> - [From GitLab Ultimate 12.5](https://gitlab.com/gitlab-org/gitlab/-/issues/13401), when GitLab receives a recovery alert, it automatically closes the associated issue.
The alert in GitLab will be automatically resolved when Prometheus
sends a payload with the field `status` set to `resolved`.
You can also configure the associated [incident to be closed automatically](../incident_management/incidents.md#automatically-close-incidents-via-recovery-alerts) when the alert resolves.
...@@ -356,6 +356,24 @@ to remove a fork relationship. ...@@ -356,6 +356,24 @@ to remove a fork relationship.
## Operations settings ## Operations settings
### Alerts
Configure [alert integrations](../../../operations/incident_management/integrations.md#configuration) to triage and manage critical problems in your application as [alerts](../../../operations/incident_management/alerts.md).
### Incidents
#### Alert integration
Automatically [create](../../../operations/incident_management/incidents.md#create-incidents-automatically), [notify on](../../../operations/incident_management/paging.md#email-notifications), and [resolve](../../../operations/incident_management/incidents.md#automatically-close-incidents-via-recovery-alerts) incidents based on GitLab alerts.
#### PagerDuty integration
[Create incidents in GitLab for each PagerDuty incident](../../../operations/incident_management/incidents.md#create-incidents-via-the-pagerduty-webhook).
#### Incident settings
[Manage Service Level Agreements for incidents](../../../operations/incident_management/incidents.md#service-level-agreement-countdown-timer) with an SLA countdown timer.
### Error Tracking ### Error Tracking
Configure Error Tracking to discover and view [Sentry errors within GitLab](../../../operations/error_tracking.md). Configure Error Tracking to discover and view [Sentry errors within GitLab](../../../operations/error_tracking.md).
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment