Commit 5f51a17c authored by Mikolaj Wawrzyniak's avatar Mikolaj Wawrzyniak

Add time frame attribute to aggregated metrics

In order to make aggregates more flexible, and as well to avoid
calculating not needed data, we should add time frame attribute
to aggregate definition.
parent 772b9df8
......@@ -934,6 +934,10 @@ To add data for aggregated metrics into Usage Ping payload you should add corres
- `operator`: Operator that defines how the aggregated metric data is counted. Available operators are:
- `OR`: Removes duplicates and counts all entries that triggered any of listed events.
- `AND`: Removes duplicates and counts all elements that were observed triggering all of following events.
- `time_frame`: One or more valid time frames. Use these to limit the data included in aggregated metric to events within a specific date-range. Valid time frames are:
- `7d`: Last seven days of data.
- `28d`: Last twenty eight days of data.
- `all`: All historical data, only available for `database` sourced aggregated metrics.
- `source`: Data source used to collect all events data included in aggregated metric. Valid data sources are:
- [`database`](#database-sourced-aggregated-metrics)
- [`redis`](#redis-sourced-aggregated-metrics)
......@@ -949,18 +953,24 @@ To add data for aggregated metrics into Usage Ping payload you should add corres
Example aggregated metric entries:
```yaml
- name: product_analytics_test_metrics_union_redis_sourced
- name: example_metrics_union
operator: OR
events: ['i_search_total', 'i_search_advanced', 'i_search_paid']
source: redis
- name: product_analytics_test_metrics_intersection_with_feautre_flag_database_sourced
time_frame:
- 7d
- 28d
- name: example_metrics_intersection
operator: AND
source: database
time_frame:
- 28d
- all
events: ['dependency_scanning_pipeline_all_time', 'container_scanning_pipeline_all_time']
feature_flag: example_aggregated_metric
```
Aggregated metrics are added under `aggregated_metrics` key in both `counts_weekly` and `counts_monthly` top level keys in Usage Ping payload.
Aggregated metrics collected in `7d` and `28d` time frames are added into Usage Ping payload under the `aggregated_metrics` sub-key in the `counts_weekly` and `counts_monthly` top level keys.
```ruby
{
......@@ -973,14 +983,35 @@ Aggregated metrics are added under `aggregated_metrics` key in both `counts_week
:project_snippets => 407,
:promoted_issues => 719,
:aggregated_metrics => {
:product_analytics_test_metrics_union => 7,
:product_analytics_test_metrics_intersection_with_feautre_flag => 2
:example_metrics_union => 7,
:example_metrics_intersection => 2
},
:snippets => 2513
}
}
```
Aggregated metrics for `all` time frame are present in the `count` top level key, with the `aggregate_` prefix added to their name.
For example:
`example_metrics_intersection`
Becomes:
`counts.aggregate_example_metrics_intersection`
```ruby
{
:counts => {
:deployments => 11003,
:successful_deployments => 178,
:failed_deployments => 1275,
:aggregate_example_metrics_intersection => 12
}
}
```
### Redis sourced aggregated metrics
> - [Introduced](https://gitlab.com/gitlab-org/gitlab/-/merge_requests/45979) in GitLab 13.6.
......@@ -992,6 +1023,7 @@ you must fulfill the following requirements:
[`known_events/*.yml`](#known-events-are-added-automatically-in-usage-data-payload) files.
1. All events listed at `events` attribute must have the same `redis_slot` attribute.
1. All events listed at `events` attribute must have the same `aggregation` attribute.
1. `time_frame` does not include `all` value, which is unavailable for Redis sourced aggregated metrics.
### Database sourced aggregated metrics
......@@ -1051,17 +1083,24 @@ end
#### Add new aggregated metric definition
After all metrics are persisted, you can add an aggregated metric definition at
[`aggregated_metrics/`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data_counters/aggregated_metrics/). When adding definitions for metrics names listed in the
`events:` attribute, use the same names you passed in the `metric_name` argument
while persisting metrics in previous step.
[`aggregated_metrics/`](https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/usage_data_counters/aggregated_metrics/).
To declare the aggregate of metrics collected with [Estimated Batch Counters](#estimated-batch-counters),
you must fulfill the following requirements:
- Metrics names listed in the `events:` attribute, have to use the same names you passed in the `metric_name` argument while persisting metrics in previous step.
- Every metric listed in the `events:` attribute, has to be persisted for **every** selected `time_frame:` value.
Example definition:
```yaml
- name: product_analytics_test_metrics_intersection_database_sourced
- name: example_metrics_intersection_database_sourced
operator: AND
source: database
events: ['dependency_scanning_pipeline', 'container_scanning_pipeline']
time_frame:
- 28d
- all
```
## Example Usage Ping payload
......
......@@ -11,6 +11,7 @@ module Gitlab
AggregatedMetricError = Class.new(StandardError)
UnknownAggregationOperator = Class.new(AggregatedMetricError)
UnknownAggregationSource = Class.new(AggregatedMetricError)
DisallowedAggregationTimeFrame = Class.new(AggregatedMetricError)
DATABASE_SOURCE = 'database'
REDIS_SOURCE = 'redis'
......@@ -30,25 +31,38 @@ module Gitlab
@recorded_at = recorded_at
end
def all_time_data
aggregated_metrics_data(start_date: nil, end_date: nil, time_frame: Gitlab::Utils::UsageData::ALL_TIME_TIME_FRAME_NAME)
end
def monthly_data
aggregated_metrics_data(**monthly_time_range)
aggregated_metrics_data(**monthly_time_range.merge(time_frame: Gitlab::Utils::UsageData::TWENTY_EIGHT_DAYS_TIME_FRAME_NAME))
end
def weekly_data
aggregated_metrics_data(**weekly_time_range)
aggregated_metrics_data(**weekly_time_range.merge(time_frame: Gitlab::Utils::UsageData::SEVEN_DAYS_TIME_FRAME_NAME))
end
private
attr_accessor :aggregated_metrics, :recorded_at
def aggregated_metrics_data(start_date:, end_date:)
def aggregated_metrics_data(start_date:, end_date:, time_frame:)
aggregated_metrics.each_with_object({}) do |aggregation, data|
next if aggregation[:feature_flag] && Feature.disabled?(aggregation[:feature_flag], default_enabled: :yaml, type: :development)
next unless aggregation[:time_frame].include?(time_frame)
case aggregation[:source]
when REDIS_SOURCE
if time_frame == Gitlab::Utils::UsageData::ALL_TIME_TIME_FRAME_NAME
data[aggregation[:name]] = Gitlab::Utils::UsageData::FALLBACK
Gitlab::ErrorTracking
.track_and_raise_for_dev_exception(
DisallowedAggregationTimeFrame.new("Aggregation time frame: 'all' is not allowed for aggregation with source: '#{REDIS_SOURCE}'")
)
else
data[aggregation[:name]] = calculate_count_for_aggregation(aggregation: aggregation, start_date: start_date, end_date: end_date)
end
when DATABASE_SOURCE
next unless Feature.enabled?('database_sourced_aggregated_metrics', default_enabled: false, type: :development)
......
......@@ -55,15 +55,15 @@ module Gitlab
end
def time_period_to_human_name(time_period)
return Gitlab::Utils::UsageData::ALL_TIME_PERIOD_HUMAN_NAME if time_period.blank?
return Gitlab::Utils::UsageData::ALL_TIME_TIME_FRAME_NAME if time_period.blank?
start_date = time_period.first.to_date
end_date = time_period.last.to_date
if (end_date - start_date).to_i > 7
Gitlab::Utils::UsageData::MONTHLY_PERIOD_HUMAN_NAME
Gitlab::Utils::UsageData::TWENTY_EIGHT_DAYS_TIME_FRAME_NAME
else
Gitlab::Utils::UsageData::WEEKLY_PERIOD_HUMAN_NAME
Gitlab::Utils::UsageData::SEVEN_DAYS_TIME_FRAME_NAME
end
end
end
......
......@@ -60,6 +60,7 @@ module Gitlab
.merge(compliance_unique_visits_data)
.merge(search_unique_visits_data)
.merge(redis_hll_counters)
.deep_merge(aggregated_metrics_data)
end
end
......@@ -224,8 +225,7 @@ module Gitlab
project_snippets: count(ProjectSnippet.where(last_28_days_time_period)),
projects_with_alerts_created: distinct_count(::AlertManagement::Alert.where(last_28_days_time_period), :project_id)
}.merge(
snowplow_event_counts(last_28_days_time_period(column: :collector_tstamp)),
aggregated_metrics_monthly
snowplow_event_counts(last_28_days_time_period(column: :collector_tstamp))
).tap do |data|
data[:snippets] = add(data[:personal_snippets], data[:project_snippets])
end
......@@ -250,10 +250,7 @@ module Gitlab
def system_usage_data_weekly
{
counts_weekly: {
}.merge(
aggregated_metrics_weekly
)
counts_weekly: {}
}
end
......@@ -713,15 +710,13 @@ module Gitlab
{ redis_hll_counters: ::Gitlab::UsageDataCounters::HLLRedisCounter.unique_events_data }
end
def aggregated_metrics_monthly
{
aggregated_metrics: aggregated_metrics.monthly_data
}
end
def aggregated_metrics_weekly
def aggregated_metrics_data
{
aggregated_metrics: aggregated_metrics.weekly_data
counts_weekly: { aggregated_metrics: aggregated_metrics.weekly_data },
counts_monthly: { aggregated_metrics: aggregated_metrics.monthly_data },
counts: aggregated_metrics
.all_time_data
.to_h { |key, value| ["aggregate_#{key}".to_sym, value.round] }
}
end
......
......@@ -11,6 +11,7 @@
operator: OR
feature_flag: usage_data_code_review_aggregation
source: redis
time_frame: [7d, 28d]
events: [
'i_code_review_user_single_file_diffs',
'i_code_review_user_create_mr',
......@@ -54,6 +55,7 @@
operator: OR
feature_flag: usage_data_code_review_aggregation
source: redis
time_frame: [7d, 28d]
events: [
'i_code_review_user_single_file_diffs',
'i_code_review_user_create_mr',
......@@ -96,6 +98,7 @@
operator: OR
feature_flag: usage_data_code_review_aggregation
source: redis
time_frame: [7d, 28d]
events: [
'i_code_review_user_vs_code_api_request'
]
......@@ -7,6 +7,10 @@
# source: defines which datasource will be used to locate events that should be included in aggregated metric. Valid values are:
# - database
# - redis
# time_frame: defines time frames for aggregated metrics:
# - 7d - last 7 days
# - 28d - last 28 days
# - all - all historical available data, this time frame is not available for redis source
# feature_flag: name of development feature flag that will be checked before metrics aggregation is performed.
# Corresponding feature flag should have `default_enabled` attribute set to `false`.
# This attribute is OPTIONAL and can be omitted, when `feature_flag` is missing no feature flag will be checked.
......@@ -14,18 +18,22 @@
- name: compliance_features_track_unique_visits_union
operator: OR
source: redis
time_frame: [7d, 28d]
events: ['g_compliance_audit_events', 'g_compliance_dashboard', 'i_compliance_audit_events', 'a_compliance_audit_events_api', 'i_compliance_credential_inventory']
- name: product_analytics_test_metrics_union
operator: OR
source: redis
time_frame: [7d, 28d]
events: ['i_search_total', 'i_search_advanced', 'i_search_paid']
- name: product_analytics_test_metrics_intersection
operator: AND
source: redis
time_frame: [7d, 28d]
events: ['i_search_total', 'i_search_advanced', 'i_search_paid']
- name: incident_management_alerts_total_unique_counts
operator: OR
source: redis
time_frame: [7d, 28d]
events: [
'incident_management_alert_status_changed',
'incident_management_alert_assigned',
......@@ -35,6 +43,7 @@
- name: incident_management_incidents_total_unique_counts
operator: OR
source: redis
time_frame: [7d, 28d]
events: [
'incident_management_incident_created',
'incident_management_incident_reopened',
......@@ -51,10 +60,11 @@
- name: i_testing_paid_monthly_active_user_total
operator: OR
source: redis
time_frame: [7d, 28d]
events: [
'i_testing_web_performance_widget_total',
'i_testing_full_code_quality_report_total',
'i_testing_group_code_coverage_visit_total',
'i_testing_load_performance_widget_total',
'i_testing_metrics_report_widget_total'
]
]
......@@ -40,9 +40,9 @@ module Gitlab
FALLBACK = -1
DISTRIBUTED_HLL_FALLBACK = -2
ALL_TIME_PERIOD_HUMAN_NAME = "all_time"
WEEKLY_PERIOD_HUMAN_NAME = "weekly"
MONTHLY_PERIOD_HUMAN_NAME = "monthly"
ALL_TIME_TIME_FRAME_NAME = "all"
SEVEN_DAYS_TIME_FRAME_NAME = "7d"
TWENTY_EIGHT_DAYS_TIME_FRAME_NAME = "28d"
def count(relation, column = nil, batch: true, batch_size: nil, start: nil, finish: nil)
if batch
......
......@@ -69,7 +69,7 @@ RSpec.describe Gitlab::Usage::Metrics::Aggregates::Sources::PostgresHll, :clean_
it 'persists serialized data in Redis' do
Gitlab::Redis::SharedState.with do |redis|
expect(redis).to receive(:set).with("#{metric_1}_weekly-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
expect(redis).to receive(:set).with("#{metric_1}_7d-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
end
save_aggregated_metrics
......@@ -81,7 +81,7 @@ RSpec.describe Gitlab::Usage::Metrics::Aggregates::Sources::PostgresHll, :clean_
it 'persists serialized data in Redis' do
Gitlab::Redis::SharedState.with do |redis|
expect(redis).to receive(:set).with("#{metric_1}_monthly-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
expect(redis).to receive(:set).with("#{metric_1}_28d-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
end
save_aggregated_metrics
......@@ -93,7 +93,7 @@ RSpec.describe Gitlab::Usage::Metrics::Aggregates::Sources::PostgresHll, :clean_
it 'persists serialized data in Redis' do
Gitlab::Redis::SharedState.with do |redis|
expect(redis).to receive(:set).with("#{metric_1}_all_time-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
expect(redis).to receive(:set).with("#{metric_1}_all-#{recorded_at.to_i}", '{"141":1,"56":1}', ex: 120.hours)
end
save_aggregated_metrics
......
......@@ -23,6 +23,22 @@ RSpec.describe 'aggregated metrics' do
end
end
RSpec::Matchers.define :have_known_time_frame do
allowed_time_frames = [
Gitlab::Utils::UsageData::ALL_TIME_TIME_FRAME_NAME,
Gitlab::Utils::UsageData::TWENTY_EIGHT_DAYS_TIME_FRAME_NAME,
Gitlab::Utils::UsageData::SEVEN_DAYS_TIME_FRAME_NAME
]
match do |aggregate|
(aggregate[:time_frame] - allowed_time_frames).empty?
end
failure_message do |aggregate|
"Aggregate with name: `#{aggregate[:name]}` uses not allowed time_frame`#{aggregate[:time_frame] - allowed_time_frames}`"
end
end
let_it_be(:known_events) do
Gitlab::UsageDataCounters::HLLRedisCounter.known_events
end
......@@ -38,10 +54,18 @@ RSpec.describe 'aggregated metrics' do
expect(aggregated_metrics).to all has_known_source
end
it 'all aggregated metrics has known source' do
expect(aggregated_metrics).to all have_known_time_frame
end
aggregated_metrics&.select { |agg| agg[:source] == Gitlab::Usage::Metrics::Aggregates::REDIS_SOURCE }&.each do |aggregate|
context "for #{aggregate[:name]} aggregate of #{aggregate[:events].join(' ')}" do
let_it_be(:events_records) { known_events.select { |event| aggregate[:events].include?(event[:name]) } }
it "does not include 'all' time frame for Redis sourced aggregate" do
expect(aggregate[:time_frame]).not_to include(Gitlab::Utils::UsageData::ALL_TIME_TIME_FRAME_NAME)
end
it "only refers to known events" do
expect(aggregate[:events]).to all be_known_event
end
......
......@@ -1375,25 +1375,20 @@ RSpec.describe Gitlab::UsageData, :aggregate_failures do
end
end
describe '.aggregated_metrics_weekly' do
subject(:aggregated_metrics_payload) { described_class.aggregated_metrics_weekly }
describe '.aggregated_metrics_data' do
it 'uses ::Gitlab::Usage::Metrics::Aggregates::Aggregate methods', :aggregate_failures do
expected_payload = {
counts_weekly: { aggregated_metrics: { global_search_gmau: 123 } },
counts_monthly: { aggregated_metrics: { global_search_gmau: 456 } },
counts: { aggregate_global_search_gmau: 789 }
}
it 'uses ::Gitlab::Usage::Metrics::Aggregates::Aggregate#weekly_data', :aggregate_failures do
expect_next_instance_of(::Gitlab::Usage::Metrics::Aggregates::Aggregate) do |instance|
expect(instance).to receive(:weekly_data).and_return(global_search_gmau: 123)
expect(instance).to receive(:monthly_data).and_return(global_search_gmau: 456)
expect(instance).to receive(:all_time_data).and_return(global_search_gmau: 789)
end
expect(aggregated_metrics_payload).to eq(aggregated_metrics: { global_search_gmau: 123 })
end
end
describe '.aggregated_metrics_monthly' do
subject(:aggregated_metrics_payload) { described_class.aggregated_metrics_monthly }
it 'uses ::Gitlab::Usage::Metrics::Aggregates::Aggregate#monthly_data', :aggregate_failures do
expect_next_instance_of(::Gitlab::Usage::Metrics::Aggregates::Aggregate) do |instance|
expect(instance).to receive(:monthly_data).and_return(global_search_gmau: 123)
end
expect(aggregated_metrics_payload).to eq(aggregated_metrics: { global_search_gmau: 123 })
expect(described_class.aggregated_metrics_data).to eq(expected_payload)
end
end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment