Reduce the number of buckets in Sidekiq histograms
Because of the wide range of buckets used in for these metrics and the large number of pods running, the cardinality of these series made it hard to query the Prometheus instance serving these. As a result, some of the metrics that are used for service monitoring and alerting were failing to record in Thanos. By reducing the number of buckets we're hoping to improve the rule evaluations and prevent missing series for Sidekiq This brings the number of series for the `sidekiq_jobs_completion_seconds` & `sidekiq_jobs_queue_duration_seconds` down from +8k to about 1.5k each. This also reduces the number of buckets used for measuring the total time a job spends per resource: cpu, db, gitaly or elasticsearch. Changelog: changed
Showing
Please register or sign in to comment