Hello. Today we would like to talk about detecting anomalies in a microservice environment. This post is a short summary of our 40-minute report that we did at the DevOps Live 2020 online conference and, in order not to write a longread, we decided to focus on an overview of the tools for detecting anomalies in the distribution of metric values ββfor automating monitoring of microservices, which can be quickly used by any team ...
The topic of anomaly detection is now very relevant, since with the transition to microservices for SRE and DevOps, the priority of tasks related to converting alerts into a meaningful signal, reducing MTTD and simplifying the configuration of alerts in monitoring distributed environments has significantly increased.
, , , .
"" .
, , .
?
?
, :
- latency ;
- ;
- .
"" , - , .
, :
- ;
- , ;
- «» , .
, , , ?
:
- c ;
- APM ;
- as a Service.
.
Prometheus , time series .
recording rules, , .
, , , ( " ").
, , z- (z-score) β , , .
http_requests_total, :
# - record: job:http_requests:rate5m expr: sum by (app) (rate(http_requests_total[5m]))
:
# average - - record: job:http_requests:rate5m:avg_over_time_1w expr: avg_over_time(job:http_requests:rate5m[1w]) # stddev - - record: job:http_requests:rate5m:stddev_over_time_1w expr: stddev_over_time(job:http_requests:rate5m[1w]) # z- (job:http_requests:rate5m - job:http_requests:rate5m:avg_over_time_1w ) / job:http_requests:rate5m:stddev_over_time_1w
( , latency) β , , .
β .
, .
.
, β z-.
Prometheus β PAD
Prometheus Anomaly Detector (PAD), Red Hat, , .
PAD Prometeheus , PAD recording rules, , , Prophet, .
PAD Grafana .
, proof of concept.
APM
(Application Performance Monitoring) AIOps β , , .
, .
New Relic
New Relic baseline ( ) β , EUM, .
β baseline, ( , , ).
, , , , baseline.
, .
2020 β New Relic Applied Intelligence (AI).
New Relic AI KPI .
/ .
AppDynamics
AppDynamics APM baseline KPI- .
baseline , , (, ) , baseline.
, , health rule .
, baseline health rule.
Dynatrace
Dynatrace " " , .
:
- KPI
.
Instana
Instana " " 230 "" , KPI .
latecy, error rate, traffic ( ).
E-Divisive with Medians (EDM).
, , baseline.
"" "" , .
baseline β .
EUM.
as a Service
APM , Prometheus , , SaaS .
Azure Metric Advisor
Microsoft β Azure Metric Advisor .
, , e-commerce.
(SQL Server, ElasticSearch, InfluxDB, MongoDB, MySQL, PostgreSQL ), Prometheus .
Anodot
β Prometheues -.
-, SRE .
e-commerce, gaming .
AnomalyIO
, , , , InfluxDB.
, InfluxDB, , .
- .
- β , .
- Prometheus β .
- APM AIOps, .
.