Everyone loves alerts.
Of course, it's much better to be notified when something happened (or fixed) than to sit around looking at the graphs and looking for anomalies.
And a lot of tools have been created for this. Alertmanager from the Prometheus ecosystem and vmalert from the VictoriaMetrics product group. Zabbix notifications and alerts in Grafana. Self-written scripts on bash and Telegram bots that periodically pull some URL and say if something is wrong. A lot of everything.
We, in our company, also used different solutions until we ran into the complexity, or rather the impossibility of creating complex, composite alerts. What we wanted and what we did in the end is under the cut. TLDR: This is how the open source project Balerter appeared
For a long time, we lived pretty well with alerts configured in Grafana. Yes, this is not the best way. It is always recommended to use some kind of specialized solutions, such as Alertmanager. And we also looked in the direction of the crossing more than once. And then, little by little, we wanted more.
, / XX% N M ? , , Grafana Alertmanager, . ( , )
, . :
Clickhouse, Postgres, .
, . / , ,
. , Prometheus, Clickhouse, Postgres
- telegram, slack ..
, ,
-
, , . - , - . .
, Balerter.
, . (, , . . )
?
Lua, ( Prometheus, Clickhouse .). - . / - . Balerter , (Email, telegram, slack ..). . … - )
:
-- @interval 10s
-- @name script1
local minRequestsRPS = 100
local log = require("log")
local ch1 = require("datasource.clickhouse.ch1")
local res, err = ch1.query("SELECT sum(requests) AS rps FROM some_table WHERE date = now()")
if err ~= nil then
log.error("clickhouse 'ch1' query error: " .. err)
return
end
local resultRPS = res[1].rps
if resultRPS < minResultRPS then
alert.error("rps-min-limit", "Requests RPS are very small: " .. tostring(resultRPS))
else
alert.success("rps-min-limit", "Requests RPS ok")
end
:
, 10
( API, , )
ch1
( )-
( , , Postgres)
ID
rps-min-limit
,
. , , . .
- . v0.4.0 .
:
-- @test script1
-- @name script1-test
test = require('test')
local resp = {
{
rps = 10
}
}
test.datasource('clickhouse.ch1').on('query', 'SELECT sum(requests) AS rps FROM some_table WHERE date = now()').response(resp)
test.alert().assertCalled('error', 'rps-min-limit', 'Requests RPS are very small: 10')
test.alert().assertNotCalled('success', 'rps-min-limit', 'Requests RPS ok')
:
,
( )
,
ch1
, (error) rps-min-limit
, rps-min-limit (success)
Balerter?
, , Balerter. https://balerter.com
clickhouse
postgres
mysql
prometheus
loki
slack
telegram
syslog
notiify (UI )
email
discord
- Key/Value
Lua (- lua- json, csv)
HTTP ( , )
API ( , )
Prometheus
?
, cron. v1.0.0
. , - MongoDB. - Elastic Search. SMS / . , , , . .
- - ) , . ,
We have been using Balerter for quite some time. Dozens of scripts are guarding our peace of mind. Hope this work will be helpful to someone else.
And welcome to your Issue and PR.