Easy work with complex alerts. Or the history of the creation of Balerter

Everyone loves alerts.

Of course, it's much better to be notified when something happened (or fixed) than to sit around looking at the graphs and looking for anomalies.

And a lot of tools have been created for this. Alertmanager from the Prometheus ecosystem and vmalert from the VictoriaMetrics product group. Zabbix notifications and alerts in Grafana. Self-written scripts on bash and Telegram bots that periodically pull some URL and say if something is wrong. A lot of everything.

We, in our company, also used different solutions until we ran into the complexity, or rather the impossibility of creating complex, composite alerts. What we wanted and what we did in the end is under the cut. TLDR: This is how the open source project Balerter appeared

For a long time, we lived pretty well with alerts configured in Grafana. Yes, this is not the best way. It is always recommended to use some kind of specialized solutions, such as Alertmanager. And we also looked in the direction of the crossing more than once. And then, little by little, we wanted more.

, / XX% N M ? , , Grafana Alertmanager, . ( , )

, . :

Clickhouse, Postgres, .

, . / , ,

  • . , Prometheus, Clickhouse, Postgres

  • - telegram, slack ..

  • , ,

  • -

, , . - , - . .

, Balerter.

, . (, , . . )

?

Lua, ( Prometheus, Clickhouse .). - . / - . Balerter , (Email, telegram, slack ..). . … - )

:

-- @interval 10s
-- @name script1

local minRequestsRPS = 100

local log = require("log")
local ch1 = require("datasource.clickhouse.ch1")

local res, err = ch1.query("SELECT sum(requests) AS rps FROM some_table WHERE date = now()")
if err ~= nil then
    log.error("clickhouse 'ch1' query error: " .. err)
    return
end

local resultRPS = res[1].rps

if resultRPS < minResultRPS then
    alert.error("rps-min-limit", "Requests RPS are very small: " .. tostring(resultRPS))
else
    alert.success("rps-min-limit", "Requests RPS ok")
end 

:

  • , 10

  • ( API, , )

  • ch1 ( )

  • -

  • ( , , Postgres)

  • ID rps-min-limit

  • ,

. , , . .

- . v0.4.0 .

:

-- @test script1
-- @name script1-test

test = require('test')

local resp = {
    {
        rps = 10
    }
} 

test.datasource('clickhouse.ch1').on('query', 'SELECT sum(requests) AS rps FROM some_table WHERE date = now()').response(resp)

test.alert().assertCalled('error', 'rps-min-limit', 'Requests RPS are very small: 10')
test.alert().assertNotCalled('success', 'rps-min-limit', 'Requests RPS ok')

:

  • ,

  • ( )

  • , ch1

  • , (error) rps-min-limit

  • , rps-min-limit (success)

Balerter?

, , Balerter. https://balerter.com

    • clickhouse

    • postgres

    • mysql

    • prometheus

    • loki

    • slack

    • telegram

    • syslog

    • notiify (UI )

    • email

    • discord

  • , S3 ( )

  • - Key/Value

  • Lua (- lua- json, csv)

  • HTTP ( , )

  • API ( , )

  • Prometheus

?

,  cron. v1.0.0

. , - MongoDB. - Elastic Search. SMS / . , , , . .

- - ) , . ,

We have been using Balerter for quite some time. Dozens of scripts are guarding our peace of mind. Hope this work will be helpful to someone else.

And welcome to your Issue and PR.




All Articles