Let's consider the concept of monitoring Kubernetes, get acquainted with the Prometheus tool, and talk about alerting.
The monitoring topic is voluminous, it cannot be disassembled in one article. The purpose of this text is to provide an overview of the tools, concepts and approaches.
The material of the article is an extract from the open lecture of the Slurm school . If you want to complete the full training, sign up for the course on Monitoring and Logging Infrastructure in Kubernetes .
What is monitored in a Kubernetes cluster
Physical servers. If a Kubernetes cluster is deployed on their servers, you need to monitor their health. This task is handled by Zabbix; if you work with him, then you do not need to refuse, there will be no conflicts. It is Zabbix that monitors the state of our servers.
Let's move on to monitoring at the cluster level.
Control Plane components: API, Scheduler and others. At a minimum, you need to monitor that the API of servers or etcd is greater than 0. Etcd is able to give a lot of metrics: on the disks on which it spins, on the health of its etcd cluster, and others.
Docker : . Docker, , , .
DNS. DNS, Discovery, . , , DNS . CoreDNS.
Ingress. ( Ingress Controller) .
— , .
, , , . : , ; 10, 2. . . , : -? , , — . .
, -: , .
Prometheus
— Prometheus. , Prometheus . , « Kubernetes», Prometheus.
, Prometheus: Helm Prometheus Prometheus Operator.
- Prometheus. , ConfigMap — , , , .
- Prometheus Operator , , : , , .
, Prometheus. , : , . Prometheus Operator , .
Prometheus Kubernetes: API Server .
Prometheus , . , Prometheus , , . . , PostgreSQL Exporter: PostgreSQL Prometheus, Prometheus .
Prometheus
Prometheus Server — , Prometheus. .
time series database (TSDB). TSDB — , Go, Prometheus. , .
TSDB
Prometheus . 15 . , : TSDB , . Prometheus .
, , . TSDB « », . , , InfluxDB, ClickHouse . ClickHouse.
Prometheus Server pull: , . : « API Server», n- .
(job cron job), , Pushgateway. : job , , Pushgateway . Prometheus Pushgateway.
Prometheus — Alertmanager. — alerting rules. , alert , API 0. , alert alert manager . alert manager : - , , . Slack, Telegram, email .
- Prometheus — Discovering. Prometheus , . « IP-, — », , (targets — ). Prometheus , , , .
Kubernetes, : 10 , 3. IP- , , — Discovering .
Prometheus PromQL. , .
https://prometheus.io/docs/prometheus/latest/querying/basics/
container_memory_usage_bytes
container_memory_usage_bytes / 1024 / 1024
sum(container_memory_usage_bytes) / 1024 / 1024
100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]) * 100)
- Prometheus
Prometheus , -. .
Expression PromQL.
Alerts — alerting rules, :
- inactive — , , ;
- pending — , , . , : , ;
- firing — , .
Status , Prometheus. (targets), .
Prometheus Kubernetes.
Grafana
- Prometheus , . , Prometheus Grafana. .
Prometheus Grafana , : GRAFANA SUPPORT FOR PROMETHEUS, .
: Grafana Loki .
: , Kubernetes, Southbridge, .