Monitoring Kubernetes Cluster: An Overview and Introduction to Prometheus

Let's consider the concept of monitoring Kubernetes, get acquainted with the Prometheus tool, and talk about alerting.



The monitoring topic is voluminous, it cannot be disassembled in one article. The purpose of this text is to provide an overview of the tools, concepts and approaches.



The material of the article is an extract from the open lecture of the Slurm school . If you want to complete the full training, sign up for the course on Monitoring and Logging Infrastructure in Kubernetes .





What is monitored in a Kubernetes cluster





Physical servers. If a Kubernetes cluster is deployed on their servers, you need to monitor their health. This task is handled by Zabbix; if you work with him, then you do not need to refuse, there will be no conflicts. It is Zabbix that monitors the state of our servers.



Let's move on to monitoring at the cluster level.



Control Plane components: API, Scheduler and others. At a minimum, you need to monitor that the API of servers or etcd is greater than 0. Etcd is able to give a lot of metrics: on the disks on which it spins, on the health of its etcd cluster, and others.



Docker : . Docker, , , .



DNS. DNS, Discovery, . , , DNS . CoreDNS.



Ingress. ( Ingress Controller) .



— , .



, , , . : , ; 10, 2. . . , : -? , , — . .



, -: , .



Prometheus



Prometheus. , Prometheus . , « Kubernetes», Prometheus.



, Prometheus: Helm Prometheus Prometheus Operator.



  1. Prometheus. , ConfigMap — , , , .
  2. Prometheus Operator , , : , , .


, Prometheus. , : , . Prometheus Operator , .



Prometheus Kubernetes: API Server .



Prometheus , . , Prometheus , , . . , PostgreSQL Exporter: PostgreSQL Prometheus, Prometheus .



Prometheus





Prometheus Server — , Prometheus. .



time series database (TSDB). TSDB — , Go, Prometheus. , .



TSDB



Prometheus . 15 . , : TSDB , . Prometheus .



, , . TSDB « », . , , InfluxDB, ClickHouse . ClickHouse.

Prometheus Server pull: , . : « API Server», n- .



(job cron job), , Pushgateway. : job , , Pushgateway . Prometheus Pushgateway.



Prometheus — Alertmanager. — alerting rules. , alert , API 0. , alert alert manager . alert manager : - , , . Slack, Telegram, email .



- Prometheus — Discovering. Prometheus , . « IP-, — », , (targets — ). Prometheus , , , .



Kubernetes, : 10 , 3. IP- , , — Discovering .



Prometheus PromQL. , .



https://prometheus.io/docs/prometheus/latest/querying/basics/

 

    container_memory_usage_bytes

 

    container_memory_usage_bytes / 1024 / 1024

 

    sum(container_memory_usage_bytes) / 1024 / 1024

 

    100 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]) * 100)


- Prometheus



Prometheus , -. .





Expression PromQL.



Alerts — alerting rules, :



  1. inactive — , , ;
  2. pending — , , . , : , ;
  3. firing — , .


Status , Prometheus. (targets), .





Prometheus Kubernetes.



Grafana



- Prometheus , . , Prometheus Grafana. .





Prometheus Grafana , : GRAFANA SUPPORT FOR PROMETHEUS, .



: Grafana Loki .



: , Kubernetes, Southbridge, .




All Articles