Top 10 PromQL queries for monitoring Kubernetes

This article provides examples of popular Prometheus queries for monitoring Kubernetes .





If you are just getting started with Prometheus and are having difficulty creating PromQL queries, we recommend that you refer to the PromQL Getting Started Guide . We'll skip theory here and get straight to practice.





The rating is based on the experience of Sysdig , which assists hundreds of customers daily in setting up monitoring of their clusters:





1. The number of pods in each namespace

Information about the number of pods in each namespace can be useful for detecting anomalies in the cluster, for example, too many pods in a separate namespace:





sum by (namespace) (kube_pod_info)
      
      



2. The number of containers without CPU limits in each namespace

It is important to properly set limits to optimize application and cluster performance . This query finds containers without CPU limits:





count by (namespace)(sum by (namespace,pod,container)(kube_pod_container_info{container!=""}) unless sum by (namespace,pod,container)(kube_pod_container_resource_limits{resource="cpu"}))
      
      



3. pods namespace

pods, . , pod CrashLoopBackOff:





sum by (namespace)(changes(kube_pod_status_ready{condition="true"}[5m]))
      
      



4. Pods Not Ready namespace

pods, . :





sum by (namespace)(kube_pod_status_ready{condition="false"})
      
      



5. —

, CPU limits . . :





sum(kube_pod_container_resource_limits{resource="cpu"}) - sum(kube_node_status_capacity_cpu_cores)
      
      



6. —

Memory limits , PodEviction, . PromQL:





sum(kube_pod_container_resource_limits{resource="memory"}) - sum(kube_node_status_capacity_memory_bytes)
      
      



7.

:





sum(kube_node_status_condition{condition="Ready", status="true"}==1)
      
      



8. ,

, Ready Not Ready:





sum(changes(kube_node_status_condition{status="true",condition="Ready"}[15m])) by (node) > 2
      
      



9.

Kubernetes — . , :





sum((rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[30m]) - on (namespace,pod,container) group_left avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="cpu"})) * -1 >0)
      
      



10.

, :





sum((container_memory_usage_bytes{container!="POD",container!=""} - on (namespace,pod,container) avg by (namespace,pod,container)(kube_pod_container_resource_requests{resource="memory"})) * -1 >0 ) / (1024*1024*1024)
      
      



?

PromQL, , PromQL.





Awesome Prometheus alerts collection. Prometheus alert rules, , PromQL Prometheus.








All Articles