👩🏼‍🤝‍👨🏾 🕺🏼 👨‍👩‍👧 How to downscale metrics in Prometheus if you're not DevOps 👨🏿‍🔬 🥃 ✌🏼

Sometimes the development team is faced with a problem in which they have little expert experience, and through trial and error, they find a non-obvious solution. This happened to us when we needed to transfer the collection of metrics from Infux to Prometheus. Their total size turned out to be 1.5 million, and we decided to reduce it. The infrastructure for collecting metrics (Prometheus, k8s, deploy via Helm) was created by DevOps engineers from another team who did not have the resources for our task. Therefore, we enlisted their advice, studied the documentation and decided to reduce the dimension of the metrics by development efforts.

This article will not be suitable for experienced DevOps engineers, but it will be useful for developers who want to reduce the dimension of metrics and do not want to dive into the documentation. Or to those who deliberately abandon hierarchical federation and are looking for a workaround, but do not want to step on our rake. Let's tell:

how to reduce the dimension of metrics in two steps using two ServiceMonitor,
what is the reference way to reduce the dimension of metrics without crutches,
why you shouldn't waste time on dimensionality reduction with Pushgateway.

Why it was necessary to reduce the dimension of metrics

Our team is responsible for one of the Mindbox products - product recommendations on the website and in newsletters. We collected the processing time of recommendations in real time in Influx, and in order to help the business evaluate the performance of the product, it was also necessary to count the Apdex (Application Performance Index). The company is gradually transferring metrics from Influx to Prometheus, so they decided to collect both metrics at once in Prometheus using histograms.

The

metric bar chart we wanted to create to measure product performance Our services are deployed on Kubernetes. We collect metrics in Prometheus using ServiceMonitor. We use Prometheus.NET in the application .In the default configuration, a pod label with the corresponding value is added to each metric that pod exports.

Collecting metrics in Prometheus using ServiceMonitor

To show the average processing time, percentiles (p50, p95, p99) and Apdex, it was planned to use a histogram with 20 buckets. Taking into account the fact that we wanted to receive information on each of the 2.5 thousand recommendation mechanics, the total dimension of the metrics was 50 thousand. The pod names change on each layout, and the pod label is assigned to each metric, so with a 30-day retention and daily layout, the dimension grows to 1.5 million. A metric with this dimension took up much more space in Prometheus than we wanted.

2,500 * 20 * 30 = 1,500,000

(number of mechanics) * (number of histogram buckets) * (retention) = (final dimension)

We decided to get rid of the pod and instance labels so that the dimension does not increase in calculations. At the same time, we tried to find a simple and cheap solution that can be implemented and maintained without the involvement of a DevOps engineer.

An important assumption: the metric for which we wanted to reduce the dimension is collected from only one pod at a time. The pod can only change when the application is restarted, for example, when laying out.

What solutions were considered

Let's make a reservation right away that hierarchical federation is most suitable for solving dimension problems; examples of use are described in detail in the documentation for it. We could deploy Prometheus with low metric retention and collect raw metrics there. Then, through the recording rules, calculate the aggregates and collect them in another Prometheus with a high data retention.

We did not consider federation because we wanted to find a solution easier, cheaper and faster to implement. In addition, the developers had to work on the task, not DevOps, so I wanted to use familiar tools and techniques. Although practice has shown that we spent time looking for such a solution, during which it was possible to make a federation, and our implementation turned out to be a "crutch".

We formulated two equivalent solutions:

1. Raise Pushgateway and push metrics there without labels. The company already had a helm chart for monitoring the stack, including for Pushgateway.

Pros:the code and charts can be reused, metrics from other team servers that are outside Kubernetes and have not yet moved from Influx to Prometheus can be transferred to the raised Pushgateway.

Cons: more expensive support.

Collecting metrics in Prometheus using Pushgateway

2. Raise the second ServiceMonitor and configure routing of metrics between them. In one, through relaying, remove the pod / instance labels, and in the other, leave it as it is.

Pros: cheaper - you just need to deploy ServiceMonitor, easier to maintain.

Cons: an ops implementation that the developers were not familiar with.

Collecting metrics in Prometheus using a second ServiceMonitor

How the Pushgateway solution failed

First solution. To begin with, we chose the obvious implementation through Pushgateway. We picked up Pushgateway, pushed metrics there and used a constant as an instance label. The request looked something like this:

curl -i -X POST \
     -d 'some_metric{bar=\"fooo\"} 3.22' \
     'https://pushgateway:9091/metrics/instance/constant/'

We quickly coped with the task and at first the result was pleasing - the metrics were collected, but the dimension did not grow. But soon we began to notice large gaps in metrics for some mechanics. At the same time, a strange pattern was traced - when for some mechanics the metric was transmitted correctly and continuously, for others, failures began. This gave the impression that only one group of mechanics was reporting at a time. There were several such groups, and they changed in no particular order.

Why didn't it work.Anyone familiar with the Pushgateway device probably knew right away that our solution was not working. In Pushgateway, labels are passed in two ways: via the request path or in the request body. In this case, the set of labels and their values that are passed through the path act as a key for the dictionary where the metrics are stored. Everything that was passed through the request body gets into the value using this key. That is, in our case, each request from a pod overwrites all the metrics that were pushed from other pods. Since Prometheus collects metrics at intervals, metrics only from the pod that was the last to be pushed were included in it.

To correctly send metrics to Pushgateway, you would have to write custom C # code. But such a solution was neither easy nor cheap, so it was abandoned.

Second solution. We decided to grab onto Pushgateway again: collect initial metrics and push with all labels, and then remove the pod label using ServiceMonitor, which collects metrics from Pushgateway. But already at the start, we realized that the idea would not work.

Why not implemented.Pushgateway has several features that make this solution impossible. Main - the data is not cleared automatically, by retention. This means you need to track the size of the disk and write the cleanup code manually. Another problem is that after relaying, metrics with the same set of labels, but with a different pod label, will conflict. As a result, only the last metric will remain in the Pushgateway order. In this case, the metrics are sorted not by the date of the last modification, but in alphabetical order. So, when laying out, values from new pods may not get into Prometheus.

How the solution worked with the second ServiceMonitor

We went back to our second original design and made two ServiceMonitor. Additionally, a special label was put in the code (in our case, business) for those metrics whose dimension we are reducing:

on one ServiceMonitor, all metrics with a special label were dropped, and the rest were left as they are;
on the other, only metrics with a special label were left and the pod and instance labels were removed from them.

We did everything through relaying, added the code to the configuration of the first ServiceMonitor:

metricRelabelings:
  - action: drop
    sourceLabels:
      - business
    regex: "[Tt]rue"

The following was added to the configuration of the second ServiceMonitor:

metricRelabelings:
  - action: keep
    sourceLabels:
      - business
    regex: "[Tt]rue"
  - action: labeldrop
    regex: instance|pod|business

What we learned from the history of finding solutions

Neither Pushgateway directly nor relaying in it are suitable for reducing the dimension of metrics.
If you use relaying, then metrics with the same set of labels should not be simultaneously reported from different pods.
The second ServiceMonitor is a "crutch" that is easy and quick to implement if you don't want to waste resources on federation.
The best solution for dimensionality reduction is federation:
- Low retention Prometheus,
- collect aggregates (recording rules),
- sent to Prometheus with high retention.

Yuri Sokolov, developer

How to downscale metrics in Prometheus if you're not DevOps

Why it was necessary to reduce the dimension of metrics

What solutions were considered

How the Pushgateway solution failed

How the solution worked with the second ServiceMonitor

What we learned from the history of finding solutions

More articles: