Guidance for Setting Up Service Level Objectives (SLOs) in Kubernetes with Prometheus and Linkerd

In anticipation of the start of the course "Infrastructure platform based on Kubernetes", we have prepared a traditional translation of a useful article.


(SLO)

, (SLO, . Service Level Objectives) Kubernetes  Prometheus, ,  Linkerd,  . , , , SLO.

, , , SLO Kubernetes .

SLO Kubernetes

SLO, , . , Google SRE, SLO , , , .

, , , SLO : SLO , , .  Kubernetes, . , SLO , . (. SLO Kubernetes.)

, SLO Kubernetes , . SLO , , ! , Linkerd golden metrics ( ) โ€” , ,  โ€” . Linkerd SLO .

(, SLO, , , . , , , , SLO .)

.

SLO Linkerd Prometheus

, SLO gRPC-, Kubernetes. , SLO.

, Linkerd . Linkerd HTTP gRPC, (pods) . , Prometheus. Prometheus Linkerd, .

, , , Linkerd Prometheus, SLO.

: Linkerd Kubernetes

. , Kubernetes  kubectl, . Linkerd, , Linkerd .

Linkerd:

curl -sL https://run.linkerd.io/install | sh
export PATH=$PATH:$HOME/.linkerd2/bin

(Linkerd   Linkerd.)

, , Kubernetes Linkerd, Linkerd :

linkerd check --pre
linkerd install | kubectl apply -f -
linkerd check

, Emojivoto, :

curl -sL https://run.linkerd.io/emojivoto.yml \
  | linkerd inject - \
  | kubectl apply -f -

. SLO: .

 โ€” , , SLO. ?

. , ,     7 80 %.  SLO. :  (service level indicator โ€” SLI), ; , ;   . :

SLI:

: 80 %

: 7

SLO , 20 % 7- , . ,  โ€” , 20 % ยซยป .

, 7 100 % , 100 %  โ€” . , 7 80 % , 0 % .   80 %,    SLO .

:

= 1โ€“[(1โ€“)/(1โ€“)]

  โ€” SLI, . , , SLI ( ) .

Prometheus

. Prometheus Linkerd, , :

# Get the name of the prometheus pod
$ kubectl -n linkerd get pods
NAME                                      READY   STATUS    RESTARTS   AGE
..
linkerd-prometheus-54dd7dd977-zrgqw       2/2     Running   0          16h

PODNAME, :

kubectl -n linkerd port-forward linkerd-prometheus-PODNAME 9090:9090

 localhost:9090   PromQL, Prometheus.

Prometheus dashboard
Prometheus

, !

Prometheus

100 80 %  โ€” . , Prometheus. Emojivoto, emojivoto .

, :

:

response_total{deployment="voting", direction="inbound", namespace="emojivoto"}

:

response_total{classification="success",deployment="voting",direction="inbound",namespace="emojivoto",..} 46499
response_total{classification="failure",deployment="voting",direction="inbound",namespace="emojivoto",..} 8652

, , : classification. 46 499 8652 .

, 7 ,  classification="success"   [7d]:

:

response_total{deployment="voting", classification="success", direction="inbound", namespace="emojivoto"}[7d]

, PromQL increase() sum(), , :

:

sum(increase(response_total{deployment="voting", classification="success", direction="inbound", namespace="emojivoto"}[7d])) by (namespace, deployment, classification, tls)

:

{classification="success",deployment="voting",namespace="emojivoto",tls="true"} 26445.68142198795

, 7 26 445 (  increase()).

, , , โ€”  classification="success":

:

sum(increase(response_total{deployment="voting", classification="success", direction="inbound", namespace="emojivoto"}[7d])) by (namespace, deployment, classification, tls) / ignoring(classification) sum(increase(response_total{deployment="voting", direction="inbound", namespace="emojivoto"}[7d])) by (namespace, deployment, tls)

:

{deployment="voting",namespace="emojivoto",tls="true"} 0.846113068695625

, 7 84,61 % .

, , . :

= 1โ€“[(1โ€“)/(1โ€“)]

, 80 % (0,8):

:

1 - ((1 - (sum(increase(response_total{deployment="voting", classification="success", direction="inbound", namespace="emojivoto"}[7d])) by (namespace, deployment, classification, tls)) / ignoring(classification) sum(increase(response_total{deployment="voting", direction="inbound", namespace="emojivoto"}[7d])) by (namespace, deployment, tls)) / (1 - .80))

:

{deployment="voting",namespace="emojivoto",tls="true"} 0.2312188519042635

23,12 % .

, !

Grafana

 โ€” , ? ! Linkerd Grafana, Linkerd.

Linkerd,  linkerd dashboard.

Grafana emojivoto, Grafana .

Linkerd dashboard with Grafana integration
Linkerd Grafana

 deploy/voting, : , . .

Linkerd in Grafana dashboard
Linkerd Grafana

 โ€”  7-day error budget (success rate) (ยซ 7 ( )ยป) , , PromQL.

!

Bug budget in Grafana with Linkerd metrics
Grafana Linkerd

.

, , , PromQL, rate(), .

, -, . (Gauge) , , .

7 days error budget (success rate) in Gauge format.
7 ( ) (Gauge).

, emojivoto,  deployment="voting". , 80 %.

Error budget for 7 days (percentage of successful attempts) for all services.
7 ( ) .

SLO

SLO Linkerd, Grafana. !

?

, , SLO. . , .  , . SLO .

Buoyant SLO, Kubernetes. ,   Dive, SLO . Dive Linkerd , , . Dive , ,  , SLO, .

Dive dashboard showing SLO and bug budget compliance over 7 days.
Dive, SLO 7- .

,  โ€” Dive SLO Linkerd Prometheus Grafana, , โ€” SLO!

:

(SLO) Kubernetes

(SLO) . SLO โ€” . ยซ , , ?ยป , Kubernetes, SLO - : , , .


" Kubernetes". " Kubernetes" .


:




All Articles