Kubernetes is designed to be robust and resilient to failures, and has the ability to automatically recover. And he does it all well! However, production nodes can lose connection to the cluster or fail for various reasons. In these cases, it is imperative that Kubernetes responds quickly to the incident.
, pods . , . , , Kubernetes, ?
, Kubernetes , Kubelet Controller Manager:
Kubelet kube-apiserver ,
--node-status-update-frequency
. 10 .
Controller manager Kubelet
β-node-monitor-period
. 5 .
Kubelet
--node-monitor-grace-period
, Controller manager Kubelet . 40 .
:
Kubelet kube-apiserver, -
node-status-update-frequency
= 10 .
.
Controller manager , Kubelet,
--node-monitor-period
= 5 .
Controller manager , , -
--node-monitor-grace-period
40 . Controller manager , NotReady.
Kube Proxy endpoints, pods , pods .
pods, , , (NotReady) 45 .
Kubelet Controller Manager.
Kubernetes , :
-βnode-status-update-frequency
1 ( 10 )
--node-monitor-period
1 ( 5 )
--node-monitor-grace-period
4 ( 40 )
, Kubernetes Kind . Kind Cluster , , .
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
kubeadmConfigPatches:
- |
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
nodeStatusUpdateFrequency: 1s
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
controllerManager:
extraArgs:
node-monitor-period: 1s
node-monitor-grace-period: 4s
- role: worker
deployment Nginx, control-plane worker. control-plane pod Ubuntu, Nginx, worker .
#!/bin/bash # create a K8S cluster with Kind kind create cluster --config kind.yaml # create a Ubuntu pod in control-plane Node kubectl run ubuntu --wait=true --image ubuntu --overrides='{"spec": { "nodeName": "kind-control-plane"}}' sleep 30d # untaint control-plane node in order to schedule pods on it kubectl taint node kind-control-plane node-role.kubernetes.io/master- # create Nginx deployment with 2 replicas, one on each node kubectl create deploy ng --image nginx sleep 30 kubectl scale deployment ng --replicas 2 # expose Nginx deployment so that is reachable on port 80 kubectl expose deploy ng --port 80 --type ClusterIP # install curl in Ubuntu pod kubectl exec ubuntu -- bash -c "apt update && apt install -y curl"
Nginx, curl pod Ubuntu, control-plane, endpoints, Nginx .
# test Nginx service access from Ubuntu pod kubectl exec ubuntu -- bash -c 'while true ; do echo "$(date +"%T.%3N") - Status: $(curl -s -o /dev/null -w "%{http_code}" -m 0.2 -i ng)" ; done' # show Nginx service endpoints while true; do gdate +"%T.%3N"; kubectl get endpoints ng -o json | jq '.subsets' | jq '.[] | .addresses' | jq '.[] | .nodeName'; echo "------";done
, , Kind, . , NotReady.
#!/bin/bash # kill Kind worker node echo "Worker down at $(gdate +"%T.%3N")" docker stop kind-worker > /dev/null sleep 15 # show when the node was detected to be down echo "Worker detected in down state by Control Plane at " kubectl get event --field-selector reason=NodeNotReady --sort-by='.lastTimestamp' -oyaml | grep time | tail -n1 # start worker node again docker start kind-worker > /dev/null
, 12:50:22, Controller manager , 12:50:26, 4 .
Worker down at 12:50:22.285 Worker detected in down state by Control Plane at time: "12:50:26Z"
. 12:50:23, . 12:50:26.744 Kube Proxy endpoint, , .
...
12:50:23.115 - Status: 200
12:50:23.141 - Status: 200
12:50:23.161 - Status: 200
12:50:23.190 - Status: 000
12:50:23.245 - Status: 200
12:50:23.269 - Status: 200
12:50:23.291 - Status: 000
12:50:23.503 - Status: 200
12:50:23.520 - Status: 000
12:50:23.738 - Status: 000
12:50:23.954 - Status: 000
12:50:24.166 - Status: 000
12:50:24.385 - Status: 200
12:50:24.407 - Status: 000
12:50:24.623 - Status: 000
12:50:24.839 - Status: 000
12:50:25.053 - Status: 000
12:50:25.276 - Status: 200
12:50:25.294 - Status: 000
12:50:25.509 - Status: 200
12:50:25.525 - Status: 200
12:50:25.541 - Status: 200
12:50:25.556 - Status: 200
12:50:25.575 - Status: 000
12:50:25.793 - Status: 200
12:50:25.809 - Status: 200
12:50:25.826 - Status: 200
12:50:25.847 - Status: 200
12:50:25.867 - Status: 200
12:50:25.890 - Status: 000
12:50:26.110 - Status: 000
12:50:26.325 - Status: 000
12:50:26.549 - Status: 000
12:50:26.604 - Status: 200
12:50:26.669 - Status: 000
12:50:27.108 - Status: 200
12:50:27.135 - Status: 200
12:50:27.162 - Status: 200
12:50:27.188 - Status: 200
...
...
------
12:50:26.523
"kind-control-plane"
"kind-worker"
------
12:50:26.618
"kind-control-plane"
"kind-worker"
------
12:50:26.744
"kind-control-plane"
------
12:50:26.878
"kind-control-plane"
------
...
, Kubernetes . , , Kubernetes , , etcd, 1 . , 1000 , 60000 , etcd etcd.
, , . , .