At some point, we at okmeter.io realized that we also need k8s in production, although we don't even have a CI / CD, but there is a task to share a common server pool between applications and it is quite easy to add capacities to the cluster. At the same time, there were a number of circumstances that complicated the implementation of k8s:
- we care very much about fault tolerance (we do not bring new technologies into prod until we understand them at a sufficient level);
- we have services with response times less than 10ms;
- ( 10 , 50 — ).
: 2018 , issue .
. Okmeter.io. . , . , . .
?
, , , , – .
Kubernetes.
.
Okmeter.io. :
- , , , auto discovery.
- , .
:
- , , . . , . Python Go.
- - , . Kafka, Cassandra, Elasticsearch, PostgreSQL.
- . , , latency. .
- DevOps , CI/CD, pipeline. , .
- . , ( ), . , . read , . . , , .
- , , , , – , 2 , , , .
- .
, .
Google App Engine. . , . .
, , . , latency . .
, - . . Cassandra, Elasticsearch, Go Python.
, Elasticsearch.
, Elasticsearch , , CPU. Python.
. , Go stateless, , -, , , CPU - .
, . , , .
?
- Ansible , , , , server -> roles . .
- , . Ansible , , . , , , .
- , Ansible, , , . – playbooks . , , playbooks production. , , . , .
, , instance, ?
- inventory. , - - .
- . , , . , . .
, , , .
- , Kubernetes , . , .
- , , . . , , , , , , .
- , . . Request + Limit – , .
Kubernetes. . . . , , . , , , instances. , OOM killer, , . , , .
, , , . .
, health checks , . , , .
Ansible . Ansible? , , . , . Ansible, , .
Kubernetes, , . - . , , over kill.
, ? :
- — , , .
- – Ansible , , Kubernetes .
Kubernetes , Ansible? Kubernetes apply, , , . . -, , . , – , .
, ? - ? , . , pod. , , Ansible. , , .
? service discovery. , . nginx, upstream’. . . . . , , service discovery. . Kubernetes.
, . , , . - , . DNS, ETCD. , , . .
, , , , . , readiness/ liveness-. , , curl, , Kubernetes .
? , pull . , , . pod, IP, . pull . , . , , .
, , pod . , , . graceful shutdown . . , .
. , RequestID, tracing, , , . , pod – , .
– . Kubernetes, , , L2 , , .
, , ? , 20 , . ? . bgp. , bgp. bgp 10 ?
Kubernetes , service discovery iptables , . . daemon, iptables. , . , , , , .
. 20 , . iptables , .
, , . IP pod’ IP . , . SR-IOV. , 128 . switch . . , , .
, . . , -, , , . - . flannel host-gw. 24- . , , -. , .
iptables kube-proxy, , iptables Kubernetes. Google , . . headless services .
?
- , K8s .
- , , CI/CD.
- . . . , ? , .
- production .
, :
- , K8s. docker , , Python. Go – . , docker . .
- docker . Ansible docker: « , , ».
- , . .
. , - etcd, ConfigMap, . , . , reconfig. , .
Helm . , .
, Helm. update/ rollback pod’, immutable ConfigMap, . , , , , , rolling update, , . , . . , production , . ConfigMap, ConfigMag.
, , – .
– , . - . pod, , immutable. , . , Helm.
. Go- , YAML . , - , -. YAML, . – .
Python Django . Settings.py – . settings . .
, , , K8s stateful . , : Cassandra, Kafka. -, , .
, , Ansible. Ansible , - K8s . ?
Kubernetes – Ansible playbook. . . playbook, , , K8s.
production - - .
. request/ limit.
, , . CPU . . , pod’ . , - , CPU.
, . , , - , OOM Killer . , 100 , .
– deployments - . , , selector , pod deployments. OOM Killer , . . deployment , . , .
- - . , – . K8s back-off. . .
, , , …, back-off .
, rollout, . . , . , , . . . . , .
, , , iptables headless. ? selector, pod’, . pod’ readiness probe, . endpoint, , . . endpoint – pod’.
IP . IP, pod.
IP. , . , iptables , upstream, .
DNS-.
? , pod. readiness probe kubelet . . , kubelet’ apiserver.
kube-proxy . kube-proxy . , .
?
- Probes . , , .
- . 10 000 rps, , .
- . Kubelet -> apiserver -> kube-proxy-> iptables. , .
- , kubelet apiserver? kube-proxy iptables? , .
- , iptables pod, . , , . retries.
, headless service – , .
. etcd, apiserver, DNS.
envoy, L7, retry. http, retry , . , , application level . , ? Envoy.
. envoy. DNS. K8s DNS, endpoints. , . , .
envoy DeamonSet, , sidecar container. ? - .
envoy, -, , , pod’ . . . rolling-update. - , , pod .
. . . envoy nginx -t
. . , : «, ». . , pod’ .
, sidecar . envoy , .
. envoy. , resolve , , 3 DNS. , envoy resolve. . . . , health check, retry.
, service mesh . service mesh, . . , .
- , GitHub , . envoy. , istio , . .
ingress-. IP, K8s-. K8s, , .
DaemonSet envoy, . DaemonSet – . IP , DaemonSet, IP - . . 3 - 4, 5, 10 upstream .
DaemonSet rolling .
ingress controller, , nginx - ingress K8s. , . . , , envoy, DaemonSet, ingress controller. . DaemonSet, . ingress.
, . .
Kubespray – , , 20 K8s-.
, , , , . , , .
playbook .
?
etcd c apiserver’. . full mesh.
, Kubelet …, . , , . . pod , . , .
CoreDNS. deployment. deployment, iptables, .
iptables, DNS deployment DaemonSet. , , . DNS , -, .
3 + N . , , . , Ansible. , .
Stateful- , , . .
, Kafka 4 10 . Kubelet , . , overbooked .
, , flannel . pod’. . 1/0. pod’ . . . . . , .
, , . , , , egress, , . flannel NAT. pod’ .
, . NAT.
- . .
- , . , .
- - . . , , . , - , . 3 , , , , .
- , Kubernetes, , . , , .
Kubernetes .
Kubernetes .
, . !
:
, ! follower, , , , Kubernetes, - ? Kubernetes ?
, , Kubernetes . , . , . . , .
, , , , , , , , . , service discovery . , , , , .
! Stateful- , . HA ? HAProxy K8s?
Cassandra . , , . . endpoints IP , .
Postgres?
Postgres . Postgres’ , . . , . , , . Postgres - . , , -, . Postgres , .
, ! ! latency. , , latency . , ?
. . . iptables , . iptables, . , , .
20 . - ?
flannel .
, , Open vSwitch ?
I am just afraid of these words. We made monitoring of K8s, and we had to make a test bench on virtual machines in order to make a demo of monitoring. There, by default, Kubespray deployed Calico. It works, but I don't understand how it works. We did not run any benchmarks there, I don’t know how it works. And how it will break, I do not know. I know how flannel will break, I'm ready for it. And how all the other 19 plugins will break, I don't know.
This question is interesting in terms of data protection in order to separate a secure network, unsecured one.
We are blessed in this regard, we have nothing like that.
You are very lucky.
So tnank you!