Kubernetes for those over 30. Nikolay Sivko (2018)



At some point, we at okmeter.io realized that we also need k8s in production, although we don't even have a CI / CD, but there is a task to share a common server pool between applications and it is quite easy to add capacities to the cluster. At the same time, there were a number of circumstances that complicated the implementation of k8s:



  • we care very much about fault tolerance (we do not bring new technologies into prod until we understand them at a sufficient level);
  • we have services with response times less than 10ms;
  • ( 10 , 50 — ).




: 2018 , issue .





. Okmeter.io. . , . , . .





?



, , , , – .



Kubernetes.





.



Okmeter.io. :



  • , , , auto discovery.
  • , .


:



  • , , . . , . Python Go.
  • - , . Kafka, Cassandra, Elasticsearch, PostgreSQL.
  • . , , latency. .
  • DevOps , CI/CD, pipeline. , .
  • . , ( ), . , . read , . . , , .
  • , , , , – , 2 , , , .
  • .




, .



Google App Engine. . , . .



, , . , latency . .



, - . . Cassandra, Elasticsearch, Go Python.





, Elasticsearch.



, Elasticsearch , , CPU. Python.





. , Go stateless, , -, , , CPU - .



, . , , .





?



  • Ansible , , , , server -> roles . .
  • , . Ansible , , . , , , .
  • , Ansible, , , . – playbooks . , , playbooks production. , , . , .




, , instance, ?



  • inventory. , - - .
  • . , , . , . .




, , , .



  • , Kubernetes , . , .
  • , , . . , , , , , , .
  • , . . Request + Limit – , .




Kubernetes. . . . , , . , , , instances. , OOM killer, , . , , .



, , , . .



, health checks , . , , .





Ansible . Ansible? , , . , . Ansible, , .



Kubernetes, , . - . , , over kill.





, ? :



  • — , , .
  • – Ansible , , Kubernetes .




Kubernetes , Ansible? Kubernetes apply, , , . . -, , . , – , .



, ? - ? , . , pod. , , Ansible. , , .





? service discovery. , . nginx, upstream’. . . . . , , service discovery. . Kubernetes.



, . , , . - , . DNS, ETCD. , , . .





, , , , . , readiness/ liveness-. , , curl, , Kubernetes .



? , pull . , , . pod, IP, . pull . , . , , .



, , pod . , , . graceful shutdown . . , .



. , RequestID, tracing, , , . , pod – , .





– . Kubernetes, , , L2 , , .





, , ? , 20 , . ? . bgp. , bgp. bgp 10 ?



Kubernetes , service discovery iptables , . . daemon, iptables. , . , , , , .





. 20 , . iptables , .



, , . IP pod’ IP . , . SR-IOV. , 128 . switch . . , , .



, . . , -, , , . - . flannel host-gw. 24- . , , -. , .





iptables kube-proxy, , iptables Kubernetes. Google , . . headless services .





?



  • , K8s .
  • , , CI/CD.
  • . . . , ? , .
  • production .




, :



  • , K8s. docker , , Python. Go – . , docker . .
  • docker . Ansible docker: « , , ».
  • , . .




. , - etcd, ConfigMap, . , . , reconfig. , .



Helm . , .





, Helm. update/ rollback pod’, immutable ConfigMap, . , , , , , rolling update, , . , . . , production , . ConfigMap, ConfigMag.





, , – .



– , . - . pod, , immutable. , . , Helm.





. Go- , YAML . , - , -. YAML, . – .





Python Django . Settings.py – . settings . .





, , , K8s stateful . , : Cassandra, Kafka. -, , .



, , Ansible. Ansible , - K8s . ?





Kubernetes – Ansible playbook. . . playbook, , , K8s.





production - - .



. request/ limit.



, , . CPU . . , pod’ . , - , CPU.





, . , , - , OOM Killer . , 100 , .





– deployments - . , , selector , pod deployments. OOM Killer , . . deployment , . , .





- - . , – . K8s back-off. . .





, , , …, back-off .



, rollout, . . , . , , . . . . , .





, , , iptables headless. ? selector, pod’, . pod’ readiness probe, . endpoint, , . . endpoint – pod’.



IP . IP, pod.



IP. , . , iptables , upstream, .



DNS-.





? , pod. readiness probe kubelet . . , kubelet’ apiserver.



kube-proxy . kube-proxy . , .





?



  • Probes . , , .
  • . 10 000 rps, , .
  • . Kubelet -> apiserver -> kube-proxy-> iptables. , .
  • , kubelet apiserver? kube-proxy iptables? , .
  • , iptables pod, . , , . retries.




, headless service – , .



. etcd, apiserver, DNS.





envoy, L7, retry. http, retry , . , , application level . , ? Envoy.





. envoy. DNS. K8s DNS, endpoints. , . , .



envoy DeamonSet, , sidecar container. ? - .



envoy, -, , , pod’ . . . rolling-update. - , , pod .





. . . envoy nginx -t. . , : «, ». . , pod’ .



, sidecar . envoy , .





. envoy. , resolve , , 3 DNS. , envoy resolve. . . . , health check, retry.





, service mesh . service mesh, . . , .



- , GitHub , . envoy. , istio , . .





ingress-. IP, K8s-. K8s, , .



DaemonSet envoy, . DaemonSet – . IP , DaemonSet, IP - . . 3 - 4, 5, 10 upstream .



DaemonSet rolling .





ingress controller, , nginx - ingress K8s. , . . , , envoy, DaemonSet, ingress controller. . DaemonSet, . ingress.





, . .



Kubespray – , , 20 K8s-.



, , , , . , , .



playbook .





?



etcd c apiserver’. . full mesh.





, Kubelet …, . , , . . pod , . , .





CoreDNS. deployment. deployment, iptables, .



iptables, DNS deployment DaemonSet. , , . DNS , -, .





3 + N . , , . , Ansible. , .



Stateful- , , . .



, Kafka 4 10 . Kubelet , . , overbooked .





, , flannel . pod’. . 1/0. pod’ . . . . . , .





, , . , , , egress, , . flannel NAT. pod’ .



, . NAT.





  • . .
  • , . , .
  • - . . , , . , - , . 3 , , , , .
  • , Kubernetes, , . , , .




Kubernetes .





Kubernetes .



, . !



:



, ! follower, , , , Kubernetes, - ? Kubernetes ?



, , Kubernetes . , . , . . , .



, , , , , , , , . , service discovery . , , , , .



! Stateful- , . HA ? HAProxy K8s?



Cassandra . , , . . endpoints IP , .



Postgres?



Postgres . Postgres’ , . . , . , , . Postgres - . , , -, . Postgres , .



, ! ! latency. , , latency . , ?



. . . iptables , . iptables, . , , .



20 . - ?



flannel .



, , Open vSwitch ?



I am just afraid of these words. We made monitoring of K8s, and we had to make a test bench on virtual machines in order to make a demo of monitoring. There, by default, Kubespray deployed Calico. It works, but I don't understand how it works. We did not run any benchmarks there, I don’t know how it works. And how it will break, I do not know. I know how flannel will break, I'm ready for it. And how all the other 19 plugins will break, I don't know.



This question is interesting in terms of data protection in order to separate a secure network, unsecured one.



We are blessed in this regard, we have nothing like that.



You are very lucky.



So tnank you!




All Articles