3 years with Kubernetes in production: here's what we got

Approx. transl. : in another article from the category "lessons learned", the DevOps engineer of the Australian company shares the main conclusions from the long-term use of Kubernetes in production for loaded services. The author covers Java, CI / CD, networking, and the complexity of K8s in general.



We started creating our first Kubernetes cluster in 2017 (from version K8s 1.9.4). We had two clusters. One worked on bare metal, on RHEL virtual machines, the other on the AWS EC2 cloud.



Today, our infrastructure has over 400 virtual machines scattered across several data centers. The platform serves as the basis for highly available mission-critical applications and systems that drive a huge network of nearly 4 million active devices.



Ultimately, Kubernetes made our lives easier, but the path to this was thorny and required a complete paradigm change. There has been a total transformation not only of the set of skills and tools, but also of the approach to design and thinking. We had to master many new technologies and invest heavily in infrastructure development and team development.



Here are the key lessons we've learned from using Kubernetes in production over three years.



1. An entertaining story with Java applications



When it comes to microservices and containerization, engineers tend to shy away from Java, primarily because of its notoriously imperfect memory management. However, today the situation is different and Java's compatibility with containers has improved in recent years. After all, even popular systems like Apache Kafka and Elasticsearch run in Java.



In 2017-2018, some of our applications ran in Java version 8. They often refused to function in containerized environments like Docker and crashed due to problems with heap memory and inadequate garbage collectors. As it turned out, these problems were caused by the inability JVM run Linux containerization mechanisms ( cgroupsand namespaces).



Since then, Oracle has made significant efforts to improve Java's compatibility with the container world. As early as version 8 of Java, experimental JVM flags appeared to address these issues: XX:+UnlockExperimentalVMOptionsand XX:+UseCGroupMemoryLimitForHeap.



But despite all the improvements, no one would argue that Java still has a bad reputation for being overly memory intensive and slow to start compared to Python. or Go. This is primarily due to the specifics of memory management in the JVM and ClassLoader.



Today, if we have to work with Java, we at least try to use version 11 or higher. And our memory limits in Kubernetes are 1 GB higher than the maximum heap memory limit in the JVM (-Xmx) (just in case). That is, if the JVM uses 8 GB for heap memory, the Kubernetes memory limit for the application will be set to 9 GB. Thanks to these measures and improvements, life has become a little easier.



2. Updates related to the Kubernetes lifecycle



Kubernetes lifecycle management (updates, additions) is a cumbersome and difficult thing, especially if the cluster is based on bare metal or virtual machines . It turned out that to upgrade to a new version, it is much easier to raise a new cluster and then transfer workloads to it. Upgrading existing sites is simply not feasible as it involves significant effort and careful planning.



This is because Kubernetes has too many "moving" parts to consider when upgrading. In order for the cluster to work, you have to collect all these components together - from Docker to CNI plugins like Calico or Flannel. Projects like Kubespray, KubeOne, kops, and kube-aws simplify the process somewhat, but they are not without drawbacks.



We deployed our clusters in RHEL virtual machines using Kubespray. He has proven himself excellent. Kubespray had scripts for creating, adding or removing nodes, updating a version, and just about everything you need to work with Kubernetes in production. That said, the upgrade script was accompanied by the caveat that even minor versions should not be skipped. In other words, to get to the desired version, the user had to install all the intermediate ones.



The main takeaway here is that if you plan to use or are already using Kubernetes, think through your K8s lifecycle steps and how it fits into your solution. It is often easier to create and run a cluster than to keep it up to date.



3. Build and deploy



Be prepared for the fact that you will have to revise the build and deployment pipelines. With the transition to Kubernetes, we have undergone a radical transformation of these processes. We not only restructured Jenkins pipelines, but with the help of tools such as Helm, we developed new strategies for building and working with Git, tagging Docker images, and versioning Helm charts.



You will need a single strategy to maintain your code, Kubernetes deployment files, Dockerfiles, Docker images, Helm charts, and a way to tie it all together.



After several iterations, we settled on the following diagram:



  • The application code and its Helm charts are located in different repositories. This allows us to version them independently of each other ( semantic versioning ).
  • , , . , , app-1.2.0 charts-1.1.0. (values) Helm, patch- (, 1.1.0 1.1.1). (RELEASE.txt) .
  • , Apache Kafka Redis ( ), . , Docker- Helm-. Docker- , .


(. .: Open Source- Kubernetes — werf — , .)



4. Liveness Readiness ( )



Kubernetes liveness and readiness checks are great for autonomously dealing with system problems. They can restart containers on failures and redirect traffic from "unhealthy" instances. But in some circumstances, these checks can turn into a double-edged sword and affect application startup and recovery (this is especially true for stateful applications such as messaging platforms or databases).



Our Kafka became their victim. We had a stateful set of 3 Broker's and 3 Zookeeper's with replicationFactor= 3 andminInSyncReplica= 2. The problem occurred when restarting Kafka after random crashes or crashes. At startup, Kafka ran additional scripts to fix corrupted indexes, which took 10 to 30 minutes, depending on the severity of the problem. This delay caused liveness tests to continually fail, causing Kubernetes to "kill" and restart Kafka. As a result, Kafka could not only fix the indexes, but even start.



The only solution at that time was to adjust the parameter initialDelaySecondsin the liveness test settings so that the checks were carried out only after the container was launched. The main challenge, of course, is deciding which delay to set. Individual starts after a failure can take up to an hour, and this must be taken into account. On the other hand, the moreinitialDelaySeconds, the slower Kubernetes will respond to failures during container startup.



In this case, the sweet spot is the value initialDelaySecondsthat best suits your resiliency requirements while still giving the application enough time to successfully launch in all failure situations (disk failures, network problems, system crashes, etc.)



Update : in recent versions of Kubernetes, a third type of test has appeared called the startup probe. It is available as an alpha version since the 1.16 release , and as a beta version since 1.18.



The Startup Probe solves the above problem by disabling readiness and liveness checks until the container starts up, thereby allowing the application to start normally.


5. Working with external IP



As it turns out, using static external IPs to access services puts significant pressure on the kernel's connection tracking mechanism. If you don't think it over carefully, it can "break".



In our cluster, we use Calicoboth CNI and BGPas a routing protocol, as well as to interact with border routers. The Kube-proxy mode is enabled iptables. We open access to our very busy service in Kubernetes (it processes millions of connections every day) through an external IP. Because of the SNAT and masking that comes from software-defined networking, Kubernetes needs a mechanism to keep track of all these logical flows. For this K8s uses these core tools as onntrackandnetfilter... With their help, it manages external connections to a static IP, which is then converted to the internal IP of the service and finally to the IP address of the pod. And all this is done using a table conntrackand iptables.



However, the possibilities of the table are not conntrackunlimited. When the limit is reached, the Kubernetes cluster (more precisely, the OS kernel at its core) will no longer be able to accept new connections. In RHEL, this limit can be checked as follows:



$  sysctl net.netfilter.nf_conntrack_count net.netfilter.nf_conntrack_maxnet.netfilter.nf_conntrack_count = 167012
net.netfilter.nf_conntrack_max = 262144


One way to get around this limitation is to combine multiple nodes with edge routers so that incoming connections to a static IP are distributed across the entire cluster. If you have a large fleet of machines in your cluster, this approach can significantly increase the size of the table conntrackto handle a very large number of incoming connections.



This completely confused us when we first started in 2017. However, relatively recently (in April 2019) the Calico project published a detailed study under the apt title " Why conntrack is no longer your friend " (there is such a translation of it into Russian - approx. Transl.) .



Do you really need Kubernetes?



Three years have passed, but we still continue to discover / learn something new every day. Kubernetes is a complex platform with its own set of challenges, especially in the area of ​​starting the environment and keeping it running. It will change your thinking, architecture, attitude towards design. You will have to deal with scaling up and upgrading teams.



On the other hand, working in the cloud and the ability to use Kubernetes as a service will save you most of the worries associated with maintaining the platform (like extending the CIDR of the internal network and updating Kubernetes).



Today we have come to understand that the main question to ask ourselves is reallydo you need Kubernetes? It will help you assess how global the problem is and whether Kubernetes will help you to cope with it.



The thing is, moving to Kubernetes is expensive. So the upsides of your use case (and how much and how it leverages the platform) should justify the price you pay. If so, Kubernetes can significantly improve your productivity.



Remember that technology for technology's sake is meaningless.



PS from translator



Read also on our blog:






All Articles