Google recognized the complexity of Kubernetes, so it developed the "Autopilot" mode.

New GKE mode is more expensive and less flexible, but easier and safer





GKE Autopilot Manages Pods for You



Two things are well known about Kubernetes clusters. The first is that it is absolutely the best tool for the mission-critical task of container orchestration. And second, its complexity is a barrier to implementation and a common cause of errors. Even Google, the inventor and main promoter of Kubernetes, admits this.



To simplify the deployment and management of clusters, the company has provided all GKE customers with access to the Autopilot service , which Google has long used in its own Borg clusters . It is automatic resource configuration based on machine learning.



“Despite 6 years of progress, Kubernetes is still incredibly complex, Drew Bradstock, head of product for Google Kubernetes Engine (GKE), told The Register. "In recent years, we've seen many companies adopt Kubernetes, but then run into difficulties."



GKE is a Kubernetes platform that runs primarily on the Google Cloud Platform (GCP). It is also available on other clouds or locally as part of Anthos .



Autopilot - new operating modeGKE, it is more automated and pre-configured to reduce operating costs for cluster management, optimize clusters for production and high availability.





Using Autopilot in Google's own infrastructure, source



Kubernetes has the concepts of clusters (a collection of physical or virtual servers), nodes (individual servers), pods (a control block representing one or more containers on a node), and containers themselves. GKE is fully managed at the cluster level. Autopilot extends this to nodes and pods.



The easiest way to understand the features and limitations of Autopilot is from the system description.... Note the "pre-configured" parameters that cannot be changed.



Comparison of Autopilot and Standard modes


Basically, this is another way of booking and managing GKE resources that sacrifices flexibility for convenience. Since Google manages most of the configuration, it guarantees a higher uptime of 99.9% for Autopilot pods with multiple zones (see SLA ).



In the Google cloud, regions are made up of three or more zones. Placing all resources in one zone is less reliable than in several zones, and the maximum fault tolerance is provided by expanding into several regions. Clusters on Autopilot are always distributed by regions, not zones: it is more reliable, but more expensive.



Another limitation of Autopilot is the pre-installed Linux operating system with Containerd, "optimized for containers". There is no way to use Linux with Docker or Windows Server. The maximum number of pods per node is 32, not 110 as on the standard GKE.



No SSH access to nodes, Autopilot nodes are blocked. GPU and TPU (Tensor Processing Unit) support is not available, although planned for the future. “Ditching SSH was a tough decision,” says Bradstock. Of course, this limits control options. But Bradstock said the decision was based on research that showed a high rate of critical errors in cluster configuration.



Money



The pricing model is different here too. You are not charged for compute instances (virtual machines), but for the actual use of CPU, memory and storage by all pods. Plus $ 0.10 per hour for each cluster on Autopilot, just like standard GKE.



The obvious question is which will be more expensive, a standard cluster or Autopilot. The answer is not easy. Since this is in some way a premium service, Autopilot is more expensive than a carefully optimized standard GKE deployment. "There is a premium over a regular GKE," Bradstock said, "because we provide not only functionality, but full SRE (Site Reliability Engineering) support and SLA guarantees."



However, Autopilot can be cheaper than an incorrectly configured GKE deployment that is not fully loaded because it is difficult to evaluate the correct specification for compute instances. Cumulative allocation function (CDF) of unused memory and occupied machines for 5000 tasks after turning on Autopilot in Google's own infrastructure, source Reducing memory errors (OOM) and unused memory share for 500 tasks after turning on Autopilot in Google infrastructure, source















Why not just use Cloud Run, which runs container workloads without any cluster, node or pod configuration, even on GKE? “Cloud Run is a great environment for developers, one application can go from zero to 1000 instances and back down to zero, that's why clouds are created,” explains Bradstock. "Autopilot makes life easier for people who want to use Kubernetes, want to see and control everything, want to use third-party scripts, want to build their own platform."



A specific issue is compatibility with existing add-ons with third-party tools for Kubernetes. Some of them are not yet compatible with Autopilot, but others are already working, such as Datadog monitoring. DaemonSets are also supported - many tools use this feature to run daemons on all nodes.



The configuration for storage, compute and networking has forced some level of flexibility and some integrations to drop: “But we definitely want a third-party ecosystem to run on [Autopilot],” Bradstock says.



With the launch of Autopilot, the range of options for how to run Kubernetes in the Google cloud expands. The trade-off is not only higher cost and less flexibility, but the potential for disorienting devops in factories. However, the main logic is that businesses are better off focusing on their core business rather than on the services that are performed by the contractor.



Google engineering has a much better reputation than customer service. Developer Kevin Lin recently described what the AWS and Google startup bonus scheme looks like .



Google proved to be a slow and ineffective organization that ended up referring the client to a third-party partner. “The first conversation was all about how much money I plan to spend on Google (as opposed to calling Amazon where they wanted to help me get the service up and running). Google Cloud has really good ergonomics and world-class engineers, but a terrible reputation for customer service, ”he said.



This is further proof that good engineers are not the only important factor in choosing a cloud.



All Articles