Today I want to talk about a tool that translates the approval and issuance of virtual machines into full self-service, preserving the logic of quotas and adding the ability to predict resource utilization.
As a former infrastructure engineer, I know what it becomes to use multiple private or public clouds in a large team. There are usually two ways. Or the process of resource allocation is too bureaucratic - teams begin to wait for virtual machines for a week and keep them "alive" as long as possible, so as not to repeat this path. Or real chaos ensues, when no one knows which team and how many resources are consuming and what hundreds and thousands of dollars are spent monthly on the same AWS. One way to simplify this situation is to move developers to self-service in the cloud using our partners' product - CloudMaster.
What is the problem
When a company employs hundreds of developers and DevOps, each team has its own project and needs for experiments, you cannot rely on the responsibility of each individual for the capacity allocated by the company. To understand the magnitude of the problem, here are the stats from one CloudMaster customer who uses public clouds and their own virtualization on OpenStack.
By developing several hundred projects in parallel, the total resources of AWS, Azure and GCP are consumed by the client for half a million dollars a month. And it creates and deletes about 350 thousand virtual machines per year.
If the process is allowed to take its course on such a scale, real anarchy will ensue. Hundreds of virtual machines will freeze unused. Without understanding who and why launched a particular VM, it will be extremely difficult to figure out whether it is needed for work and whether it will be required in the future. This entails unnecessary expenses for renting cloud resources, and it is basically impossible to conduct any analysis of what is happening or to predict the load in these conditions.
A logical way to avoid this is to prescribe a resource negotiation business process. Of course, this complicates the developer's path to obtaining a virtual machine: you need to fill out an application, send it to the person in charge. But, if you collect the reports correctly, the manager will have a complete picture: who, when and what capacity requested. With a certain delay, he will even be able to analyze the needs of teams in virtual capacity. But this is not a panacea either. With a sufficient volume of requests, the teams with their projects "hang" waiting for the responsible person to look and evaluate the next application. And the larger the company, the longer this wait.
In this case, all created virtual machines do not have an "expiration date", i.e. then someone will need to control that all allocated resources were turned off at the stated time and this did not affect the projects.
Self-service through cloud management platforms (CMPs)
Solutions of the Cloud Management Platform class help to bring order to several used clouds, private or public. I want to talk about the Russian alternative from our partners - the CloudMaster platform, focused on connecting to Azure, AWS and Google Cloud, as well as private regions under vCloud Director, vSphere and OpenStack.
From a developer's point of view, CloudMaster is a self-service portal where, through a single interface and without bureaucracy (through the UI in the browser, mobile application and console commands (Python scripts)), you can get resources in the corporate cloud or data center. And for infrastructure, this is an additional layer of abstraction between cloud platforms and end users, which preserves selective sharing of resources, security policies, standard configurations, and other necessary tools like machine images and Terraform templates.
The bulk of CloudMaster is Java based and based on the server-side Spring frameworks and Dagger in the Android application.
Architecturally, CloudMaster is tailored to work with large teams and significant volumes of sent messages: RabbitMQ is used for queuing, MonogoDB is used for data storage, and Nginx is used for balancing.
The tool has been developed since 2012, and since 2014 it has been used by a large software developer.
CloudMaster logic
From a developer perspective, CloudMaster is a one-stop shop for quickly launching virtual machines across all available clouds and regions. The tool allows you not to wait for approvals, but to get a resource here and now.
Registration on the portal is enough to access this “single window”. And if CloudMaster is integrated with corporate AD, the roles of employees in the company and in the project will be loaded into this tool, automatically determining the available projects and resources.
Virtual machine launch window
If you have the appropriate rights, one command can launch up to 10 virtual machines of typical "forms". In CloudMaster terms, these are standard configurations that are mapped to the typical resource offerings of each cloud (and customized for the task).
Common "templates" for different clouds
You can create images from existing machines, use ready-made templates "infrastructure as code" or upload your own (Terraform and CloudFormation).
Templates
In this case, VMs can be created indefinitely or work according to a specified schedule. This gives a certain freedom in the use of resources. For example, a company might allow developers to use the corporate cloud for personal experimentation and comparison, but only for one day. This is, by the way, the client of this platform doing custom development. All virtual machines created in this way are deleted by themselves within a specified period.
From a manager's point of view, the most useful thing about CloudMaster is that it counts every running virtual machine. For them, there are sections with complete information about VMs in the selected cloud / region, including those created according to specific templates, with billing from cloud providers, metrics on resource consumption by individual projects, where unused or underutilized capacities can be identified.
List of resources
List of virtual machines
In addition to displaying information in the interface, CloudMaster generates about 60 types of notifications, including those related to finance.
Incoming notifications
AND the text of one of the notifications
The logic of the service is such that each VM has an owner - the person who created this virtual machine, or the one to whom these functions were transferred. The owner receives all notifications about the utilization of resources or changes in the state of the VM, and is also responsible for the costs. In this sense, CloudMaster helps instill a culture of controlling capacity utilization and taking responsibility for abandoned zombie machines.
Restrictions on creating new VMs are governed by access rights and quotas. And here it is possible to customize any workflows, up to admitting client representatives to the cloud. You can prescribe quotas for teams and provide for various actions when they are reached or approaching a certain threshold value (say, 70% of the quota).
Billing from cloud providers
Quota management window
For private clouds (OpenStack and VMware), CloudMaster supports a kind of exchange - an estimate of the cost of running virtual machines, with which you can choose a more profitable resource utilization scheme. Colleagues say that in the future such a feature may appear for public clouds.
In this system, the role of an infrastructure engineer is closest to me, so I left it for last. For DevOps, this is, of course, a new tool, but on the other hand, it becomes possible to control what happens to cloud resources using only it. Popular configuration, monitoring, and development tools like Chef and Ansible can be deployed faster and easier.
A Java SDK is available for administrators and developers if needed.
Most importantly, CloudMaster, like other CMPs, allows you to move from manual routine resource allocation to more interesting tasks: developing automation based on infrastructure as code, etc.
In my experience, the appearance of such tools is justified if the company employs at least fifty virtual machines and there are at least fifteen active users of different clouds. On the one hand, this is a certain complication of the infrastructure, but on the other, it is bringing heterogeneous clouds, each of which has its own management tools, to a common denominator with a guarantee that this does not violate internal corporate standards. At the same time, the tool “lowers” the responsibility for resource utilization and budget planning to the level of project managers, which is ideologically more correct.