🙋 🌑 🤒 7 things to work out before launching OpenShift in production 👳🏿 👩‍👦 🤜🏽

The explosive growth in container use in businesses is impressive. Containers have perfectly matched the expectations and needs of those looking to reduce costs, expand their technical capabilities and move forward on the agile and devops journey. The container revolution opens up new opportunities for those who are late in updating their IT systems. Containers and Kubernetes are a completely and fundamentally new way to manage applications and IT infrastructure.

Unlike the previous and equally revolutionary transition from bare metal to virtual machines, containers dramatically reduce the redundancy of the software stack and change the very nature of operating systems management in the enterprise.

Many are choosing to accelerate their migration to containers with the Red Hat OpenShift Container Platform, the industry-leading Kubernetes platform for the enterprise sector. This solution automatically takes on many of the first day tasks and offers the best Kubernetes ecosystem on a single, rigorously tested and highly secure platform. This is the most comprehensive and functional solution for enterprises, which contains everything you need to get started and removes a lot of technical barriers and complexities when building a Kubernetes platform.

However, OpenShift is not a magic wand that solves all problems on its own. Yes, thanks to its capabilities, this platform is able to bring - and does to its customers - a lot of benefits and quickly pays off, provided that at the time of its launch you have a well thought out plan. To be successful, there are seven areas that must be carefully considered before moving any workloads to OpenShift.

1. Standardization of naming rules and metadata

In computer science, there are only two difficult things: invalidating the cache and naming entities.

- Phil Karlton

Every entity in OpenShift and Kubernetes has its own name. And each service must have its own DNS name, the only limitation here is the DNS naming rules. Now imagine that a monolithic application has decomposed into 100,500 separate microservices, each with its own database. And yes, in OpenShift everything is either hierarchical, related, or must follow a pattern. So the mass and mass of everything will have to be named. And if you do not prepare standards in advance, it will turn out to be a real Wild West.

Have you already planned out the service implementation scheme? Let's say it will be one big namespace, for example, "databases", in which everyone will host their databases. OK, and let's even assume that everyone does, but then they start to host their Kafka clusters in their own namespaces. Yes, but do you need to create a namespace "middleware"? Or is it better to call it "messaging"? And as usual, at some point there are guys who always go their own way and consider themselves special, and say that they need their own namespaces. And listen, we have 17 departments in the organization, maybe we need to attach our standard department prefixes to all namespaces?

Consider naming and collation standards before putting anything into production - you can save a lot of time and effort if you do this ahead of time. Set standards for everything. Moreover, it is not so much their quality that is important here, but their availability, integrity and implementation.

Another mega-useful thing is metadata . Standardize what assets you want to track and make sure the right metadata is written on the right resources. Start with Recommended Labels... For example, annotating "support_email" in the namespace metadata can save you valuable time when contacting tier 2 support in the event of a major failure. In addition, metadata can be used to shorten resource names to a meaningful length, rather than hyphenating all the necessary information there. Engage everyone from application architects to IT operators, brainstorm and figure out what it might take to have solid standards when OpenShift launches.

2. Standardization of corporate base images

One of the key features of containers is the ability to mix and match all the pieces of the software stack. You can, of course, take your favorite kind of OS and build everything on it, but by acting in this way, the organization is missing out on huge opportunities. After all, what is really cool about container images? Layering. You can remove a lot of critical tasks from developers and solve them by standardizing images.

Take a basic java application for example. Your developers are unlikely to go wrong with OpenJDK, but with the management of vulnerabilities, updating libraries and other issues of IT hygiene - they can. We all know that business problems are often solved at the expense of technical compromises, such as deliberately using older versions of Java. Fortunately, these tasks are easily automated and managed at the enterprise level. You can still use the vendor base images , but at the same time set and control your update cycles by creating your own base images.

Going back to the example above, let's say developers want Java 11 and you want them to always use the latest version of Java 11. Then you create a corporate base image (registry.yourcompany.io/java11) using as a starting point base image from the OS vendor (registry.redhat.io/ubi8/openjdk-11). And when this base image is updated, you automatically help developers to implement the latest updates. In addition, this provides an abstraction layer that allows you to seamlessly extend the standard image with the required libraries or Linux packages.

3. Standardization of health and readiness checks

Serviceability monitoring, it is needed almost everywhere. It is believed that an annual medical examination is sufficient for a person. The health of applications should be checked, of course, much more often, and two key things should be monitored:

Whether the application is running (health check).
Whether the application is ready (readiness check).

There are tons of other metrics to make it easier to monitor applications, but these two are the foundation of not only monitoring, but also scaling. Health is typically determined by the availability of network connectivity and the ability of the host running the application to respond to a request. As far as readiness is concerned, here already each application must respond to requests according to its own standards. For example, launching an application with very low latency may result in a lengthy cache refresh or JVM warm-up . And accordingly, the pause between the responses "Started" and "Ready" can reach several minutes. But, for example, for a stateless REST API with a relational database, these responses will come simultaneously.

The most important thing in these checks is not to deviate from purely binary logic. Launched means launched, without any "kind of launched" there. Done means ready, and there are no gradations, like "ready to answer such requests, but no such requests." The principle is simple: all or nothing.

The second aspect of such checks is standardization. How to check readiness? If you don't have standards, then even such a simple question can be a real monitoring nightmare. Just compare how the Quarkus standards and the Spring Boot standards have diverged . But nobody wanted that, but standards are always the case. The only difference is that your organization now has the power to develop and enforce standards.

Note in the margin. Don't invent your own standards. Just find and use a ready-made one .

4. Standardization of logs

Continuing the topic of monitoring, we note that the combination of inexpensive storages and big data solutions has spawned a new monster at enterprises - total logging. Previously, these were unstructured and archaic console logs that did not live long and were created from time to time. Now they strive to log everything and build data science with machine learning in order to optimize operations and monitoring in the most revolutionary way. Alas, we must admit the obvious: any attempts to start collecting logs of hundreds of applications, without having absolutely any standards and without even thinking about them, invariably lead to pointless and exorbitant spending on tools for managing logs and data transformation just to get started work. That is, even before you understandthat the messages "Transition completed" or "This block triggered" are unlikely to have anything to do with your operations.

The structure needs to be standardized. Again, the integrity of the standards is more important than their correctness. There should be ways to write a separate log parser for each application that is in the enterprise. Yes, these will be purely piece, not replicated things. Yes, you will have a bunch of exceptions that cannot be controlled, especially for boxed applications. But do not throw the baby out with the water, pay attention to the details: for example, the timestamp in each log must meet the corresponding ISO standard ; the output itself must be in UTC with an accuracy of the 5th decimal place in microseconds (2018-11-07T00: 25: 00.07387Z). Log levels should be issued with CAPS and there should be elements TRACE, DEBUG, INFO, WARN, ERROR. In general, set the structure, and only then go into the details.

Structural standardization will force everyone to adhere to the same rules and use the same architectural patterns. This is true for both application and platform logs. And don't deviate from the ready-made solution unless absolutely necessary. The EFK stack (Elasticsearch, Fluentd and Kibana) of the OpenShift platform should be able to handle all of your scripts. After all, he entered the platform for a reason, and when updating it, this is another thing that you do not need to worry about.

5. Switching to GitOps

One of the great things about OpenShift is that everything here - literally: everything - is ultimately either configuration or code, which means it can be controlled through a version control system. This allows you to revolutionize delivery methods and get rid of bureaucracy when launching into production.

In particular, the traditional ticket-based scheme can be completely replaced with a model with git pull requests.... Let's say the application owner wants to adjust the resources allocated to the application after implementing new functions in it, for example, to increase the memory from 8 GB to 16 GB. In the traditional scheme, a developer needs to create a ticket and wait for someone else to complete the corresponding task. This someone else is most often an IT operator who only introduces a tangible delay in the process of implementing changes, without increasing the value of this process, or worse, adding unnecessary additional cycles to this process. Indeed, the operator has two options. First, he reviews the application and decides to execute it, for which he enters the production environment, makes the requested changes manually, and restarts the application.

In addition to the time to carry out the work itself, there is an additional delay here, since the operator, as a rule, always has a whole queue of requests for execution. In addition, there is a risk of human error, such as 160GB input instead of 16GB. The second option: the operator casts doubt on the application and thereby launches a chain reaction to find out the reasons and consequences of the requested changes, so much so that sometimes the authorities have to intervene.

Now let's see how this is done in GitOps. The change request goes to the git repository and becomes a pull request. After that, the developer can submit this pull request (especially if these are changes in the production environment) for approval by the involved parties. This way, security professionals can get involved at an early stage, and it is always possible to trace the sequence of changes. Standards in this area can be implemented programmatically using appropriate tools in the CI / CD tool chain. Once approved, the pull request is versioned and easily audited. In addition, it can be tested in a pre-production environment as part of a standard process , completely eliminating the risk of human error.

As you can see, the changes are radical. But they will be new not so much for developers who are no stranger to version control systems, as for system administrators and security specialists. But as soon as they delve into the new paradigm and appreciate its strength and simplicity, the idea will go off with a bang.

6. Blueprints

The move from monolithic applications to microservices enhances the role of application design patterns (patterns). Indeed, the typical monolithic application is not very classifiable. Typically, there are REST APIs, batch processing, and event driven. HTTP, FTP, kafka, JMS and Infinispan ? Please, but it also works with three different databases at the same time. And how do you order to create a scheme when a whole bunch of enterprise application integration patterns are mixed here? No way.

But if you decompose such a monolithic application into separate parts, then templates are distinguished much easier and easier. Let's say there are now four separate applications, and they use the following templates:

REST API for managing data in a DBMS.
Batch processing that will check the FTP server for updates and send it to the kafka topic.
Camel is an adapter that takes data from this kafka topic and sends it to the REST API
REST APIs that expose aggregated information collected from the Data Grid that acts like a state machine.

So now we have schematics, and schemas can be standardized. REST APIs must conform to Open API standards . Batch jobs will be managed as OpenShift batch jobs . Integrations will use Camel... You can create schemas for APIs, for batch jobs, for AI / ML, for multicast applications, or whatever. And then you can determine how to deploy these schemes, how to configure them, which templates to use. With these standards in place, you don't have to reinvent the wheel every time, and you can better focus on the really important tasks, like creating new business functionality. Working out the schemes may seem like a waste of time, but the effort spent will return a hundredfold in the future.

7. Prepare for the API

APIs come along with microservices architecture. They will also have to be managed and better prepared for this in advance.

First, standards are needed here again. You can take the Open API standards as a starting point , but you have to delve deeper into the jungle. Although it is important to strike a balance here and not fall into excessive over-regulation with a bunch of restrictions. Look at these questions: When a new entity is created using POST, should we return 201 or 200? is it allowed to update entities using POST and not PUT? What's the difference between 400 and 500 responses? - about the same level of detail you need.

Second, you need a service mesh... This is a really strong thing and over time it will become an integral part of Kubernetes. Why? Because traffic will sooner or later turn into a problem, and you will want to manage it both inside the data center (the so-called “east-west” traffic) and between the data center and the outside world (“north- south"). You will want to pull authentication and authorization out of applications and bring them to the platform level. You will need Kiali's capabilities to visualize traffic inside the service mesh, as well as blue-green and canary application deployment schemes, or, for example, dynamic traffic control. In general, the service mesh falls into the category of day one tasks without question.

Third, you will need a centralized API management solution ... You will want to have a “one window” for finding and reusing APIs. Developers will need to be able to go to the API Store, find the API they want, and get documentation on how to use it. You will want to consistently manage versions and deprecations. If you're building an API for external consumers, it can be a north-south endpoint for security and load management. 3Scale can even help with API monetization. Well, sooner or later your management will want to receive a report answering the question "What API do we have?"

In conclusion, we note that while identifying areas for standardization and documenting corporate standards can be daunting in itself, the lion's share of the effort is not spent on this, but on monitoring and enforcing compliance with the standards. Powerful blend of organizational entropyand quite a natural reluctance to conflict with colleagues from the very beginning work against standards. The fight splits into countless tiny and sometimes invisible battles: the required label is missing here, and this name, although not completely, still meets the standard sufficiently. Standards usually die a death from a thousand cuts, and few, if any, in the organization know about it. In a sense, standards are like exercise: no one wants to sweat and strain, but everyone knows that a long and healthy life is impossible without them.

However, there is hope, and it lies in automation. Any of the above standards can be implemented using automation. The GitOps process can verify that all the required labels and annotations are present in all relevant yaml files. The CI / CD process can enforce standards for corporate images. Everything can be codified, tested, and harmonized. In addition, automation can be improved when you introduce new standards or change existing ones. The undoubted advantage of standardization through automation is that the computer does not avoid conflicts, but simply states the facts. Hence, with enough sophistication and investment in automation, the platform you are investing so much in today iscan bring a much greater return on investment in the future in the form of improved performance and stability.

7 things to work out before launching OpenShift in production