👩🏽‍💻 ⛓️ 👉🏼 How to Build a Hybrid Cloud with Kubernetes to Replace DBaaS 🍓 🆔 🗺️

My name is Petr Zaitsev, I am the CEO, founder of Percona and I want to tell you:

how we came from open source solutions to Database as a Service;
what are the approaches to deploying databases in the cloud;
How Kubernetes can replace DBaaS by removing vendor dependency and keeping the DBMS as a service simpler

The article is based on the report at @Databases Meetup by Mail.ru Cloud Solutions & Tarantool. If you don't want to read, you can see:

How from open source came to Database as a Service in the cloud

I've been doing open source since the late 90's. Twenty years ago, using open source such as databases was not easy. It was necessary to download the sources, patch, compile, and only then use.

Then open source went through a series of simplifications:

Tar.gz and INSTALL sources that needed to be compiled;
packages with dependencies like .deb and .rpm, where you only need to install a set of packages;
package repositories like APT and YUM, which are used for automatic installation;
solutions such as Docker and Snap, which allow you to get packages on installation without external dependencies.

As a result, it becomes easier to use open source software, and the barrier to entry into the development of such applications is reduced.

At the same time, unlike the situation 20 years ago, when everyone was an assembly expert, now most developers cannot assemble the tools used from source.

In fact, this is not bad because:

We can use more complex, but more convenient software. For example, a browser is convenient to use, but it includes many open source components, it is inconvenient to build it from scratch.
More people can become developers of open source and other software, more software is used by business, and the need for it is higher.

The downside is that the next step in simplification is associated with the use of cloud solutions, and this leads to a specific vendor lock-in, that is, binding to one vendor. We use simple solutions and providers use open source components, but in fact they are nailed to one of the large clouds. That is, the easiest and fastest way to deploy open source (and compatible software) is in the clouds, using a proprietary API.

When it comes to databases in the cloud, there are two approaches:

Build the database infrastructure as in a regular data center. That is, take the standard building blocks: compute, storage, and so on, put Linux on them, a database, configure.
Use Database as a Service, where the provider offers a ready-made database inside the cloud.

Now DBaaS is a rapidly growing market, because such a service allows developers to work with databases directly and minimizes routine work. The provider assumes the provision of High Availability (high availability) and easy scaling, database patching, backups, performance tuning.

Two types of Database as a Service based on open source and an alternative in the form of Kubernetes

There are two types of Database as a Service for open source databases:

A standard open source product packaged into an administration backend for easy deployment and management.
An advanced commercial solution with various add-ons, compatible with open source.

Both options reduce the ability to migrate between clouds and reduce the portability of data and applications. For example, despite the fact that different types of clouds support, in fact, the same standard MySQL, there are significant differences between them: in operation, performance, backup, and so on. Migrating from one cloud to another can be challenging, especially for complex applications.

And here the question arises - is it possible to get the convenience of Database as a Service, but as a simple open source solution?

The bad news is that, unfortunately, there are no such solutions on the market yet. The good news is there is Kubernetes that allows you to implement such solutions.

Kubernetes is a cloud or data center operating system that allows you to deploy and manage your application across multiple servers in a cluster, rather than a single host.

Kubernetes is now the leader in this category of software. There were many different solutions for such tasks, but it was he who became the standard. Many companies that previously dabbled in alternative solutions are now focusing on adapting their products to support Kubernetes.

In addition, Kubernetes is a universal solution that is supported in private, public and hybrid clouds of many vendors, for example: AWS, Google Cloud, Microsoft Azure, Mail.ru Cloud Solutions .

How Kubernetes works with databases

Kubernetes was originally designed for stateless applications that process data but do not store anything, such as microservices or web applications. Databases are on the other end of the spectrum, which means they are stateful applications. And Kubernetes was not originally designed for such applications.

However, there are features that have appeared in Kubernetes recently and allow the use of databases and other stateful applications:

The StatefulSet concept is a whole series of primitives for handling pod shutdown events and performing Graceful Shutdown (predictable application shutdown).
Persistent Volumes - data stores that are associated with pods, Kubernetes management objects.
Operator Framework - that is, the ability to create components for managing databases and other stateful applications distributed on many nodes.

There are already large Database as a Service in public clouds, in the backend of which Kubernetes, for example: CockroachCloud, InfluxDB, PlanetScale. That is, a database on Kubernetes is not only what is theoretically possible, but also what works in practice.

Percona has two open source Kubernetes solutions:

Kubernetes Operator for Percona Server for MongoDB.
Kubernetes Operator for XtraDB CLUSTER is a MySQL-compatible service that provides high availability and consistency. You can also use single node if high availability is not needed, for example for dev database.

Kubernetes users can be divided into two groups. Some people use Kubernetes Operators directly - these are mainly advanced users who have a good understanding of how the technology works. Others run it on the backend - such users are interested in something like Database as a Service, they do not want to delve into the nuances of Kubernetes. For the second group of users, we have another open source solution - Percona DBaaS CLI Tool. This is an experimental solution for those who want to get open source DBaaS based on Kubernetes without deep understanding of the technology.

How to run DBaaS from Percona on Google Kubernetes Engine

Google Kubernetes Engine, in my opinion, is one of the most functional implementations of Kubernetes technology. It is available in many regions of the world and has a simple and convenient Command Line Tool (SDK) that allows you to create scripts, rather than manually manipulating the platform.

In order for our DBaaS to work, we need the following components:

Kubectl.
Google Cloud SDK.
Percona DBaaS CLI.

Install kubectl

Installing the package for your operating system, we will look at the example of Ubuntu. More details here .

sudo apt-get update && sudo apt-get install -y apt-transport-https gnupg2
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubectl

Installing Google Cloud SDK

Install the software package in the same way. More details here .

# Add the Cloud SDK distribution URI as a package source
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] 
http://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list

# Import the Google Cloud Platform public key
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -

# Update the package list and install the Cloud SDK
sudo apt-get update && sudo apt-get install google-cloud-sdk

Install Percona DBaaS CLI

Install from the Percona repositories. The Percona DBaaS CLI Tool is still an experimental product, therefore it is in an experimental repository, which must be included separately, even if you already have the Percona repositories installed.

More details here .

Installation algorithm:

Configure Percona repositories using the percona-release tool. First you need to download and install the official percona-release package from Percona:
```
wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
sudo dpkg -i percona-release_latest.generic_all.deb
```
Enable the experimental tool repository component as follows:
```
sudo percona-release enable tools experimental
```

Install the percona-dbaas-cli package:

sudo apt-get update
sudo apt-get install percona-dbaas-cli

Setting up the components

Read more about the settings here .

First you need to log into your Google account. Further, Google Cloud allows one user to have many independent projects, so you need to specify a working project using the code of this project:

gcloud auth login
gcloud config set project hidden-brace-236921

Next, we create a cluster. For the demo, I created a Kubernetes cluster of only three nodes - this is the minimum required for high availability:

gcloud container clusters create --zone us-central1-a your-cluster-name --cluster-version 1.15 --num-nodes=3

The following kubectl command gives the required privileges to our current user:

kubectl create clusterrolebinding cluster-admin-binding-$USER 
--clusterrole=cluster-admin --user=$(gcloud config get-value core/account)

Then we create a namespace and make it active. Namespace is, roughly speaking, also like a project or environment, but already inside a Kubernetes cluster. It is independent of Google Cloud projects:

kubectl create namespace my-namespace
kubectl config set-context --current --namespace=my-namespace

We start the cluster

After we have gone through these few steps, we can start a cluster of three nodes with this simple command:

# percona-dbaas mysql create-db example
Starting ......................................... [done]
Database started successfully, connection details are below:
Provider:          k8s
Engine:            pxc
Resource Name:     example
Resource Endpoint: example-proxysql.my-namespace.pxc.svc.local
Port:              3306
User:              root
Pass:              Nt9YZquajW7nfVXTTrP
Status:            ready

How to connect to a cluster

By default, it is only available inside Kubernetes. That is, from this server from which you ran the Create command, it is not available. To make it available, for example, for tests with a client, you need to pass the port through Port Mapping:

kubectl port-forward svc/example-proxysql 3306:3306 $

Then we connect your MySQL client:

mysql -h 127.0.0.1 -P 3306 -uroot -pNt9YZquajW7nfVXTTrP

Advanced cluster management commands

Public IP Database

If you want a more permanent solution for cluster availability, you can get an external IP address. In this case, the database will be accessible from anywhere. It is less secure, but often more convenient. For external IP use the following command:

# percona-dbaas mysql create-db exposed 
--options="proxysql.serviceType=LoadBalancer"
Starting ......................................... [done]
Database started successfully, connection details are below:
Provider:          k8s
Engine:            pxc
Resource Name:     exposed
Resource Endpoint: 104.154.133.197
Port:              3306
User:              root
Pass:              k0QVxTr8EVfgyCLYse
Status:            ready

To access database please run the following command:
mysql -h 104.154.133.197 -P 3306 -uroot -pk0QVxTr8EVfgyCLYse

Explicitly set the password

Instead of the system randomly generating the password, you can explicitly set the password:

# percona-dbaas mysql create-db withpw --password=mypassword
Starting ......................................... [done]
Database started successfully, connection details are below:
Provider:          k8s
Engine:            pxc
Resource Name:     withpw
Resource Endpoint: withpw-proxysql.my-namespace.pxc.svc.local
Port:              3306
User:              root
Pass:              mypassword
Status:            ready

I am displaying script output in human readable format, but JSON format is also supported.

Turn off high availability

You can turn off high availability with the following command to expand the single node:

# percona-dbaas mysql create-db singlenode 
--options="proxysql.enabled=false, allowUnsafeConfigurations=true,pxc.size=1"
Starting ......................................... [done]
Database started successfully, connection details are below:
Provider:          k8s
Engine:            pxc
Resource Name:     singlenode
Resource Endpoint: singlenode-pxc.my-namespace.pxc.svc.local
Port:              3306
User:              root
Pass:              22VqFD96mvRnmPMGg
Status:            ready

This is a solution for test problems in order to quickly and easily raise MySQL, test it, and then roll it up or use it for development.

The Percona DBaaS CLI tool helps you get a DBaaS-like solution on Kubernetes. At the same time, we continue to work on its functionality and usability.

This talk was first presented at @Databases Meetup by Mail.ru Cloud Solutions & Tarantool. Watch videos of other speeches and subscribe to event announcements in Telegram Around Kubernetes in Mail.ru Group .

What else to read on the topic:

Databases in a modern IIoT platform.
How to choose a database for a project so as not to choose again.

How to Build a Hybrid Cloud with Kubernetes to Replace DBaaS