👩🏿‍🤝‍👩🏻 🤸🏻 🧔🏻 Distributed transactions in the context of microservice architecture 💍 👲🏼 👩🏻‍🤝‍👨🏾

Hello. Already in September, OTUS opens a set for a new group of the course "Highload Architect" . In this regard, I continue a series of my publications written specifically for this course, and also invite you to my free webinar, in which I will tell you in detail about the course program and the training format in OTUS. You can sign up for the webinar here .

Introduction

As you know, the transition from a monolith to a microservice architecture causes a number of difficulties associated with both the technical part of the project and the human factor. One of the most difficult technical challenges is ensuring consistency in a distributed system.

Consistency

A rather subtle point is that consistency in the context of distributed systems is different from consistency in the context of databases. Further, by consistency, we will mean exactly the first: an incomplete (erroneous) operation does not introduce any effects and does not change the data; with concurrent access to data, all operations are considered as atomic (you cannot see the intermediate result of the operation) if the data has multiple copies (replication) , then the sequence of applying operations on all copies is the same. That is, in fact, we want to receive an ACID transaction, but only a distributed one.

The cause of the problem

Why is it difficult to maintain consistency in a microservice architecture? The fact is that this architectural style often involves the use of the database per service pattern. Let me remind you that this pattern consists in the fact that each microservice has its own independent base or bases (bases, because in addition to the primary data source, for example, a cache can be used). This approach allows, on the one hand, not to add implicit data format links between microservices (microservices interact only explicitly through the API), on the other hand, to make the most of such an advantage of the microservice architecture as technology agnostic (we can choose the data storage technology that is suitable for the particular load on the microservice ). But with all this, we lost the guarantee of data consistency. Judge for yourselfthe monolith communicated with one large database, which provided the ability to provide ACID transactions. Now there are many databases, and instead of one big ACID - transaction, we have many small ACID - transactions. Our task will be to combine all these transactions into onedistributed .

Optimistic consistency

The first thing that comes to mind is the concept of optimistic consistency: we commit as many transactions as we want for as many storage engines as needed. At the same time, we expect that everything will be fine, and if everything is bad, then we say that everything will be fine in the end. If everything is bad in the end, then we say: "Yes, this happens, but with an extremely low probability."

Joking aside, neglecting consistency when it is not business critical is a good idea, especially considering how much effort it will cost us to maintain it (which I hope you will see later).

Consistency options

If consistency is critical to the business, there are several ways to try to achieve it. If we are talking about a situation where data is updated by one service (for example, database replication takes place), then standard consistency algorithms such as Paxos or Raft can be applied. Such transactions are called homogeneous . If the data is updated by several services (that is, a heterogeneous transaction takes place ), then how does the complexity begin, which we talked about above.

On the one hand, we can still bypass the need to provide a distributed transaction by striving for a service-based architecture (we combine services in such a way that the transaction is homogeneous). Such a solution is not very canonical from the point of view of the principles of microservice architecture, but it is technically much simpler, which is why it is often used in practice. On the other hand, we can leave the canonical microservices, but at the same time apply one of the mechanisms for ensuring distributed transactions: a two-phase commit or a saga. This article will explore the first option and discuss the second next time.

Two-phase commit

The mechanism is extremely simple: there is some transaction manager that actually orchestrates the transaction. At the first stage (prepare), the transaction manager issues the appropriate command for resource managers, according to which they write data to their logs that will be committed. After receiving confirmation from all resource managers about the successful completion of the first stage, the transaction manager starts the second stage and issues the next command (commit), according to which the resource managers apply the previously accepted changes.

Despite its apparent simplicity, this approach has a number of disadvantages. First, if at least one resource manager fails in the second phase, the entire transaction must be rolled back. Thus, one of the principles of microservice architecture is violated - resistance to failures (when we came to a distributed system, we immediately assumed that failure in it is the norm and not an exceptional situation). Moreover, if there are a lot of failures (and there will be many of them), then the process of canceling transactions will need to be automated (including writing transactions that rollback transactions). Second, the transaction manager itself is a single point of failure. He must be able to transactionally issue id-shniks to transactions. Third, since special commands are given to the repository, it is logical to assume that the repository should be able to do this,that is, comply with the XA standard, and not all modern technologies comply with it (brokers such as Kafka, RabbitMQ and NoSQL solutions such as MongoDB and Cassandra do not support two-phase commits).

The conclusion that suggests itself from all of these factors was beautifully articulated by Chris Richardson: "2PC not an option" (two-phase commit is not an option).

Output

We figured out why distributed transactions are the main technical pain of a microservice architecture and talked about various options for solving this problem, discussed in detail the two-phase commit mechanism.

I invite everyone to sign up for my webinar about the course , in which I will tell you in detail about the training format and introduce everyone to the training program.

Sort by selection
Locks as one of the ways to ensure transaction isolation
MVCC as one way to ensure transaction isolation
What you need to know about hashing collections
Why would you need semi-synchronous replication?
Sort by inserts

Distributed transactions in the context of microservice architecture