Distributed transaction processing in a microservice architecture

Hello, Habr!



Today we bring to your attention a small material on microservices and distributed architecture. In particular, it touches on Martin Fowler's idea that a new system should start with a monolith, and even in a developed microservice architecture it is advisable to leave a large monolithic core.



Enjoy reading!



Today, everyone thinks and writes about microservices, and I am no exception. Based on the basic principles of microservices and their true context, it is clear that microservices are a distributed system.



What is a distributed transaction?



Transactions that span multiple physical systems or computers on a network are referred to simply as distributed transactions. In the world of microservices, a transaction is split across multiple services that are called in a sequence to complete the entire transaction.



Here is a monolithic online store system that uses transactions:







Fig. 1: Transaction in a monolith



If in the above system, a user sends an order request (Checkout) to the platform, then the platform creates a local transaction in the database, and this transaction spans many database tables to process (Process) the order and reserve(Reserve) goods from the warehouse. If any of these steps fails, then the transaction can be rolled back , which means the refusal of both the order itself and the reserved goods. This set of principles is called ACID (Atomicity, Consistency, Isolation, Durability) and is guaranteed at the database system level.



Here is a decomposition of an online store system built from microservices:







Figure 2: Transactions in a microservice



After decomposing this system, we created microservices OrderMicroserviceandInventoryMicroservicewith separate databases. When a request for an order (Checkout) comes from a user, both of these microservices are called, and each of them makes changes to its database. Since a transaction is now spread across multiple databases across multiple systems, it is considered distributed .



What is the problem when committing distributed transactions in microservices?



With the introduction of microservice architecture, databases are losing their ACID nature. Due to the possible proliferation of transactions between many microservices and, therefore, databases, one has to deal with the following key problems:



How to maintain transaction atomicity?



Atomicity means that in any transaction, either all steps can be completed, or none. If the above example fails to complete the 'order items' operation in the method InventoryMicroservice, then how to roll back the changes in 'order processing' that were applied OrderMicroservice?



How do I handle competitive requests?



Let's say an object from any of the microservices enters the database for long-term storage, and at the same time another request reads the same object. What data should the service return - old or new? In the above example, when it OrderMicroservicehas already completed the work and InventoryMicroserviceis in the process of updating, do you need to include the current order in the number of requests for orders placed by the user?



Modern systems are designed with potential failures in mind, and one of the main problems in distributed transaction processing is well articulated by Pat Helland.

As a rule, developers simply do not make large scalable applications that would involve working with distributed transactions.



Possible solutions



The above two issues are very critical in the context of designing and building microservices-based applications. To solve them, the following two approaches are used:



  • Two-phase fixation
  • Ultimate Consistency and Compensation / SAGA


1. Two-phase fixation



As the name implies, this method of processing transactions involves two stages: a preparation phase and a commit phase. An important role in this case is played by the transaction coordinator, organizing the life cycle of the transaction.



How it works



At the preparatory stage, all microservices participating in the work prepare for commit and notify the coordinator that they are ready to complete the transaction. Then, in the next step, either a commit occurs, or the transaction coordinator instructs all microservices to roll back.



Consider again an online store system as an example:







Figure 3: Successful two-phase commit in a microservice system



In the example above (Figure 3), when a user submits an order request, the coordinator TransactionCoordinatorfirst starts a global transaction with complete context information. First, it sends the prepare command to the microservice OrderMicroserviceto create the order. Then it sends the prepare command toInventoryMicroserviceto reserve items. When both services are ready to make changes, they block objects from further changes and notify about it TransactionCoordinator. Once it TransactionCoordinatorconfirms that all microservices are ready to apply their changes, it will order these microservices to save them by requesting a commit of the transaction. At this point, all objects will be unlocked.





Figure 4: Failed two-phase commit when working with microservices



In a failure scenario (Figure 4) - if at any moment a single microservice does not have time to prepare, TransactionCoordinatorcancel the transaction and begin the rollback process. On the diagram, OrderMicroservicefor some reason, I could not create an order, but InventoryMicroserviceresponded that I was ready to create an order. The coordinator TransactionCoordinatorwill request a cancellation atInventoryMicroservice, after which the service will roll back all the changes made and unlock the database objects.



Benefits



  • This approach guarantees the atomicity of the transaction. The transaction will complete either when both microservices succeed, or when the microservices do not make any changes.
  • Second, this approach allows you to isolate read from write, since changes to objects are not visible until the transaction coordinator commits these changes.
  • This approach is a synchronous call in which the client will be notified of success or failure.


disadvantages



  • Nothing is perfect; two-phase commits are quite slow compared to single microservice operations. They are highly dependent on the coordinator. transactions, which can significantly slow down the system during periods of high load.
  • Another major drawback is database row locking. Locking can become a performance bottleneck, and deadlocks can occur , where two transactions lock each other tightly.


2. Ultimate Consistency and Compensation / SAGA



One of the best definitions of consistency is ultimately given at microservices.io: each service publishes an event whenever its data is updated. Other services subscribe to events. When an event is received, the service updates its data .



With this approach, a distributed transaction is executed as a collection of asynchronous local transactions on the corresponding microservices. Microservices exchange information via the event bus.



How it works



Again, let's take an example of a system running in an online store:







Figure 5: Ultimate Consistency / SAGA, Success



In the example above (Figure 5), the customer requires the system to process the order. This request Choreographerraises the Create Order event, which starts the transaction. Microservice OrderMicroservicelistens for this event and creates an order - if this operation is successful, then it raises the Order Created event. The coordinator Choreographerlistens for this event and proceeds to order items, raising the Reserve Items event. MicroserviceInventoryMicroservicelistens to this event and orders goods; if this event is successful, then it raises the Items Reserved event. In this example, this means that the transaction has ended.



All event-based communication between microservices happens through the event bus, and another system is responsible for its organization (choreography) - this is how the problem is solved with unnecessary complexity.







Figure 6: Ultimate Consistency / SAGA, Failed Outcome



If, for some reason, the InventoryMicroserviceitems were not reserved (Figure 6), it raises the Failed to Reserve Items event. The coordinator Choreographerlistens for this event and starts the offsetting transaction, raising the Delete Order event. MicroserviceOrderMicroservice listens for this event and deletes the previously created order.



Benefits



A major advantage of this approach is that each microservice focuses only on its own atomic transaction. Microservices are not blocked if another service takes a relatively long time to run. This also means that you do not need to lock the database either. Using this approach, it is possible to ensure good scalability of the system when working under high load, since the proposed solution is asynchronous and based on working with events.



disadvantages



The main disadvantage of this approach is that it does not provide read isolation. Thus, in the above example, the customer will see that the order has been created, but after a second the order will be deleted during the offsetting transaction. In addition, as the number of microservices increases, they become more difficult to debug and maintain.



Conclusion



The first alternative to the proposed approach is to abandon distributed transactions altogether. If you are building a new application, start with a monolithic architecture, as described in MonolithFirst by Martin Fowler. I will quote him.

, , . , , . —
If you need to update data in two places at once as a result of a single event, then the ultimately consistency / SAGA approach is preferable for processing distributed transactions over the two-phase approach. The main reason is that the two-phase approach in a distributed environment does not scale. Using consistency also ultimately raises its own set of problems, such as how to atomically update the database and fire an event. Moving on to such a philosophy of development, it is necessary to change its perception both from the point of view of the developer and from the point of view of the tester.



All Articles