Distributed locks using Redis

Hello, Habr!



Today we bring to your attention a translation of a complex article on the implementation of distributed locks using Redis and propose to talk about the prospects of Redis as a topic. An analysis of the considered Redlock algorithm from Martin Kleppman, author of the book "High Load Applications ", is given here .





Distributed locks are a very useful primitive used in many environments where different processes must work on shared resources in a mutually exclusive manner.



There are a number of libraries and posts out there describing how to implement a DLM (Distributed Locking Manager) with Redis, but each library takes a different approach and the guarantees provided are weak compared to what is achievable with a little more complex design.



In this article, we will try to describe a conditionally canonical algorithm demonstrating how to implement distributed locks with Redis. We will talk about an algorithm called Redlock, it implements a distributed locking manager and, in our opinion, this algorithm is safer than the conventional single-instance approach. We hope that the community will analyze it, give feedback and use it as a starting point for the implementation of more complex or alternative projects.



Implementations



Before proceeding to the description of the algorithm, here are a few links to ready-made implementations. They can be used for reference.







Security and Availability Guarantees



We are going to simulate our design with just three properties that we believe provide the minimum guarantees required to effectively use distributed locks.



  1. Security property: Mutual exclusion. Only one client can hold a lock at a time.
  2. Accessibility Property A: No deadlocks. In the end, you can always get a lock, even if the client that locked the resource fails or ends up in another disk segment.
  3. Accessibility Property B: Fault Tolerance. While most of the Redis nodes are running, clients are able to acquire and release locks.




Why a failover-based implementation is not enough in this case

To understand what we are going to improve, let's analyze the current state of affairs with most of the Redis-based distributed locking libraries.



The easiest way to lock a resource using Redis is to create a key on an instance. Usually a key is created with a limited lifetime, this is achieved using the expires feature provided in Redis, so sooner or later this key is released (property 2 in our list). When the client needs to free the resource, it removes the key.



At first glance, this solution works well, but there is a problem: there is a single point of failure in our architecture. What happens if the master Redis instance fails? Let's add a follower then! And we will use it if the host is not available. Unfortunately, this option is not viable. By doing this, we will not be able to correctly implement the mutual exclusion property that we need for security, because replication in Redis is asynchronous.



Obviously, a race condition occurs in such a model:

  1. Client A acquires a lock on the master.
  2. The master fails before the write to the key is transferred to the slave.
  3. The follower is promoted to the leader.
  4. Client B acquires a lock on the same resource that is already locked by A. SECURITY VIOLATION!




Sometimes it is perfectly normal that in special circumstances, such as a failure, multiple clients can hold a lock at the same time. In such cases, you can apply a replication-based solution. Otherwise, we recommend the solution described in this article.



Correct Single-Instance Implementation



Before we try to overcome the shortcomings of the single-instance configuration described above, let's figure out how to handle this simple case, since this is actually acceptable in applications where race conditions are occasionally acceptable, and also because locking on a single instance serves as the basis for the distributed algorithm described here.



To acquire a lock, let's do this:



SET resource_name my_random_value NX PX 30000



This command installs a key only if it does not already exist (option NX), with an expiration date of 30,000 milliseconds (option PX). The key is set to β€œ myrandomvalue”. This value must be unique across all clients and across all lock requests.

Basically, the random value is used to safely release the lock, with a script telling Redis to only remove the key if it exists and the value stored in it is exactly what was expected. This is accomplished with the following Lua script:



if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end




This is important in order to prevent the release of a lock by another client. For example, a client might acquire a lock, then block in an operation that lasts longer than the first lock (so that the key has time to expire), and later remove the lock that some other client has placed.

It is unsafe to use a simple DEL because a client might remove a lock held by another client. On the contrary, when using the above script, each lock is β€œsigned” with a random string, so only the client who previously placed it can remove it.



What should this random string be? I suppose it should be 20 bytes from / dev / urandom, but you can find less expensive ways to make the string unique enough for the purpose you have in mind. For example, it would be ok to seed RC4 with / dev / urandom and then generate a pseudo-random stream based on it. A simpler solution involves combining microsecond unix time plus client ID; it is not quite as secure, but is perhaps at the level of the challenge in most contexts.



The time we use as a measure of the key lifetime is called the "lock expiration time". This value is both the time after which the lock will be automatically released and the time that the client has to complete the operation before another client can in turn lock the resource without actually violating the mutual exclusion guarantees. Such a guarantee is limited only to a certain window of time, which begins from the moment of acquiring the lock.



So, we've discussed a good way to acquire and release a lock. The system (if we are talking about an unallocated system consisting of a single and always available instance) is secure. Let's expand this concept to a distributed system in which we don't have such guarantees.



Redlock Algorithm



The distributed version of the algorithm assumes that we have N leading Redis. These nodes are completely independent from each other, so we don't use replication or any other implicit coordination system. We've already covered how to securely acquire and release locks on a single instance. We take it for granted that the algorithm will use this method when working with a single instance. In our examples, we set N to 5, which is a perfectly reasonable value. Thus, we will need to run 5 Redis masters on different computers or virtual machines to ensure that they operate largely independently of each other.



To acquire a lock, the client performs the following operations:



  1. Gets the current time in milliseconds.
  2. N , . 2, , , , , , . , 10 , ~ 5-50 . , , Redis: , .
  3. , , ; , 1. , ( 3), , , , , , .
  4. , , 3.
  5. - ( N/2+1 , ), ( , , , ).




Is the algorithm asynchronous?



This algorithm is based on the assumption that, although there is no synchronized clock on which all processes would work, the local time in each process still flows at approximately the same rate, and the error is small compared to the total time after which the lock is automatically released. This assumption is very similar to the situation typical for ordinary computers: each computer has a local clock, and usually we can count on the fact that the time difference on different computers is small.



At this stage, we should be more careful in formulating our rule of mutual exclusion: mutual exclusion is guaranteed only if the client holding the lock completes within the time during which the lock is valid (this value was obtained in step 3), minus some more time (total several milliseconds to compensate for the time difference between processes).



The following interesting article tells more about such systems that require a timing gap: Leases: an efficient fault-tolerant mechanism for distributed file cache consistency .



Retry on failure



When the client fails to acquire the lock, it should try to do so again, with a random delay; this is done to out-sync multiple clients simultaneously trying to acquire a lock on the same resource (which can lead to a split-brain situation in which there are no winners). In addition, the faster a client tries to acquire a lock on most Redis instances, the narrower the window in which a split-brain situation can occur (and the less retrying is required). Therefore, ideally, the client should try to send SET commands to N instances at the same time using multiplexing.



It is worth emphasizing here how important it is for clients who have not been able to acquire most of the locks to release (partially) acquired locks so that they do not have to wait for the key expiration date before the lock on the resource can be acquired again (albeit if network fragmentation occurs and the client loses contact with the Redis instances, then you have to pay an access violation penalty while the key expires).



Releasing a Lock



Releasing a lock is a simple operation that simply requires unlocking all instances, regardless of whether the customer thinks they were able to successfully lock a particular instance.



Safety considerations



Is the algorithm secure? Let's try to imagine what happens in different scenarios.



First, let's assume that the client was able to acquire the lock on most of the instances. Each of the instances will contain a key with the same lifetime for everyone. However, each of these keys was installed at its own moment, so they will expire at different times. But, if the first key was installed at a time no worse than T1 (the time that we choose before contacting the first server), and the last key was installed at a time no worse than T2 (the time at which a response was received from the last server), then we are sure that the first key in the set that expires will last at leastMIN_VALIDITY=TTL-(T2-T1)-CLOCK_DRIFT... All other keys will expire later, so we can be sure that all keys will be valid simultaneously for at least this time.



During the time when most keys remain valid, another client will not be able to acquire the lock, since N / 2 + 1 SET NX operations cannot succeed if N / 2 + 1 keys already exist. Therefore, if the lock was acquired, then it is impossible to re-acquire it at the same moment (this would violate the property of mutual exclusion).

However, we want to make sure that many clients simultaneously trying to acquire a lock cannot succeed at the same time.



If the client has locked most of the instances, spending about or more than the maximum lock duration for this time, it will invalidate the lock and unblock the instances. Therefore, we only have to consider the case in which the customer was able to block most of the instances in less than the expiration date. In this case, with respect to the above argument, MIN_VALIDITYno client should be able to re-acquire the lock in time. Therefore, multiple clients will be able to lock N / 2 + 1 instances in the same amount of time (which ends at the end of stage 2) only when the time to lock the majority was greater than the TTL time, which invalidates the lock.



Can you provide a formal proof of security, point out existing similar algorithms, or find a bug in the above? Availability



Considerations



System availability depends on three main characteristics:



  1. Automatic release of locks (as keys expire): Eventually, keys will be available again to be used for locks.
  2. The fact that clients usually help each other by removing locks when the desired lock was not acquired, or was acquired, and the work is completed; therefore, it is likely that we do not have to wait for the keys to expire to reacquire the lock.
  3. The fact that when a customer needs to re-try to acquire a lock, it waits for a relatively longer time than the period it takes to acquire most locks. This reduces the likelihood of a split brain situation when competing for resources.




However, you have to pay a penalty for reduced availability equal to the TTL time in the network segments, so if there are contiguous segments, then this penalty can become of an undefined size. This happens whenever a client acquires a lock and then clamps to another segment before it can release it.



In principle, given infinite contiguous network segments, the system can remain unavailable for an infinite period of time.



Performance, failover, and fsync



Many people use Redis because they need to provide high performance of the lock server, at the level of latency required to acquire and release locks, as well as the number of such acquisition / release operations that can be performed per second. To meet this requirement, there is a communication strategy with N Redis servers to reduce latency. This is a multiplexing strategy (or poor man's multiplexing, which puts the socket in non-blocking mode, sends all commands, and reads the commands later, assuming the round-trip time between the client and each instance is similar).



True, there is also a long-term data storage consideration to consider if we want to create a model with reliable recovery after failures.



Basically, to clarify the problem, let's assume that we are configuring Redis without long-term data storage at all. The client manages to block 3 out of 5 instances. One of the instances that the client managed to block is restarted, and at this moment 3 instances appear again for the same resource, which we can block, and the other client can, in turn, block the restarted instance, violating the security property that implies exclusivity of locks.



If you enable data preservation (AOF), the situation will improve slightly. For example, you can promote the server by sending the SHUTDOWN command and then restarting it. Since the expiration operations in Redis are semantically implemented in such a way that time continues to flow even when the server is turned off, everything is fine with all our requirements. Normal as long as normal shutdown is assured. What to do in case of power outages? If Redis is configured by default, with fsync synchronizing on disk every second, then it is possible that after restarting we will miss our key. In theory, if we want to guarantee the safety of locks on any restart of the instance, we must enablefsync=alwaysin the settings for long-term data storage. This will completely kill performance, to the level of CP systems traditionally used to securely implement distributed locks.



But the situation is better than meets the eye. In principle, the algorithm remains secure because when an instance is restarted after a failure, it no longer participates in any currently active lock.



To ensure this, we just need to ensure that after a failure, the instance remains unavailable for a period of time slightly exceeding the maximum TTL we are using. So we will wait for the expiration and automatic release of all keys that were active at the time of the failure.



By using deferred restarts, it is in principle possible to achieve security without any long-term persistence in Redis. Note, however, that this can result in an accessibility penalty. For example, if most of the instances fail, the system will become globally unavailable for the TTL time (and no resource can be blocked at this time).



Increasing the availability of the algorithm: extending the lock



If the work done by clients consists of small steps, it is possible to shorten the default lock expiration time and implement a lock prolongation mechanism. Basically, if the client is busy with calculations and the lock expiration value is dangerously low, you can send all instances a script in Lua to extend the TTL of the key, if the key still exists, and its value is still a random one obtained when the lock was acquired.



A customer should consider a lock as reacquired only if it has successfully locked most of the instances within the validity period.



However, technically the algorithm does not change in this case, so the maximum number of repeated attempts to acquire locks should be limited, otherwise accessibility properties will be violated.



All Articles