前景提要
在 Distributed locks with Redis – Redis 中,首先它描述了如何正确地使用单实例实现分布式锁,然后它介绍了分布式版本的算法。但是对于分布式版本,我有许多疑问。
首先,那篇文章说 In the distributed version of the algorithm we assume we have N Redis masters. Those nodes are totally independent, so we don’t use replication or any other implicit coordination system. We already described how to acquire and release the lock safely in a single instance. We take for granted that the algorithm will use this method to acquire and release the lock in a single instance. In our examples we set N=5, which is a reasonable value, so we need to run 5 Redis masters on different computers or virtual machines in order to ensure that they’ll fail in a mostly independent way. In order to acquire the lock, the client performs the following operations: It gets the current time in milliseconds. It tries to acquire the lock in all the N instances sequentially, using the same key name and random value in all the instances. During step 2, when setting the lock in each instance, the client uses a timeout which is small compared to the total lock auto-release time in order to acquire it. For example if the auto-release time is 10 seconds, the timeout could be in the ~ 5-50 milliseconds range. This prevents the client from remaining blocked for a long time trying to talk with a Redis node which is down: if an instance is not available, we should try to talk with the next instance ASAP. The client computes how much time elapsed in order to acquire the lock, by subtracting from the current time the timestamp obtained in step 1. If and only if the client was able to acquire the lock in the majority of the instances (at least 3), and the total time elapsed to acquire the lock is less than lock validity time, the lock is considered to be acquired. If the lock was acquired, its validity time is considered to be the initial validity time minus the time elapsed, as computed in step 3. If the client failed to acquire the lock for some reason (either it was not able to lock N/2+1 instances or the validity time is negative), it will try to unlock all the instances (even the instances it believed it was not able to lock).
我的疑问 为什么要顺序地尝试获取所有实例里的锁呢?同时尝试获取会存在什么问题呢? Redlock 算法所说的 auto-release time 是类似于 Distributed locks with Redis – Redis - Correct implementation with a single instance 中所说的 SET resource_name my_random_value NX PX {ttl} 中的 ttl 吗?也就是我下面所说的 TTL,是吗? 在第二步时,会尝试在所有实例中获取锁,它所做的行为跟单实例所做的行为是一样的,也就是 SET resource_name my_random_value NX PX {ttl} ,那么 ttl 是怎么计算出来的呢?我认为不同实例的 ttl 是不同的,因为尝试获取在不同的实例里的锁的时间是不一样的。因为要确保“如果所有实例的同一个 key 都在同一时间被删除”,所以我觉得每个实例里所设置的 ttl 是“ TTL - (在某个实例尝试获取锁的时间 - 第一步获取到的时间) ”,对吗?(这里的 TTL 表示的是逻辑上的 TTL,并不是真实设置在某个实例里的 ttl,也就是所有实例里的同一个 key 都会在“第一步获取到的时间 + TTL”这个时间被删除)