Mutex Violations Using ServiceStack Redis for Distributed Locking

I'm trying to implement DLM using the locking mechanisms provided by the ServiceStack-Redis library and described here , but I find that the API seems to represent a race condition that sometimes exposes the same lock to multiple clients.

BasicRedisClientManager mgr = new BasicRedisClientManager(redisConnStr);

using(var client = mgr.GetClient())
{
    client.Remove("touchcount");
    client.Increment("touchcount", 0);
}

Random rng = new Random();

Action<object> simulatedDistributedClientCode = (clientId) => {

    using(var redisClient = mgr.GetClient())
    {
        using(var mylock = redisClient.AcquireLock("mutex", TimeSpan.FromSeconds(2)))
        {
            long touches = redisClient.Get<long>("touchcount");
            Debug.WriteLine("client{0}: I acquired the lock! (touched: {1}x)", clientId, touches);
            if(touches > 0) {
                Debug.WriteLine("client{0}: Oh, but I see you've already been here. I'll release it.", clientId);
                return;
            }
            int arbitraryDurationOfExecutingCode = rng.Next(100, 2500);
            Thread.Sleep(arbitraryDurationOfExecutingCode); // do some work of arbitrary duration
            redisClient.Increment("touchcount", 1);
        }
        Debug.WriteLine("client{0}: Okay, I released my lock, your turn now.", clientId);
    }
};
Action<Task> exceptionWriter = (t) => {if(t.IsFaulted) Debug.WriteLine(t.Exception.InnerExceptions.First());};

int arbitraryDelayBetweenClients = rng.Next(5, 500);
var clientWorker1 = new Task(simulatedDistributedClientCode, 1);
var clientWorker2 = new Task(simulatedDistributedClientCode, 2);

clientWorker1.Start();
Thread.Sleep(arbitraryDelayBetweenClients);
clientWorker2.Start();

Task.WaitAll(
    clientWorker1.ContinueWith(exceptionWriter),
    clientWorker2.ContinueWith(exceptionWriter)
    );

using(var client = mgr.GetClient())
{
    var finaltouch = client.Get<long>("touchcount");
    Console.WriteLine("Touched a total of {0}x.", finaltouch);
}

mgr.Dispose();

      

When running the above code to simulate two clients trying to perform the same operation in short sequences from each other, there are three possible exits. The first is the optimal case when Mutex is working correctly and clients are working in the correct order. The second case is when the second client timed out the lock; also an acceptable result. The problem, however, is that as the arbitraryDurationOfExecutingCode

timeout for acquiring a lock approaches or exceeds, it is fairly easy to reproduce the situation where the 2nd client is granted the lock BEFORE the 1st client releasing it, generating output like this:

client1: I purchased a castle! (touched: 0x)
client2: I purchased a lock! (touched: 0x)
client1: Ok, I released my lock, now it's your turn.
client2: Ok, I released my lock, now it's your turn.
Placed a total of 2x.

My understanding of the API and its documentation is that the argument timeOut

when acquiring a lock is for exactly that - a timeout to acquire a lock. If I need to guess a value timeOut

that is large enough to always be longer than the duration of my execution code to prevent this condition, it seems rather error prone. Does anyone have a work around other than passing null to wait on locks forever? I definitely don't want to do this, or I know what I will find with ghost castles from broken workers.

+3


source to share


2 answers


The answer from mythz (thanks for the quick answer!) Confirms that the built-in method AcquireLock

in ServiceStack.Redis does not distinguish between the lock commit period and the lock expiration period. For our purposes, we have existing code that assumes that the locking mechanism of the lock can quickly fail if the lock has been taken, but allow long running processes within the lock scope. To meet these requirements, I got this option on ServiceStack RedisLock which allows me to differentiate between the two.

// based on ServiceStack.Redis.RedisLock
// https://github.com/ServiceStack/ServiceStack.Redis/blob/master/src/ServiceStack.Redis/RedisLock.cs
internal class RedisDlmLock : IDisposable
{
    public static readonly TimeSpan DefaultLockAcquisitionTimeout = TimeSpan.FromSeconds(30);
    public static readonly TimeSpan DefaultLockMaxAge = TimeSpan.FromHours(2);
    public const string LockPrefix = "";    // namespace lock keys if desired

    private readonly IRedisClient _client; // note that the held reference to client means lock scope should always be within client scope

    private readonly string _lockKey;
    private string _lockValue;

    /// <summary>
    /// Acquires a distributed lock on the specified key.
    /// </summary>
    /// <param name="redisClient">The client to use to acquire the lock.</param>
    /// <param name="key">The key to acquire the lock on.</param>
    /// <param name="acquisitionTimeOut">The amount of time to wait while trying to acquire the lock. Defaults to <see cref="DefaultLockAcquisitionTimeout"/>.</param>
    /// <param name="lockMaxAge">After this amount of time expires, the lock will be invalidated and other clients will be allowed to establish a new lock on the same key. Deafults to <see cref="DefaultLockMaxAge"/>.</param>
    public RedisDlmLock(IRedisClient redisClient, string key, TimeSpan? acquisitionTimeOut = null, TimeSpan? lockMaxAge = null)
    {
        _client = redisClient;
        _lockKey = LockPrefix + key;

        ExecExtensions.RetryUntilTrue(
            () =>
            {
                //Modified from ServiceStack.Redis.RedisLock
                //This pattern is taken from the redis command for SETNX http://redis.io/commands/setnx
                //Calculate a unix time for when the lock should expire

                lockMaxAge = lockMaxAge ?? DefaultLockMaxAge; // hold the lock for the default amount of time if not specified.
                DateTime expireTime = DateTime.UtcNow.Add(lockMaxAge.Value);
                _lockValue = (expireTime.ToUnixTimeMs() + 1).ToString(CultureInfo.InvariantCulture);

                //Try to set the lock, if it does not exist this will succeed and the lock is obtained
                var nx = redisClient.SetEntryIfNotExists(_lockKey, _lockValue);
                if (nx)
                    return true;

                //If we've gotten here then a key for the lock is present. This could be because the lock is
                //correctly acquired or it could be because a client that had acquired the lock crashed (or didn't release it properly).
                //Therefore we need to get the value of the lock to see when it should expire
                string existingLockValue = redisClient.Get<string>(_lockKey);
                long lockExpireTime;
                if (!long.TryParse(existingLockValue, out lockExpireTime))
                    return false;
                //If the expire time is greater than the current time then we can't let the lock go yet
                if (lockExpireTime > DateTime.UtcNow.ToUnixTimeMs())
                    return false;

                //If the expire time is less than the current time then it wasn't released properly and we can attempt to 
                //acquire the lock. This is done by setting the lock to our timeout string AND checking to make sure
                //that what is returned is the old timeout string in order to account for a possible race condition.
                return redisClient.GetAndSetEntry(_lockKey, _lockValue) == existingLockValue;
            },
            acquisitionTimeOut ?? DefaultLockAcquisitionTimeout // loop attempting to get the lock for this amount of time.
            );
    }

    public override string ToString()
    {
        return String.Format("RedisDlmLock:{0}:{1}", _lockKey, _lockValue);
    }

    public void Dispose()
    {
        try
        {
            // only remove the entry if it still contains OUR value
            _client.Watch(_lockKey);
            var currentValue = _client.Get<string>(_lockKey);
            if (currentValue != _lockValue)
            {
                _client.UnWatch();
                return;
            }

            using (var tx = _client.CreateTransaction())
            {
                tx.QueueCommand(r => r.Remove(_lockKey));
                tx.Commit();
            }
        }
        catch (Exception ex)
        {
            // log but don't throw
        }
    }
}

      



To keep it as easy to use as possible, I also expose some extension methods for IRedisClient

for the parallel method AcquireLock

in the following lines:

internal static class RedisClientLockExtensions
{
    public static IDisposable AcquireDlmLock(this IRedisClient client, string key, TimeSpan timeOut, TimeSpan maxAge)
    {
        return new RedisDlmLock(client, key, timeOut, maxAge);
    }
}

      

+3


source


Your question highlights the distributed locking behavior in ServiceStack.Redis, if the specified timeout is exceeded, clients with the timeout treat it as an invalid lock and will try to automatically recover the lock. If there were no automatic recovery, the emergency client never released the lock, and further operations waiting for that lock would be allowed through.

The locking behavior for is AcquireLock

wrapped in the RedisLock class :

public IDisposable AcquireLock(string key, TimeSpan timeOut)
{
    return new RedisLock(this, key, timeOut);
}

      



What you can copy and change according to the behavior you prefer:

using (new MyRedisLock(client, key, timeout))
{
    //...
}

      

+2


source







All Articles