StackExchange.Redis on Azure timed out doing get and no connections available

I recently switched an MVC application that serves data feeds and dynamically generated images (6000 RPM throughput) from the ServiceStack.Redis v3.9.67 client to the latest StackExchange.Redis client (v1.0.450) and I see slower performance and some new exceptions.

Our Redis instance is tier S4 (13GB), the CPU is showing a pretty consistent 45% or so, and the network bandwidth is pretty low. I'm not really sure how to interpret the gets / sets graph in our Azure portal, but it shows us about 1M and 100k sets (it seems like it could be in 5 minute increments).

The client library switch was simple and we are still using the ServiceStack JSON serializer version 3.9, so the client library was the only replacement.

Our external monitoring with New Relic shows that our average response time increases from about 200ms to about 280ms between the ServiceStack libraries and the StackExchange (slower StackExchange) libraries without any changes.

We have recorded a series of exceptions with messages on the lines:

Timeout doing GET feed channels: ag177kxj_egeo-_nek0cew, inst: 12, mgr: Inactive, queue: 30, qu = 0, qs = 30, qc = 0, wr = 0/0, in = 0/0

I understand this means there are multiple commands in the dispatch queue, but there is no response from Redis and this could be caused by long commands exceeding the timeout. These errors appeared during the period when our sql database behind one of our data services was backed up, perhaps that was the reason? After scaling this database out to reduce load, we didn't see very much of this error, but the DB query must be happening in .Net and I can't see how it will delay the redis command or connection.

We also recorded a large batch of errors this morning over a short period (a couple of minutes) with messages such as:

The connection is not available to service this operation: feed channels SETEX: vleggqikrugmxeprwhwc2a: last-retry

We were used for temporary errors connecting to the ServiceStack library, and these exception messages were usually like this:

Unable to connect: sPort: 63980

I am under the impression that SE.Redis should be repeating connections and commands in the background for me. Do I still have to collect our calls through SE.Redis in my retry policy? Perhaps other timeout values ​​would be more appropriate (although I'm not sure which values ​​to use)?

Our line redis connection sets these parameters: abortConnect=false,syncTimeout=2000,ssl=true

. We are using single screen instance ConnectionMultiplexer

and temporary instances IDatabase

.

The vast majority of our use of Redis goes through the Cache class, and the important implementation bits are below if we are doing something stupid that is causing us problems.

Our keys usually contain 10-30 characters. The values ​​are mostly scalar or reasonably small serialized sets of objects (hundreds of bytes to a few kilobytes in total), although we also store jpg images in the cache, so a large chunk of data is from a couple hundred kB to a couple MB.

Perhaps I should use different multiplexers for small and large values, perhaps with longer timeouts for larger values? Or a couple / multiple multiplexers in case one gets stuck?

public class Cache : ICache
{
    private readonly IDatabase _redis;

    public Cache(IDatabase redis)
    {
        _redis = redis;
    }

    // storing this placeholder value allows us to distinguish between a stored null and a non-existent key
    // while only making a single call to redis. see Exists method.
    static readonly string NULL_PLACEHOLDER = "$NULL_VALUE$";

    // this is a dictionary of https://github.com/StephenCleary/AsyncEx/wiki/AsyncLock
    private static readonly ILockCache _locks = new LockCache();

    public T GetOrSet<T>(string key, TimeSpan cacheDuration, Func<T> refresh) {
        T val;
        if (!Exists(key, out val)) {
            using (_locks[key].Lock()) {
                if (!Exists(key, out val)) {
                    val = refresh();
                    Set(key, val, cacheDuration);
                }
            }
        }
        return val;
    }

    private bool Exists<T>(string key, out T value) {
        value = default(T);
        var redisValue = _redis.StringGet(key);

        if (redisValue.IsNull)
            return false;

        if (redisValue == NULL_PLACEHOLDER)
            return true;

        value = typeof(T) == typeof(byte[])
            ? (T)(object)(byte[])redisValue
            : JsonSerializer.DeserializeFromString<T>(redisValue);

        return true;
    }

    public void Set<T>(string key, T value, TimeSpan cacheDuration)
    {
        if (value.IsDefaultForType())
            _redis.StringSet(key, NULL_PLACEHOLDER, cacheDuration);
        else if (typeof (T) == typeof (byte[]))
            _redis.StringSet(key, (byte[])(object)value, cacheDuration);
        else
            _redis.StringSet(key, JsonSerializer.SerializeToString(value), cacheDuration);
    }


    public async Task<T> GetOrSetAsync<T>(string key, Func<T, TimeSpan> getSoftExpire, TimeSpan additionalHardExpire, TimeSpan retryInterval, Func<Task<T>> refreshAsync) {
        var softExpireKey = key + ":soft-expire";
        var lastRetryKey = key + ":last-retry";

        T val;
        if (ShouldReturnNow(key, softExpireKey, lastRetryKey, retryInterval, out val)) 
            return val;

        using (await _locks[key].LockAsync()) {
            if (ShouldReturnNow(key, softExpireKey, lastRetryKey, retryInterval, out val))
                return val;

            Set(lastRetryKey, DateTime.UtcNow, additionalHardExpire);

            try {
                var newVal = await refreshAsync();
                var softExpire = getSoftExpire(newVal);
                var hardExpire = softExpire + additionalHardExpire;

                if (softExpire > TimeSpan.Zero) {
                    Set(key, newVal, hardExpire);
                    Set(softExpireKey, DateTime.UtcNow + softExpire, hardExpire);
                }
                val = newVal;
            }
            catch (Exception ex) {
                if (val == null)
                    throw;
            }
        }

        return val;
    }

    private bool ShouldReturnNow<T>(string valKey, string softExpireKey, string lastRetryKey, TimeSpan retryInterval, out T val) {
        if (!Exists(valKey, out val))
            return false;

        var softExpireDate = Get<DateTime?>(softExpireKey);
        if (softExpireDate == null)
            return true;

        // value is in the cache and not yet soft-expired
        if (softExpireDate.Value >= DateTime.UtcNow)
            return true;

        var lastRetryDate = Get<DateTime?>(lastRetryKey);

        // value is in the cache, it has soft-expired, but it too soon to try again
        if (lastRetryDate != null && DateTime.UtcNow - lastRetryDate.Value < retryInterval) {
            return true;
        }

        return false;
    }
}

      

+3


source to share


1 answer


Several recommendations. - You can use different multiplexers with different timeout values ​​for different key / value types http://azure.microsoft.com/en-us/documentation/articles/cache-faq/ - Make sure you are not tied to the network on client and server. if you are on a server, upgrade to a higher SKU that has more bandwidth Please read this post for more details. http://azure.microsoft.com/blog/2015/02/10/investigating-timeout-exceptions-in-stackexchange-redis-for-azure-redis-cache/



+3


source







All Articles