Azure Redis cache - timeouts on GET calls

We have multiple web and worker roles in Azure that connect to our Azure Redis cache through the StackExchange.Redis library and we get regular timeouts that cause our end-to-end solution to stop. An example of one of them is given below:

System.TimeoutException: Timeout executing thread GET: 459, inst: 4, mgr: Inactive, queue: 12, qu = 0, qs = 12, qc = 0, wr = 0/0, in = 65536/0 at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl [T] (Message Message, ResultProcessor 1 processor, ServerEndPoint server) in c:\TeamCity\buildAgent\work\58bc9a6df18a3782\StackExchange.Redis\StackExchange\Redis\ConnectionMultiplexer.cs:line 1785 at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor

1 Processor, ServerEndPoint) at C: \ TeamCity \ buildAgent \ Work \ 58bc9a6df18a3782 \ StackExchange.Redis \ StackExchange \ Stack. Redis \ RedisBase.cs line Redis.RedisDatabase.StringGet (RedisKey, CommandFlags) at C: \ TeamCity \ buildAgent \ work \ 58bc9a6df18a3782 \ StackExchange.Redis \ StackExchange \ Redis \ RedisDatabase.cs: line 1346 at OptiRTC.Cache.RedisCacheAgent > C__DisplayClass41.<Get>b__3() in c:\dev\OptiRTCAzure\OptiRTC.Cache\RedisCacheActions.cs:line 104 at Polly.Retry.RetryPolicy.Implementation(Action action, IEnumerable

1 shouldRetryPredicates, Func`1 policyStateFactory) in OptiRTC.Cache.RedisCacheActions.Get [T] (String key, Boolean allowDirtyRead) in c: \ dev \ OptiRTCAzure \ OptiRTC.Cache \ RedisCacheActions.cs: line 107 on OptiRTC.Cacheccess .d__e4.MoveNext () in c: \ dev \ OptiRTCAzure \ OptiRTC.Cache \ RedisCacheAccess.cs: line 1196; TraceSource Event 'WaWorkerHost.exe'

All timeouts have different queue numbers and qs, but the rest of the messages are consistent. These StringGet calls refer to various keys in the cache. In each of our services, we use one cache-cache class with one ConnectionMultiplexer, which is registered in our IoC container when starting a network or worker role:

        container.RegisterInstance<ICacheAccess>(cacheAccess);

      

In our ICacheAccess implementation, we create a multiplexer as follows:

            ConfigurationOptions options = new ConfigurationOptions();
            options.EndPoints.Add(serverAddress);
            options.Ssl = true;
            options.Password = accessKey;                    
            options.ConnectTimeout = 1000;
            options.SyncTimeout = 2500;

            redis = ConnectionMultiplexer.Connect(options);

      

where the redis object is used throughout the instance. We have about 20 instances of web and worker roles connecting to the cache using this ICacheAccess implementation, but the management console shows an average of 200 concurrent cache connections.

I've seen other posts that link to version 1.0.333 of StackExchange.Redis that we do via NuGet, but when I look at the actual version of the StackExchange.Redis.dll link it shows 1.0.316.0. We tried to add and remove NuGet reference and also add it to new project and we always get version mismatch.

Any insight will be greatly appreciated. Thank.

Additional Information:

We've updated to 1.0.371. We have two services, each of which has access to the same cache object at different intervals, one for editing and sometimes reading, and the other for reading that object several times per second. Both services are deployed with the same caching code and version of the StackExchange.Redis library. I almost never see timeouts on the service that is editing the object, but I get timeouts between 50 and 75% of the time on the services that read it. Timeouts have the same format as above, and they continue to occur after wrapping the call to db.StringGet in a Polly recheck block that handles both RedisException and System.TimeoutException and repeats once after 500ms.

We contacted Microsoft regarding this issue and confirmed that they are not seeing anything in the Redis logs that indicate an issue on the Redis service side. Our cache shortage is extremely low on a Redis server, but we keep getting these timeouts, which makes our application functionality much more difficult.

In response to comments, yes, we always have a number in qs and never in qc. We always have a number in the first part and never in the second.

Additional Information:

When I start a service with fewer instances on a higher cpu, I get significantly more of these timeout errors than when the instances run on lower cpu. More specifically, I pulled some numbers out of our services this morning. When they were running on about 30% CPU, I saw very few timeouts - only 42 hours in 30 minutes. When I removed half of the instances and they started running at around 60-65% cpu, the speed increased 10x to 536 in 30 minutes.

+3


source to share


2 answers


I know this topic is months, but I think my own experience might bring some value here. I had the same Azure Redis cache issue (timeouts in Gets), but I realized that this almost exclusively happens in Gets where the string value was relatively large (> 250K in length). I have implemented gzip for both Gets and Sets (when the string value is large) and now I almost never get a timeout.



Even if that doesn't solve your particular problem, it is probably good practice to compress values ​​in general to reduce costs and improve performance.

+3


source


In terms of version numbers, it seems that AssemblyVersion has been locked at 1.0.316 for the last few releases, but AssemblyFileVersion has been updated to match the version of the NuGet package. For now, I recommend ignoring AssemblyVersion and just using AssemblyFileVersion to make sure you have the correct binary.



Please contact us at AzureCache@microsoft.com if you are still seeing timeouts using the AzureCache cache.

0


source







All Articles