Using Redis for very large cache memory

I am planning to consider Redis for storing large amounts of data in the cache. I currently store them in my cache, written in java. See my use case below.

I am getting 15 minute data from a source and I need to aggregate data hourly. So for a given object A, every hour I get 4 values ​​and I need to aggregate them down to a single value that the max / min / sum formula will use.

The fick key I am planning to use is as below

a) the id object is long

b) time-long

c) property id is int (each object can have many properties that I need to aggregate for each property separately)

Thus, the final key will look like this:

objectid_time_propertyid

Every 15 minutes I can get 50 to 60 million keys, I need to get these keys every time to convert the property value to double and apply the formula (max / min / sum etc.) and then convert back to String and save back. So I see that for each key I have one entry and one entry and transformation in each case.

My questions are as follows.

  • It is recommended to use redis for such a use case, in the future I can collect hourly data daily, daily weekly etc.
  • What will the cache read and write performance be (I did a benchmark for Windows and 100K keys, took 30-40 seconds to read and write, which is not very good, but I did it on windows and I finally need to run on Linux.
  • I want to use persistence redis feature, what are the pros and cons of this?

If anyone has real experience using redis as memcache that needs frequent updates, please provide a suggestion.

+3


source to share


1 answer


  • It is recommended to use redis for such a use case, in the future I can collect hourly data daily, daily weekly etc.

The recommended one depends on who you ask, but I certainly feel that Redis will work. If one server isn't enough, your description suggests that the dataset can be easily delineated to allow the cluster to scale.

I would advise, however, to store your data a little differently. First, every key in Redis has an overhead, so the more of them, the more RAM you will need. So instead of storing the key in relation to a time object, I recommend hashes as a means of concatenating some values ​​together. For example, you can use a key object_id:timestamp

and store property_id: value pairs underneath it.

Also, instead of storing 4 discrete measurements for each property object by timestamp and recalculating your aggregates, I suggest that you save only the aggregates and update them with new dimensions. So, you basically have a object_id

Hash with the following structure:

object_id:hourtimestamp  ->  property_id1:max = x
                             property_id1:min = y
                             property id1:sum = z

      

When getting new data - d - for an object property, just recalculate the aggregates:

property_id1:max = max(x, d)
property_id1:min = min(y, d)
property_id1:sum = z + d

      



Repeat the same for each required resolution, eg. use object_id:daytimestamp

to keep daily level aggregates.

Finally, remember to expire your keys after they are no longer required (i.e. set a 24 hour TTL for hour counters, etc.).

There are other possible approaches, mostly using Sorted Sets, that can be applied to solve your queries (remember that storing data is the easy part - returning it is usually more difficult;)).

  1. What will the cache read and write performance be (I did a benchmark for Windows and 100K keys, took 30-40 seconds to read and write, which is not very good, but I did it on windows and I finally need to run on Linux.

Redis, when running on my Linux laptop in a virtual machine, does over 500K reads and writes per second. Performance is highly dependent on how you use the Redis data types and APIs. Considering your throughput of 60 million readings in 15 minutes, or ~ 70K / sec writes about small data, Redis is more than capable of handling this.

  1. I want to use persistence redis feature, what are the pros and cons of this?

This is a very well documented subject - see http://redis.io/topics/persistence and http://oldblog.antirez.com/post/redis-persistence-demystified.html for a start.

+3


source







All Articles