Storing binary droplets in Cassandra

I am creating a simple HTTP service that stores arbitrary blobs. The service is maintained by Cassandra. This is a simplified version of Amazon S3. The system must be able to handle a heavy write load and must be accessible in the write and read path.

The saved data is unchanged. It can be removed, but it cannot be updated. Therefore, data inconsistency is not a problem. The data warehouse must be able to efficiently destroy old data.

The service uses the Netflix Astyanax library, which provides a recipe for storing (large) blobs in Cassandra.

I see two solutions to solve the problem, which have all pros and cons. It’s hard for me to judge what suits Cassandra better.

Separate table with TTL

Astyanax automatically breaks large objects into small pieces and saves them in one table. Each block is assigned a TTL to expire after a specified amount of time. Triggering compaction removes droplets when the TTL has expired.

These solutions work and are fairly easy to implement. I started using it SizeTieredCompactionStrategy

, but I think it DateTieredCompactionStrategy

might be a better choice when dealing with TTL data.

My main concern: can Cassandra's seal keep pace? Does anyone come across a similar use case?

Timed close data

Another approach would be to plot the data over time. I could create a table for each day and store the chunks in that table. In this case, I can drop the full table to get rid of the expired data.

This solution requires a little more effort to implement, but it makes it easier and probably faster to remove stale data.

How effective is Cassandra at dropping the table?

+3


source to share


1 answer


The correct option for your scenario is DateTieredCompactionStrategy and assigning a TTL to each block.



See: http://www.datastax.com/dev/blog/datetieredcompactionstrategy

+2


source







All Articles