Effective key design in Cassandra

Question

Effective key design in Cassandra

I have a question about the optimal design of a Cassandra database: is it efficient to have one table with a lot of skinny rows, or is it efficient to have a keyspace with many tables?

Context: I am trying to store data from multiple sensors. One approach would be to have a single table that stores data from all sensors. Another approach would be to have one table per gauge. Which one is better?

Please advise.

+3

cassandra

tagsense 01 june 15 at 16:21

source to share

2 answers

Aeham · Answer 1 · 2015-06-01T23:10:32+0000

I would have fewer tables for several reasons:

As Andy Tolbert said in his answer , each table presents some overhead that accumulates to a large amount when you have 10 or 100 thousand tables. Think of it as an increase in the overhead / cost ratio
If you are dealing with a large number of tables, chances are you will create some of them dynamically during normal application runtime. If so, you might see bugs in Cassandra as it may not propagate the schemas of some new tables in the cluster when it is under pressure. I've seen this in C * 2.0, but I'm not sure if it's still a problem in recent versions.
Most of the benefits of a multi-table schema can be gained by adding extra thought to single-table data modeling. Having said that, there are times when segregating data into discrete tables is indeed the most appropriate solution. One example of this is in some multi-tenant systems, where data for different tenants must be physically separated and backed up in isolation for regulatory reasons.

Andy Tolbert · Answer 2 · 2015-06-01T16:45:35+0000

It is much better and idiomatic to have 1 table for all sensors. Each table presents some overhead (mxbeans for metrics, files, etc.) so you don't need to have too much.

When you say "a lot of skinny strings" I don't expect this to be a problem, you can have many unique keys / sections (some crazy large number).

Effective key design in Cassandra

More articles: