Effective key design in Cassandra

I have a question about the optimal design of a Cassandra database: is it efficient to have one table with a lot of skinny rows, or is it efficient to have a keyspace with many tables?

Context: I am trying to store data from multiple sensors. One approach would be to have a single table that stores data from all sensors. Another approach would be to have one table per gauge. Which one is better?

Please advise.

+3


source to share


2 answers


I would have fewer tables for several reasons:



  • As Andy Tolbert said in his answer , each table presents some overhead that accumulates to a large amount when you have 10 or 100 thousand tables. Think of it as an increase in the overhead / cost ratio
  • If you are dealing with a large number of tables, chances are you will create some of them dynamically during normal application runtime. If so, you might see bugs in Cassandra as it may not propagate the schemas of some new tables in the cluster when it is under pressure. I've seen this in C * 2.0, but I'm not sure if it's still a problem in recent versions.
  • Most of the benefits of a multi-table schema can be gained by adding extra thought to single-table data modeling. Having said that, there are times when segregating data into discrete tables is indeed the most appropriate solution. One example of this is in some multi-tenant systems, where data for different tenants must be physically separated and backed up in isolation for regulatory reasons.
+3


source


It is much better and idiomatic to have 1 table for all sensors. Each table presents some overhead (mxbeans for metrics, files, etc.) so you don't need to have too much.



When you say "a lot of skinny strings" I don't expect this to be a problem, you can have many unique keys / sections (some crazy large number).

+2


source







All Articles