Cassandra Performance SELECT by id or SELECT by nothing

Question

Cassandra Performance SELECT by id or SELECT by nothing

I am wondering if the speed of C * s depends on SELECT

how we select entire final tables.

For example, we have this table

id | value
A  | x
A  | xx
B  | xx
C  | xxx
B  | xx

It would be faster to get all the results if we do SELECT * FROM Y WHERE id='A'

SELECT * FROM Y WHERE id='B'

SELECT * FROM Y WHERE id='C'

or it will be faster if we do SELECT * FROM Y WHERE 1

or maybe it will be faster if we do SELECT * FROM Y WHERE id IN ('A', 'B', 'C')

Or they will be equally fast (if we miss the connection time)

+3

cassandra cql cql3

M. Hirn 24 nov. 14 at 22:38

source to share

1 answer

Aaron · Accepted Answer · 2014-11-24T23:32:27+0000

Not sure what your family (table) of your column looks like, but your sample data would never exist in Cassandra. Primary keys are unique and if id

is your primary key the last record will be defeated. Basically, the table will look something like this:

id | value
A  | xx
C  | xxx
B  | xx

As for your individual requests ...

SELECT * FROM Y WHERE 1

This might work well with 3 lines, but it won't if you have 3 million, all spread across multiple nodes.

SELECT * FROM Y WHERE id IN ('A', 'B', 'C')

It's definitely not faster. See my answer here on why relying on IN

for anything other than occasional OLAP usage is not a good idea.

SELECT * FROM Y WHERE id='A'
SELECT * FROM Y WHERE id='B'
SELECT * FROM Y WHERE id='C'

This is definitely the best way. Cassandra is designed to ask for a specific unique sharing key. Even if you want to query every row in a column family (table), you still provide it with a specific section key. This will help your driver quickly determine which node (s) to send the request to.

Now let's say you have 3 million lines. For your application, is it faster to query each individual or just do it SELECT *

? It might be faster in terms of the request, but you still have to go through each one (client side). This means that they manage all of them within the limits of your JVM's available memory (which probably means they are nudging them in some way). But this is a bad (extreme) example, because you never want to send your client application 3 million lines to work with.

The bottom line is that you will have to discuss these issues yourself and within the specifications of your application. But from a performance standpoint, I've noticed that appropriate query-based data modeling tends to outweigh query strategy or syntax tricks.

Cassandra Performance SELECT by id or SELECT by nothing

More articles: