Cassandra CLUSTERING ORDER BY doesn't work and shows correct results

Hi I have created a table to store data like this

CREATE TABLE keyspace.test (
name text,
date text,
time double,
entry text,
details text,
PRIMARY KEY ((name, date), time)
) WITH CLUSTERING ORDER BY (time DESC);

      

And the inserted data into the table. But such a query gives an unordered result.

SELECT * FROM keyspace.test where device_id   name ='anand' and date in ('2017-04-01','2017-04-02','2017-04-03','2017-04-05') ;

      

Are there any problems with the design of the table.

+3


source to share


2 answers


I think you are underestimating the clustered clustered key order. Cassandra Sort data with cluster key within one partition.

This refers to your cassandra data to be sorted with clustering key times within the same name and date.

Example: let's insert some data

INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 1, 'a');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 2, 'b');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-01', 3, 'c');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-02', 0, 'nil');
INSERT INTO test (name , date , time , entry ) VALUES ('anand', '2017-04-02', 4, 'd');

      

If we select data with your request:

SELECT * FROM test where name ='anand' and date in ('2017-04-01','2017-04-02','2017-04-03','2017-04-05') ;

      

Output:



 name  | date       | time | details | entry
-------+------------+------+---------+-------
 anand | 2017-04-01 |    3 |    null |     c
 anand | 2017-04-01 |    2 |    null |     b
 anand | 2017-04-01 |    1 |    null |     a
 anand | 2017-04-02 |    4 |    null |     d
 anand | 2017-04-02 |    0 |    null |   nil

      

You can see that the times 3,2,1

are within the same section anand:2017-04-01

are sorted in desc AND the times 4,0

are within the same section anand:2017-04-02

are sorted in desc. Cassandra will not be sorting between different sections.

Here is the doc:

In a table definition, a clustering column is a column that is part of a composite primary key definition, but not the first column, which is a position reserved for a partition key. Columns are grouped into multiple rows in a single section. The clustering order is determined by the position of the columns in the composite primary key definition.

Source: http://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html

By the way, why is your data field of type text

and time

is it double

type?
The field date

can be used as a type date

and time

as timestamp

.

+2


source


The query you're using is ok but it probably doesn't behave the way you expect because the coordinator won't sort the results based on sections. I also run into this problem a couple of times.

The solution for it is very simple, basically. It is much better to run the 4 separate queries you need on the client and then combine the results there. In short, the IN operator puts a lot of pressure on the coordinator node in the cluster, there it reads well on the matter:



https://lostechies.com/ryansvihla/2014/09/22/cassandra-query-patterns-not-using-the-in-query-for-multiple-partitions/

+2


source







All Articles