How to build a sorted list of ranks from Cassandra's table?

I store my data in a single Cassandra 2.0.10 table. There is one column (named score

), integer type, can take any values. I need to write a background job that assigns a value to another column, rank

giving a value of 1 for the row with the highest value in the score field, a value of 2 for the one closest to the highest, and so on. The score

row with the lowest value should receive the total row count assigned rank

. It is currently defined in CQL as

CREATE TABLE players
    (user int, rank int, score int, details blob, PRIMARY KEY(user))

      

Put something like PostgreSQL, I would do something like

select id, rank from players order by score desc offset A limit 100;

      

using incremental values ​​for A and thus iterating over the database across 100 pages. This will give me the top 100 players in one query, 100 to 200 in the second, and so on. I can then run update instructions by id, one at a time, or in batches.

When I try to do the same in Cassandra CQL, it turns out that many of the required functionality is not supported (no ordering, no offset, no clear way to visit all rows). I tried to create an index on the score column but it didn't help.

This rank assignment is an auxiliary assignment. It is not a problem for it to take days or even weeks. It is okay if it is a little inconsistent, as the scores can change during operation. This is not the main feature of the application. The core functions do not use range queries and Cassandra works well.

Is it possible to implement this rank assignment combining Java and CQL or the limitations are serious enough. Do I need to use a different database engine?

+3


source to share


1 answer


In my experience, Cassandra is not suited for such tasks. You can definitely do this, but the solution will not be simple or effective. There is no problem to iterate over all the rows in the same table to update the ranks, however there is a problem to iterate through all the rows in the order of your ranks. You could keep two tables:

players (id, rank) and rank_to_id (rank, id_list). Then you have to request the second page using:

select * from rank_to_id where rank> 100 limit 100



It will be your assigned rank's responsibility to properly update both tables when the rank changes. Basically, you will be implementing the simple database index that PostgreSQL has out of the box.

Also I would recommend that you take a look at Redis DB. It has a nifty datatype like Sorted Set, which implements almost what you want: http://redis.io/commands#sorted_set . However, this depends on the amount of data you have. Redis is an in-memory database.

PostgreSQL can be a good solution as well. Why don't you use it?

+1


source







All Articles