Cassandra RandomPartitioner and "full table scan"

I can't seem to find information on how to traverse all rows in a column family using RandomPartitioner

to split the keys. The usual full-scan approaches that I see in the list are to "use MapReduce" (which would be an option, but not yet), and create a range segment query to retrieve rows in batches, updating the lower bound of the range with the last key visible after each batch ... This seems like a weird approach where you cannot guarantee the ordering of keys, so I was wondering what is accepted in this situation.

To be clear, this whole history of column-family traversal is not a regular occurrence and is not part of our standard database access patterns. It doesn't need to be particularly fast (although of course it would be nice!) We just need to do it sometimes to check for garbage and the like. We don't expect the returned rows to be a consistent snapshot or anything like that.

+3


source to share


1 answer


Using Hadoop MapReduce would be the correct way to do this, but I understand that this is not a viable option for you at the moment. Thus, you have several possibilities:



  • If your keys have some logical order and can be calculated or known in advance, you can do multiple access to a group of keys in a package.

  • You can create a range client similar to the way Cassandra's ColumnFamilyInputFormat works .

  • You can slice the range using Hector like this or some similar construct in another library.

+2


source







All Articles