Sequence generator / autoincrement using Cassandra 3.0

I've read a lot of Cassandras documentation and checked counter changes and so on. But this means that Cassandra does not come with a standard and standard way to generate incremental sequences on the fly.

Everything I have found is using the IF statement / clause doing the comparison and set.

Thus, you can check the existence of a document, and if not, create it. Since this is done with the quorum algorithm considered by the cluster, it should be easy to use and secure, but with high latency.

To work around this delay, you can create (reserve) a thousand IDs by increasing the nextSequenceId value by one thousand instead of one. So you only pay for latency after the first of the thousand is generated (or if it is done prematurely, it will have almost no latency at all).

I understand that this will create a hot spot or congestion.

One way to avoid this overload is to use sequence number generators, which all go to a different offset (modulo) and limit the chance of collision by randomly choosing a particular sequence generator by modulo.

So this will be my naive implementation.

Since Cassandra 3.0 hit the street, I'm just curious about three things:

  • Cassandra offers a smarter way to implement sequences.
  • Is Cassandra suggesting something to ease the pain of realizing this? I mean what I read and I compare and install. Is there something smarter?
  • Is there any library already giving me some sort of sequence numbers?
+3


source to share


1 answer


Jonathan opened Jira for this theme - https://issues.apache.org/jira/browse/CASSANDRA-9200

3.0 is not yet available, but it looks like committers are finalizing features for 3.0, and 9200 seems to be set to 3.1 (which actually means "sometime after 3.0" - maybe 3.1, maybe 3.2, maybe be, 4.0).

For all questions:



1) No, there is no built-in way to make a sequence in cassandra at this time

2) No, you will have to write with write before writing, or lock sections of the sequence on a node if you can tolerate sequences that do not grow much

3) Twitter posted Snowflake at some point ( https://github.com/twitter/snowflake ), but he's now retired. As a general rule, I prefer to use type 1 UUIDs that refer to a timeline with random components. Even UUIDs are not reliable, but they are generally “good enough” for our workloads. Simpleflake ( http://engineering.custommade.com/simpleflake-distributed-id-generation-for-the-lazy/ ) discusses the tradeoffs at the link provided to me and also offers its own generator.

+2


source







All Articles