Cassandra malfunctions with batch updates

Question

Cassandra malfunctions with batch updates

I am using Apache Spark with the Spark Cassandra connector to write millions of rows to a Cassandra cluster. The replication factor is set to 3 and I am setting the ALL write consistency in spark-submit mode (YARN client mode) using the following parameters:

spark-submit ...
--conf spark.cassandra.output.consistency.level=ALL \
--conf spark.cassandra.output.concurrent.writes=1 \
--conf spark.cassandra.output.batch.size.bytes=20000 \
...

Then I wrote another Spark job to count the data I wrote. I have sequenced the new job as follows:

spark-submit ...
--conf spark.cassandra.input.consistency.level=ONE \
--conf spark.cassandra.input.split.size=50000 \
...

From the documentation, if the write consistency plus read consistency is greater than the replication factor, then I should have sequential reads.

But I am getting the following results:

Reading job gives me different results (score) every time I run it
If I increase the consistency level of the read assignment, I get the expected results

What am I missing? Is there a secret config that's set up by default (for example in case of problems while recording, then lower the consistency level or something ...) or am I using the buggy version of Cassandra (that's 2.1.2) or is there problems with batch updates that spark-cassandra-connector uses to save data in Cassandra (I just use saveToCassandra method)?

What will go wrong?

+3

cassandra apache-spark

Nicola ferraro 07 dec. 14 at 11:21

source to share

1 answer

Jacek L. · Accepted Answer · 2014-12-08T11:07:30+0000

I confirm that this is a bug in the connector. The consistency level is set on individual prepared statements and is simply ignored when using statement statements. Follow the updates on the connector - the fix will be included in the next bug fix release.

Cassandra malfunctions with batch updates

More articles: