How do I write a streaming dataset in Cassandra?
So, I have a python Stream-sourced DataFrame df
that has all the data I want to put in a Cassandra table with spark-cassandra-connector . I've tried doing it in two ways:
df.write \
.format("org.apache.spark.sql.cassandra") \
.mode('append') \
.options(table="myTable",keyspace="myKeySpace") \
.save()
query = df.writeStream \
.format("org.apache.spark.sql.cassandra") \
.outputMode('append') \
.options(table="myTable",keyspace="myKeySpace") \
.start()
query.awaitTermination()
However, I keep getting these errors, respectively:
pyspark.sql.utils.AnalysisException: "'write' can not be called on streaming Dataset/DataFrame;
and
java.lang.UnsupportedOperationException: Data source org.apache.spark.sql.cassandra does not support streamed writing.
Is there anyway I can send my Streamed DataFrame to the Cassandra table?
source to share
There is currently no streaming Sink
for Cassandra in the Spark Cassandra connector. You will need to implement your own Sink
or wait for it to appear.
If you were using Scala or Java, you could use the operator foreach
and use ForeachWriter
as described in Using Foreach .
source to share