How do I write a streaming dataset in Cassandra?

So, I have a python Stream-sourced DataFrame df

that has all the data I want to put in a Cassandra table with spark-cassandra-connector . I've tried doing it in two ways:

df.write \
    .format("org.apache.spark.sql.cassandra") \
    .mode('append') \
    .options(table="myTable",keyspace="myKeySpace") \
    .save() 

query = df.writeStream \
    .format("org.apache.spark.sql.cassandra") \
    .outputMode('append') \
    .options(table="myTable",keyspace="myKeySpace") \
    .start()

query.awaitTermination()

      

However, I keep getting these errors, respectively:

pyspark.sql.utils.AnalysisException: "'write' can not be called on streaming Dataset/DataFrame;

      

and

java.lang.UnsupportedOperationException: Data source org.apache.spark.sql.cassandra does not support streamed writing.

      

Is there anyway I can send my Streamed DataFrame to the Cassandra table?

+3


source to share


1 answer


There is currently no streaming Sink

for Cassandra in the Spark Cassandra connector. You will need to implement your own Sink

or wait for it to appear.



If you were using Scala or Java, you could use the operator foreach

and use ForeachWriter

as described in Using Foreach .

+6


source







All Articles