How to insert (not save or update) an RDD in Cassandra?
I am working with Apache Spark and Cassandra and I want to store the RDD in Cassandra with spark-cassandra-connector .
Here's the code:
def saveToCassandra(step: RDD[(String, String, Date, Int, Int)]) = {
step.saveToCassandra("keyspace", "table")
}
This works great just fine, but overrides data that is already present in the db. I would not want to override any data. How is this possible?
+3
source to share
2 answers
What am I doing:
rdd.foreachPartition(x => connector.WithSessionDo(session => { someUpdater.UpdateEntries(x, session) // or x.foreach(y => someUpdater.UpdateEntry(y, session)) }))
connector
above CassandraConnector(sparkConf)
.
It's not as good as a simple one saveToCassandra
, but it does produce a fine-grained control.
+4
source to share