How to insert (not save or update) an RDD in Cassandra?

I am working with Apache Spark and Cassandra and I want to store the RDD in Cassandra with spark-cassandra-connector .

Here's the code:

def saveToCassandra(step: RDD[(String, String, Date, Int, Int)]) = {
  step.saveToCassandra("keyspace", "table")
}

      

This works great just fine, but overrides data that is already present in the db. I would not want to override any data. How is this possible?

+3


source to share


2 answers


What am I doing:

rdd.foreachPartition(x => connector.WithSessionDo(session => {
  someUpdater.UpdateEntries(x, session)
  // or
  x.foreach(y => someUpdater.UpdateEntry(y, session))
}))

      



connector

above CassandraConnector(sparkConf)

.

It's not as good as a simple one saveToCassandra

, but it does produce a fine-grained control.

+4


source


I think it's better to use WithSessionDo outside of the foreach section. There's overhead involved in this call that doesn't need to be repeated.



0


source







All Articles