Accessing Cassandra Nodes in Spark
2 answers
The documentation is simple enough:
new SparkConf(true)
.set("spark.cassandra.connection.host", "192.168.123.10")
And just below:
"Multiple hosts can be transferred using a comma separated list (" 127.0.0.1,127.0.0.2 "). These are just the initial contact points, all nodes in the local DC will be used when connecting."
In other words, you just need to connect to a Spark master that knows about other machines in the cluster through the resource manager. A comma separated list is useful when you want to connect to multiple clusters.
+2
source to share
You can try this if you are using scala. I couldn't find anything else in Python.
val connectorToClusterOne = CassandraConnector(sc.getConf.set("spark.cassandra.connection.host", "127.0.0.1"))
val connectorToClusterTwo = CassandraConnector(sc.getConf.set("spark.cassandra.connection.host", "127.0.0.2"))
implicit val c = connectorToClusterOne
sc.cassandraTable("ks","tab")
implicit val c = connectorToClusterTwo
rddFromClusterOne.saveToCassandra("ks","tab")
Good luck !!
+2
source to share