Spark Solr Bulk Load from HDFS

Question

Spark Solr Bulk Load from HDFS

There used to be a way to load Solr saturated from HDFS using MR jobs. An old video by Doug Reading talks about this at 11:41 pm https://www.youtube.com/watch?v=5444z-L2V2A

I also took a screenshot of the slide below:

So you used to read data from HDFS and write out multiple Solr shards ... one for each cartographer. How do I do something like this with Spark? I found a Spark-Solr project from LucidWorks that has SolrRDD but seems to be writing Solr entries using SolrJ. I would like to just write shards in HDFS from Spark RDD, similar to what the MR job was doing. How can I do this in Spark?

+3

hdfs solr apache-spark

james 08 Aug 15 at 7:50 am

source to share