Spark Solr Bulk Load from HDFS

There used to be a way to load Solr saturated from HDFS using MR jobs. An old video by Doug Reading talks about this at 11:41 pm https://www.youtube.com/watch?v=5444z-L2V2A

I also took a screenshot of the slide below:

enter image description here

So you used to read data from HDFS and write out multiple Solr shards ... one for each cartographer. How do I do something like this with Spark? I found a Spark-Solr project from LucidWorks that has SolrRDD but seems to be writing Solr entries using SolrJ. I would like to just write shards in HDFS from Spark RDD, similar to what the MR job was doing. How can I do this in Spark?

+3


source to share





All Articles