Spark Solr Bulk Load from HDFS
There used to be a way to load Solr saturated from HDFS using MR jobs. An old video by Doug Reading talks about this at 11:41 pm https://www.youtube.com/watch?v=5444z-L2V2A
I also took a screenshot of the slide below:
So you used to read data from HDFS and write out multiple Solr shards ... one for each cartographer. How do I do something like this with Spark? I found a Spark-Solr project from LucidWorks that has SolrRDD but seems to be writing Solr entries using SolrJ. I would like to just write shards in HDFS from Spark RDD, similar to what the MR job was doing. How can I do this in Spark?
+3
source to share
No one has answered this question yet
Check out similar questions: