Doesn't Spark support the Arrayist when writing in elasticsearch?

Question

Doesn't Spark support the Arrayist when writing in elasticsearch?

I have the following structure:

mylist = [{"key1":"val1"}, {"key2":"val2"}]
myrdd = value_counts.map(lambda item: ('key', { 
    'field': somelist 
}))

I am getting error: 15/02/10 15:54:08 INFO scheduler.TaskSetManager: Lost task 1.0 in stage 2.0 (TID 6) on executor ip-10-80-15-145.ec2.internal: org.apache.spark .SparkException (java.util.ArrayList data cannot be used) [duplicate 1]

rdd.saveAsNewAPIHadoopFile( 
            path='-', 
            outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat", 
            keyClass="org.apache.hadoop.io.NullWritable", 
            valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", 
            conf={ 
        "es.nodes" : "localhost", 
        "es.port" : "9200", 
        "es.resource" : "mboyd/mboydtype" 
    })

What I would like the document to end up like when writing in ES:

{
field:[{"key1":"val1"}, {"key2":"val2"}]
}

+3

hadoop elasticsearch apache-spark

Rolando Jul 14 15 at 15:15

source to share

2 answers

GBleaney · Answer 1 · 2015-11-05T16:35:08+0000

A bit late in the game, but this is the solution we faced after we did it yesterday. Add 'es.input.json': 'true'

to your conf and then run json.dumps()

on your data.

Modifying your example, it looks like this:

import json

rdd = sc.parallelize([{"key1": ["val1", "val2"]}])
json_rdd = rdd.map(json.dumps)
json_rdd.saveAsNewAPIHadoopFile( 
    path='-', 
    outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat", 
    keyClass="org.apache.hadoop.io.NullWritable", 
    valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable", 
    conf={ 
        "es.nodes" : "localhost", 
        "es.port" : "9200", 
        "es.resource" : "mboyd/mboydtype",
        "es.input.json": "true"
    }
)

Karudoso · Answer 2 · 2016-05-23T14:00:30+0000

This problem had a problem and the solution is by converting all the lists to tuples. The conversion to json does the same.

Doesn't Spark support the Arrayist when writing in elasticsearch?

More articles: