How to convert types when reading data from Elasticsearch using elasticsearch-spark to SPARK

When I try to read data from elasticsearch using a function esRDD("index")

in elasticsearch-spark, I get the results in type org.apache.spark.rdd.RDD[(String, scala.collection.Map[String,AnyRef])]

. And when I check the values, they are all type AnyRef

. However, I saw on the ES site , it says:

elasticsearch-hadoop automatically converts built-in Spark types to Elasticsearch types (and vice versa)

My dependencies:

scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"  
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0"  
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.0"  
libraryDependencies += "org.elasticsearch" % "elasticsearch-spark-20_2.11" % "5.4.0"

      

Did I miss something? And how can I convert types in a convenient way?

+3


source to share


1 answer


OK, I found a solution. If you use esRDD

, information about all types will be lost.
Better to use:

val df = sparkSession.read.format("org.elasticsearch.spark.sql").option("es.read.field.as.array.include", "").load("index")

      

You can tweak in option

, if you've done it before, option

you can ignore it.



The returend data is in DataFrame

, and the data types are stored (converted to sql.DataTypes

) in the schema as long as the conversion is supported elasticsearch-spark

.

And now you can do whatever you want.

+1


source







All Articles