How to convert types when reading data from Elasticsearch using elasticsearch-spark to SPARK
When I try to read data from elasticsearch using a function esRDD("index")
in elasticsearch-spark, I get the results in type org.apache.spark.rdd.RDD[(String, scala.collection.Map[String,AnyRef])]
. And when I check the values, they are all type AnyRef
. However, I saw on the ES site , it says:
elasticsearch-hadoop automatically converts built-in Spark types to Elasticsearch types (and vice versa)
My dependencies:
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.1.0"
libraryDependencies += "org.apache.spark" %% "spark-mllib" % "2.1.0"
libraryDependencies += "org.elasticsearch" % "elasticsearch-spark-20_2.11" % "5.4.0"
Did I miss something? And how can I convert types in a convenient way?
source to share
OK, I found a solution. If you use esRDD
, information about all types will be lost.
Better to use:
val df = sparkSession.read.format("org.elasticsearch.spark.sql").option("es.read.field.as.array.include", "").load("index")
You can tweak in option
, if you've done it before, option
you can ignore it.
The returend data is in DataFrame
, and the data types are stored (converted to sql.DataTypes
) in the schema as long as the conversion is supported elasticsearch-spark
.
And now you can do whatever you want.
source to share