How can I read a Json file with a specific format using Spark Scala?

I am trying to read a Json file similar to:



I tried the command:

    val df ="namefile") 


But it doesn't work: my columns are not recognized ...


source to share

2 answers

If you want to use read.json

, you need one JSON document per line. If your file contains a valid JSON array of documents, it simply won't work as expected. For example, taking your example, the data input file should be formatted like this:

{"IFAM":"EQR","KTM":1430006400000,"COL":21,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"31","Nrout":"0","up":null,"Crate":"2"}, {"MLrate":"30","Nrout":"5","up":null,"Crate":"2"} ,{"MLrate":"34","Nrout":"0","up":null,"Crate":"4"} ,{"MLrate":"33","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"8","up":null,"Crate":"2"} ]}
{"IFAM":"EQR","KTM":1430006400000,"COL":22,"DATA":[{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"0"} ,{"MLrate":"35","Nrout":"1","up":null,"Crate":"5"} ,{"MLrate":"30","Nrout":"6","up":null,"Crate":"2"} ,{"MLrate":"30","Nrout":"0","up":null,"Crate":"2"} ,{"MLrate":"38","Nrout":"8","up":null,"Crate":"1"} ]}


If you are using read.json

over the add-on, you will see it parsed as expected:

 |-- COL: long (nullable = true)
 |-- DATA: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Crate: string (nullable = true)
 |    |    |-- MLrate: string (nullable = true)
 |    |    |-- Nrout: string (nullable = true)
 |    |    |-- up: string (nullable = true)
 |-- IFAM: string (nullable = true)
 |-- KTM: long (nullable = true)




If you don't want to format your JSON file (line by line), you can create a schema using StructType and MapType using SparkSQL functions

import org.apache.spark.sql.DataFrame 
import org.apache.spark.sql.functions._ 
import org.apache.spark.sql.types._

// Convenience function for turning JSON strings into DataFrames
def jsonToDataFrame(json: String, schema: StructType = null): 
DataFrame = {
    val reader =

// Using a struct
val schema = new StructType().add("a", new StructType().add("b", IntegerType))

// call the function passing the sample JSON data and the schema as parameter
val json_df = jsonToDataFrame("""
     "a": {
        "b": 1
   } """, schema)

// now you can access your json fields
val b_value ="a.b")


See this reference documentation for more examples and details.



All Articles