Fixed bug when accessing empty or null arrays
I have a JSON file with this type of schema:
{
"name" : "john doe",
"phone-numbers" : {
"home": ["1111", "222"],
"country" : "England"
}
}
The array of home phone numbers can sometimes be empty.
My spark app gets a list of these JSONS and does the following:
val dataframe = spark.read.json(filePaths: _*)
val result = dataframe.select($"name",
explode(dataframe.col("phone-numbers.home")))
When the "home" array is empty, I get the following error when I try to blow it up:
org.apache.spark.sql.AnalysisException: Unable to resolve '
phone-numbers
[' home ']' due to datatype mismatch: argument 2 requires an integral type, but 'home' is of string type. ;;
Is there an elegant way to prevent a spark from exploding this field if empty or empty?
source to share
The problem is not empty arrays ( "home" : []
), but arrays that are null ( "home" : null
) which don't work withexplode
So, first, filter the null values ββfirst:
val result = df
.filter($"phone-numbers.home".isNotNull)
.select($"name", explode($"phone-numbers.home"))
or replace null values ββwith an empty array (which I would prefer in your situation):
val nullToEmptyArr = udf(
(arr:Array[Long]) => if(arr==null) Array.empty[Long] else arr
)
val result = df
.withColumn("phone-numbers.home",nullToEmptyArr($"phone-numbers.home")) // clean existing column
.select($"name", explode($"phone-numbers.home"))
source to share
There is a class in sparks called DataFrameNaFunctions
, this class is specialized for dealing with missing data in DataFrame
s.
this class contains three main methods : drop
, replace
andfill
to use these methods, the only thing you need to do is call the method df.na
that returns DataFrameNaFunctions
for yours df
, then apply one of the three methods that yours returns df
with the specified operation.
to solve your problem, you can use something like this:
val dataframe = spark.read.json(filePaths: _*)
val result = dataframe.na.drop().select("name",
explode(dataframe.col("phone-numbers.home")))
Hope for this help, best wishes
source to share