How do I create a DataFrame with zeros using toDF?
NULL
not defined in API
anywhere but NULL
is, so you can define as
val df2 = Seq((1, null), (2, "b")).toDF("number","letter")
And you should have an output like
+------+------+
|number|letter|
+------+------+
|1 |null |
|2 |b |
+------+------+
The trick is to use two or more values ββfor a nullable column to determine the type that Spark SQL should use.
After that it won't work:
val df = Seq((1, null)).toDF("number","letter")
Spark doesn't know what the type of letter is in this case.
source to share
In addition to Ramesh's answers, it is worth noting that since it toDF
uses reflection to render the schema, it is important that the provided sequence is of the correct type. And if scala's type expression is not enough, you need to explicitly specify the type.
For example, if you want the second column to be a null integer, then none of the following works:
Seq((1, null))
is inferred Seq[(Int, Null)]
Seq((1, null), (2, 2))
is inferredSeq[(Int, Any)]
In this case, you need to explicitly specify the type for the second column. There are at least two ways to do this. You can explicitly specify the generic type for the sequence
Seq[(Int, Integer)]((1, null)).toDF
or create a case class for a string:
case class MyRow(x: Int, y: Integer)
Seq(MyRow(1, null)).toDF
Note that I used Integer
instead Int
as the later one, which is a primitive type, cannot accept null values.
source to share