How do I create a DataFrame with zeros using toDF?

How do you create a dataframe containing zeros from a sequence using .toDF?

It works:

val df = Seq((1,"a"),(2,"b")).toDF("number","letter")

      

but I would like to do something like:

val df = Seq((1, NULL),(2,"b")).toDF("number","letter")

      

+3


source to share


2 answers


NULL

not defined in API

anywhere but NULL

is, so you can define as

val df2 = Seq((1, null), (2, "b")).toDF("number","letter")

      

And you should have an output like

+------+------+
|number|letter|
+------+------+
|1     |null  |
|2     |b     |
+------+------+

      



The trick is to use two or more values ​​for a nullable column to determine the type that Spark SQL should use.

After that it won't work:

val df = Seq((1, null)).toDF("number","letter")

      

Spark doesn't know what the type of letter is in this case.

+2


source


In addition to Ramesh's answers, it is worth noting that since it toDF

uses reflection to render the schema, it is important that the provided sequence is of the correct type. And if scala's type expression is not enough, you need to explicitly specify the type.

For example, if you want the second column to be a null integer, then none of the following works:

Seq((1, null))

is inferred Seq[(Int, Null)]

Seq((1, null), (2, 2))

is inferredSeq[(Int, Any)]

In this case, you need to explicitly specify the type for the second column. There are at least two ways to do this. You can explicitly specify the generic type for the sequence



Seq[(Int, Integer)]((1, null)).toDF

      

or create a case class for a string:

case class MyRow(x: Int, y: Integer)
Seq(MyRow(1, null)).toDF

      

Note that I used Integer

instead Int

as the later one, which is a primitive type, cannot accept null values.

0


source







All Articles