How do I create a DataFrame with zeros using toDF?

Question

How do I create a DataFrame with zeros using toDF?

How do you create a dataframe containing zeros from a sequence using .toDF?

It works:

val df = Seq((1,"a"),(2,"b")).toDF("number","letter")

but I would like to do something like:

val df = Seq((1, NULL),(2,"b")).toDF("number","letter")

+3

scala apache-spark apache-spark-sql

user2682459 23 june 17 at 15:35

source to share

2 answers

In addition to Ramesh's answers, it is worth noting that since it toDF

uses reflection to render the schema, it is important that the provided sequence is of the correct type. And if scala's type expression is not enough, you need to explicitly specify the type.

For example, if you want the second column to be a null integer, then none of the following works:

Seq((1, null))

is inferred Seq[(Int, Null)]

Seq((1, null), (2, 2))

is inferredSeq[(Int, Any)]

In this case, you need to explicitly specify the type for the second column. There are at least two ways to do this. You can explicitly specify the generic type for the sequence

Seq[(Int, Integer)]((1, null)).toDF

or create a case class for a string:

case class MyRow(x: Int, y: Integer)
Seq(MyRow(1, null)).toDF

Note that I used Integer

instead Int

as the later one, which is a primitive type, cannot accept null values.

0

Alex vayda May 25 '18 at 21:14

source to share

Ramesh Maharjan · Accepted Answer · 2017-06-23T16:33:35+0000

NULL

not defined in API

anywhere but NULL

is, so you can define as

val df2 = Seq((1, null), (2, "b")).toDF("number","letter")

And you should have an output like

+------+------+
|number|letter|
+------+------+
|1     |null  |
|2     |b     |
+------+------+

The trick is to use two or more values for a nullable column to determine the type that Spark SQL should use.

After that it won't work:

val df = Seq((1, null)).toDF("number","letter")

Spark doesn't know what the type of letter is in this case.

How do I create a DataFrame with zeros using toDF?

More articles: