How do I get the value and type of each column of every row in a data frame?
How can I convert a data frame to a tuple that includes the data type for each column?
I have several dataframes with different sizes and types. I need to be able to determine the type and value of each column and row of a given dataframe so that I can perform some actions that are type dependent.
For example, let's say I have a dataframe that looks like this:
+-------+-------+ | foo | bar | +-------+-------+ | 12345 | fnord | | 42 | baz | +-------+-------+
i need to get
Seq(
(("12345", "Integer"), ("fnord", "String")),
(("42", "Integer"), ("baz", "String"))
)
or something as simple to iterate over and work with programmatically.
Thanks in advance and sorry for being a very nubist question I'm sure.
source to share
If I understand your question correctly, then your solution will be as follows.
val df = Seq(
(12345, "fnord"),
(42, "baz"))
.toDF("foo", "bar")
This creates the data frame that you already have.
+-----+-----+
| foo| bar|
+-----+-----+
|12345|fnord|
| 42| baz|
+-----+-----+
The next step is to extract dataType
from schema
from dataFrame
and create iterator
.
val fieldTypesList = df.schema.map(struct => struct.dataType)
The next step - to convert dataFrame
rows
to the list rdd
, and map
each value
in dataType
of the created abovelist
val dfList = df.rdd.map(row => row.toString().replace("[","").replace("]","").split(",").toList)
val tuples = dfList.map(list => list.map(value => (value, fieldTypesList(list.indexOf(value)))))
Now if we print it
tuples.foreach(println)
This will give
List((12345,IntegerType), (fnord,StringType))
List((42,IntegerType), (baz,StringType))
That you can reprogram and run the software
source to share