How do I get the value and type of each column of every row in a data frame?

How can I convert a data frame to a tuple that includes the data type for each column?

I have several dataframes with different sizes and types. I need to be able to determine the type and value of each column and row of a given dataframe so that I can perform some actions that are type dependent.

For example, let's say I have a dataframe that looks like this:

+-------+-------+
|  foo  |  bar  |
+-------+-------+
| 12345 | fnord |
|    42 |   baz |
+-------+-------+

      

i need to get

Seq(
  (("12345", "Integer"), ("fnord", "String")),
  (("42", "Integer"), ("baz", "String"))
)

      

or something as simple to iterate over and work with programmatically.

Thanks in advance and sorry for being a very nubist question I'm sure.

+3


source to share


1 answer


If I understand your question correctly, then your solution will be as follows.

  val df = Seq(
    (12345, "fnord"),
    (42, "baz"))
    .toDF("foo", "bar")

      

This creates the data frame that you already have.

+-----+-----+
|  foo|  bar|
+-----+-----+
|12345|fnord|
|   42|  baz|
+-----+-----+

      

The next step is to extract dataType

from schema

from dataFrame

and create iterator

.

val fieldTypesList = df.schema.map(struct => struct.dataType)

      

The next step - to convert dataFrame

rows

to the list rdd

, and map

each value

in dataType

of the created abovelist



  val dfList = df.rdd.map(row => row.toString().replace("[","").replace("]","").split(",").toList)
  val tuples = dfList.map(list => list.map(value => (value, fieldTypesList(list.indexOf(value)))))

      

Now if we print it

tuples.foreach(println)

      

This will give

List((12345,IntegerType), (fnord,StringType))
List((42,IntegerType), (baz,StringType))

      

That you can reprogram and run the software

+2


source







All Articles