Adding Constant Structure Column to Spark DataFrame
I want to load struct
from a database collection and attach it as a constant column to every row in the target DataFrame
.
I can load the column I want as DataFrame
a single row and then crossJoin
insert it on each row of the target:
val parentCollectionDF = /* ... load a single row from the database */ val constantCol = broadcast(parentCollectionDF.select("my_column")) val result = childCollectionDF.crossJoin(constantCol)
It works, but it seems wasteful: the data is constant for every row of the child collection, but crossJoin copies it to every row.
If I could hard-code the values, I could use something like childCollection.withColumn("my_column", struct(lit(val1) as "field1", lit(val2) as "field2" /* etc. */))
But I don't know them ahead of time; I need to load a structure from a parent collection.
What I'm looking for is something like:
childCollection.withColumn("my_column", lit(parentCollectionDF.select("my_column").take(1).getStruct(0))
... but I can see from the code for literals that lit()
only base types can be used as an argument for . It is not recommended to pass GenericRowWithSchema or case class here.
Is there a less clumsy way to do this? (Spark 2.1.1, Scala)
[edit: Not the same as this question , which explains how to add a struct with literal (hardcoded) constants. My structure should be loaded dynamically.]
source to share
No one has answered this question yet
See similar questions:
or similar: