How do I bind two columns of data in sparkR?

How to link two columns of a dataframe in SparkR from spark 1.4

TIA, Arun

+3


source to share


1 answer


There is no way to do this. Here is a question about spark (1.3) in scala. The only way to do it is to have some sort of row.numbering, because then you can join row.number. What for? Since you can only join tables or add columns based on other already existing columns

data1 <- createDataFrame(sqlContext, data.frame(a=c(1,2,3)))
data2 <- createDataFrame(sqlContext, data.frame(b=c(2,3,4)))

      

Then

withColumn(data1,"b",data1$a + 1)

      



allowed but

withColumn(data1,"b",data2$b)

      

not. From the moment Spark shrinks your DataFrame into blocks to store it, it doesn't know how to concatenate them (it has no idea about the sequence of rows), only when you have row.numbers.

+5


source







All Articles