How do I bind two columns of data in sparkR?
1 answer
There is no way to do this. Here is a question about spark (1.3) in scala. The only way to do it is to have some sort of row.numbering, because then you can join row.number. What for? Since you can only join tables or add columns based on other already existing columns
data1 <- createDataFrame(sqlContext, data.frame(a=c(1,2,3)))
data2 <- createDataFrame(sqlContext, data.frame(b=c(2,3,4)))
Then
withColumn(data1,"b",data1$a + 1)
allowed but
withColumn(data1,"b",data2$b)
not. From the moment Spark shrinks your DataFrame into blocks to store it, it doesn't know how to concatenate them (it has no idea about the sequence of rows), only when you have row.numbers.
+5
source to share