Loading csv files into sparkR

Question

Loading csv files into sparkR

In R, I created two datasets which I saved as csv files with

liste <-write.csv(liste, file="/home/.../liste.csv", row.names=FALSE)
    data <- write.csv(data, file="/home/.../data.csv", row.names=FALSE)

Now I want to open these csv files in SparkR. So I type

liste <- read.df(sqlContext, "/home/.../liste.csv", "com.databricks.spark.csv", header="true", delimiter= "\t")

data <- read.df(sqlContext, "/home/.../data.csv", "com.databricks.spark.csv", header="true", delimiter= "\t")

It turns out that one dataset "liste" has been successfully loaded into SparkR, however the "data" cannot be loaded for some strange reason.

'liste' is just a vector of numbers in R, whereas "data" is a data.frame that I loaded into R and removed some parts of the data.frame. SparkR gives me this error message:

Error: returnStatus == 0 is not TRUE

+3

r sparkr

Ole Petersen 10 Aug 15 at 7:11

source to share

1 answer

Wannes rosiers · Accepted Answer · 2015-08-10T08:41:12+0000

Liste is a local list that can be written with write.csv, data is a SparkR DataFrame that cannot be written with write.csv: it only writes its pointer, not the DataFrame. That's why it's only 33 kb

Loading csv files into sparkR

More articles: