What is the parameter parameter of the spark_write_csv dplyr function?

Question

What is the parameter parameter of the spark_write_csv dplyr function?

I was looking for a way to make spark_write_csv

to upload only one file to S3 because I want to store the regression result on S3. I was wondering if it has options

some parameter that determines the number of sections. I couldn't find it anywhere in the documentation. Or is there another efficient way to load the result table in S3?

Any help is appreciated!

+3

r amazon-s3 dplyr apache-spark sparklyr

chandni ramdasan May 19 '17 at 11:09

source to share

1 answer

user6910411 · Answer 1 · 2017-05-19T13:07:03+0000

options

the argument is equivalent to a word (you can check the documentation for a complete list of options specific to the CSV source), and it cannot be used to control the number of output sections. options

DataFrameWriter

DataFrameWriter.csv

While not generally recommended , you can use the Spark API to combine data and convert it back to sparklyr

tbl

:

df %>% 
  spark_dataframe() %>% 
  invoke("coalesce", 1L) %>% 
  invoke("createOrReplaceTempView", "_coalesced")

tbl(sc, "_coalesced") %>% spark_write_csv(...)

or, in recent versions, sparklyr::sdf_coalesce

df %>% sparklyr::sdf_coalesce()

What is the parameter parameter of the spark_write_csv dplyr function?

More articles: