What is the parameter parameter of the spark_write_csv dplyr function?

I was looking for a way to make spark_write_csv

to upload only one file to S3 because I want to store the regression result on S3. I was wondering if it has options

some parameter that determines the number of sections. I couldn't find it anywhere in the documentation. Or is there another efficient way to load the result table in S3?

Any help is appreciated!

+3


source to share


1 answer


options

the argument is equivalent to a word (you can check the documentation for a complete list of options specific to the CSV source), and it cannot be used to control the number of output sections. options

DataFrameWriter

DataFrameWriter.csv

While not generally recommended , you can use the Spark API to combine data and convert it back to sparklyr

tbl

:

df %>% 
  spark_dataframe() %>% 
  invoke("coalesce", 1L) %>% 
  invoke("createOrReplaceTempView", "_coalesced")

tbl(sc, "_coalesced") %>% spark_write_csv(...)

      



or, in recent versions, sparklyr::sdf_coalesce

df %>% sparklyr::sdf_coalesce()

      

0


source







All Articles