What is the parameter parameter of the spark_write_csv dplyr function?
I was looking for a way to make spark_write_csv
to upload only one file to S3 because I want to store the regression result on S3. I was wondering if it has options
some parameter that determines the number of sections. I couldn't find it anywhere in the documentation. Or is there another efficient way to load the result table in S3?
Any help is appreciated!
options
the argument is equivalent to a word (you can check the documentation for a complete list of options specific to the CSV source), and it cannot be used to control the number of output sections. options
DataFrameWriter
DataFrameWriter.csv
While not generally recommended , you can use the Spark API to combine data and convert it back to sparklyr
tbl
:
df %>%
spark_dataframe() %>%
invoke("coalesce", 1L) %>%
invoke("createOrReplaceTempView", "_coalesced")
tbl(sc, "_coalesced") %>% spark_write_csv(...)
or, in recent versions, sparklyr::sdf_coalesce
df %>% sparklyr::sdf_coalesce()