What is the parameter parameter of the spark_write_csv dplyr function?
I was looking for a way to make spark_write_csv
to upload only one file to S3 because I want to store the regression result on S3. I was wondering if it has options
some parameter that determines the number of sections. I couldn't find it anywhere in the documentation. Or is there another efficient way to load the result table in S3?
Any help is appreciated!
source to share
options
the argument is equivalent to a word (you can check the documentation for a complete list of options specific to the CSV source), and it cannot be used to control the number of output sections. options
DataFrameWriter
DataFrameWriter.csv
While not generally recommended , you can use the Spark API to combine data and convert it back to sparklyr
tbl
:
df %>%
spark_dataframe() %>%
invoke("coalesce", 1L) %>%
invoke("createOrReplaceTempView", "_coalesced")
tbl(sc, "_coalesced") %>% spark_write_csv(...)
or, in recent versions, sparklyr::sdf_coalesce
df %>% sparklyr::sdf_coalesce()
source to share