How to save CSV with all fields?
The code below does not add double quotes, which are standard. I also tried adding # and single quote using option quote
with no success. I also used quoteMode
with parameters ALL
and NON_NUMERIC
, still not changing the output.
s2d.coalesce(64).write
.format("com.databricks.spark.csv")
.option("header", "false")
.save(fname)
Are there any other options I can try? I am using spark-csv 2.11 over sparks 2.1.
Result:
d4c354ef,2017-03-14 16:31:33,2017-03-14 16:31:46,104617772177,340618697
The result I'm looking for:
"d4c354ef","2017-03-14 16:31:33","2017-03-14 16:31:46",104617772177,340618697
source to share
tl; dr Enable option quoteAll
.
scala> Seq(("hello", 5)).toDF.write.option("quoteAll", true).csv("hello5.csv")
The above output gives the following output:
$ cat hello5.csv/part-00000-a0ecb4c2-76a9-4e08-9c54-6a7922376fe6-c000.csv
"hello","5"
It is assumed that quote
there are "
(see. CSVOptions )
This, however, will not give you "Double quotes around all non-numeric characters". Unfortunately.
You can see all the options in CSVOptions , which serves as a source of options for reading and writing CSV.
ps com.databricks.spark.csv
is currently a simple alias for the format csv
. You can use both, but it's preferable csv
.
ps Use option("header", false)
( false
as boolean not String), which makes your code a bit more type safe.
source to share
In Spark 2.1, where the old CSV library was built in, I don't see any option for what you want in the method csv
DataFrameWriter
as shown.
So, I think you need to map your data "manually" to determine which of the components Row
are not numbers and quote them accordingly. You can use a direct helper function isNumeric
like this:
def isNumeric(s: String) = s.nonEmpty && s.forall(Character.isDigit)
As you draw over yours DataSet
, include the values, where isNumeric
- false
.
source to share