How to save CSV with all fields?

Question

How to save CSV with all fields?

The code below does not add double quotes, which are standard. I also tried adding # and single quote using option quote

with no success. I also used quoteMode

with parameters ALL

and NON_NUMERIC

, still not changing the output.

s2d.coalesce(64).write
  .format("com.databricks.spark.csv")
  .option("header", "false")
  .save(fname)

Are there any other options I can try? I am using spark-csv 2.11 over sparks 2.1.

Result:

d4c354ef,2017-03-14 16:31:33,2017-03-14 16:31:46,104617772177,340618697

The result I'm looking for:

"d4c354ef","2017-03-14 16:31:33","2017-03-14 16:31:46",104617772177,340618697

+3

scala spark-csv apache-spark

Arvind Kandaswamy Apr 26. 17 at 20:31

source to share

2 answers

In Spark 2.1, where the old CSV library was built in, I don't see any option for what you want in the method csv

DataFrameWriter

as shown.

So, I think you need to map your data "manually" to determine which of the components Row

are not numbers and quote them accordingly. You can use a direct helper function isNumeric

like this:

def isNumeric(s: String) = s.nonEmpty && s.forall(Character.isDigit)

As you draw over yours DataSet

, include the values, where isNumeric

- false

.

+1

Vidya Apr 27. 17 at 12:47

source to share

Jacek Laskowski · Accepted Answer · 2017-04-27T19:22:04+0000

tl; dr Enable option quoteAll

.

scala> Seq(("hello", 5)).toDF.write.option("quoteAll", true).csv("hello5.csv")

The above output gives the following output:

$ cat hello5.csv/part-00000-a0ecb4c2-76a9-4e08-9c54-6a7922376fe6-c000.csv
"hello","5"

It is assumed that quote

there are "

(see. CSVOptions )

This, however, will not give you "Double quotes around all non-numeric characters". Unfortunately.

You can see all the options in CSVOptions , which serves as a source of options for reading and writing CSV.

ps com.databricks.spark.csv

is currently a simple alias for the format csv

. You can use both, but it's preferable csv

.

ps Use option("header", false)

( false

as boolean not String), which makes your code a bit more type safe.

How to save CSV with all fields?

More articles: