How to save CSV with all fields?

The code below does not add double quotes, which are standard. I also tried adding # and single quote using option quote

with no success. I also used quoteMode

with parameters ALL

and NON_NUMERIC

, still not changing the output.

s2d.coalesce(64).write
  .format("com.databricks.spark.csv")
  .option("header", "false")
  .save(fname)

      

Are there any other options I can try? I am using spark-csv 2.11 over sparks 2.1.

Result:

d4c354ef,2017-03-14 16:31:33,2017-03-14 16:31:46,104617772177,340618697

      

The result I'm looking for:

"d4c354ef","2017-03-14 16:31:33","2017-03-14 16:31:46",104617772177,340618697  

      

+3


source to share


2 answers


tl; dr Enable option quoteAll

.

scala> Seq(("hello", 5)).toDF.write.option("quoteAll", true).csv("hello5.csv")

      

The above output gives the following output:

$ cat hello5.csv/part-00000-a0ecb4c2-76a9-4e08-9c54-6a7922376fe6-c000.csv
"hello","5"

      

It is assumed that quote

there are "

(see. CSVOptions )



This, however, will not give you "Double quotes around all non-numeric characters". Unfortunately.

You can see all the options in CSVOptions , which serves as a source of options for reading and writing CSV.

ps com.databricks.spark.csv

is currently a simple alias for the format csv

. You can use both, but it's preferable csv

.

ps Use option("header", false)

( false

as boolean not String), which makes your code a bit more type safe.

+2


source


In Spark 2.1, where the old CSV library was built in, I don't see any option for what you want in the method csv

DataFrameWriter

as shown.

So, I think you need to map your data "manually" to determine which of the components Row

are not numbers and quote them accordingly. You can use a direct helper function isNumeric

like this:



def isNumeric(s: String) = s.nonEmpty && s.forall(Character.isDigit)

      

As you draw over yours DataSet

, include the values, where isNumeric

- false

.

+1


source







All Articles