Parboiled2 for parsing CSV string

Question

Parboiled2 for parsing CSV string

I am trying to parse a single line that contains delimited strings into a sequence of those lines. It should be able to have any character in strings, if the field contains a delimiter, it needs double quotes. To have double quotes in such a field, the double quotes are escaped.

I used this as a starting point: https://github.com/sirthias/parboiled2/blob/695ee6603359cfcb97734edf6dd1d27383c48727/examples/src/main/scala/org/parboiled2/examples/CsvParser.scala

My grammar looks like this:

class CsvParser(val input: ParserInput, val delimiter: String = ",") extends Parser {
  def line: Rule1[Seq[String]] = rule {record ~ EOI}
  def record = rule(oneOrMore(field).separatedBy(delimiter))

  def QUOTE = "\""
  def ESCAPED_QUOTE = "\\\""
  def DELIMITER_QUOTE = delimiter+"\""
  def WS = " \t".replace(delimiter, "")

  def field = rule{whiteSpace ~ ((QUOTE ~ escapedField ~ QUOTE) | unquotedField) ~ whiteSpace}
  def escapedField = rule { capture(zeroOrMore(noneOf(QUOTE) | ESCAPED_QUOTE)) ~> (_.replace(ESCAPED_QUOTE, QUOTE))  } 
  def unquotedField = rule { capture(zeroOrMore(noneOf(DELIMITER_QUOTE))) }
  def whiteSpace = rule(zeroOrMore(anyOf(WS)))
}

When I call it with "quote\"key",1,2

I getInvalid input 'k', expected whiteSpace, ',' or 'EOI' (line 1, column 9)

What am I doing wrong? How do I debug this? (And as a bonus question: how would I extend the grammar to allow the delimiter to be multiple type characters ##

?)

Thank!

+3

scala parsing csv peg parboiled2

DreamFlasher 16 jul. 15 at 15:26

source to share

1 answer

DreamFlasher · Accepted Answer · 2015-07-17T15:55:44+0000

Parboiled2 seems to follow the rules without tracking back.

In this particular case

def escapedField = rule { capture(zeroOrMore(noneOf(QUOTE) | ESCAPED_QUOTE)) ~> (_.replace(ESCAPED_QUOTE, QUOTE))  }

noneOf(QUOTE)

captures \ from \ "and then returns instead of tracking back and tries to capture the full \".

The error was solved with

def escapedField = rule { capture(ESCAPED_QUOTE | zeroOrMore(noneOf(QUOTE))) ~> (_.replace(ESCAPED_QUOTE, QUOTE))  }

Parboiled2 for parsing CSV string

More articles: