How to accurately match "n" given characters with FastParse

The FastParse parser-combinator scala library gives you a .rep(n)

"Repeat" so you can create a new parser that tries to parse the given parser n

or more times. What is the canonical way to do this if I want it to n

match exactly ?

In my case, I want to parse a 40 character Git commit ID - if it was longer than 40 characters, it is not a commit ID and it shouldn't match.

The closest example I've found in the docs so far:

val unicodeEscape = P( "u" ~ hexDigit ~ hexDigit ~ hexDigit ~ hexDigit )

      

... which matches 4 characters with simple repetition (verbose for 40 character commit ID).

These are parser combinators, not regular expressions, where the answer will be similar to \p{XDigit}{40}

.

+3


source to share


3 answers


Since the issue was closed by this commit , rep maintains a maximum keyword argument. It also now supports exactly the key argument.



hexdigit.rep(exactly = 40)

      

+5


source


Okay, even if this function is not available now, you can write a function that applies a ~

certain number of times:

def repExactly(parser: Parser[Unit])(times: Int): Parser[Unit] =
  Iterator.iterate(parser)(_ ~ parser).drop(times - 1).next()

      

Here's a little test:

object Main extends App {

  import fastparse._

  def repExactly(parser: Parser[Unit])(times: Int): Parser[Unit] =
    Iterator.iterate(parser)(_ ~ parser).drop(times - 1).next()

  val hexDigit = P( CharIn('0'to'9', 'a'to'f', 'A'to'F') )
  def fiveHexDigits = repExactly(hexDigit)(5) ~ End

  println(fiveHexDigits.parse("123a"))
  println(fiveHexDigits.parse("123ab"))
  println(fiveHexDigits.parse("123abc"))

}

      



And the conclusion

Failure(hexDigit:4 / CharIn("0123456789abcdefABCDEF"):4 ..."", false)
Success((), 5)
Failure(End:5 ..."c", false)

      

And here's a general way to implement this functionality as an operator *

Parser

(the original implementationrep

does something pretty confusing, so my implementation might not account for some cases. Also, I haven't tested how this works with arguments that have slits):

object Main extends App {

  import fastparse._

  implicit class ParserExtension[T](parser: Parser[T]) {
    def *[R] (times: Int)(implicit ev: Implicits.Repeater[T, R]): Parser[R] = {
      assert(times >= 1)

      Iterator.iterate(parser map { t =>
        val acc = ev.initial
        ev.accumulate(t, acc)
        acc
      }){ prev: Parser[ev.Acc] =>
        (prev ~ parser) map {
          case (acc, t) =>
            ev.accumulate(t, acc)
            acc
        }
      }.drop(times - 1).next() map (acc => ev.result(acc))
    }
  }

  val hexDigit = P( CharIn('0'to'9', 'a'to'f', 'A'to'F') )

  val fiveDigitsSeq = (hexDigit.! * 5) ~ End

  println(fiveDigitsSeq.parse("123a"))   // Failure ...
  println(fiveDigitsSeq.parse("123ab"))  // Success(ArrayBuffer(1, 2, 3, a, b), 5)
  println(fiveDigitsSeq.parse("123abc")) // Failure ...
  println()

  val fiveDigitsStr = (hexDigit * 5).! ~ End

  println(fiveDigitsStr.parse("123a"))   // Failure ...
  println(fiveDigitsStr.parse("123ab"))  // Success(123ab, 5)
  println(fiveDigitsStr.parse("123abc")) // Failure ...
}

      

+3


source


Ah, it looks like it is not currently available, but is a known "missing feature" for FastParse:

https://github.com/lihaoyi/fastparse/issues/27

+2


source







All Articles