How to accurately match "n" given characters with FastParse
The FastParse parser-combinator scala library gives you a .rep(n)
"Repeat" so you can create a new parser that tries to parse the given parser n
or more times. What is the canonical way to do this if I want it to n
match exactly ?
In my case, I want to parse a 40 character Git commit ID - if it was longer than 40 characters, it is not a commit ID and it shouldn't match.
The closest example I've found in the docs so far:
val unicodeEscape = P( "u" ~ hexDigit ~ hexDigit ~ hexDigit ~ hexDigit )
... which matches 4 characters with simple repetition (verbose for 40 character commit ID).
These are parser combinators, not regular expressions, where the answer will be similar to \p{XDigit}{40}
.
source to share
Okay, even if this function is not available now, you can write a function that applies a ~
certain number of times:
def repExactly(parser: Parser[Unit])(times: Int): Parser[Unit] =
Iterator.iterate(parser)(_ ~ parser).drop(times - 1).next()
Here's a little test:
object Main extends App {
import fastparse._
def repExactly(parser: Parser[Unit])(times: Int): Parser[Unit] =
Iterator.iterate(parser)(_ ~ parser).drop(times - 1).next()
val hexDigit = P( CharIn('0'to'9', 'a'to'f', 'A'to'F') )
def fiveHexDigits = repExactly(hexDigit)(5) ~ End
println(fiveHexDigits.parse("123a"))
println(fiveHexDigits.parse("123ab"))
println(fiveHexDigits.parse("123abc"))
}
And the conclusion
Failure(hexDigit:4 / CharIn("0123456789abcdefABCDEF"):4 ..."", false)
Success((), 5)
Failure(End:5 ..."c", false)
And here's a general way to implement this functionality as an operator *
Parser
(the original implementationrep
does something pretty confusing, so my implementation might not account for some cases. Also, I haven't tested how this works with arguments that have slits):
object Main extends App {
import fastparse._
implicit class ParserExtension[T](parser: Parser[T]) {
def *[R] (times: Int)(implicit ev: Implicits.Repeater[T, R]): Parser[R] = {
assert(times >= 1)
Iterator.iterate(parser map { t =>
val acc = ev.initial
ev.accumulate(t, acc)
acc
}){ prev: Parser[ev.Acc] =>
(prev ~ parser) map {
case (acc, t) =>
ev.accumulate(t, acc)
acc
}
}.drop(times - 1).next() map (acc => ev.result(acc))
}
}
val hexDigit = P( CharIn('0'to'9', 'a'to'f', 'A'to'F') )
val fiveDigitsSeq = (hexDigit.! * 5) ~ End
println(fiveDigitsSeq.parse("123a")) // Failure ...
println(fiveDigitsSeq.parse("123ab")) // Success(ArrayBuffer(1, 2, 3, a, b), 5)
println(fiveDigitsSeq.parse("123abc")) // Failure ...
println()
val fiveDigitsStr = (hexDigit * 5).! ~ End
println(fiveDigitsStr.parse("123a")) // Failure ...
println(fiveDigitsStr.parse("123ab")) // Success(123ab, 5)
println(fiveDigitsStr.parse("123abc")) // Failure ...
}
source to share