Where can I read detailed documentation on defining a grammar for ParseKit?
I'm just getting to know ParseKit, reading Basic Grammar Syntax, but this is just a very simple introduction. I get out of my depth quickly when I want to establish my own grammar. Where do I go from here?
For example, I want to parse a log file in a very common format. Breaking it down into header, body and footer, this would be my BNF for the first line of the header:
<header-line-1> ::= <log-format> <log-id> "," <category> <EOL> <log-format> ::= "Type A Logfile" | "Logfile II" | "Some Other Format" <log-id> ::= "#" <long-int> <category> ::= <some unknown string>
How to determine what ParseKit understands? I'm so far away
@start = header-line-1; header-line-1 = log-format log-id "," category EOL; log-format = 'Type A Logfile'; log-id = '#' ; // and then how to specify a long-int?!? category = char+; char = 'A' | 'a' | 'B' | 'b' | 'C'; //..etc... Surely not?!?
I suspect there should be at least ways to define the range of charachters?
Of course, the book cited by the author of parsekit will probably help me, but it would be nice if someone could help me go with my own little example before I dive into the topic. I'm just researching an idea, just a proof of concept.
source to share
Unfortunately, there is no further (good) documentation on the syntax of the ParseKit grammar. Currently the best resources:
Stephen Metsker A book collector in Java . Good news: it teaches you the design / internals of ParseKit. Bad news: ParseKit's "Grammar Syntax" feature is an optional feature on top of ParseKit that I created and added. Therefore, it is not described in Metsker's book, as its Java library does not have this feature.
in Test ParseKit Xcode Project Target. There are many real world grammatical examples here. You can learn a lot from an example.
As for your specific example, here's how I would define it in the ParseKit syntax.
'\n'; // Tokenizer Directive // tells tokenizer to treat new line chars as // individual Symbol tokens rather than whitespace = headerLine*; headerLine = logFormat logId comma category eol; logFormat = ('Type' 'A' 'Logfile') | ('Logfile' 'II') | ('Some' 'Other' 'Format'); logId = hash Number; category = Any+; comma = ','; hash = '#'; eol = '\n';=
It's important to remember that parsing in ParseKit is a two-phase process:
- Tokenizing (done
and modified by the Tokenizer directives in your grammar)
- Analysis (performed by the parser created in the Declaration in the grammar)
So, the parser created by your grammar works with Tokens that have already been denoted by Tokenizer. It does not work for single characters or long strings of multiple tokens.
source to share