Discard and Omit unstructured text with Perl Marpa?
I am using Marpa :: R2 :: Scanless :: G to parse legacy text file format. The file format has a well-structured section at the top followed by a poorly structured mess of text and uuencoded. The last material can be completely ignored, but I cannot figure out how to tell the Marpa SLIF interface: everything is done; don't worry about the leftover text.
In very simplified terms, the file might look like this:
("field_a_val" 1,
"field_b_vals" (1,2,3),
"field_c_pairs" ((a 1)(b 2)(c 3))
)now_stuff_i_dont_care_about a;oiwermnv;alwfja;sldfa
asdf343avadfg;okm;om;oia3
e{<|1ydblV, HYED c"L. 78b."8
U=nK Wpw: Qh(e x!,~dU...
I have all the data I need aligned from the top, but when it hits the bottom junk, if I don't try to match it, I get: Error in SLIF parsing: Parse exhausted but lexemes remain.
I can't figure out how to create a term that says to decompose potentially megabytes of crap, just keep going to the end of the file regardless of the text found. No luck with my attempts to use: discard or "pause => after", although I am probably using them incorrectly.
For context, I don't have a clear understanding of parsing and lexing. I hit the grammar until it worked.
source to share
The simplest thing would be to present a token that matches everything else that you are not interested in:
lexeme default = latm => 1 # this prevents the rest from matching the whole document
Header
::= ActualHeader (AllTheRest) action => ::first
ActualHeader
::= ... # your code here
...
AllTheRest
::= action => ::undef # rest is optional
AllTheRest
::= THE_REST action => ::undef # matches anything
THE_REST ~ [\s\S]+
We cannot use the rule :discard
for THE_REST
because it will allow the rest to happen anywhere, but we only want to resolve it at the end. The character class [\s\S]
matches all characters.
source to share
There was once a discussion on a similar topic on the marpa-parser mailing list, but the code examples are somehow from there, so I would suggest a working example from my answer to another SO question .
Not sure if this is the correct way to do things like this in Marpa, although not tested for a few megabyte lines.
Hope it helps.
source to share