Discard and skip over unstructured text with Perl Marpa?

141 Views Asked by At

I'm using Marpa::R2::Scanless::G to parse a legacy text file format. The file format has a well structured section up top, followed by a poorly structured mess of text and uuencoded stuff. The latter stuff can be entirely ignored, but I can't figure out how to tell the Marpa SLIF interface: You're done; don't bother with the remaining text.

In very simplified terms a file might look like this:

("field_a_val"  1,
 "field_b_vals" (1,2,3),
 "field_c_pairs" ((a 1)(b 2)(c 3))
)now_stuff_i_dont_care_about a;oiwermnv;alwfja;sldfa
asdf343avadfg;okm;om;oia3
e{<|1ydblV, HYED c"L. 78b."8
U=nK Wpw: Qh(e x!,~dU...

I have all the data I need parsed out of the top section, but when it hits the bottom junk if I don't try to match it I get: Error in SLIF parse: Parse exhausted, but lexemes remain.

I cannot figure out how to craft a term that says to slurp up potentially megabytes of crap, just keep going to the end of the file regardless of the encountered text. No luck with my attempts to use :discard or 'pause => after', though I'm likely misusing them.

For context I don't have a solid understanding of parsing and lexing. I banged on the grammar until it worked.

2

There are 2 best solutions below

2
On BEST ANSWER

The simplest thing to do would be to introduce a lexeme that matches all the rest you're not interested in:

lexeme default = latm => 1  # this prevents the rest from matching the whole document

Header
  ::= ActualHeader (AllTheRest) action => ::first
ActualHeader
  ::= ... # your code here
...

AllTheRest
  ::=           action => ::undef  # rest is optional
AllTheRest
  ::= THE_REST  action => ::undef  # matches anything
THE_REST ~ [\s\S]+

We cannot use a :discard rule for THE_REST because that would allow the rest to occur anywhere, but we only want to allow it at the end. The character class [\s\S] matches all characters.

0
On

There was once a discussion of a similar topic on marpa-parser mailing list, but code examples somehow from there, so I'd suggest a working example from my answer to another SO question.

Not sure if this is the proper way to do such things in Marpa though and not tested for multi-megabyte strings.

Hope this helps.