How do I make the Marpa sequence rules greedy?
I am working on a grammar Marpa::R2
that groups elements in text. Each group can only contain elements of a specific type, but is not explicitly limited. This causes problems because x...x
(wherein .
represents an element which can be part of a group) may be grouped as a x(...)x
, x(..)(.)x
, x(.)(..)x
, x(.)(.)(.)x
. In other words, grammar is very ambiguous.
How can I remove this ambiguity if I only need parsing x(...)x
, i.e. if I want to make the quantifier +
behave only "greedy" (as it does in Perl regular expressions)?
In the grammar below, I tried adding rank
adverbs to the sequence rules to prioritize Group
over Sequence
, but that doesn't work.
Below is a test case that accomplishes this behavior.
use strict;
use warnings;
use Marpa::R2;
use Test::More;
my $grammar_source = <<'END_GRAMMAR';
inaccessible is fatal by default
:discard ~ space
:start ::= Sequence
Sequence
::= SequenceItem+ action => ::array
SequenceItem
::= WORD action => ::first
| Group action => ::first
Group
::= GroupItem+ action => [name, values]
GroupItem
::= ('[') Sequence (']') action => ::first
WORD ~ [a-z]+
space ~ [\s]+
END_GRAMMAR
my $input = "foo [a] [b] bar";
diag "perl $^V";
diag "Marpa::R2 " . Marpa::R2->VERSION;
my $grammar = Marpa::R2::Scanless::G->new({ source => \$grammar_source });
my $recce = Marpa::R2::Scanless::R->new({ grammar => $grammar });
$recce->read(\$input);
my $parse_count = 0;
while (my $value = $recce->value) {
is_deeply $$value, ['foo', [Group => ['a'], ['b']], 'bar'], 'expected structure'
or diag explain $$value;
$parse_count++;
}
is $parse_count, 1, 'expected number of parses';
done_testing;
Test case output (FAIL):
# perl v5.18.2 # Marpa::R2 2.09 ok 1 - expected structure not ok 2 - expected structure # Failed test 'expected structure' # at - line 38. # Structures begin differing at: # $got->[1][2] = Does not exist # $expected->[1][2] = ARRAY(0x981bd68) # [ # 'foo', # [ # 'Group', # [ # 'a' # ] # ], # [ # ${\$VAR1->[1][0]}, # [ # 'b' # ] # ], # 'bar' # ] not ok 3 - expected number of parses # Failed test 'expected number of parses' # at - line 41. # got: '2' # expected: '1' 1..3 # Looks like you failed 2 tests of 3.
source to share
The sequence rules are for non-integer cases. Sequence rules can always be rewritten as BNF rules when it gets tricky, which is what I suggest here. Below is your test work:
use strict;
use warnings;
use Marpa::R2;
use Test::More;
my $grammar_source = <<'END_GRAMMAR';
inaccessible is fatal by default
:discard ~ space
# Three cases
# 1.) Just one group.
# 2.) Group follows by alternating words and groups.
# 3.) Alternating words and groups, starting with words
Sequence ::= Group action => ::first
Sequence ::= Group Subsequence action => [values]
Sequence ::= Subsequence action => ::first
Subsequence ::= Words action => ::first
# "action => [values]" makes the test work unchanged.
# The action for the next rule probably should be
# action => [name, values] in order to handle the general case.
Subsequence ::= Subsequence Group Words action => [values]
Words ::= WORD+ action => ::first
Group
::= GroupItem+ action => [name, values]
GroupItem
::= ('[') Sequence (']') action => [value]
WORD ~ [a-z]+
space ~ [\s]+
END_GRAMMAR
my $input = "foo [a] [b] bar";
diag "perl $^V";
diag "Marpa::R2 " . Marpa::R2->VERSION;
my $grammar = Marpa::R2::Scanless::G->new( { source => \$grammar_source } );
my $recce = Marpa::R2::Scanless::R->new( { grammar => $grammar } );
$recce->read( \$input );
my $parse_count = 0;
while ( my $value = $recce->value ) {
is_deeply $$value, [ 'foo', [ Group => ['a'], ['b'] ], 'bar' ],
'expected structure'
or diag explain $$value;
$parse_count++;
} ## end while ( my $value = $recce->value )
is $parse_count, 1, 'expected number of parses';
done_testing;
source to share
Unequal grammar:
Sequence : WORD+ SequenceAfterWords
| Group SequenceAfterGroup
SequenceAfterWords : Group SequenceAfterGroup
|
SequenceAfterGroup : WORD+ SequenceAfterWords
|
Jeffrey Kegler says that leading recursively is handled more efficiently in Marpa. The same approach used above can be brought back to the foreground to do this.
Sequence : SequenceBeforeWords WORD+
| SequenceBeforeGroup Group
SequenceBeforeWords : SequenceBeforeGroup Group
|
SequenceBeforeGroup : SequenceBeforeWords WORD+
|
In both cases
Group : GroupItem+
GroupItem : '[' Sequence ']'
source to share