Matching single-character brackets

Given the grammar rule (BNF, |

means or):

x := a | x x | x + x | x + "x" | "x" + x | "x" + "x"

      

from

  • +

    left associative ( a+a+a

    means (a+a)+a

    ),
  • left-associative concatenation ( aaa

    mean (aa)a

    , not a(aa)

    ),
  • and +

    lazily eating operands ( aa+aa

    means a(a+a)a

    ).

Problem: Is this grammar ambiguous? Ie is it possible to parse a string in two different ways?

<strong> Examples:

Allowed: a

, a+a

, a+"a"

, "a+a"+"a+a"

(read (a+a)+(a+a)

), ""a"+"a""+"a"

(read ((a)+(a))+(a)

) a+a+a

, a+"a"+a

.

Prohibited: "a+a"

, +"a"

, a++a

, "a"

, a+"a

, ""a+a"+a"

.

Appendix: I hate to avoid {

it }

in LaTeX too, so I wanted to create LaTeX dialogs in which only one character needs to be escaped, so replace both {

with }

one character "

for example, and write something like ""1+2"/3"^"a+b"

instead {\frac{1+2}{3}}^{a+b}

.

+3


source to share


1 answer


Here is a quick and dirty script using Marpa :: R2 , Marpa's Perl interface , a generic BNF for parsing input with the grammar you provided and a modified version of it that supports lazy power and left linking, but doesn't disallow "a"

: code , output .

The grammar is not ambiguous for the inputs you provided as parse () , otherwise it will throw an exception.

Hope it helps.

PS Using the general Marpa syntax decoupling to provide a better syntax interface for TeX (among others) has been discussed in the Marpa community .

update : re-request comment

This grammar (in Marpa SLIF DSL , || means lower priority)



x ::= a
   ||    x     '+'     x
   |     x     '+' '"' x '"'
   | '"' x '"' '+'     x
   | '"' x '"' '+' '"' x '"'
   ||    x             x

      

parses the input in question uniquely, except "a+a"+"a+a"

for which an alternative might be needed "x"

(which will make the grammar ambiguous as rici helps in the comments below, more on this in the next paragraph): code , output .

In general, with double quotes serving as parens, '+' as, well, plus, it's tempting to add a sign for op with a lower precedence than "+", like ".". for concatenation and make it a classic term / factor grammar, which can be expressed as follows in a Marpa SLIF DSL:

x ::= a
  || '"' x '"' assoc => group
  || x '+' x
  || x '.' x

      

Update 1 :

# input: "a+a"+"a+a"
Setting trace_terminals option
Lexer "L0" accepted lexeme L1c1 e1: '"'; value="""
Lexer "L0" accepted lexeme L1c1 e1: '"'; value="""
Lexer "L0" accepted lexeme L1c2 e2: a; value="a"
Lexer "L0" accepted lexeme L1c3 e3: '+'; value="+"
Lexer "L0" accepted lexeme L1c3 e3: '+'; value="+"
Lexer "L0" accepted lexeme L1c4 e4: a; value="a"
Lexer "L0" accepted lexeme L1c5 e5: '"'; value="""
Lexer "L0" accepted lexeme L1c5 e5: '"'; value="""
Lexer "L0" accepted lexeme L1c6 e6: '+'; value="+"
Lexer "L0" accepted lexeme L1c6 e6: '+'; value="+"
Lexer "L0" accepted lexeme L1c7 e7: '"'; value="""
Lexer "L0" accepted lexeme L1c8 e8: a; value="a"
Error in SLIF parse: No lexeme found at line 1, column 9
* String before error: "a+a"+"a
* The error was at line 1, column 9, and at character 0x002b '+', ...
* here: +a"
Marpa::R2 exception at C:\cygwin\home\Ruslan\Marpa-R2-work\q27655176.t line 63.

Progress report is:
F3 @7-8 L1c7-8 x -> a .
R7:6 @0-8 L1c1-8 x -> '"' x '"' '+' '"' x . '"'
# ast dump:
undef

      

+2


source







All Articles