Wrong line number in parse Exception

Question

Wrong line number in parse Exception

I have a simple language defined in pyparsing. The parsing works well, but the problem is with the error messages. They show the wrong line number. I am showing the main part of the code here

communications = Group( Suppress(CaselessLiteral("communications")) + op + ZeroOrMore(communicationList) + cl + semicolon)

language = Suppress(CaselessLiteral("language")) + (CaselessLiteral("cpp")|CaselessLiteral("python")) + semicolon

componentContents = communications.setResultsName('communications') & language.setResultsName('language') & gui.setResultsName('gui') & options.setResultsName('options')

component = Suppress(CaselessLiteral("component")) + identifier.setResultsName("name") + op + componentContents.setResultsName("properties") + cl + semicolon

CDSL = idslImports.setResultsName("imports") + component.setResultsName("component")

Reports the correct line number only up to component

, but for any errors internally component

(i.e. in ComponentContents) it just specifies the line number where the component starts. For example, this is an example of parsed text

import "/robocomp/interfaces/IDSLs/Test.idsl";

Component publish
{
    Communications
    {
        requires test;
        implements test;
    };
    language python;
};

here if i missed the semicolon after python;

or after the test. he would (line:4, col:1)

say i.e. at {

.

+3

python

nithin June 10. 17 at 13:57

source to share

1 answer

PaulMcG · Accepted Answer · 2017-06-10T17:09:45+0000

This behavior is pyparsing, not buggy, and needs some extra help to get it going (or getting it going).

When pyparsing cannot match somewhere in a complex expression, it will unbind its parsing pair to its last fully complete expression alternative. You know that after matching a "component" everything after that should be an error in the component definition, but pyparsing doesn't. So when a failure occurs after the open keyword, then pyparsing will back up and report that the keyword expression cannot be matched.

When you have the grammar of such commands, the keywords are often unambiguous. For example, after matching "component", anything that is not an identifier followed by a list of arguments in parentheses would be an error. You can indicate that pyparsing should not support "component" by replacing the "+" operator with the "-" operator.

Looking at your grammar, I'll go back and write a short BNF (always good practice):

communications ::= 'communications' '(' communicationList* ')' ';'
language       ::= 'language' ('cpp' | 'python') ';'
componentContents ::= communications | language | gui | options
component      ::= 'component' identifier '(' component_contents+ ')' ';'
CDSL           ::= idslImports component

When there are keywords in grammar, I always recommend using Keyword

either CaselessKeyword

, not Literal

or CaselessLiteral

. Classes Literal

do not enforce word boundaries, so if I were to use Literal("no")

as part of the grammar it could match leading "no" "no" or "no" or "nothing", etc.

This is how I approach this BNF. (I am using the shorthand version setResultsName

I find to keep this grammar clearer.):

LBRACE,RBRACE,SEMI = map(Suppress, "{};")
identifier = pyparsing_common.identifier

# keywords - extend as needed
(IMPORT, COMMUNICATIONS, LANGUAGE, COMPONENT, CPP, 
 PYTHON, REQUIRES, IMPLEMENTS) = map(CaselessKeyword, """
    IMPORT COMMUNICATIONS LANGUAGE COMPONENT CPP PYTHON 
    REQUIRES IMPLEMENTS""".split())

# keyword-leading expressions, use '-' operator to prevent backtracking once significant keyword is parsed
communicationItem = Group((REQUIRES | IMPLEMENTS) - identifier + SEMI)
communications = Group( COMMUNICATIONS.suppress() - LBRACE + ZeroOrMore(communicationItem) + RBRACE + SEMI)
language = Group(LANGUAGE.suppress() - (CPP | PYTHON) + SEMI)

componentContents = communications('communications') & language('language') & gui('gui') & options('options')
component = Group(COMPONENT - identifier("name") + Group(LBRACE + componentContents + RBRACE)("properties") + SEMI)

CDSL = idslImports("imports") + component("component")

Analyzing your sample with:

sample = """\
Component publish
{
    Communications
    {
        requires test;
        implements test;
    };
    language python;
};
"""

component.runTests([sample])

gives:

[['COMPONENT', 'publish', [[['REQUIRES', 'test'], ['IMPLEMENTS', 'test']], ['PYTHON']]]]
[0]:
  ['COMPONENT', 'publish', [[['REQUIRES', 'test'], ['IMPLEMENTS', 'test']], ['PYTHON']]]
  - name: 'publish'
  - properties: [[['REQUIRES', 'test'], ['IMPLEMENTS', 'test']], ['PYTHON']]
    - communications: [['REQUIRES', 'test'], ['IMPLEMENTS', 'test']]
      [0]:
        ['REQUIRES', 'test']
      [1]:
        ['IMPLEMENTS', 'test']
    - language: ['PYTHON']

(By the way, I like using the "&" operator to randomly match various content with the pyparsing class Each

- I think this makes a friendlier and more robust parser. It turns out to Each

have a slight conflict with the "-" operator, I'll have to fix that in the next version.)

Wrong line number in parse Exception

More articles: