Parsing

Question

Parsing

I need to parse a file with information separated by curly braces, for example:

Continent
{
Name    Europe
Country
{
Name    UK
Dog
{
Name    Fiffi
Colour  Gray
}
Dog
{
Name    Smut
Colour  Black
}
}
}

Here is what I tried in Python

from io import open
from pyparsing import *
import pprint

def parse(s):
    return nestedExpr('{','}').parseString(s).asList()

def test(strng):
    print strng
    try:
        cfgFile = file(strng)
        cfgData = "".join( cfgFile.readlines() )
        list = parse( cfgData )
        pp = pprint.PrettyPrinter(2)
        pp.pprint(list)

    except ParseException, err:
        print err.line
        print " "*(err.column-1) + "^"
        print err

    cfgFile.close()
    print
    return list

if __name__ == '__main__':
    test('testfile')

But this is not with an error:

testfile
Continent
^
Expected "{" (at char 0), (line:1, col:1)

Traceback (most recent call last):
  File "xxx.py", line 55, in <module>
    test('testfile')
  File "xxx.py", line 40, in test
    return list
UnboundLocalError: local variable 'list' referenced before assignment

What do I need to do to make this work? Is another parser better than pyrography?

+2

python parsing pyparsing

Damian 06 June '13 at 9:11

source to share

2 answers

Nested expressions are so common and usually require defining a recursive parser or recursive code if you are not using a parsing library. This code can be tricky for beginners and error nestedExpr

prone even for experts, so I added a helper in pyparsing.

The problem you are having is that your input line contains more than just an expression of nested curly braces. When I first tried the parser, I try to keep testing as simple as possible - for example, I inserted a sample instead of reading it from a file, for example.

test = """\
Continent
{
Name    Europe
Country
{
Name    UK
Dog
{
Name    Fiffi
Colour  "light Gray"
}
Dog
{
Name    Smut
Colour  Black
}}}"""

from pyparsing import *

expr = nestedExpr('{','}')

print expr.parseString(test).asList()

And I am getting the same parsing error as you:

Traceback (most recent call last):
  File "nb.py", line 25, in <module>
    print expr.parseString(test).asList()
  File "c:\python26\lib\site-packages\pyparsing-1.5.7-py2.6.egg\pyparsing.py", line 1006, in parseString
    raise exc
pyparsing.ParseException: Expected "{" (at char 1), (line:1, col:1)

So when looking at the error message (and even its own debug code), pyparsing stumbles upon the leading word "Continent" because that word is not the start of a nested expression in curly braces, pyparsing (as we see in the exception message) was looking for the opening '{'.

The solution is to slightly modify your parser to handle the intro "Continent" label by changing the expression:

expr = Word(alphas) + nestedExpr('{','}')

Now, printing the results as a list (using pprint as the OP did, good job) looks like this:

['Continent',
 ['Name',
  'Europe',
  'Country',
  ['Name',
   'UK',
   'Dog',
   ['Name', 'Fiffi', 'Colour', '"light Gray"'],
   'Dog',
   ['Name', 'Smut', 'Colour', 'Black']]]]

which should match your parenthesis nesting.

+5

PaulMcG 06 June 13 at 18:17

source to share

Sylvain Leroux · Accepted Answer · 2013-06-06T10:21:51+0000

Recursion is the key here. Try something around:

def parse(it):
    result = []
    while True:
        try:
            tk = next(it)
        except StopIteration:
            break

        if tk == '}':
            break
        val = next(it)
        if val == '{':
            result.append((tk,parse(it)))
        else:
            result.append((tk, val))

    return result

Use case:

import pprint       

data = """
Continent
{
Name    Europe
Country
{
Name    UK
Dog
{
Name    Fiffi
Colour  Gray
}
Dog
{
Name    Smut
Colour  Black
}
}
}
"""

r = parse(iter(data.split()))
pprint.pprint(r)

... which produce (Python 2.6):

[('Continent',
  [('Name', 'Europe'),
   ('Country',
    [('Name', 'UK'),
     ('Dog', [('Name', 'Fiffi'), ('Colour', 'Gray')]),
     ('Dog', [('Name', 'Smut'), ('Colour', 'Black')])])])]

Please consider this only as a starting point and feel free to improve the code as needed (depending on your data, a dictionary might be a better choice). Also, the example code does not handle malformed data (in particular, redundant or missing data }

- I urge you to complete full test coverage;)

EDIT: Detection pyparsing

, I tried the following, which seems to work (much) better and could be (more) easily adapted for special needs:

import pprint
from pyparsing import Word, Literal, Forward, Group, ZeroOrMore, alphas

def syntax():
    lbr = Literal( '{' ).suppress()
    rbr = Literal( '}' ).suppress()
    key = Word( alphas )
    atom = Word ( alphas )
    expr = Forward()
    pair = atom | (lbr + ZeroOrMore( expr ) + rbr)
    expr << Group ( key + pair )

    return expr

expr = syntax()
result = expr.parseString(data).asList()
pprint.pprint(result)

Production:

[['Continent',
  ['Name', 'Europe'],
  ['Country',
   ['Name', 'UK'],
   ['Dog', ['Name', 'Fiffi'], ['Colour', 'Gray']],
   ['Dog', ['Name', 'Smut'], ['Colour', 'Black']]]]]

Parsing

More articles: