Pyparsing nestedExpr and nested parentheses

Question

Pyparsing nestedExpr and nested parentheses

I am working on a very simple "query syntax" that can be used by people with reasonable technical skills (that is, not codes per se, but can be related to the subject)

A typical example of what they will enter on a form is:

address like street
AND
vote =  True
AND
(
  (
    age>=25
    AND
    gender = M
  )
  OR
  (
    age between [20,30]
    AND
    gender = F
  )
  OR
  (
    age >= 70
    AND
    eyes != blue
  )
)

FROM

no quotes required
potentially infinite nesting of parentheses
simple AND | OR linking

I am using pyparsing (well, it doesn't matter) and achieving something:

from pyparsing import *

OPERATORS = [
    '<',
    '<=',
    '>',
    '>=',
    '=',
    '!=',
    'like'
    'regexp',
    'between'
]

unicode_printables = u''.join(unichr(c) for c in xrange(65536)
                              if not unichr(c).isspace())

# user_input is the text sent by the client form
user_input = ' '.join(user_input.split())
user_input = '(' + user_input + ')'

AND = Keyword("AND").setName('AND')
OR = Keyword("OR").setName('OR')

FIELD = Word(alphanums).setName('FIELD')
OPERATOR = oneOf(OPERATORS).setName('OPERATOR')
VALUE = Word(unicode_printables).setName('VALUE')
CRITERION = FIELD + OPERATOR + VALUE

QUERY = Forward()
NESTED_PARENTHESES = nestedExpr('(', ')')
QUERY << ( CRITERION | AND | OR | NESTED_PARENTHESES )

RESULT = QUERY.parseString(user_input)
RESULT.pprint()

Output:

[['address',
  'like',
  'street',
  'AND',
  'vote',
  '=',
  'True',
  'AND',
  [['age>=25', 'AND', 'gender', '=', 'M'],
   'OR',
   ['age', 'between', '[20,30]', 'AND', 'gender', '=', 'F'],
   'OR',
   ['age', '>=', '70', 'AND', 'eyes', '!=', 'blue']]]]

I'm only partially happy - the main reason is that the desired end result would look like this:

[
  {
    "field" : "address",
    "operator" : "like",
    "value" : "street",
  },
  'AND',
  {
    "field" : "vote",
    "operator" : "=",
    "value" : True,
  },
  'AND',
  [
    [
      {
        "field" : "age",
        "operator" : ">=",
        "value" : 25,
      },
      'AND'
      {
        "field" : "gender",
        "operator" : "=",
        "value" : "M",
      }
    ],
    'OR',
    [
      {
        "field" : "age",
        "operator" : "between",
        "value" : [20,30],
      },
      'AND'
      {
        "field" : "gender",
        "operator" : "=",
        "value" : "F",
      }
    ],
    'OR',
    [
      {
        "field" : "age",
        "operator" : ">=",
        "value" : 70,
      },
      'AND'
      {
        "field" : "eyes",
        "operator" : "!=",
        "value" : "blue",
      }
    ],
  ]
]

Many thanks!

EDIT

After Paul responds, it looks like this. Obviously this works much nicer :-)

unicode_printables = u''.join(unichr(c) for c in xrange(65536)
                              if not unichr(c).isspace())

user_input = ' '.join(user_input.split())

AND = oneOf(['AND', '&'])
OR = oneOf(['OR', '|'])
FIELD = Word(alphanums)
OPERATOR = oneOf(OPERATORS)
VALUE = Word(unicode_printables)
COMPARISON = FIELD + OPERATOR + VALUE

QUERY = infixNotation(
    COMPARISON,
    [
        (AND, 2, opAssoc.LEFT,),
        (OR, 2, opAssoc.LEFT,),
    ]
)

class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())

COMPARISON.addParseAction(ComparisonExpr)

RESULT = QUERY.parseString(user_input).asList()
print type(RESULT)
from pprint import pprint
pprint(RESULT)

Output:

[
  [
    <[snip]ComparisonExpr instance at 0x043D0918>,
    'AND',
    <[snip]ComparisonExpr instance at 0x043D0F08>,
    'AND',
    [
      [
        <[snip]ComparisonExpr instance at 0x043D3878>,
        'AND',
        <[snip]ComparisonExpr instance at 0x043D3170>
      ],
      'OR',
      [
        [
          <[snip]ComparisonExpr instance at 0x043D3030>,
          'AND',
          <[snip]ComparisonExpr instance at 0x043D3620>
        ],
        'AND',
        [
          <[snip]ComparisonExpr instance at 0x043D3210>,
          'AND',
          <[snip]ComparisonExpr instance at 0x043D34E0>
        ]
      ]
    ]
  ]
]

Is there a way to return the RESULT with dictionaries and not ComparisonExpr

instances?

EDIT2

Came up with a naive and very specific solution, but that works for me so far:

[snip]
class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())

    def asDict(self):
        return {
            "field": self.tokens.asList()[0],
            "operator": self.tokens.asList()[1],
            "value": self.tokens.asList()[2]
        }

[snip]
RESULT = QUERY.parseString(user_input).asList()[0]
def convert(list):
    final = []
    for item in list:
        if item.__class__.__name__ == 'ComparisonExpr':
            final.append(item.asDict())
        elif item in ['AND', 'OR']:
            final.append(item)
        elif item.__class__.__name__ == 'list':
            final.append(convert(item))
        else:
            print 'ooops forgotten something maybe?'

    return final

FINAL = convert(RESULT)
pprint(FINAL)

What are the outputs:

[{'field': 'address', 'operator': 'LIKE', 'value': 'street'},
   'AND',
   {'field': 'vote', 'operator': '=', 'value': 'true'},
   'AND',
   [[{'field': 'age', 'operator': '>=', 'value': '25'},
     'AND',
     {'field': 'gender', 'operator': '=', 'value': 'M'}],
    'OR',
    [[{'field': 'age', 'operator': 'BETWEEN', 'value': '[20,30]'},
      'AND',
      {'field': 'gender', 'operator': '=', 'value': 'F'}],
     'AND',
     [{'field': 'age', 'operator': '>=', 'value': '70'},
      'AND',
      {'field': 'eyes', 'operator': '!=', 'value': 'blue'}]]]]

Thanks again to Pavel for pointing me in the right direction!

The only thing left of me is to turn 'true'

into True

and '[20,30]'

into [20, 30]

.

+3

python nested pyparsing

Hal May 25 '17 at 12:46

source to share

1 answer

PaulMcG · Accepted Answer · 2017-05-26T02:55:37+0000

nestedExpr

is a convenience expression in pyparsing to make it easier to define text with matching open and close characters. If you want to parse nested content, it is nestedExpr

usually not well structured.

The query syntax you are trying to parse is better served using the pyparsing method infixNotation

. You can see some examples on the pyparsing wiki examples page - SimpleBool is very similar to what you are parsing.

"Infix notation" is a generic term for expressions where the operator is in between related operands (as compared to "postfix notation" where the operator follows the operands, as in "2 3 +" instead of "2 + 3" "or" prefix notation " which looks like "+ 2 3"). Operators can have an order of precedence in the evaluation, which can override the order from left to right - for example, in "2 + 3 * 4", the precedence of the operations dictates that multiplication is evaluated before addition. also supports the use of parentheses or other grouping characters to override this precedence, as in "(2 + 3) * 4" to force the append operation.

the pyparsing method infixNotation

takes a base operand expression and then a list of operator definition tuples in order of precedence. For example, 4-function integer arithmetic would look like this:

parser = infixNotation(integer,
             [
             (oneOf('* /'), 2, opAssoc.LEFT),
             (oneOf('+ -'), 2, opAssoc.LEFT),
             ])

The implication is that we will parse integer operands with binary left associative operators '*' and '/' and binary operators '+' and '-' in that order. Brace support for reordering is built into the infixNotation

.

Query strings are often some combination of NOT, AND, and OR, and are usually evaluated in that order of precedence. In your case, the operands for these operators are comparison expressions such as "address = street" or "age from [20,30]". Therefore, if you define an expression to express a comparison, form fieldname operator value

, you can use infixNotation

AND and OR to group correctly:

import pyparsing as pp
query_expr = pp.infixNotation(comparison_expr,
                [
                    (NOT, 1, pp.opAssoc.RIGHT,),
                    (AND, 2, pp.opAssoc.LEFT,),
                    (OR, 2, pp.opAssoc.LEFT,),
                ])

Finally, I suggest you define a class to take comparison tokens as init args classes, then you can attach behavior to that class to evaluate comparisons and output debug strings, like so:

class ComparisonExpr:
    def __init__(self, tokens):
        self.tokens = tokens

    def __str__(self):
        return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(
                            *self.tokens.asList())

# attach the class to the comparison expression
comparison_expr.addParseAction(ComparisonExpr)

Then you can get the output like:

query_expr.parseString(sample).pprint()

[[Comparison:({'field': 'address', 'operator': 'like', 'value': 'street'}),
  'AND',
  Comparison:({'field': 'vote', 'operator': '=', 'value': True}),
  'AND',
  [[Comparison:({'field': 'age', 'operator': '>=', 'value': 25}),
    'AND',
    Comparison:({'field': 'gender', 'operator': '=', 'value': 'M'})],
   'OR',
   [Comparison:({'field': 'age', 'operator': 'between', 'value': [20, 30]}),
    'AND',
    Comparison:({'field': 'gender', 'operator': '=', 'value': 'F'})],
   'OR',
   [Comparison:({'field': 'age', 'operator': '>=', 'value': 70}),
    'AND',
    Comparison:({'field': 'eyes', 'operator': '!=', 'value': 'blue'})]]]]

The SimpleBool.py example contains more details on how to create this class and related classes for the NOT, AND, and OR operators.

EDIT:

"Is there a way to return the RESULT with dictionaries and not ComparisonExpr instances?" The method __repr__

in your class ComparisonExpr

gets called instead __str__

. The simplest solution is to add to your class:

__repr__ = __str__

Or just rename __str__

to __repr__

.

"The only thing left for me is to turn" truth "into" Truth "and" [20, 30] "into [20, 30]"

Try:

CK = CaselessKeyword  # 'cause I'm lazy
bool_literal = (CK('true') | CK('false')).setParseAction(lambda t: t[0] == 'true')
LBRACK,RBRACK = map(Suppress, "[]")
# parse numbers using pyparsing_common.number, which includes the str->int conversion parse action
num_list = Group(LBRACK + delimitedList(pyparsing_common.number) + RBRACK)

Then add them to your VALUE expression:

VALUE = bool_literal | num_list | Word(unicode_printables)

Finally:

from pprint import pprint
pprint(RESULT)

I'm so tired of importing pprint

all the time to do just that, I just added it to the API for ParseResults

. Try:

RESULT.pprint()  # no import required on your part

or

print(RESULT.dump()) # will also show indented list of named fields

EDIT2

LAST, the results names are useful for learning. If you make this change to COMPARE everything works the same as it did for you:

COMPARISON = FIELD('field') + OPERATOR('operator') + VALUE('value')

But now you can write:

def asDict(self):
    return self.tokens.asDict()

And you can access the parsed values by name instead of index position (using result['field']

or notation result.field

).

Pyparsing nestedExpr and nested parentheses

More articles: