Pyparsing nestedExpr and nested parentheses
I am working on a very simple "query syntax" that can be used by people with reasonable technical skills (that is, not codes per se, but can be related to the subject)
A typical example of what they will enter on a form is:
address like street
AND
vote = True
AND
(
(
age>=25
AND
gender = M
)
OR
(
age between [20,30]
AND
gender = F
)
OR
(
age >= 70
AND
eyes != blue
)
)
FROM
- no quotes required
- potentially infinite nesting of parentheses
- simple AND | OR linking
I am using pyparsing (well, it doesn't matter) and achieving something:
from pyparsing import *
OPERATORS = [
'<',
'<=',
'>',
'>=',
'=',
'!=',
'like'
'regexp',
'between'
]
unicode_printables = u''.join(unichr(c) for c in xrange(65536)
if not unichr(c).isspace())
# user_input is the text sent by the client form
user_input = ' '.join(user_input.split())
user_input = '(' + user_input + ')'
AND = Keyword("AND").setName('AND')
OR = Keyword("OR").setName('OR')
FIELD = Word(alphanums).setName('FIELD')
OPERATOR = oneOf(OPERATORS).setName('OPERATOR')
VALUE = Word(unicode_printables).setName('VALUE')
CRITERION = FIELD + OPERATOR + VALUE
QUERY = Forward()
NESTED_PARENTHESES = nestedExpr('(', ')')
QUERY << ( CRITERION | AND | OR | NESTED_PARENTHESES )
RESULT = QUERY.parseString(user_input)
RESULT.pprint()
Output:
[['address',
'like',
'street',
'AND',
'vote',
'=',
'True',
'AND',
[['age>=25', 'AND', 'gender', '=', 'M'],
'OR',
['age', 'between', '[20,30]', 'AND', 'gender', '=', 'F'],
'OR',
['age', '>=', '70', 'AND', 'eyes', '!=', 'blue']]]]
I'm only partially happy - the main reason is that the desired end result would look like this:
[
{
"field" : "address",
"operator" : "like",
"value" : "street",
},
'AND',
{
"field" : "vote",
"operator" : "=",
"value" : True,
},
'AND',
[
[
{
"field" : "age",
"operator" : ">=",
"value" : 25,
},
'AND'
{
"field" : "gender",
"operator" : "=",
"value" : "M",
}
],
'OR',
[
{
"field" : "age",
"operator" : "between",
"value" : [20,30],
},
'AND'
{
"field" : "gender",
"operator" : "=",
"value" : "F",
}
],
'OR',
[
{
"field" : "age",
"operator" : ">=",
"value" : 70,
},
'AND'
{
"field" : "eyes",
"operator" : "!=",
"value" : "blue",
}
],
]
]
Many thanks!
EDIT
After Paul responds, it looks like this. Obviously this works much nicer :-)
unicode_printables = u''.join(unichr(c) for c in xrange(65536)
if not unichr(c).isspace())
user_input = ' '.join(user_input.split())
AND = oneOf(['AND', '&'])
OR = oneOf(['OR', '|'])
FIELD = Word(alphanums)
OPERATOR = oneOf(OPERATORS)
VALUE = Word(unicode_printables)
COMPARISON = FIELD + OPERATOR + VALUE
QUERY = infixNotation(
COMPARISON,
[
(AND, 2, opAssoc.LEFT,),
(OR, 2, opAssoc.LEFT,),
]
)
class ComparisonExpr:
def __init__(self, tokens):
self.tokens = tokens
def __str__(self):
return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())
COMPARISON.addParseAction(ComparisonExpr)
RESULT = QUERY.parseString(user_input).asList()
print type(RESULT)
from pprint import pprint
pprint(RESULT)
Output:
[
[
<[snip]ComparisonExpr instance at 0x043D0918>,
'AND',
<[snip]ComparisonExpr instance at 0x043D0F08>,
'AND',
[
[
<[snip]ComparisonExpr instance at 0x043D3878>,
'AND',
<[snip]ComparisonExpr instance at 0x043D3170>
],
'OR',
[
[
<[snip]ComparisonExpr instance at 0x043D3030>,
'AND',
<[snip]ComparisonExpr instance at 0x043D3620>
],
'AND',
[
<[snip]ComparisonExpr instance at 0x043D3210>,
'AND',
<[snip]ComparisonExpr instance at 0x043D34E0>
]
]
]
]
]
Is there a way to return the RESULT with dictionaries and not ComparisonExpr
instances?
EDIT2
Came up with a naive and very specific solution, but that works for me so far:
[snip]
class ComparisonExpr:
def __init__(self, tokens):
self.tokens = tokens
def __str__(self):
return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(*self.tokens.asList())
def asDict(self):
return {
"field": self.tokens.asList()[0],
"operator": self.tokens.asList()[1],
"value": self.tokens.asList()[2]
}
[snip]
RESULT = QUERY.parseString(user_input).asList()[0]
def convert(list):
final = []
for item in list:
if item.__class__.__name__ == 'ComparisonExpr':
final.append(item.asDict())
elif item in ['AND', 'OR']:
final.append(item)
elif item.__class__.__name__ == 'list':
final.append(convert(item))
else:
print 'ooops forgotten something maybe?'
return final
FINAL = convert(RESULT)
pprint(FINAL)
What are the outputs:
[{'field': 'address', 'operator': 'LIKE', 'value': 'street'},
'AND',
{'field': 'vote', 'operator': '=', 'value': 'true'},
'AND',
[[{'field': 'age', 'operator': '>=', 'value': '25'},
'AND',
{'field': 'gender', 'operator': '=', 'value': 'M'}],
'OR',
[[{'field': 'age', 'operator': 'BETWEEN', 'value': '[20,30]'},
'AND',
{'field': 'gender', 'operator': '=', 'value': 'F'}],
'AND',
[{'field': 'age', 'operator': '>=', 'value': '70'},
'AND',
{'field': 'eyes', 'operator': '!=', 'value': 'blue'}]]]]
Thanks again to Pavel for pointing me in the right direction!
The only thing left of me is to turn 'true'
into True
and '[20,30]'
into [20, 30]
.
source to share
nestedExpr
is a convenience expression in pyparsing to make it easier to define text with matching open and close characters. If you want to parse nested content, it is nestedExpr
usually not well structured.
The query syntax you are trying to parse is better served using the pyparsing method infixNotation
. You can see some examples on the pyparsing wiki examples page - SimpleBool is very similar to what you are parsing.
"Infix notation" is a generic term for expressions where the operator is in between related operands (as compared to "postfix notation" where the operator follows the operands, as in "2 3 +" instead of "2 + 3" "or" prefix notation " which looks like "+ 2 3"). Operators can have an order of precedence in the evaluation, which can override the order from left to right - for example, in "2 + 3 * 4", the precedence of the operations dictates that multiplication is evaluated before addition. also supports the use of parentheses or other grouping characters to override this precedence, as in "(2 + 3) * 4" to force the append operation.
the pyparsing method infixNotation
takes a base operand expression and then a list of operator definition tuples in order of precedence. For example, 4-function integer arithmetic would look like this:
parser = infixNotation(integer,
[
(oneOf('* /'), 2, opAssoc.LEFT),
(oneOf('+ -'), 2, opAssoc.LEFT),
])
The implication is that we will parse integer operands with binary left associative operators '*' and '/' and binary operators '+' and '-' in that order. Brace support for reordering is built into the infixNotation
.
Query strings are often some combination of NOT, AND, and OR, and are usually evaluated in that order of precedence. In your case, the operands for these operators are comparison expressions such as "address = street" or "age from [20,30]". Therefore, if you define an expression to express a comparison, form fieldname operator value
, you can use infixNotation
AND and OR to group correctly:
import pyparsing as pp
query_expr = pp.infixNotation(comparison_expr,
[
(NOT, 1, pp.opAssoc.RIGHT,),
(AND, 2, pp.opAssoc.LEFT,),
(OR, 2, pp.opAssoc.LEFT,),
])
Finally, I suggest you define a class to take comparison tokens as init args classes, then you can attach behavior to that class to evaluate comparisons and output debug strings, like so:
class ComparisonExpr:
def __init__(self, tokens):
self.tokens = tokens
def __str__(self):
return "Comparison:('field': {!r}, 'operator': {!r}, 'value': {!r})".format(
*self.tokens.asList())
# attach the class to the comparison expression
comparison_expr.addParseAction(ComparisonExpr)
Then you can get the output like:
query_expr.parseString(sample).pprint()
[[Comparison:({'field': 'address', 'operator': 'like', 'value': 'street'}),
'AND',
Comparison:({'field': 'vote', 'operator': '=', 'value': True}),
'AND',
[[Comparison:({'field': 'age', 'operator': '>=', 'value': 25}),
'AND',
Comparison:({'field': 'gender', 'operator': '=', 'value': 'M'})],
'OR',
[Comparison:({'field': 'age', 'operator': 'between', 'value': [20, 30]}),
'AND',
Comparison:({'field': 'gender', 'operator': '=', 'value': 'F'})],
'OR',
[Comparison:({'field': 'age', 'operator': '>=', 'value': 70}),
'AND',
Comparison:({'field': 'eyes', 'operator': '!=', 'value': 'blue'})]]]]
The SimpleBool.py example contains more details on how to create this class and related classes for the NOT, AND, and OR operators.
EDIT:
"Is there a way to return the RESULT with dictionaries and not ComparisonExpr instances?" The method __repr__
in your class ComparisonExpr
gets called instead __str__
. The simplest solution is to add to your class:
__repr__ = __str__
Or just rename __str__
to __repr__
.
"The only thing left for me is to turn" truth "into" Truth "and" [20, 30] "into [20, 30]"
Try:
CK = CaselessKeyword # 'cause I'm lazy
bool_literal = (CK('true') | CK('false')).setParseAction(lambda t: t[0] == 'true')
LBRACK,RBRACK = map(Suppress, "[]")
# parse numbers using pyparsing_common.number, which includes the str->int conversion parse action
num_list = Group(LBRACK + delimitedList(pyparsing_common.number) + RBRACK)
Then add them to your VALUE expression:
VALUE = bool_literal | num_list | Word(unicode_printables)
Finally:
from pprint import pprint
pprint(RESULT)
I'm so tired of importing pprint
all the time to do just that, I just added it to the API for ParseResults
. Try:
RESULT.pprint() # no import required on your part
or
print(RESULT.dump()) # will also show indented list of named fields
EDIT2
LAST, the results names are useful for learning. If you make this change to COMPARE everything works the same as it did for you:
COMPARISON = FIELD('field') + OPERATOR('operator') + VALUE('value')
But now you can write:
def asDict(self):
return self.tokens.asDict()
And you can access the parsed values by name instead of index position (using result['field']
or notation result.field
).
source to share