Sqlalchemy filter joined query built on parsed input
I am stuck with this: I adopted bauble (a program on github) and part of it is for specifying a query in a SQL database. the query language is really three different languages, one of which (as in the SQL query) I am rewriting.
the original author chose for pyparsing and I have no reason to consider this choice other than that I don't know pyparsing and I have always enjoyed lex and yacc ... but I figured I would continue pyparsing so I Study it ...
I (re) wrote a parser that recognizes the given request and most of the grammar categories are translated into classes. I suppose the parsing part is pretty subtle, the point where I get stuck is where the objects I created with pyparsing should be using SQLAlchemy to query the database, specifically when I filter based on attributes from the combined tables.
the relevant part of the grammar, in pyparsing format:
query_expression = Forward()
identifier = Group(delimitedList(Word(alphas, alphanums+'_'),
'.')).setParseAction(IdentifierToken)
ident_expression = (
Group(identifier + binop + value).setParseAction(IdentExpressionToken)
| (
Literal('(') + query_expression + Literal(')')
).setParseAction(ParenthesisedQuery))
query_expression << infixNotation(
ident_expression,
[ (NOT_, 1, opAssoc.RIGHT, SearchNotAction),
(AND_, 2, opAssoc.LEFT, SearchAndAction),
(OR_, 2, opAssoc.LEFT, SearchOrAction) ] )
and the corresponding classes (the method of the evaluate
last two is something I don't know how to write yet):
class BinaryLogical(object):
## abstract base class. `name` is defined in derived classes
def __init__(self, t):
self.op = t[0][1]
self.operands = t[0][0::2] # every second object is an operand
def __repr__(self):
return "(%s %s %s)" % (self.operands[0], self.name, self.operands[1])
class SearchAndAction(BinaryLogical):
name = 'AND'
def evaluate(self, domain, session):
return self.operands[0].evaluate(domain, session).intersect_all(
map(lambda i: i.evaluate(domain, session), self.operands[1:]))
class SearchOrAction(BinaryLogical):
name = 'OR'
def evaluate(self, domain, session):
return self.operands[0].evaluate(domain, session).union_all(
map(lambda i: i.evaluate(domain, session), self.operands[1:]))
class SearchNotAction(object):
name = 'NOT'
def __init__(self, t):
self.op, self.operand = t[0]
def evaluate(self, domain, session):
return session.query(domain).except_(self.operand.evaluate(domain, session))
def __repr__(self):
return "%s %s" % (self.name, str(self.operand))
class ParenthesisedQuery(object):
def __init__(self, t):
self.query = t[1]
def __repr__(self):
return "(%s)" % self.query.__repr__()
def evaluate(self, domain, session):
return self.query.evaluate(domain, session)
class IdentifierToken(object):
def __init__(self, t):
self.value = t[0]
def __repr__(self):
return '.'.join(self.value)
def evaluate(self, domain, session):
q = session.query(domain)
if len(self.value) > 1:
q = q.join(self.value[:-1], aliased=True)
return q.subquery().c[self.value[-1]]
class IdentExpressionToken(object):
def __init__(self, t):
self.op = t[0][1]
self.operation = {'>': lambda x,y: x>y,
'<': lambda x,y: x<y,
'>=': lambda x,y: x>=y,
'<=': lambda x,y: x<=y,
'=': lambda x,y: x==y,
'!=': lambda x,y: x!=y,
}[self.op]
self.operands = t[0][0::2] # every second object is an operand
def __repr__(self):
return "(%s %s %s)" % ( self.operands[0], self.op, self.operands[1])
def evaluate(self, domain, session):
return session.query(domain).filter(self.operation(self.operands[0].evaluate(domain, session),
self.operands[1].express()))
the complete and most up-to-date code for the above snippets is here .
several possible queries:
results = mapper_search.search("plant where accession.species.id=44")
results = mapper_search.search("species where genus.genus='Ixora'")
results = mapper_search.search("species where genus.genus=Maxillaria and not genus.family=Orchidaceae")
source to share
I assumed I found a temporarily acceptable answer, but it uses internal information (an underscore prefixed field) from SQLAlchemy.
the core of the problem was that since I was working with parsed user information, I started with something like a class name and a relationship name to navigate. for example, in plant where accession.species.id=44
, class name Plant
, and I filter on the id
associated object Species
.
the example above might make you think it's pretty simple, just a capitalization issue. but we still need to know in which module the Plant
, Accession
and are to be found Species
.
another example family where genera.id!=0
. in general, the name of the relation does not have to be equal to the name of the referenced class.
The grammar was ok and I didn't need to change it any further. the point was (and is still partially) in interacting with SQLAlchemy, so I had to fix the methods evaluate
in the IdentifierToken
and classes IdentExpressionToken
.
my solution includes this code:
class IdentifierToken(object):
....
def evaluate(self, env):
"""return pair (query, attribute)
the value associated to the identifier is an altered query where the
joinpoint is the one relative to the attribute, and the attribute
itself.
"""
query = env.session.query(env.domain)
if len(self.value) == 1:
# identifier is an attribute of the table being queried
attr = getattr(env.domain, self.value[0])
elif len(self.value) > 1:
# identifier is an attribute of a joined table
query = query.join(*self.value[:-1], aliased=True)
attr = getattr(query._joinpoint['_joinpoint_entity'], self.value[-1])
return query, attr
class IdentExpressionToken(object):
...
def evaluate(self, env):
q, a = self.operands[0].evaluate(env)
clause = lambda x: self.operation(a, x)
return q.filter(clause(self.operands[1].express()))
multiple points:
- it was not clear to me that the request method did not change the request calling it, but I had to use the return value.
- I am flattening the merged request so it is easy to get the "destination" class for the join operation.
- query with an alias, I am using a field
_joinpoint
that looks like unpublished information. -
query._joinpoint['_joinpoint_entity']
is a reference to the class from which I need to get the field named in the parsed query. the dictionary_joinpoint
is different from non-smoothed queries. -
the still open part of the question is whether there is an "official" SQLAlchemy way to extract this information.
source to share
It looks like the previous developer ran into a lot of problems to create these classes - this is actually "best practice" when using pyparsing. The goal is that these classes, as a result of the parsing process, usually maintain their own behavior using the parsed elements. In this case, items are also available by name (another "best practice" pyramid). Once these classes have been created during the parsing process, peering is largely untrue - any additional processing is purely a function of these classes.
I think the goal was probably the same as you believe there is a method on these classes for example results.statement.invoke()
. Take a look at the methods on these classes and see what they provide for you, especially the top-level StatementAction class. If there is no such method, then this is probably the next step, as you should apply the parsed values in a way that makes sense for your SQLAlchemy database.
source to share