Sqlalchemy filter joined query built on parsed input

Question

Sqlalchemy filter joined query built on parsed input

I am stuck with this: I adopted bauble (a program on github) and part of it is for specifying a query in a SQL database. the query language is really three different languages, one of which (as in the SQL query) I am rewriting.

the original author chose for pyparsing and I have no reason to consider this choice other than that I don't know pyparsing and I have always enjoyed lex and yacc ... but I figured I would continue pyparsing so I Study it ...

I (re) wrote a parser that recognizes the given request and most of the grammar categories are translated into classes. I suppose the parsing part is pretty subtle, the point where I get stuck is where the objects I created with pyparsing should be using SQLAlchemy to query the database, specifically when I filter based on attributes from the combined tables.

the relevant part of the grammar, in pyparsing format:

query_expression = Forward()
identifier = Group(delimitedList(Word(alphas, alphanums+'_'),
                                 '.')).setParseAction(IdentifierToken)
ident_expression = (
    Group(identifier + binop + value).setParseAction(IdentExpressionToken)
    | (
        Literal('(') + query_expression + Literal(')')
    ).setParseAction(ParenthesisedQuery))
query_expression << infixNotation(
    ident_expression,
    [ (NOT_, 1, opAssoc.RIGHT, SearchNotAction),
      (AND_, 2, opAssoc.LEFT,  SearchAndAction),
      (OR_,  2, opAssoc.LEFT,  SearchOrAction) ] )

and the corresponding classes (the method of the evaluate

last two is something I don't know how to write yet):

class BinaryLogical(object):
    ## abstract base class. `name` is defined in derived classes
    def __init__(self, t):
        self.op = t[0][1]
        self.operands = t[0][0::2]  # every second object is an operand

    def __repr__(self):
        return "(%s %s %s)" % (self.operands[0], self.name, self.operands[1])


class SearchAndAction(BinaryLogical):
    name = 'AND'

    def evaluate(self, domain, session):
        return self.operands[0].evaluate(domain, session).intersect_all(
            map(lambda i: i.evaluate(domain, session), self.operands[1:]))


class SearchOrAction(BinaryLogical):
    name = 'OR'

    def evaluate(self, domain, session):
        return self.operands[0].evaluate(domain, session).union_all(
            map(lambda i: i.evaluate(domain, session), self.operands[1:]))


class SearchNotAction(object):
    name = 'NOT'

    def __init__(self, t):
        self.op, self.operand = t[0]

    def evaluate(self, domain, session):
        return session.query(domain).except_(self.operand.evaluate(domain, session))

    def __repr__(self):
        return "%s %s" % (self.name, str(self.operand))



class ParenthesisedQuery(object):
    def __init__(self, t):
        self.query = t[1]

    def __repr__(self):
        return "(%s)" % self.query.__repr__()

    def evaluate(self, domain, session):
        return self.query.evaluate(domain, session)


class IdentifierToken(object):
    def __init__(self, t):
        self.value = t[0]

    def __repr__(self):
        return '.'.join(self.value)

    def evaluate(self, domain, session):
        q = session.query(domain)
        if len(self.value) > 1:
            q = q.join(self.value[:-1], aliased=True)
        return q.subquery().c[self.value[-1]]


class IdentExpressionToken(object):
    def __init__(self, t):
        self.op = t[0][1]
        self.operation = {'>': lambda x,y: x>y,
                          '<': lambda x,y: x<y,
                          '>=': lambda x,y: x>=y,
                          '<=': lambda x,y: x<=y,
                          '=': lambda x,y: x==y,
                          '!=': lambda x,y: x!=y,
                      }[self.op]
        self.operands = t[0][0::2]  # every second object is an operand

    def __repr__(self):
        return "(%s %s %s)" % ( self.operands[0], self.op, self.operands[1])

    def evaluate(self, domain, session):
        return session.query(domain).filter(self.operation(self.operands[0].evaluate(domain, session),
                                                           self.operands[1].express()))

the complete and most up-to-date code for the above snippets is here .

several possible queries:

results = mapper_search.search("plant where accession.species.id=44")
results = mapper_search.search("species where genus.genus='Ixora'")
results = mapper_search.search("species where genus.genus=Maxillaria and not genus.family=Orchidaceae")

+3

python sqlalchemy pyparsing

mariotomo Dec 17. 14 at 18:40

source to share

2 answers

It looks like the previous developer ran into a lot of problems to create these classes - this is actually "best practice" when using pyparsing. The goal is that these classes, as a result of the parsing process, usually maintain their own behavior using the parsed elements. In this case, items are also available by name (another "best practice" pyramid). Once these classes have been created during the parsing process, peering is largely untrue - any additional processing is purely a function of these classes.

I think the goal was probably the same as you believe there is a method on these classes for example results.statement.invoke()

. Take a look at the methods on these classes and see what they provide for you, especially the top-level StatementAction class. If there is no such method, then this is probably the next step, as you should apply the parsed values in a way that makes sense for your SQLAlchemy database.

0

PaulMcG Dec 17. 14 at 23:38

source to share

mariotomo · Accepted Answer · 2015-01-01T12:48:53+0000

I assumed I found a temporarily acceptable answer, but it uses internal information (an underscore prefixed field) from SQLAlchemy.

the core of the problem was that since I was working with parsed user information, I started with something like a class name and a relationship name to navigate. for example, in plant where accession.species.id=44

, class name Plant

, and I filter on the id

associated object Species

.

the example above might make you think it's pretty simple, just a capitalization issue. but we still need to know in which module the Plant

, Accession

and are to be found Species

.

another example family where genera.id!=0

. in general, the name of the relation does not have to be equal to the name of the referenced class.

The grammar was ok and I didn't need to change it any further. the point was (and is still partially) in interacting with SQLAlchemy, so I had to fix the methods evaluate

in the IdentifierToken

and classes IdentExpressionToken

.

my solution includes this code:

class IdentifierToken(object):
....
    def evaluate(self, env):
        """return pair (query, attribute)

        the value associated to the identifier is an altered query where the
        joinpoint is the one relative to the attribute, and the attribute
        itself.
        """

        query = env.session.query(env.domain)
        if len(self.value) == 1:
            # identifier is an attribute of the table being queried
            attr = getattr(env.domain, self.value[0])
        elif len(self.value) > 1:
            # identifier is an attribute of a joined table
            query = query.join(*self.value[:-1], aliased=True)
            attr = getattr(query._joinpoint['_joinpoint_entity'], self.value[-1])
        return query, attr

class IdentExpressionToken(object):
...
    def evaluate(self, env):
        q, a = self.operands[0].evaluate(env)
        clause = lambda x: self.operation(a, x)
        return q.filter(clause(self.operands[1].express()))

multiple points:

it was not clear to me that the request method did not change the request calling it, but I had to use the return value.
I am flattening the merged request so it is easy to get the "destination" class for the join operation.
query with an alias, I am using a field _joinpoint

that looks like unpublished information.
query._joinpoint['_joinpoint_entity']

is a reference to the class from which I need to get the field named in the parsed query. the dictionary _joinpoint

is different from non-smoothed queries.
the still open part of the question is whether there is an "official" SQLAlchemy way to extract this information.

Sqlalchemy filter joined query built on parsed input

More articles: