Shuntin Yard Validate Expression

We use the Shunting-Yard algorithm to evaluate expressions. We can check the expression simply by applying the algorithm. It fails if missing operands, matching parentheses and other things are missing. However, the Shunting-Yard algorithm has a stronger supported syntax than just a human-readable infix. For example,

1 + 2
+ 1 2
1 2 +

      

are acceptable ways of providing "1 + 2" as input to the Shuntin Yard algorithm. '+ 1 2' and '1 2 +' are not valid, but the standard Shunting-Yard algorithm can handle them. The algorithm doesn't really care about order, it applies the operators in order of precedence, capturing the "closest" operands.

We would like to restrict our input to valid human readable infilling. I'm looking for a way to change the Shunting-Yard algorithm to fail with invalid infixing, or provide infix validation before using Shunting-Yard.

Does anyone know of any posted methods for this? We have to support both the main operator and custom operators, parentheses and functions (with multiple arguments). I haven't seen anything that works with more than basic operators on the internet.

thank

+4


source to share


2 answers


The solution to my problem was to improve the algorithm published on Wikipedia with the state machine recommended by Rici . I am posting pseudocode here because others might find it useful.

Support two states, ExpectOperand and ExpectOperator.

Set State to ExpectOperand
While there are tokens to read:
    If token is a constant (number)
        Error if state is not ExpectOperand.
        Push token to output queue.
        Set state to ExpectOperator.
    If token is a variable.
        Error if state is not ExpectOperand.
        Push token to output queue.
        Set state to ExpectOperator.
    If token is an argument separator (a comma).
        Error if state is not ExpectOperator.
        Until the top of the operator stack is a left parenthesis  (don't pop the left parenthesis).
            Push the top token of the stack to the output queue.
            If no left parenthesis is encountered then error.  Either the separator was misplaced or the parentheses were mismatched.
        Set state to ExpectOperand.
    If token is a unary operator.
        Error if the state is not ExpectOperand.
        Push the token to the operator stack.
        Set the state to ExpectOperand.
    If the token is a binary operator.
        Error if the state is not ExpectOperator.
        While there is an operator token at the top of the operator stack and either the current token is left-associative and of lower then or equal precedence to the operator on the stack, or the current token is right associative and of lower precedence than the operator on the stack.
            Pop the operator from the operator stack and push it onto the output queue.
        Push the current operator onto the operator stack.
        Set the state to ExpectOperand. 
    If the token is a Function.
        Error if the state is not ExpectOperand.  
        Push the token onto the operator stack.
        Set the state to ExpectOperand.
    If the token is a open parentheses.
        Error if the state is not ExpectOperand.
        Push the token onto the operator stack.
        Set the state to ExpectOperand.
    If the token is a close parentheses.
         Error if the state is not ExpectOperator.
         Until the token at the top of the operator stack is a left parenthesis.
             Pop the token off of the operator stack and push it onto the output queue.
         Pop the left parenthesis off of the operator stack and discard.
         If the token at the top of the operator stack is a function then pop it and push it onto the output queue.
         Set the state to ExpectOperator.
At this point you have processed all the input tokens.
While there are tokens on the operator stack.
    Pop the next token from the operator stack and push it onto the output queue.
    If a parenthesis is encountered then error.  There are mismatched parenthesis.

      



You can easily distinguish between unary and binary operators (I'm talking about negative prefix and subtraction operator) by looking at the previous token. If there is no preceding token, the previous token is an open parenthesis, or the previous token is an operator, then you are faced with the unary prefix operator, otherwise you are faced with a binary operator.

+3


source


Good discussion on Shunting Yard algorithms http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm The algorithm presented there uses the key idea of ​​the operator stack, but has some grammar to know what to expect next. It has two main functions E()

, one that expects an expression, and P()

one that expects either a prefix operator, variable, number, parentheses and functions. The prefix operators are always more stringent than binary operators, so you want to process them first.

If we say that P denotes some prefix sequence and B is a binary operator, then any expression will have the form

P B P B P

      

i.e. you are either expecting a prefix sequence or a binary operator. Formally grammar

E -> P (B P)*

      



and P will be

P -> -P | variable | constant | etc.

      

This means psudocode as

E() {
    P()
    while next token is a binary op:
         read next op
         push onto stack and do the shunting yard logic
         P()
    if any tokens remain report error
    pop remaining operators off the stack
}

P() {
    if next token is constant or variable:
         add to output
    else if next token is unary minus: 
         push uminus onto operator stack
         P()
}

      

You can expand this to handle other unary operators, functions, brackets, suffix operators.

+2


source







All Articles