Parsing into std :: vector <string> with Spirit Qi, getting segfaults or asserting failures

I am using Spirit Qi as my parser to parse math expressions in an expression tree. I keep track of things like the types of characters that are encountered in parsing that need to be declared in the text that I am processing. Namely, I am parsing Bertini input files , a simple example of which is here , a complex example here , but for completeness purposes, as shown below:

%input: our first input file
  variable_group x,y;
  function f,g;

  f = x^2 - 1;
  g = y^2 - 4;
 END;

      

The grammar I have been working on is ideally

  • find the declaration statements and then parse the following comma-separated list of the declared type and store the resulting character vector in the class object in question; egvariable_group x, y;

  • find a previously declared symbol followed by an equal sign, and this is the definition of that symbol as an evaluated mathematical object; for example f = x^2 - 1;

    This part of me is mostly under control.
  • find the previously declared symbol, then =

    , and parse it as a subfunction. I think I can handle this too.

The problem I'm trying to solve seems to be just as trivial, but after hours of searching, I still haven't got it. I've read dozens of Boost Spirit mailing list posts, SO posts, guidance and headlines for Spirit itself, but still don't quite understand some of the critical points in Spirit Qi's analysis.

Here's a problematic basic grammar definition that will go in system_parser.hpp

:

#define BOOST_SPIRIT_USE_PHOENIX_V3 1


#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>




namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;


template<typename Iterator>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), boost::spirit::ascii::space_type>
{


    SystemParser() : SystemParser::base_type(variable_group_)
    {
        namespace phx = boost::phoenix;
        using qi::_1;
        using qi::_val;
        using qi::eps;
        using qi::lit;

        qi::symbols<char,int> encountered_variables;

        qi::symbols<char,int> declarative_symbols;
        declarative_symbols.add("variable_group",0);



        // wraps the vector between its appropriate declaration and line termination.
        BOOST_SPIRIT_DEBUG_NODE(variable_group_);
        debug(variable_group_);
        variable_group_.name("variable_group_");
        variable_group_ %= lit("variable_group") >> genericvargp_ >> lit(';');


        // creates a vector of strings
        BOOST_SPIRIT_DEBUG_NODE(genericvargp_);
        debug(genericvargp_);
        genericvargp_.name("genericvargp_");
        genericvargp_ %= new_variable_ % ',';




        // will in the future make a shared pointer to an object using the string
        BOOST_SPIRIT_DEBUG_NODE(new_variable_);
        debug(new_variable_);
        new_variable_.name("new_variable_");
        new_variable_ %= unencountered_symbol_;


        // this rule gets a string.
        BOOST_SPIRIT_DEBUG_NODE(unencountered_symbol_);
        debug(unencountered_symbol_);
        unencountered_symbol_.name("unencountered_symbol");
        unencountered_symbol_ %= valid_variable_name_ - ( encountered_variables | declarative_symbols);


        // get a string which fits the naming rules.
        BOOST_SPIRIT_DEBUG_NODE(valid_variable_name_);
        valid_variable_name_.name("valid_variable_name_");
        valid_variable_name_ %= +qi::alpha >> *(qi::alnum | qi::char_('_') | qi::char_('[') | qi::char_(']') );



    }


    // rule declarations.  these are member variables for the parser.
    qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > variable_group_;
    qi::rule<Iterator, std::vector<std::string>(), ascii::space_type > genericvargp_;
    qi::rule<Iterator, std::string(), ascii::space_type>  new_variable_;
    qi::rule<Iterator, std::string(), ascii::space_type > unencountered_symbol_;// , ascii::space_type


    // the rule which determines valid variable names
    qi::rule<Iterator, std::string()> valid_variable_name_;
};

      

and some code that uses it:

#include "system_parsing.hpp"



int main(int argc, char** argv)
{


    std::vector<std::string> V;
    std::string str = "variable_group x, y, z;";


    std::string::const_iterator iter = str.begin();
    std::string::const_iterator end = str.end();


    SystemParser<std::string::const_iterator> S;


    bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);

    std::cout << "the unparsed string:\n" << std::string(iter,end);


    return 0;
}

      

It compiles easily for Clang 4.9.x on OSX. When I run it I get:

Assertion failed: (px != 0), function operator->, file /usr/local/include/boost/smart_ptr/shared_ptr.hpp, line 648.

Alternatively, if I use the wait operator >

rather than >>

in the rule definition variable_group_

, I end up with our dear old friend Segmentation fault: 11

.

In my learning process, I came across excellent posts like how to tell what a spirit type is trying to generate , propagating attributes , how to interact with symbols , an example of infinite left recursion that lead to segfault, class parsing information, not structs that have a link to using customization points (yet the links do not contain examples), a Nabialek trick that associates keywords with actions and is perhaps most relevant to what I'm trying to do dynamic diff parsingwhich I certainly need as the character set grows and I forbid using them as another type later because the set of characters already encountered starts empty and grows - that's it, the rules for parsing are dynamic.

So, here is where I am. My current problem is the assert / segfault generated by this particular example. However, I don't quite understand some things, and I need advice that I just haven't gathered from any sources I have consulted, and a query that hopefully makes this SO question unrelated to the people previously asked:

  • When is it appropriate to use lexeme

    ? I just don't know when to use the token instead.
  • What are some guidelines for using >

    instead >>

    ?
  • I've seen many examples of adapting Fusion where there is a structure to analyze and a set of rules for doing so. My input files will probably have multiple entries in function declarations, variables, etc., which should all go in the same place, so I need to be able to add to the fields of the terminal class object I'm parsing into, in any order. many times. I think I would like to use getter / setters on a class object so that parsing is not the only way to construct an object. This is problem?

Any kind advice for this newbie is appreciated.

+1


source to share


1 answer


You are referring to variables symbols

. But they are local, so they don't exist as soon as the constructor returns. This one calls Undefined Behavior . Everything can happen.

Make symmbol tables class members.

Also making it easier to dance around

  • skippers (see Speed ​​up skipper problems) . This link also answers your question: "When is it appropriate to use it lexeme[]

    . For example, in your example, you weren't lexeme[]

    around encountered_variables|declarative_symbols

    ."
  • debug macros
  • operator%=

    , and some generally unused things
  • Assuming you don't need the mapped type symbols<>

    (since int

    it wasn't used), the simplified initialization is there

Demo



Live On Coliru

#define BOOST_SPIRIT_USE_PHOENIX_V3 1
#define BOOST_SPIRIT_DEBUG 1

#include <boost/spirit/include/qi_core.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <iostream>
#include <string>

namespace qi    = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

template <typename Iterator, typename Skipper = ascii::space_type>
struct SystemParser : qi::grammar<Iterator, std::vector<std::string>(), Skipper> {

    SystemParser() : SystemParser::base_type(variable_group_) 
    {
        declarative_symbols += "variable_group";

        variable_group_        = "variable_group" >> genericvargp_ >> ';';
        genericvargp_          = new_variable_ % ',';
        valid_variable_name_   = qi::alpha >> *(qi::alnum | qi::char_("_[]"));
        unencountered_symbol_  = valid_variable_name_ - (encountered_variables|declarative_symbols);
        new_variable_          = unencountered_symbol_;

        BOOST_SPIRIT_DEBUG_NODES((variable_group_) (valid_variable_name_) (unencountered_symbol_) (new_variable_) (genericvargp_))
    }
  private:

    qi::symbols<char, qi::unused_type> encountered_variables, declarative_symbols;

    // rule declarations.  these are member variables for the parser.
    qi::rule<Iterator, std::vector<std::string>(), Skipper> variable_group_;
    qi::rule<Iterator, std::vector<std::string>(), Skipper> genericvargp_;
    qi::rule<Iterator, std::string()> new_variable_;
    qi::rule<Iterator, std::string()> unencountered_symbol_; // , Skipper

    // the rule which determines valid variable names
    qi::rule<Iterator, std::string()> valid_variable_name_;
};

//#include "system_parsing.hpp"

int main() {

    using It = std::string::const_iterator;
    std::string const str = "variable_group x, y, z;";

    SystemParser<It> S;

    It iter = str.begin(), end = str.end();
    std::vector<std::string> V;
    bool s = phrase_parse(iter, end, S, boost::spirit::ascii::space, V);

    if (s)
    {
        std::cout << "Parse succeeded: " << V.size() << "\n";
        for (auto& s : V)
            std::cout << " - '" << s << "'\n";
    }
    else
        std::cout << "Parse failed\n";

    if (iter!=end)
        std::cout << "Remaining unparsed: '" << std::string(iter, end) << "'\n";
}

      

Printing

Parse succeeded: 3
 - 'x'
 - 'y'
 - 'z'

      

0


source







All Articles