Inconsistent behavior of spirit enhancement grammar

I have a little grammar that I want to use for a working project. Minimal executable example:

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
#pragma GCC diagnostic ignored "-Wunused-variable"
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/qi_grammar.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#pragma GCC diagnostic pop // pops

#include <iostream>

int main()

{
    typedef  unsigned long long ull;

    std::string curline = "1;2;;3,4;5";
    std::cout << "parsing:  " << curline << "\n";

    namespace qi = boost::spirit::qi;
    auto ids = -qi::ulong_long % ','; // '-' allows for empty vecs.
    auto match_type_res = ids % ';' ;
    std::vector<std::vector<ull> > r;
    qi::parse(curline.begin(), curline.end(), match_type_res, r);

    std::cout << "got:      ";
    for (auto v: r){
        for (auto i : v)
            std::cout << i << ",";
        std::cout << ";";
    }
    std::cout <<"\n";
}

      

On my personal machine, this produces the correct output: parse: 1; 2 ;; 3.4; 5 got: 1 ,; 2, ;; 3.4; five,;

But at work he does: parsing: 1; 2 ;; 3.4; 5 got: 1 ,; 2, ;; 3,

In other words, it cannot parse a vector of long integers as soon as there is more than one element in it.

Now I have determined that the production system is using boost 1.56, while my private computer is using 1.57. This is the reason?

We know that we have some real experts in the field, I was hoping someone might know where this problem is from, or can at least cut down on the things I need to check.

If the boost version issue is an issue, I can probably convince the company to upgrade, but a workaround would be welcome anyway.

+3


source to share


1 answer


You are calling Undefined Behavior in your code.

Specifically where do you use the auto

parser to store the expression. The Template Template contains links to temporary sections, which become dangling at the end containing the complete expression.

UB means anything can happen. Both compilers are right! Most importantly, you will probably see different behavior depending on the compiler flags used.

Fix it using:

  • qi::copy

    (or boost::proto::deep_copy

    before v.1.55 IIRC)
  • use BOOST_SPIRIT_AUTO

    instead BOOST_AUTO

    (mostly useful if you also support C ++ 03)
  • use qi::rule<>

    and qi::grammar<>

    ( non-terminal ) for erase and expression. This also affects performance, but also gives more options such as

    • recursive rules
    • locals and inherited attributes
    • declared skippers (handy because the rules can be implicit lexeme[]

      (see here )
    • better code organization.

Note also that Spirit X3 promises to drop restrictions on use with auto. It's basically a lot more lightweight due to its use of C ++ 14 features. Keep in mind that it's not stable yet.

  • Showing that GCC with -O2 shows undefined results; Live on coliru

  • Fixed version:



Live On Coliru

//#pragma GCC diagnostic push
//#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
//#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
//#pragma GCC diagnostic ignored "-Wunused-variable"
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/qi.hpp>
//#pragma GCC diagnostic pop // pops

#include <iostream>

int main() {
    typedef  unsigned long long ull;

    std::string const curline = "1;2;;3,4;5";
    std::cout << "parsing: '" << curline << "'\n";

    namespace qi = boost::spirit::qi;

#if 0 // THIS IS UNDEFINED BEHAVIOUR:
    auto ids     = -qi::ulong_long % ','; // '-' allows for empty vecs.
    auto grammar = ids % ';';
#else // THIS IS CORRECT:
    auto ids     = qi::copy(-qi::ulong_long % ','); // '-' allows for empty vecs.
    auto grammar = qi::copy(ids % ';');
#endif

    std::vector<std::vector<ull> > r;
    qi::parse(curline.begin(), curline.end(), grammar, r);

    std::cout << "got:      ";
    for (auto v: r){
        for (auto i : v)
            std::cout << i << ",";
        std::cout << ";";
    }
    std::cout <<"\n";
}

      

Print (also with GCC -O2!):

parsing: '1;2;;3,4;5'
got:      1,;2,;;3,4,;5,;

      


ยน (which is basically "next semicolon" here, but in standard)

+2


source







All Articles