X3 parser segments with debug result (BOOST_SPIRIT_X3_DEBUG)

Question

X3 parser segments with debug result (BOOST_SPIRIT_X3_DEBUG)

Update

This question addresses two issues (as shown in the accepted answer ), both of which are present in the Boost Spirit X3 version that ships with Boost 1.64, but both of them are now fixed (or at least discovered at compile time) in branch development at the time of this writing (2017-04-30).

I have updated the mcve project to reflect the changes I made to use the development branch instead of the latest boost version, in hopes this helps other people who are facing similar problems.

Original question

I am trying to find out how to break Spirit X3 parters into separate reusable grammars as recommended by sample code (rexpr_full and calc in particular) and presentations of CppCon 2015 and BoostCon .

I have a symbol table (essentially mapping different types to an enum class of types that I support) that I would like to use in multiple parsers. The only example of character tables I could find is an example of numbers with roman numerals, which is in the same source file.

When I try to move the symbol table into my own cpp / h file in the style of more structured examples, my parser will segfault if I try to parse any string that is not in the symbol table. If the symbol table is defined in the same compilation unit as the parsers being used, it will instead throw a wait exception (which is what I expected from it).

With the definition of BOOST_SPIRIT_X3_DEBUG, I get the following output:

<FruitType>
  <try>GrannySmith: Mammals</try>
  <Identifier>
    <try>GrannySmith: Mammals</try>
    <success>: Mammals</success>
    <attributes>[[
Process finished with exit code 11

I made a small project that shows what I am trying to achieve and is available here: https://github.com/sigbjornlo/spirit_fruit_mcve

My questions:

Why does moving the symbol parser to a separate compilation unit cause a segmentation fault in this case?
What is the recommended way to make the symbol table reusable for multiple parsers? (In MCVE, I obviously only use the parser fruit

in one other parser, but in my full project, I want to use it in several other parsers.)

Below is the code for the MCVE project:

main.cpp

#include <iostream>

#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>

#include "common.h"
#include "fruit.h"

namespace ast {
    struct FruitType {
        std::string identifier;
        FRUIT fruit;
    };
}

BOOST_FUSION_ADAPT_STRUCT(ast::FruitType, identifier, fruit);

namespace parser {
    // Identifier
    using identifier_type = x3::rule<class identifier, std::string>;
    const auto identifier = identifier_type {"Identifier"};
    const auto identifier_def = x3::raw[x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')]];
    BOOST_SPIRIT_DEFINE(identifier);

    // FruitType
    struct fruit_type_class;
    const auto fruit_type = x3::rule<fruit_type_class, ast::FruitType> {"FruitType"};

    // Using the sequence operator creates a grammar which fails gracefully given invalid input.
    // const auto fruit_type_def = identifier >> ':' >> make_fruit_grammar();

    // Using the expectation operator causes EXC_BAD_ACCESS exception with invalid input.
    // Instead, I would have expected an expectation failure exception.
    // Indeed, an expectation failure exception is thrown when the fruit grammar is defined here in this compilation unit instead of in fruit.cpp.
    const auto fruit_type_def = identifier > ':' > make_fruit_grammar();

    BOOST_SPIRIT_DEFINE(fruit_type);
}

int main() {
    std::string input = "GrannySmith: Mammals";
    parser::iterator_type it = input.begin(), end = input.end();

    const auto& grammar = parser::fruit_type;
    auto result = ast::FruitType {};

    bool successful_parse = boost::spirit::x3::phrase_parse(it, end, grammar, boost::spirit::x3::ascii::space, result);
    if (successful_parse && it == end) {
        std::cout << "Parsing succeeded!\n";
        std::cout << result.identifier << " is a kind of " << to_string(result.fruit) << "!\n";
    } else {
        std::cout << "Parsing failed!\n";
    }

    return 0;
}

std::string to_string(FRUIT fruit) {
    switch (fruit) {
        case FRUIT::APPLES:
            return "apple";
        case FRUIT::ORANGES:
            return "orange";
    }
}

common.h

#ifndef SPIRIT_FRUIT_COMMON_H
#define SPIRIT_FRUIT_COMMON_H

namespace x3 = boost::spirit::x3;

enum class FRUIT {
    APPLES,
    ORANGES
};

std::string to_string(FRUIT fruit);

namespace parser {
    using iterator_type = std::string::const_iterator;
    using context_type = x3::phrase_parse_context<x3::ascii::space_type>::type;
}

#endif //SPIRIT_FRUIT_COMMON_H

fruit.h

#ifndef SPIRIT_FRUIT_FRUIT_H
#define SPIRIT_FRUIT_FRUIT_H

#include <boost/spirit/home/x3.hpp>

#include "common.h"

namespace parser {
    struct fruit_class;
    using fruit_grammar = x3::rule<fruit_class, FRUIT>;

    BOOST_SPIRIT_DECLARE(fruit_grammar)

    fruit_grammar make_fruit_grammar();
}


#endif //SPIRIT_FRUIT_FRUIT_H

fruit.cpp

#include "fruit.h"

namespace parser {
    struct fruit_symbol_table : x3::symbols<FRUIT> {
        fruit_symbol_table() {
            add
                    ("Apples", FRUIT::APPLES)
                    ("Oranges", FRUIT::ORANGES);
        }
    };

    struct fruit_class;
    const auto fruit = x3::rule<fruit_class, FRUIT> {"Fruit"};
    const auto fruit_def = fruit_symbol_table {};
    BOOST_SPIRIT_DEFINE(fruit);

    BOOST_SPIRIT_INSTANTIATE(fruit_grammar, iterator_type, context_type);

    fruit_grammar make_fruit_grammar() {
        return fruit;
    }
}

+3

c ++ segmentation-fault boost boost-spirit boost-spirit-x3

sigbjornlo Apr 28 17 at 11:31

source to share

1 answer

sehe · Accepted Answer · 2017-04-28T20:21:02+0000

Very good work on the loudspeaker. It reminded me of my PR https://github.com/boostorg/spirit/pull/229 (see analysis here for strange semantic behavior of boost spirit x3 after splitting ).

The problem is with Fiasco's static initialization setting, which copies the names of the debug rules before they are initialized.

In fact, actually disabling debug information fixes the crash and gracefully throws a wait failure.

The same thing happens with development branch 1, so there is another similar thing, or I missed a spot. As long as you know you can turn off the debug output. I will post an update if I find a place.

UPDATE:

I have not missed a single place. There call_rule_definition

is a separate issue where it parameterizes the helper class context_debug<>

with the actual attribute type instead of the converted one:

#if defined(BOOST_SPIRIT_X3_DEBUG)
                typedef typename make_attribute::type dbg_attribute_type;
                context_debug<Iterator, dbg_attribute_type>
                dbg(rule_name, first, last, dbg_attribute_type(attr_), ok_parse);
#endif

The comment seems to suggest this is an optional behavior: it tries to print the attribute before converting. However, it completely fails if the type of the synthesized attribute does not match the actual type of the attribute. In this case, it context_debug

takes a reference to the temporary transformed attribute, resulting in an Undefined Behavior .

This is actually undefined behavior in working cases as well. I can only assume that in the case of an inline definition, it happens well that everything goes well, so it seems that everything works as planned.

As far as I know, this would be a clean fix, preventing any unreasonable conversions and temporary inputs that come with them:

#if defined(BOOST_SPIRIT_X3_DEBUG)
                context_debug<Iterator, transform_attr>
                dbg(rule_name, first, last, attr_, ok_parse);
#endif

I created a pull request for this: https://github.com/boostorg/spirit/pull/232

¹ develop the branch does not merge with the 1.64 release

X3 parser segments with debug result (BOOST_SPIRIT_X3_DEBUG)

Update

Original question

UPDATE:

More articles: