C ++ extract data from string

Question

C ++ extract data from string

What's an elegant way to extract data from a string (perhaps using the boost library)?

Content-Type: text/plain
Content-Length: 15
Content-Date: 2/5/2013
Content-Request: Save

hello world

Let's say I have the line above and want to extract all fields, including the greeting text. Can anyone point me in the right direction?

+3

c ++ boost

marcwho 05 Feb 13 at 19:40

source to share

8 answers

Try

http://pocoproject.org/

Comes with HTTPServer and Client implementations
http://cpp-netlib.github.com/

Comes with request / response processing

Boost Spirit Demo : http://liveworkspace.org/code/3K5TzT

You will need to provide a simple grammar (or complex grammar if you want to catch all the intricacies of HTTP)

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>

typedef std::map<std::string, std::string> Headers;
typedef std::pair<std::string, std::string> Header;
struct Request { Headers headers; std::vector<char> content; };

BOOST_FUSION_ADAPT_STRUCT(Request, (Headers, headers)(std::vector<char>, content))

namespace qi    = boost::spirit::qi;
namespace karma = boost::spirit::karma;

template <typename It, typename Skipper = qi::blank_type>
    struct parser : qi::grammar<It, Request(), Skipper>
{
    parser() : parser::base_type(start)
    {
        using namespace qi;

        header = +~char_(":\n") > ": " > *(char_ - eol);
        start = header % eol >> eol >> eol >> *char_;
    }

  private:
    qi::rule<It, Header(),  Skipper> header;
    qi::rule<It, Request(), Skipper> start;
};

bool doParse(const std::string& input)
{
    auto f(begin(input)), l(end(input));

    parser<decltype(f), qi::blank_type> p;
    Request data;

    try
    {
        bool ok = qi::phrase_parse(f,l,p,qi::blank,data);
        if (ok)   
        {
            std::cout << "parse success\n";
            std::cout << "data: " << karma::format_delimited(karma::auto_, ' ', data) << "\n";
        }
        else      std::cerr << "parse failed: '" << std::string(f,l) << "'\n";

        if (f!=l) std::cerr << "trailing unparsed: '" << std::string(f,l) << "'\n";
        return ok;
    } catch(const qi::expectation_failure<decltype(f)>& e)
    {
        std::string frag(e.first, e.last);
        std::cerr << e.what() << "'" << frag << "'\n";
    }

    return false;
}

int main()
{
    const std::string input = 
        "Content-Type: text/plain\n"
        "Content-Length: 15\n"
        "Content-Date: 2/5/2013\n"
        "Content-Request: Save\n"
        "\n"
        "hello world";

    bool ok = doParse(input);

    return ok? 0 : 255;
}

+4

sehe 05 Feb 13 at 19:47

source to share

There are several solutions. If the format is that simple, you can just read the file line by line. If the string starts with a key, you can simply split it up to get the value. If it is not, the value is the string itself. This can be done with STL very easily and quite elegantly.

If the grammar is more complex, and since you've added boost to the tags, you might think that Boost Spirit parses it and gets meaning from it.

+2

Baptiste Wicht 05 Feb 13 at 19:46

source to share

The simplest solution, I believe, is to use regular expressions . There are standard regular expressions in C ++ 11 and some can be found in boost .

+2

Artem Sobolev 05 Feb 13 at 19:47

source to share

You can use string::find

with a space to find where they are and then copy from that position until you find'\n'

+1

rubbyrubber 05 Feb 13 at 19:47

source to share

If you want to write code to parse it yourself, start by looking at the HTTP spec for that. This will give you the grammar:

    generic-message = start-line
                      *(message-header CRLF)
                      CRLF
                      [ message-body ]
    start-line      = Request-Line | Status-Line

So the first thing I would like to do is use split () on CRLF to split it into compound lines. Then you can iterate over the resulting vector. Until you get to an element that is an empty CRLF, you are parsing the header, so you divide by the first ":" to get the key and value.

Once you click on an empty element, you will parse the response body.

Warning: Having done this myself in the past, I can tell you that not all webservers are composed of line endings (you can only find CR or only LF in places) and not all browsers / other layers of abstraction agree with what they convey to you ... This way you can find additional CRLFs in places you would not expect, or missing CRLFs in places you expect them to. Good luck.

+1

i_am_jorf 05 Feb At 19:54

source to share

If you are ready to unwrap your loop manually, you can also use the std::istringstream

normal overloads of the extract operator (with appropriate manipulators, such as get_time()

for working with dates) to extract your data in a simple way.

Another possibility is to use std::regex

to match all patterns such as <string>:<string>

, and repeat all matches (grammar egrep

seems promising if you have multiple lines to process).

Or, if you want to do it in a complex way and your string has specific syntax, you can use Boost.Spirit to easily define the grammar and generate the parser.

0

Andy Prowl 05 Feb 13 at 19:47

source to share

If you have access to C + 11, you can use std :: regex ( http://en.cppreference.com/w/cpp/regex ).

std::string input = "Content-Type: text/plain";
std::regex contentTypeRegex("Content-Type: (.+)");

std::smatch match;

if (std::regex_match(input, match, contentTypeRegex)) {
     std::ssub_match contentTypeMatch = match[1];
     std::string contentType = contentTypeMatch.str();
     std::cout << contentType;
}
//else not found

Compilation / working version here: http://ideone.com/QTJrue

This regex is a very simplified case, but it is the same principle for multiple fields.

0

Robert Prior 05 Feb 13 at 20:02

source to share

Markus Schumann · Accepted Answer · 2013-02-05T19:49:37+0000

Here's a pretty compact one, written in C: https://github.com/openwebos/nodejs/blob/master/deps/http_parser/http_parser.c

C ++ extract data from string

More articles: