Need to parse the string having a mask (something like this "% yr-% mh-% dy"), so I get int values

For example, I have to find the time in the format specified in the header (but the %

-tags order may be different) in the string "The date is 2009-August-25."

How can I get the program to interpret the tags and what construct is better to use to store them among information on how to act on certain fragments of the date string ?

+2


source to share


2 answers


Take a look at the library first boost::date_time

. It has an I / O system, after all, it may have what you want, but I can see the lack of search.

To do a custom date search you need boost::xpressive

. It contains everything you need. Let's take a look at my hastily served example. First you have to analyze your own template, because Xpressive is easy. First look at the header you want:

#include <string>
#include <iostream>
#include <map>
#include <boost/xpressive/xpressive_static.hpp>
#include <boost/xpressive/regex_actions.hpp>

//make example shorter but less clear
using namespace boost::xpressive;

      

The latter define the map of your custom tags:

std::map<std::string, int > number_map;
number_map["%yr"] = 0;
number_map["%mh"] = 1;
number_map["%dy"] = 2;
number_map["%%"] = 3;  // escape a %

      

The next step is to create a regex that will parse our template using tags and store the values ​​from map into the tag_id variable when it finds a tag, or stores -1 otherwise:

int tag_id;
sregex rx=((a1=number_map)|(s1=+~as_xpr('%')))[ref(tag_id)=(a1|-1)];

      

For more information and description, see here and here . Now, let's analyze some pattern:

  std::string pattern("%yr-%mh-%dy"); // this will be parsed

  sregex_token_iterator begin( pattern.begin(), pattern.end(), rx ), end;
  if(begin == end) throw std::runtime_error("The pattern is empty!");

      

sregex_token_iterator

will iterate over our tokens, and every time it sets the tag_id variable. All we have to do is create a regular expression using these tokens. We will construct this regex using a tag that matches the portions of the static regex defined in the array:



sregex regex_group[] = {
    range('1','9') >> repeat<3,3>( _d ), // 4 digit year
    as_xpr( "January" ) | "February" | "August", // not all month XD so lazy
    repeat<2,2>( range('0','9') )[    // two digit day
    check(as<int>(_) >= 1 && as<int>(_) <= 31) ], //only bettwen 1 and 31
    as_xpr( '%' )  // match escaped %
};

      

Finally, let's start building our custom regular expression. The first match will build the first part. If the tag is matched and the tag_id is not negative, we select the regex from the array, otherwise the match is probably a delimiter and we create its regex:

sregex custom_regex = (tag_id>=0) ? regex_group[tag_id] : as_xpr(begin->str());

      

Next, we'll go from start to finish and add the following regex:

while(++begin != end)
{
    if(tag_id>=0)
    {
        sregex nextregex = custom_regex >> regex_group[tag_id];
        custom_regex = nextregex;
    }
    else
    {
        sregex nextregex = custom_regex >> as_xpr(begin->str());
        custom_regex = nextregex;
    }
}

      

Now our regex is ready, let's find some dates: -]

std::string input = "The date is 2009-August-25.";

smatch mydate;
if( regex_search( input, mydate, custom_regex ) )
    std::cout << "Found " << mydate.str() << "." << std::endl;

      

The xpressive library is very powerful and fast. It's also a beautiful use of patterns.

If you like this example, let me know in the comments or paragraphs; -)

+1


source


I would convert the tagged string to a regex with capturing for three fields and look for it. The complexity of the regexp will depend on what you want to be% year. You can also have a less stringent expression and then check for valid values, this may lead to better error messages ("Invalid month: Augsut" instead of "date not found") or false positives depending on the context.



+1


source







All Articles