Need to parse the string having a mask (something like this "% yr-% mh-% dy"), so I get int values
For example, I have to find the time in the format specified in the header (but the %
-tags order may be different) in the string "The date is 2009-August-25."
How can I get the program to interpret the tags and what construct is better to use to store them among information on how to act on certain fragments of the date string ?
source to share
Take a look at the library first boost::date_time
. It has an I / O system, after all, it may have what you want, but I can see the lack of search.
To do a custom date search you need boost::xpressive
. It contains everything you need. Let's take a look at my hastily served example. First you have to analyze your own template, because Xpressive is easy. First look at the header you want:
#include <string>
#include <iostream>
#include <map>
#include <boost/xpressive/xpressive_static.hpp>
#include <boost/xpressive/regex_actions.hpp>
//make example shorter but less clear
using namespace boost::xpressive;
The latter define the map of your custom tags:
std::map<std::string, int > number_map;
number_map["%yr"] = 0;
number_map["%mh"] = 1;
number_map["%dy"] = 2;
number_map["%%"] = 3; // escape a %
The next step is to create a regex that will parse our template using tags and store the values ββfrom map into the tag_id variable when it finds a tag, or stores -1 otherwise:
int tag_id;
sregex rx=((a1=number_map)|(s1=+~as_xpr('%')))[ref(tag_id)=(a1|-1)];
For more information and description, see here and here . Now, let's analyze some pattern:
std::string pattern("%yr-%mh-%dy"); // this will be parsed
sregex_token_iterator begin( pattern.begin(), pattern.end(), rx ), end;
if(begin == end) throw std::runtime_error("The pattern is empty!");
sregex_token_iterator
will iterate over our tokens, and every time it sets the tag_id variable. All we have to do is create a regular expression using these tokens. We will construct this regex using a tag that matches the portions of the static regex defined in the array:
sregex regex_group[] = {
range('1','9') >> repeat<3,3>( _d ), // 4 digit year
as_xpr( "January" ) | "February" | "August", // not all month XD so lazy
repeat<2,2>( range('0','9') )[ // two digit day
check(as<int>(_) >= 1 && as<int>(_) <= 31) ], //only bettwen 1 and 31
as_xpr( '%' ) // match escaped %
};
Finally, let's start building our custom regular expression. The first match will build the first part. If the tag is matched and the tag_id is not negative, we select the regex from the array, otherwise the match is probably a delimiter and we create its regex:
sregex custom_regex = (tag_id>=0) ? regex_group[tag_id] : as_xpr(begin->str());
Next, we'll go from start to finish and add the following regex:
while(++begin != end)
{
if(tag_id>=0)
{
sregex nextregex = custom_regex >> regex_group[tag_id];
custom_regex = nextregex;
}
else
{
sregex nextregex = custom_regex >> as_xpr(begin->str());
custom_regex = nextregex;
}
}
Now our regex is ready, let's find some dates: -]
std::string input = "The date is 2009-August-25.";
smatch mydate;
if( regex_search( input, mydate, custom_regex ) )
std::cout << "Found " << mydate.str() << "." << std::endl;
The xpressive library is very powerful and fast. It's also a beautiful use of patterns.
If you like this example, let me know in the comments or paragraphs; -)
source to share
I would convert the tagged string to a regex with capturing for three fields and look for it. The complexity of the regexp will depend on what you want to be% year. You can also have a less stringent expression and then check for valid values, this may lead to better error messages ("Invalid month: Augsut" instead of "date not found") or false positives depending on the context.
source to share