Reversible Bias Tokenizer

Question

Reversible Bias Tokenizer

I have a string for tokenize. It is a form HHmmssff

where H

, m

, s

, f

- number.

It should be denoted by four two-digit numbers, but I also need to take short forms, for example sff

, so it interprets it as 00000sff

. I wanted to use boost::tokenizer

offset_separator

, but it seems to only work with positive offsets, and I would like it to work in reverse.

Ok, one idea is to put the string with zeros on the left, but maybe the community will come up with something uber-smart.;)

Edit: Additional requirements just came into play.

The main need for a more reasonable solution is to handle all the cases, such as f

, ssff

, mssff

, etc., but also to take a more complete designation of time, for example HH:mm:ss:ff

, with its short form, for example s:ff

, or even s:

(this should be interpreted as s:00

).

In the case where the string ends with :

, I can obviously put it with two zeros and then cross out all the delimiters, leaving only the numbers and parse the resulting string with ghost.

But it looks like it would be a little easier if there was a way that the offset tokenizer would return from the end of the string (offsets -2, -4, -6, -8) and lexically cast numbers to int

s.

+1

c ++ boost tokenize

macbirdie 13 nov. At 13:06

source to share

3 answers

xtofl · Answer 1 · 2008-11-13T13:56:25+0000

I continue to preach BNF notation. If you can write a grammar that defines your problem, you can easily convert it to a Boost.Spirit parser that does it for you.

TimeString := LongNotation | ShortNotation

LongNotation := Hours Minutes Seconds Fractions

Hours := digit digit
Minutes := digit digit
Seconds := digit digit
Fraction := digit digit

ShortNotation := ShortSeconds Fraction
ShortSeconds := digit

Edit: additional restriction

VerboseNotation = [ [ [ Hours ':' ] Minutes ':' ] Seconds ':' ]  Fraction

Johannes Schaub - litb · Answer 2 · 2008-11-13T13:56:56+0000

Regular expressions come to mind. Something like "^0*?(\\d?\\d?)(\\d?\\d?)(\\d?\\d?)(\\d?\\d?)$"

with boost::regex

. Soles will give you numbers. It shouldn't be hard to accept in your other format the colon between the numbers (see Sep61.myopenid.com's answer). boost::regex

is one of the fastest regular expression parsers.

Steve jessop · Answer 3 · 2008-11-13T14:20:26+0000

In response to the comment "Doesn't mean to be a performance freak, but this solution involves some string copying (input is a constant and std :: string)".

If you really care about performance so much that you can't use a big old library like regex, don't risk a BNF parser, don't want to assume that std :: string :: substr will avoid being copied from (and therefore can't use string STL functions) and can't even copy string characters to buffer and left pane with "0" characters:

void parse(const string &s) {
    string::const_iterator current = s.begin();
    int HH = 0;
    int mm = 0;
    int ss = 0;
    int ff = 0;
    switch(s.size()) {
        case 8:
            HH = (*(current++) - '0') * 10;
        case 7:
            HH += (*(current++) - '0');
        case 6:
            mm = (*(current++) - '0') * 10;
        // ... you get the idea.
        case 1:
            ff += (*current - '0');
        case 0: break;
        default: throw logic_error("invalid date");
        // except that this code goes so badly wrong if the input isn't
        // valid that there not much point objecting to the length...
   }
}

But basically, just 0-initializing these int variables is pretty much the same as copying a string into a char buffer with padding, so I don't expect to see a significant performance difference. So I don't really recommend this solution in real life as an exercise in premature optimization.

Reversible Bias Tokenizer

More articles: