How to read from a C ++ program before reading a character, then skip some characters and continue reading again

How can I read a file until a specific character is reached and then search for the next character and continue reading in C ++.

In my program, I use some HTML syntax and generate an htm file ... So in my C ++ code, I have added tags ... but when I read from my htm file, I want it to not include tags.

What I plan to do is read the file before '<' is encountered, then just look for period until ">" is encountered and keep reading from there.

Please help me and guide me with this. I am not very experienced with file input output in C ++ .. Thanks .. :)

+3


source to share


4 answers


First of all, you need to know that getting it right is more difficult than you think.

Just answering the question as you asked for it, you can use istream::get

to read a character at a time until you get a '<'. You can use ignore

to ignore characters until the next ">" in the stream.



Coming back to the first point, however, will generally not work correctly. In particular, it is entirely possible that the tag contains a string, and the string (in turn) contains ">", which is not the closing of the tag. So, to have any hope of properly processing the HTML, you need to parse the strings inside the tags, and when you find them, skip over their contents, rather than process any ">" they might contain as a trailing tag.

+5


source


In general, to read a file until reaching a certain character you use std::getline

and you set the second parameter to your terminator, so if you read as long as the '<' character you can do

std::getline( infile, str, '<' );

      

you can do the same with the symbol >



In your case, if you are parsing HTML, there are probably certain parsers for this. I think HTML1.1 is XML compliant, but HTML1.0 is not like it was not always necessary to cover all your tags, so the XML parser won't necessarily work.

You will need to assume that the open and close tags are not part of the comments or quoted text, and the methodology described above does not promise you that you will need a complete state machine.

+3


source


The next is read from standard input; modify / reuse getchar () calls to read from elsewhere.

int c;

c = getchar();
while ( c != EOF ) {
    while ( c != '<' && c != EOF) {
        /* Do something with character outside tag? */
        c = getchar();
    }
    while ( c != '>' && c != EOF ) {
        /* Do something with character inside tag? */
        c = getchar();
    }
}    

      

+1


source


Here are some guidelines.

  • You can read the file line by line getLine

    from ifstream

    and store each line instd::string

  • You can use the method std::string.find()

    to find symbols <

    and >

    .

  • You can use the method std::string.substr()

    to get substrings.

  • You can group the rows as needed in std::vector

    .

You won't get the full implementation here, but it should be enough to get you started.

0


source







All Articles