How to find all occurrences of a substring in C

Question

How to find all occurrences of a substring in C

I am trying to write a syntax program in C that will display specific pieces of text from an HTML document. To do this, I need to find each instance of the substring "name": in the document; however, the C function strstr only finds the first instance of the substring. I cannot find a function that will find anything outside of the first instance, and I considered removing each substring after I found it so that strstr would return the next one. I can't seem to get any of these approaches to work.

By the way, I know that the while loop limits this to six iterations, but I was just testing this to see if I can get this function to work in the first place.

while(entry_count < 6)
{   
    printf("test");
    if((ptr = strstr(buffer, "\"name\":")) != NULL)
    {   
        ptr += 8;
        int i = 0;
        while(*ptr != '\"')
        {   
            company_name[i] = *ptr;
            ptr++;
            i++;
        }   
        company_name[i] = '\n';
        int j;
        for(j = 0; company_name[j] != '\n'; j++)
            printf("%c", company_name[j]);
        printf("\n");
        strtok(buffer, "\"name\":");
        entry_count++;
    }   
}

+3

c html algorithm parsing

Luca del signore Apr 25. 15 at 14:43

source to share

1 answer

Ilmari Karonen · Accepted Answer · 2015-04-25T14:46:31+0000

Just pass the returned pointer plus one back to strstr()

to find the next match:

char *ptr = strstr(buffer, target);
while (ptr) {
    /* ... do something with ptr ... */
    ptr = strstr(ptr+1, target);
}

Ps. While you can of course do this, I would suggest that you might wish to consider more appropriate tools for the job:

C is a very low-level language and trying to write parsing code in it is time consuming (especially if you insist on coding everything from scratch instead of using existing parsing libraries or parser generators) and error prone (some of which, like buffer overflows, can create security holes). There are many higher-level scripting languages (such as Perl, Ruby, Python, or even JavaScript) that are much better suited for such tasks.
When parsing HTML, you really should use a suitable HTML parser (preferably combined with a good DOM constructor and query tool). This will allow you to find the data you want based on the structure of the document, rather than just substrings in your HTML source code. A true HTML parser will also transparently deal with issues such as character set conversion and decoding of character objects. (Yes, there are HTML parsers for C like Gumbo and Hubbub , so you can and should use it even if you insist on sticking to C.)

How to find all occurrences of a substring in C

More articles: