Split char string with multi character delimiter in C

I want to split char *string

based on a multi-character delimiter. I know what is strtok()

used to split a string, but it works with single character delimiter.

I want to split a char * string based on a substring like "abc"

or any other substring. How can this be achieved?

+3


source to share


3 answers


Finding the point at which the desired sequence occurs is pretty simple: strstr

supports this:

char str[] = "this is abc a big abc input string abc to split up";
char *pos = strstr(str, "abc");

      

So, at this point, it pos

points to the first location abc

in the larger row. Here's where things get a little ugly. strtok

has a nasty design where it 1) modifies the original string and 2) stores a pointer to the current location inside the string.

If we weren't doing roughly the same thing, we could do something like this:

char *multi_tok(char *input, char *delimiter) {
    static char *string;
    if (input != NULL)
        string = input;

    if (string == NULL)
        return string;

    char *end = strstr(string, delimiter);
    if (end == NULL) {
        char *temp = string;
        string = NULL;
        return temp;
    }

    char *temp = string;

    *end = '\0';
    string = end + strlen(delimiter);
    return temp;
}

      

It works. For example:



int main() {
    char input [] = "this is abc a big abc input string abc to split up";

    char *token = multi_tok(input, "abc");

    while (token != NULL) {
        printf("%s\n", token);
        token = multi_tok(NULL, "abc");
    }
}

      

outputs something like the expected result:

this is
 a big
 input string
 to split up

      

However, it's clunky, hard to make it thread-safe (you have to make your internal variable string

thread-local) and generally just crappy design. Using (for example) an interface similar to strtok_r

, we can fix at least the thread safety issue:

typedef char *multi_tok_t;

char *multi_tok(char *input, multi_tok_t *string, char *delimiter) {
    if (input != NULL)
        *string = input;

    if (*string == NULL)
        return *string;

    char *end = strstr(*string, delimiter);
    if (end == NULL) {
        char *temp = *string;
        *string = NULL;
        return temp;
    }

    char *temp = *string;

    *end = '\0';
    *string = end + strlen(delimiter);
    return temp;
}

multi_tok_t init() { return NULL; }

int main() {
    multi_tok_t s=init();

    char input [] = "this is abc a big abc input string abc to split up";

    char *token = multi_tok(input, &s, "abc");

    while (token != NULL) {
        printf("%s\n", token);
        token = multi_tok(NULL, &s, "abc");
    }
}

      

I guess I'll leave it to that for now - in order to get a really clean interface, we really want to reinvent something like coroutines, and that's probably a bit to post here.

+3


source


You can easily write your own parser using strstr()

to achieve the same. The basic algorithm might look like this:



  • use strstr()

    to find the first occurrence of the entire delimiter string
  • mark the index
  • copy from the beginning to the marked index, which will be your expected token.
  • to parse the input for subsequent entries, adjust the anchor of the starting line to token length + separator line length.
+1


source


EDIT: Considered suggestions from Alan and Surav and written some basic code for it.

#include <stdio.h>

#include <string.h>

int main (void)
{
  char str[] = "This is abc test abc string";

  char* in = str;
  char *delim = "abc";
  char *token;

  do {

    token = strstr(in,delim);

    if (token) 
      *token = '\0';

    printf("%s\n",in);

    in = token+strlen(delim);

  }while(token!=NULL);


  return 0;
}

      

+1


source







All Articles