Split char string with multi character delimiter in C
Finding the point at which the desired sequence occurs is pretty simple: strstr
supports this:
char str[] = "this is abc a big abc input string abc to split up";
char *pos = strstr(str, "abc");
So, at this point, it pos
points to the first location abc
in the larger row. Here's where things get a little ugly. strtok
has a nasty design where it 1) modifies the original string and 2) stores a pointer to the current location inside the string.
If we weren't doing roughly the same thing, we could do something like this:
char *multi_tok(char *input, char *delimiter) {
static char *string;
if (input != NULL)
string = input;
if (string == NULL)
return string;
char *end = strstr(string, delimiter);
if (end == NULL) {
char *temp = string;
string = NULL;
return temp;
}
char *temp = string;
*end = '\0';
string = end + strlen(delimiter);
return temp;
}
It works. For example:
int main() {
char input [] = "this is abc a big abc input string abc to split up";
char *token = multi_tok(input, "abc");
while (token != NULL) {
printf("%s\n", token);
token = multi_tok(NULL, "abc");
}
}
outputs something like the expected result:
this is
a big
input string
to split up
However, it's clunky, hard to make it thread-safe (you have to make your internal variable string
thread-local) and generally just crappy design. Using (for example) an interface similar to strtok_r
, we can fix at least the thread safety issue:
typedef char *multi_tok_t;
char *multi_tok(char *input, multi_tok_t *string, char *delimiter) {
if (input != NULL)
*string = input;
if (*string == NULL)
return *string;
char *end = strstr(*string, delimiter);
if (end == NULL) {
char *temp = *string;
*string = NULL;
return temp;
}
char *temp = *string;
*end = '\0';
*string = end + strlen(delimiter);
return temp;
}
multi_tok_t init() { return NULL; }
int main() {
multi_tok_t s=init();
char input [] = "this is abc a big abc input string abc to split up";
char *token = multi_tok(input, &s, "abc");
while (token != NULL) {
printf("%s\n", token);
token = multi_tok(NULL, &s, "abc");
}
}
I guess I'll leave it to that for now - in order to get a really clean interface, we really want to reinvent something like coroutines, and that's probably a bit to post here.
source to share
You can easily write your own parser using strstr()
to achieve the same. The basic algorithm might look like this:
- use
strstr()
to find the first occurrence of the entire delimiter string - mark the index
- copy from the beginning to the marked index, which will be your expected token.
- to parse the input for subsequent entries, adjust the anchor of the starting line to token length + separator line length.
source to share
EDIT: Considered suggestions from Alan and Surav and written some basic code for it.
#include <stdio.h>
#include <string.h>
int main (void)
{
char str[] = "This is abc test abc string";
char* in = str;
char *delim = "abc";
char *token;
do {
token = strstr(in,delim);
if (token)
*token = '\0';
printf("%s\n",in);
in = token+strlen(delim);
}while(token!=NULL);
return 0;
}
source to share