The strtok function and multithreading
I have read many articles about strtok (char * s1, char * s2) and its implementation. However, I still can't figure out what makes it a dangerous function to use in a multithreaded program. Can someone please give me an example of a multithreaded program and explain the problem there? Please, not that I'm looking for an example that shows me where the problem occurs.
ps: strtok (char * s1, char * s2) is part of the C standard library.
source to share
Here's a concrete example:
First, suppose your program is multithreaded and the following code is executed in one thread of execution:
char str1[] = "split.me.up";
// call this line A
char *word1 = strtok(str1, "."); // returns "split", sets str1[5] = '\0'
// ...
// call this line B
char *word2 = strtok(NULL, "."); // we hope to get back "me"
And in another thread, the following code is executed:
char str2[] = "multi;token;string";
// call this line C
char *token1 = strtok(str2, ";"); // returns "multi", sets str2[5] = '\0'
// ...
// call this line D
char *token2 = strtok(NULL, ";"); // we hope to get back "token"
The point is, we don't really know what will be in word2
and token2
:
If the commands are executed in the order (A), (B), (C), (D), then we get what we want.
But if, say, commands executed in order (A), (C), (B), (D), then command (B) will look for a label .
in "token;string"
! This is because the first argument NULL
to command (B) tells the strtok
search to continue in the last search string not NULL
that it passed, and since command (C) is already running, strtok
will use str2
.
Command (B) will token;string
then return , while simultaneously setting a new search start character to the terminator NUL
at the end str2
. Then the command (D) will think that it is looking for an empty string, because it will start searching in the terminal str2
NUL
, and will return the same NULL
.
Even if you place commands (A) and (B) next to each other, and commands (C) and (D) are next to each other, there is no guarantee that (B) will be executed immediately after (A) before (C) or (D) etc.
If you create some kind of mutex or alternative guard to protect the use of a function strtok
and call strtok
from a thread that acquired the lock on said mutex, then it's strtok
safe to use.However, it's probably better to just use thread safe strtok_r
, as others have said.
Edit: There is another issue that nobody mentioned, which is what strtok
modifies and potentially uses global (or static, any) variables, and does so in the case of perhaps not - even if you don't rely on repeated calls to strtok
for getting consecutive "tokens" from the same string, it might be unsafe to use them in a multithreaded environment without security, etc ..
source to share
In the first call to strtok, you specify the string and delimiters. In subsequent calls, the first parameter is NULL and you just specify the delimiters. strtok remembers the line you passed through.
In a multithreaded environment, this is dangerous because many threads can call strtok with different strings. It will only remember the last one and return the wrong result.
source to share
The explanation is simple: every time they call it THREAD safe, they literally mean that it's not just your thread, but another thread can change it too! It's like a cake being shared with 5 friends at the same time. The results are unpredictable, who consumed the cake, or who changed it.
Each call to strtok () returns a refrence to a NULL terminated string and uses a static buffer during parsing. Any subsequent function call will only refer to this buffer and it will be modified.! It is independent of who named it, which is why it is not thread safe.
The other sides are strtok_r () using an additional third argument called saveptr (we need to specify it), which is probably used to store this reference for subsequent calls. So it is not more system specific, but in developer management.
Example : (from Stephen Robbins book, Unix System Programming)
Incorrect use of strtok to determine the average number of words per line.
#include <string.h>
#define LINE_DELIMITERS "\n"
#define WORD_DELIMITERS " "
static int wordcount(char *s) {
int count = 1;
if (strtok(s, WORD_DELIMITERS) == NULL)
return 0;
while (strtok(NULL, WORD_DELIMITERS) != NULL)
count++;
return count;
}
double wordaverage(char *s) { /* return average size of words in s */
int linecount = 1;
char *nextline;
int words;
nextline = strtok(s, LINE_DELIMITERS);
if (nextline == NULL)
return 0.0;
words = wordcount(nextline);
while ((nextline = strtok(NULL, LINE_DELIMITERS)) != NULL) {
words += wordcount(nextline);
linecount++;
}
return (double)words/linecount;
}
The function wordaverage
determines the average number of words per line using strtok
to find the next line. The function then calls wordcount
to count the number of words in that line. Unfortunately, it wordcount
also uses strtok
, this time to analyze words on the line. Each of these functions would be correct on their own if the other didn't call strtok
. The function wordaverage
works correctly for the first line, but when it wordaverage
calls strtok
to parse the second line, the internal state information stored in strtok
was reset to wordcount
.
source to share