The strtok function and multithreading

Question

The strtok function and multithreading

I have read many articles about strtok (char * s1, char * s2) and its implementation. However, I still can't figure out what makes it a dangerous function to use in a multithreaded program. Can someone please give me an example of a multithreaded program and explain the problem there? Please, not that I'm looking for an example that shows me where the problem occurs.

ps: strtok (char * s1, char * s2) is part of the C standard library.

0

c multithreading

Nazgol Dec 29. 13 at 12:36 am

source to share

3 answers

In the first call to strtok, you specify the string and delimiters. In subsequent calls, the first parameter is NULL and you just specify the delimiters. strtok remembers the line you passed through.

In a multithreaded environment, this is dangerous because many threads can call strtok with different strings. It will only remember the last one and return the wrong result.

+3

cup Dec 29. 13 at 12:40 am

source to share

The explanation is simple: every time they call it THREAD safe, they literally mean that it's not just your thread, but another thread can change it too! It's like a cake being shared with 5 friends at the same time. The results are unpredictable, who consumed the cake, or who changed it.

Each call to strtok () returns a refrence to a NULL terminated string and uses a static buffer during parsing. Any subsequent function call will only refer to this buffer and it will be modified.! It is independent of who named it, which is why it is not thread safe.

The other sides are strtok_r () using an additional third argument called saveptr (we need to specify it), which is probably used to store this reference for subsequent calls. So it is not more system specific, but in developer management.

Example : (from Stephen Robbins book, Unix System Programming)

Incorrect use of strtok to determine the average number of words per line.

#include <string.h>
#define LINE_DELIMITERS "\n"
#define WORD_DELIMITERS " "

static int wordcount(char *s) {
   int count = 1;

   if (strtok(s, WORD_DELIMITERS) == NULL)
      return 0;
   while (strtok(NULL, WORD_DELIMITERS) != NULL)
      count++;
   return count;
}

double wordaverage(char *s) {      /* return average size of words in s */
   int linecount = 1;
   char *nextline;
   int words;

   nextline = strtok(s, LINE_DELIMITERS);
   if (nextline == NULL)
      return 0.0;
   words = wordcount(nextline);
   while ((nextline = strtok(NULL, LINE_DELIMITERS)) != NULL) {
      words += wordcount(nextline);
      linecount++;
   }
   return (double)words/linecount;
}

The function wordaverage

determines the average number of words per line using strtok

to find the next line. The function then calls wordcount

to count the number of words in that line. Unfortunately, it wordcount

also uses strtok

, this time to analyze words on the line. Each of these functions would be correct on their own if the other didn't call strtok

. The function wordaverage

works correctly for the first line, but when it wordaverage

calls strtok

to parse the second line, the internal state information stored in strtok

was reset to wordcount

.

+1

Maheswaran ravisankar Dec 29. '13 at 3:48

source to share

Andrey Mishchenko · Accepted Answer · 2013-12-29T08:53:56+0000

Here's a concrete example:

First, suppose your program is multithreaded and the following code is executed in one thread of execution:

char str1[] = "split.me.up";

// call this line A
char *word1 = strtok(str1, "."); // returns "split", sets str1[5] = '\0'

// ... 

// call this line B
char *word2 = strtok(NULL, "."); // we hope to get back "me"

And in another thread, the following code is executed:

char str2[] = "multi;token;string";

// call this line C
char *token1 = strtok(str2, ";"); // returns "multi", sets str2[5] = '\0'

// ...

// call this line D
char *token2 = strtok(NULL, ";"); // we hope to get back "token"

The point is, we don't really know what will be in word2

and token2

:

If the commands are executed in the order (A), (B), (C), (D), then we get what we want.

But if, say, commands executed in order (A), (C), (B), (D), then command (B) will look for a label .

in "token;string"

! This is because the first argument NULL

to command (B) tells the strtok

search to continue in the last search string not NULL

that it passed, and since command (C) is already running, strtok

will use str2

.

Command (B) will token;string

then return , while simultaneously setting a new search start character to the terminator NUL

at the end str2

. Then the command (D) will think that it is looking for an empty string, because it will start searching in the terminal str2

NUL

, and will return the same NULL

.

Even if you place commands (A) and (B) next to each other, and commands (C) and (D) are next to each other, there is no guarantee that (B) will be executed immediately after (A) before (C) or (D) etc.

If you create some kind of mutex or alternative guard to protect the use of a function strtok

and call strtok

from a thread that acquired the lock on said mutex, then it's strtok

safe to use.However, it's probably better to just use thread safe strtok_r

, as others have said.

Edit: There is another issue that nobody mentioned, which is what strtok

modifies and potentially uses global (or static, any) variables, and does so in the case of perhaps not - even if you don't rely on repeated calls to strtok

for getting consecutive "tokens" from the same string, it might be unsafe to use them in a multithreaded environment without security, etc ..

The strtok function and multithreading

More articles: