String tokenizer without using strtok ()

I am in the process of writing a text tokenizer without using strtok (). This is mostly for my own improvement and for a better understanding of pointers. I think I almost don't have it, but I get the following errors:

myToc.c:25 warning: assignment makes integer from pointer without a cast
myToc.c:35 (same as above)
myToc.c:44 error: invalid type argument of 'unary *' (have 'int')

      

What I am doing is looping through the string sent to the method, finding each delimiter and replacing it with "\ 0". The array "ptr" is assumed to have pointers to split substrings. This is what I have so far.

#include <string.h>

void myToc(char * str){
   int spcCount = 0;
   int ptrIndex = 0;

   int n = strlen(str);

   for(int i = 0; i < n; i++){
      if(i != 0 && str[i] == ' ' && str[i-1] != ' '){
         spcCount++;
      }
   }

   //Pointer array; +1 for \0 character, +1 for one word more than number of spaces
   int *ptr = (int *) calloc(spcCount+2, sizeof(char));
   ptr[spcCount+1] = '\0';
   //Used to differentiate separating spaces from unnecessary ones
   char temp;

   for(int j = 0; j < n; j++){
      if(j == 0){
         /*Line 25*/ ptr[ptrIndex] = &str[j];
         temp = str[j];
         ptrIndex++;
      }
      else{
         if(str[j] == ' '){
            temp = str[j];
            str[j] = '\0';
         }
         else if(str[j] != ' ' && str[j] != '\0' && temp == ' '){
            /*Line 35*/ ptr[ptrIndex] = &str[j];
            temp = str[j];
            ptrIndex++;
         }
      }
   }

   int k = 0;
   while(ptr[k] != '\0'){
      /*Line 44*/ printf("%s \n", *ptr[k]);
      k++;
   }
}

      

I can see where the errors are occurring, but I'm not sure how to fix them. What should I do? Am I allocating memory correctly or is it just a problem with the way I specify addresses?

+3


source to share


3 answers


Invalid pointer array. It sounds like you want:

char **ptr =  calloc(spcCount+2, sizeof(char*));

      

Also, if I am reading your code correctly, there is no null byte since this array is not a string.

Also, you need to fix:



while(ptr[k] != '\0'){
  /*Line 44*/ printf("%s \n", *ptr[k]);
  k++;
}

      

No difference is required and if you remove the null ptr this should work:

for ( k = 0; k < ptrIndex; k++ ){
  /*Line 44*/ printf("%s \n", ptr[k]);
}

      

+3


source


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void myToc(char * str){
    int spcCount = 0;
    int ptrIndex = 0;

    int n = strlen(str);

    for(int i = 0; i < n; i++){
        if(i != 0 && str[i] == ' ' && str[i-1] != ' '){
            spcCount++;
        }
    }

    char **ptr = calloc(spcCount+2, sizeof(char*));
    //ptr[spcCount+1] = '\0';//0 initialized by calloc 
    char temp = ' ';//can simplify the code

    for(int j = 0; j < n; j++){
        if(str[j] == ' '){
            temp = str[j];
            str[j] = '\0';
        } else if(str[j] != '\0' && temp == ' '){//can omit `str[j] != ' ' &&`
            ptr[ptrIndex++] = &str[j];
            temp = str[j];
        }
    }

    int k = 0;
    while(ptr[k] != NULL){//better use NULL
        printf("%s \n", ptr[k++]);
    }
    free(ptr);
}

int main(){
    char test1[] = "a b c";
    myToc(test1);
    char test2[] = "hello world";
    myToc(test2);
    return 0;
}

      



+1


source


Update: I tried this at http://www.compileonline.com/compile_c99_online.php with fixes for lines 25, 35 and 44 and with a main function called myToc () twice. I initially ran into segfaults when trying to write null characters before str[]

, but that was only because the lines I was going through were (apparently unmodifiable) literals. The code below worked as desired when I allocated a text buffer and wrote the lines there before passing them in. This version can also be modified to return an array of pointers, which will then point to tokens.

(The code below also works even if the string parameter is not modified while it myToc()

creates a local copy of the string; but this will not have the desired effect if the purpose of this function is to return a list of tokens rather than just print them.)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void myToc(char * str){
   int spcCount = 0;
   int ptrIndex = 0;

   int n = strlen(str);

   for(int i = 0; i < n; i++){
      if(i != 0 && str[i] == ' ' && str[i-1] != ' '){
         spcCount++;
      }
   }

   //Pointer array;  +1 for one word more than number of spaces
   char** ptr = (char**) calloc(spcCount+2, sizeof(char*));
   //Used to differentiate separating spaces from unnecessary ones
   char temp;

   for(int j = 0; j < n; j++){
      if(j == 0){
         ptr[ptrIndex] = &str[j];
         temp = str[j];
         ptrIndex++;
      }
      else{
         if(str[j] == ' '){
            temp = str[j];
            str[j] = '\0';
         }
         else if(str[j] != ' ' && str[j] != '\0' && temp == ' '){
            ptr[ptrIndex] = &str[j];
            temp = str[j];
            ptrIndex++;
         }
      }
   }

   for (int k = 0; k < ptrIndex; ++k){
      printf("%s \n", ptr[k]);
   }
}

int main (int n, char** v)
{
  char text[256];
  strcpy(text, "a b c");
  myToc(text);
  printf("-----\n");
  strcpy(text, "hello world");
  myToc(text);
}

      

I would prefer simpler code. Basically you want a pointer to the first nonblank character in str[]

, then a pointer to every nonblank (except the first) character that is preceded by a space. Your first loop almost gets the idea, except that it looks for spaces that are preceded by non-spaces. (Alternatively, you can run this loop in i = 1

and not try i != 0

on every iteration.)

I could just allocate an array of char*

size sizeof(char*) * (n + 1)/2

to hold the pointers, rather than iterate over the string twice (i.e., I would skip the first loop, which should just size the array). In any case, if ptr[0]

it is not empty, I would write its address into an array; then looping for (int j = 1; j < n; ++j)

, write the address str[j]

into an array if str[j]

not empty but str[j - 1]

empty - basically what you do, but with fewer if

and fewer helper variables. Smaller code means less opportunity for error if the code is clean and makes sense.

Previous notes:

int *ptr =

declares an array int

. For an array of pointers to char

you want

char** ptr = (char**) calloc(spcCount+2, sizeof(char*));

      

The comment before this line also seems to indicate some confusion. There is no trailing zero in your array of pointers, and you don't need to allocate space for one, so it spcCount+2

might be spcCount + 1

.

This is also suspicious:

while(ptr[k] != '\0')

      

It looks like it would work given the way you used calloc

(you need spcCount+2

to make this work), but I would feel more confident writing something like this:

for (k = 0; k < ptrIndex; ++k)

      

Not what caused the segfault, I just have a hard time comparing pointer ( ptr[k]

) to \0

(which you usually compare to char

).

0


source







All Articles