Array - Jump to value

I am trying to calculate the frequency of the start of the letters of words in a dictionary that contains about 140,000 words. I store the frequencies in an array count , counter [0] for a, count [1] for b ... however, when I sum the count array , the value is not equal to the total number of words in the dictionary. I found out that if I reduce the size of the dictionary to 95137, the numbers are equal, but once the dictionary has more than 95137 words, the values ​​from count [0] to count [4] suddenly become very large. I have no idea why .. here is my code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void)
{
    FILE *fp = fopen("testdic.txt", "r");
    int count[26];
    char buffer[30];
    for (int i = 0; i < 26; i++)
        count[i] = 0;
    int total = 0;
    while (1)
    {
        fscanf(fp, "%s", buffer);
        if (feof(fp))
            break;
        count[buffer[0]-97] ++;
        total++;
        if (count[0] > total)            // I used this to find out where the jump occurs
            break;
    }
    printf("%d ", i);
    for (int i = 0; i < 26; i++)
        printf("%d " , count[i]);

}

      

+3


source to share


4 answers


It's hard to see why this code is throwing weird exits since you forget a few debug checks.



  • feof

    should only be used if the read function fails;
  • you are not checking the return value fopen

    ;
  • you are not checking the return value scanf

    ;
  • you are not checking the value buffer[0]

    ;
  • you don't check the length buffer

    in %s

    .
+3


source


In this statement, count[buffer[0]-97] ++;

you are looking at index output by taking the ascii value of the starting letter and subtracting 97, which is the ascii value a

. I'm not sure if you are going to handle a word that starts with an uppercase letter, for example Ascii

, where buffer[0]

is 65 and the expression buffer[0] - 97

evaluates to a negative integer. This can damage the stack.



+1


source


Don't know if this is a problem, but your code should care if they have odd characters. So, just by doing

count[buffer[0]-97] 

      

a little rash if you ask me! I have to make sure that buffer[0] >=97 & buffer[0]<97+26

before executing this line. Otherwise, who knows what you are increasing!

- maybe your 95138th word starts with a funny character?

0


source


Your variable is total

declared right after your array count

, so when you go outside the array, you messed things up. A character less than "a" would be especially bad, but frankly, the first capital letter is enough. Use count[(buffer[0]-'a')%26]++

- modular subdivision makes you stay in the array. It's kluge, but if it works, you might start looking for bad characters.

0


source







All Articles