Array - Jump to value
I am trying to calculate the frequency of the start of the letters of words in a dictionary that contains about 140,000 words. I store the frequencies in an array count , counter [0] for a, count [1] for b ... however, when I sum the count array , the value is not equal to the total number of words in the dictionary. I found out that if I reduce the size of the dictionary to 95137, the numbers are equal, but once the dictionary has more than 95137 words, the values from count [0] to count [4] suddenly become very large. I have no idea why .. here is my code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(void)
{
FILE *fp = fopen("testdic.txt", "r");
int count[26];
char buffer[30];
for (int i = 0; i < 26; i++)
count[i] = 0;
int total = 0;
while (1)
{
fscanf(fp, "%s", buffer);
if (feof(fp))
break;
count[buffer[0]-97] ++;
total++;
if (count[0] > total) // I used this to find out where the jump occurs
break;
}
printf("%d ", i);
for (int i = 0; i < 26; i++)
printf("%d " , count[i]);
}
source to share
It's hard to see why this code is throwing weird exits since you forget a few debug checks.
-
feof
should only be used if the read function fails; - you are not checking the return value
fopen
; - you are not checking the return value
scanf
; - you are not checking the value
buffer[0]
; - you don't check the length
buffer
in%s
.
source to share
In this statement, count[buffer[0]-97] ++;
you are looking at index output by taking the ascii value of the starting letter and subtracting 97, which is the ascii value a
. I'm not sure if you are going to handle a word that starts with an uppercase letter, for example Ascii
, where buffer[0]
is 65 and the expression buffer[0] - 97
evaluates to a negative integer. This can damage the stack.
source to share
Don't know if this is a problem, but your code should care if they have odd characters. So, just by doing
count[buffer[0]-97]
a little rash if you ask me! I have to make sure that buffer[0] >=97 & buffer[0]<97+26
before executing this line. Otherwise, who knows what you are increasing!
- maybe your 95138th word starts with a funny character?
source to share
Your variable is total
declared right after your array count
, so when you go outside the array, you messed things up. A character less than "a" would be especially bad, but frankly, the first capital letter is enough. Use count[(buffer[0]-'a')%26]++
- modular subdivision makes you stay in the array. It's kluge, but if it works, you might start looking for bad characters.
source to share