Does the spell check algorithm print everything, not just typos (Python)?

Basically I'm trying to code a simple spell checker program that will prompt you for an input file and then parse the input file for possible spelling errors (using a binary search to see if a word is in the dictionary) before printing it into the output file. However, it currently outputs everything in the input file, not just errors ... My code looks like this:

import re

with open('DICTIONARY1.txt', 'r') as file:
    content = file.readlines()
    dictionary = []
    for line in content:
        line = line.rstrip()
        dictionary.append(line)

def binary_search(array, target, low, high):
    mid = (low + high) // 2
    if low > high:
        return -1
    elif array[mid] == target:
        return mid
    elif target < array[mid]:
        return binary_search(array, target, low, mid-1)
    else:
        return binary_search(array, target, mid+1, high)

input = input("Please enter file name of file to be analyzed: ")
infile = open(input, 'r')
contents = infile.readlines()
text = []
for line in contents:
    for word in line.split():
        word = re.sub('[^a-z\ \']+', " ", word.lower())
        text.append(word)
infile.close()
outfile = open('TYPO.txt', 'w')
for data in text:
    if data.strip() == '':
        pass
    elif binary_search(dictionary, data, 0, len(data)) == -1:
        outfile.write(data + "\n")
    else:
        pass

file.close
outfile.close

      

I can't figure out what happened. :( Any help would be very much appreciated! Thanks. :)

+3


source to share


1 answer


I tried replacing len(data)

with len(dictionary)

as it made more sense to me and seems to work in my very limited tests.



I think you were passing the length of the word in question as the upper bound of the dictionary. So if you were looking for the word "dog", you only checked the first 3 words in the dictionary, and since your dictionary is probably very large, almost every word was never found (which is why every word was in the output file).

+1


source







All Articles