How to stop java spell checker from fixing duplicate words

I have implemented a program that does the following:

  • scan all words in a webpage to a string (using jsoup)
  • Filter all HTML markup and code
  • Put these words in a spell checker and suggest sentences.

The spell checker loads the dictionary.txt file into an array and compares the string input with the words inside the dictionary.

My current problem is that when the input contains the same word multiple times, like "the program is worst", the code prints out

You entered 'teh', did you mean 'the'?
You entered 'teh', did you mean 'the'?

      

Sometimes a website will have multiple words over and over and it can get messy.

If possible, typing a word along with how many times it was misspelled would be ideal, but giving a limit to each word printed would be good enough.

My program has multiple methods and two classes, but the spell check method is below:

Note: the source code contains some if statements that remove punctuation marks, but I've removed them for clarity.

static boolean suggestWord;

public static String checkWord(String wordToCheck) {
        String wordCheck;
        String word = wordToCheck.toLowerCase();

    if ((wordCheck = (String) dictionary.get(word)) != null) {
        suggestWord = false; // no need to ask for suggestion for a correct
                                // word.
        return wordCheck;
    }

    // If after all of these checks a word could not be corrected, return as
    // a misspelled word.
    return word;
}

      

TEMPORARY EDIT: As requested, complete code:

Class 1:

public class ParseCleanCheck {

        static Hashtable<String, String> dictionary;// To store all the  words of the
        // dictionary
        static boolean suggestWord;// To indicate whether the word is spelled
                                    // correctly or not.

        static Scanner urlInput = new Scanner(System.in);
        public static String cleanString;
        public static String url = "";
        public static boolean correct = true;


        /**
         * PARSER METHOD
         */
        public static void PageScanner() throws IOException {
            System.out.println("Pick an english website to scan.");

            // This do-while loop allows the user to try again after a mistake
            do {
                try {
                    System.out.println("Enter a URL, starting with http://");
                    url = urlInput.nextLine();
                    // This creates a document out of the HTML on the web page
                    Document doc = Jsoup.connect(url).get();
                    // This converts the document into a string to be cleaned
                    String htmlToClean = doc.toString();
                    cleanString = Jsoup.clean(htmlToClean, Whitelist.none());


                    correct = false;
                } catch (Exception e) {
                    System.out.println("Incorrect format for a URL. Please try again.");
                }
            } while (correct);
        }

        /**
         * SPELL CHECKER METHOD
         */
        public static void SpellChecker() throws IOException {
            dictionary = new Hashtable<String, String>();
            System.out.println("Searching for spelling errors ... ");

            try {
                // Read and store the words of the dictionary
                BufferedReader dictReader = new BufferedReader(new FileReader("dictionary.txt"));

                while (dictReader.ready()) {
                    String dictInput = dictReader.readLine();
                    String[] dict = dictInput.split("\\s"); // create an array of
                                                            // dictionary words

                    for (int i = 0; i < dict.length; i++) {
                        // key and value are identical
                        dictionary.put(dict[i], dict[i]);
                    }
                }
                dictReader.close();
                String user_text = "";

                // Initializing a spelling suggestion object based on probability
                SuggestSpelling suggest = new SuggestSpelling("wordprobabilityDatabase.txt");

                // get user input for correction
                {

                    user_text = cleanString;
                    String[] words = user_text.split(" ");

                    int error = 0;

                    for (String word : words) {
                        if(!dictionary.contains(word)) {
                            checkWord(word);


                            dictionary.put(word, word);
                        }
                        suggestWord = true;
                        String outputWord = checkWord(word);

                        if (suggestWord) {
                            System.out.println("Suggestions for " + word + " are:  " + suggest.correct(outputWord) + "\n");
                            error++;
                        }
                    }

                    if (error == 0) {
                        System.out.println("No mistakes found");
                    }
                }

            } catch (IOException e) {
                e.printStackTrace();
                System.exit(-1);
            }
        }

        /**
         * METHOD TO SPELL CHECK THE WORDS IN A STRING. IS USED IN SPELL CHECKER
         * METHOD THROUGH THE "WORD" STRING
         */

        public static String checkWord(String wordToCheck) {
            String wordCheck;
            String word = wordToCheck.toLowerCase();

        if ((wordCheck = (String) dictionary.get(word)) != null) {
            suggestWord = false; // no need to ask for suggestion for a correct
                                    // word.
            return wordCheck;
        }

        // If after all of these checks a word could not be corrected, return as
        // a misspelled word.
        return word;
    }
    }

      

There is a second class (SuggestSpelling.java) that contains a probability calculator, but this is not relevant right now if you have not planned to run the code for yourself.

+3


source to share


1 answer


Use HashSet

to detect duplicates -

Set<String> wordSet = new HashSet<>();

      

And store every word of the input sentence. If any word already exists when inserted into HashSet

, do not call checkWord(String wordToCheck)

for that word. Something like that -

String[] words = // split input sentence into words
for(String word: words) {
    if(!wordSet.contains(word)) {
        checkWord(word);
        // do stuff
        wordSet.add(word);
    }
}

      



Edit

// ....
{

    user_text = cleanString;
    String[] words = user_text.split(" ");
    Set<String> wordSet = new HashSet<>();

    int error = 0;

    for (String word : words) {
        // wordSet is another data-structure. Its only for duplicates checking, don't mix it with dictionary
        if(!wordSet.contains(word)) {

            // put all your logic here

            wordSet.add(word);
        }
    }

    if (error == 0) {
        System.out.println("No mistakes found");
    }
}
// .... 

      

You have other errors as well, as you are passing String wordCheck

as an argument checkWord

and re-declaring it internally checkWord()

again String wordCheck;

, which is not correct. Also check out the other parts.

+5


source







All Articles