How to find out if a word exists in English using nltk

Question

How to find out if a word exists in English using nltk

I am looking for a suitable solution to this question. This question has been asked many times and I have not found any suitable answer. I need to use corpus in NLTK to determine if a word is an English word

I tried to do:

wordnet.synsets(word)

This is not a word for many common words. Using an English wordlist and doing a file search is not an option. Using enchantments is not an option either. If there is another library that can do the same, please provide using api. If not, specify the corpus in nltk, which has all words in English.

+2

python python-3.x nlp nltk wordnet

akshitBhatia 17 Mar At 12:54 pm

source to share

2 answers

Kasramvd · Answer 1 · 2015-03-17T13:01:57+0000

NLTK includes several corpora , which are nothing more than word lists. Words Corpus is a Unix / usr / share / dict / words file used by some spelling checks . We can use it to find unusual or misspelled words in the text corpus, as shown in the picture:

def unusual_words(text):
    text_vocab = set(w.lower() for w in text.split() if w.isalpha())
    english_vocab = set(w.lower() for w in nltk.corpus.words.words())
    unusual = text_vocab - english_vocab
    return sorted(unusual)

And in this case, you can check your word's membership on english_vocab

.

>>> import nltk
>>> english_vocab = set(w.lower() for w in nltk.corpus.words.words())
>>> 'a' in english_vocab
True
>>> 'this' in english_vocab
True
>>> 'nothing' in english_vocab
True
>>> 'nothingg' in english_vocab
False
>>> 'corpus' in english_vocab
True
>>> 'Terminology'.lower() in english_vocab
True
>>> 'sorted' in english_vocab
True

Saurabh Malviya · Answer 2 · 2017-06-20T05:45:01+0000

I tried the above approach, but for many words that should exist, so I tried wordnet. I think this has a more comprehensive vocabulary.

from nltk.corpus import wordnet if wordnet.synsets(word): #Do something else: #Do some otherthing

How to find out if a word exists in English using nltk

More articles: