Lemmatize plural nouns using nltk and wordnet

Question

Lemmatize plural nouns using nltk and wordnet

I want lemmatize with

from nltk import word_tokenize, sent_tokenize, pos_tag
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
lmtzr = WordNetLemmatizer()
POS = pos_tag(text)

def get_wordnet_pos(treebank_tag):
        #maps pos tag so lemmatizer understands
        from nltk.corpus import wordnet
        if treebank_tag.startswith('J'):
            return wordnet.ADJ
        elif treebank_tag.startswith('V'):
            return wordnet.VERB
        elif treebank_tag.startswith('N'):
            return wordnet.NOUN
        elif treebank_tag.startswith('R'):
            return wordnet.ADV
        else:
            return wordnet.NOUN
 lmtzr.lemmatize(text[i], get_wordnet_pos(POS[i][1]))

The problem is that the POS tag gets that "procaspases" is "NNS", but how do I convert NNS to wordnet since "procaspases" continues to be "procaspaseS" even after lemmatizer.

+3

python nltk lemmatization wordnet

user99889 June 24. 15 at 2:22 am

source to share

2 answers

justhelping · Answer 1 · 2016-12-08T00:41:35+0000

NLTK takes care of most plurals, not just removing the ending.

import nltk
from nltk.stem.wordnet import WordNetLemmatizer

Lem = WordNetLemmatizer()

phrase = 'cobblers ants women boys needs finds binaries hobbies busses wolves'

words = phrase.split()
for word in words :
  lemword = Lem.lemmatize(word)
  print(lemword)

Conclusion: shoemaker ant woman boy need to find binary hobby bus wolf

Charles J. Daniels · Answer 2 · 2015-07-03T17:46:35+0000

I can lemmatize things easily using wordnet.morphy:

>>> from nltk.corpus import wordnet
>>> wordnet.morphy('cats')
u'cat'

Note that procaspases are not in WordNet (caspases, however, and morphine will give caspase as a lemma) and probably your lemmatizer just won't recognize it. Unless you have trouble lecturing other words, this is probably just foreign to implement.

Lemmatize plural nouns using nltk and wordnet

More articles: