Lemmatize plural nouns using nltk and wordnet

I want lemmatize with

from nltk import word_tokenize, sent_tokenize, pos_tag
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.corpus import wordnet
lmtzr = WordNetLemmatizer()
POS = pos_tag(text)

def get_wordnet_pos(treebank_tag):
        #maps pos tag so lemmatizer understands
        from nltk.corpus import wordnet
        if treebank_tag.startswith('J'):
            return wordnet.ADJ
        elif treebank_tag.startswith('V'):
            return wordnet.VERB
        elif treebank_tag.startswith('N'):
            return wordnet.NOUN
        elif treebank_tag.startswith('R'):
            return wordnet.ADV
        else:
            return wordnet.NOUN
 lmtzr.lemmatize(text[i], get_wordnet_pos(POS[i][1]))

      

The problem is that the POS tag gets that "procaspases" is "NNS", but how do I convert NNS to wordnet since "procaspases" continues to be "procaspaseS" even after lemmatizer.

+3


source to share


2 answers


NLTK takes care of most plurals, not just removing the ending.

import nltk
from nltk.stem.wordnet import WordNetLemmatizer

Lem = WordNetLemmatizer()

phrase = 'cobblers ants women boys needs finds binaries hobbies busses wolves'

words = phrase.split()
for word in words :
  lemword = Lem.lemmatize(word)
  print(lemword)

      



Conclusion: shoemaker ant woman boy need to find binary hobby bus wolf

+4


source


I can lemmatize things easily using wordnet.morphy:

>>> from nltk.corpus import wordnet
>>> wordnet.morphy('cats')
u'cat'

      



Note that procaspases are not in WordNet (caspases, however, and morphine will give caspase as a lemma) and probably your lemmatizer just won't recognize it. Unless you have trouble lecturing other words, this is probably just foreign to implement.

+3


source







All Articles