Extending the class in nltk. - python
The goal is to add additional functionality to the wordnet class in nltk
, for example:
from nltk.corpus import wordnet
class WN(wordnet):
def foobar(self):
print 'foobar'
x = WN
WN.foobar()
but it gives error:
Traceback (most recent call last):
File "/home/alvas/workspace/pybabel-fresh/babelnet/utils/pybabel_WordNet.py", line 5, in <module>
class WN(wordnet):
File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/util.py", line 44, in __init__
assert issubclass(reader_cls, CorpusReader)
TypeError: Error when calling the metaclass bases
issubclass() arg 1 must be a class
So, I tried using nltk.corpus.reader.WordNetCorpusReader
( http://www.nltk.org/_modules/nltk/corpus/reader/wordnet.html#WordNetCorpusReader ):
from nltk.corpus.reader import WordNetCorpusReader
class WN(WordNetCorpusReader):
def __init__(self):
self = WN.__init__()
def foobar(self):
return "foobar"
x = WN
x.foobar()
It seems to me that if I use WordNetCorpusReader I need to create it, so I got:
Traceback (most recent call last):
File "/home/alvas/workspace/pybabel-fresh/babelnet/utils/pybabel_WordNet.py", line 13, in <module>
x.foobar()
TypeError: unbound method foobar() must be called with WN instance as first argument (got nothing instead)
Then I tried:
from nltk.corpus.reader import WordNetCorpusReader
class WN(WordNetCorpusReader):
def foobar(self):
return "foobar"
x = WN
for i in x.all_synsets():
print i
[output]:
Traceback (most recent call last):
File "/home/alvas/workspace/pybabel-fresh/babelnet/utils/pybabel_WordNet.py", line 10, in <module>
for i in x.all_synsets():
TypeError: unbound method all_synsets() must be called with WN instance as first argument (got nothing instead)
How do I extend the nltk wordnet API with new features? Note. The goal is to create a new class with new functionality.
source to share
Your second try seems to be the closest. The problem with your constructor:
class WN(WordNetCorpusReader):
def __init__(self):
self = WN.__init__() # needs an instance as the first argument, recursive, and no need to assign to self
The method __init__
needs an instance as its first argument (here self
) and furthermore, you are calling a method on the __init__
wrong class. This will result in an error RuntimeError: maximum recursion depth exceeded
. Finally, you just want to call the method; you don't need to assign the results to the method self
.
I think you wanted to do this instead:
from nltk.corpus.reader import WordNetCorpusReader
import nltk
class WN(WordNetCorpusReader):
def __init__(self, *args):
WordNetCorpusReader.__init__(self, *args)
def foobar(self):
return "foobar"
The hook is that you will need to pass the required arguments to WordNetCorpusReader.__init__
your new class. In my version, nltk
this means that you need to pass an argument root
like this:
>>> x = WN(nltk.data.find('corpora/wordnet'))
>>> x.foobar()
'foobar'
>>> x.synsets('run')
[Synset('run.n.01'), Synset('test.n.05'), ...]
A more efficient approach
A more efficient way to do the same:
class WN(WordNetCorpusReader):
root = nltk.data.find('corpora/wordnet') # make root a class variable, so you only need to load it once
def __init__(self, *args, **kwargs):
WordNetCorpusReader.__init__(self, WN.root, *args, **kwargs) # add root yourself here, so no arguments are required
def foobar(self):
return "foobar"
Now test it:
>>> x = WN()
>>> x.foobar()
'foobar'
>>> x.synsets('run')
[Synset('run.n.01'), Synset('test.n.05'), ...]
By the way, I loved seeing your work on the tag nltk
.
source to share