SklearnClassifier object has no _vectorizer attribute

I got a brand new laptop and installed the latest NLTK and SciKit-Learn. I used an old sentiment analysis script that downloaded an old pickle that I created earlier this year and I got the error below. It looks like the SciKitClassifier wrapper from NLTK now has a _vectorizer attribute where there was none before.

  File "c:\users\yoprado\pycharmprojects\gnip_sentiment\gnip_sentiment\main.py", line 64, in mongoaddsentiment
    MongoSentiment(mongo_server, mongo_port, dbname, colname, pickle_file)
  File "c:\users\yoprado\pycharmprojects\gnip_sentiment\gnip_sentiment\MongoSentiment.py", line 61, in MongoSentiment
    senti = classifier_eng.classify(get_features(cleanedBody.split()))
  File "C:\Python27\lib\site-packages\nltk-3.0.0-py2.7-win32.egg\nltk\classify\api.py", line 54, in classify
    return self.classify_many([featureset])[0]
  File "C:\Python27\lib\site-packages\nltk-3.0.0-py2.7-win32.egg\nltk\classify\scikitlearn.py", line 84, in classify_many
    X = self._vectorizer.transform(featuresets)
AttributeError: 'SklearnClassifier' object has no attribute '_vectorizer'

      

I used the same script that created the sort classifier before and the new pickle seems to work pretty well. It seems that something in the code was changed with a recent update. Is there a way to convert the current pickle to the new format?

thank

+3


source to share


2 answers


This type of issue is a known issue with sklearn

. I had the same general problem training sklearn models after updating to the latest package. For some reason, there is often not enough consistency between versions, so you can reliably defile a trained model from a previous version. When you originally marinated the trained classifier, it serialized the function call under the hood, which itself is not serialized. So when you depickle it deserializes the call, but makes a call to a new version of that function that no longer takes the same arguments or has the same attributes (in your case_vectorizer

). You have two options: (1) reinstall the model with the newer version, or (2) install the previous version you used, not the most recent version of sklearn.



+3


source


If you used dill

instead pickle

to serialize the model sklearn

, then you will be able to restore your classifier even if there is a version change. If you use pickle

, serializing class instances only saves some relevant state, but then referencing the class definition ... so if the definition changes, you're out of luck for the old pickles. Defaultdill

will sort the class definition along with the class instance ... and that way, even if the class definition changes, you can unpack your saved instance and hopefully retrieve from the old instance. For example, you can pass state from an old classifier object to a new classifier object and jump into life with a new shiny object. The only caveat is that you have to plan ahead and use dill

for serialization in the first place - if you don't, you're out of luck.



+1


source







All Articles