MemoryError of Randomforest in scikit-learn

I am following the example Python

given in For Beginners - Bag of Words . However, the following code segment gives an error message like MemoryError

. What can cause this error

forest = forest.fit(train_data_features, train["sentiment"])

Traceback (most recent call last):
File "C:/Users/PycharmProjects/Project3/demo4.py", line 60, in <module>
   forest = forest.fit(train_data_features, train["sentiment"])
File "C:\Users\AppData\Roaming\Python\Python27\site-        
   packages\sklearn\ensemble\forest.py", line 195, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "C:\Users\AppData\Roaming\Python\Python27\site-
   packages\sklearn\utils\validation.py", line 341, in check_array
   array = np.array(array, dtype=dtype, order=order, copy=copy)
MemoryError

      

+3


source to share


2 answers


MemoryError

as the name says, means you're out of free memory.

If you are following the code example from here , there are a few things that can help you:



  • delte variables using del

    when you no longer need them

    (e.g. clean_train_reviews

    not needed after line 62)
  • After line 42 is only required train["sentiment"]

    , the rest train

    can be discarded to free memory
  • don't read both the training and test kits at the beginning. The kit test

    is only needed after the forest has been created, and at this moment nothing else is required to assemble the train.
  • The whole learning part can be wrapped in a function that returns a forest that will take care of all references that are no longer needed after that.
+4


source


In the above example, the word bag contains 5000 functions; this requires significant memory. Thus, one solution is to reduce the number of functions, but this may affect the performance of the model. Another solution is to switch from 32-bit Python to 64-bit.



0


source







All Articles