MemoryError of Randomforest in scikit-learn

Question

MemoryError of Randomforest in scikit-learn

I am following the example Python

given in For Beginners - Bag of Words . However, the following code segment gives an error message like MemoryError

. What can cause this error

forest = forest.fit(train_data_features, train["sentiment"])

Traceback (most recent call last):
File "C:/Users/PycharmProjects/Project3/demo4.py", line 60, in <module>
   forest = forest.fit(train_data_features, train["sentiment"])
File "C:\Users\AppData\Roaming\Python\Python27\site-        
   packages\sklearn\ensemble\forest.py", line 195, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "C:\Users\AppData\Roaming\Python\Python27\site-
   packages\sklearn\utils\validation.py", line 341, in check_array
   array = np.array(array, dtype=dtype, order=order, copy=copy)
MemoryError

+3

python python-2.7 scikit-learn machine-learning

user785099 Apr 17 At 4:18 am

source to share

2 answers

In the above example, the word bag contains 5000 functions; this requires significant memory. Thus, one solution is to reduce the number of functions, but this may affect the performance of the model. Another solution is to switch from 32-bit Python to 64-bit.

0

Andrey Teterin June 21. 15 at 15:22

source to share

mata · Accepted Answer · 2015-04-17T10:09:36+0000

MemoryError

as the name says, means you're out of free memory.

If you are following the code example from here , there are a few things that can help you:

delte variables using del

when you no longer need them

(e.g. clean_train_reviews

not needed after line 62)
After line 42 is only required train["sentiment"]

, the rest train

can be discarded to free memory
don't read both the training and test kits at the beginning. The kit test

is only needed after the forest has been created, and at this moment nothing else is required to assemble the train.
The whole learning part can be wrapped in a function that returns a forest that will take care of all references that are no longer needed after that.

MemoryError of Randomforest in scikit-learn

More articles: