MemoryError of Randomforest in scikit-learn
I am following the example Python
given in For Beginners - Bag of Words . However, the following code segment gives an error message like MemoryError
. What can cause this error
forest = forest.fit(train_data_features, train["sentiment"])
Traceback (most recent call last):
File "C:/Users/PycharmProjects/Project3/demo4.py", line 60, in <module>
forest = forest.fit(train_data_features, train["sentiment"])
File "C:\Users\AppData\Roaming\Python\Python27\site-
packages\sklearn\ensemble\forest.py", line 195, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "C:\Users\AppData\Roaming\Python\Python27\site-
packages\sklearn\utils\validation.py", line 341, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
MemoryError
+3
user785099
source
to share
2 answers
MemoryError
as the name says, means you're out of free memory.
If you are following the code example from here , there are a few things that can help you:
- delte variables using
del
when you no longer need them
(e.g.clean_train_reviews
not needed after line 62) - After line 42 is only required
train["sentiment"]
, the resttrain
can be discarded to free memory - don't read both the training and test kits at the beginning. The kit
test
is only needed after the forest has been created, and at this moment nothing else is required to assemble the train. - The whole learning part can be wrapped in a function that returns a forest that will take care of all references that are no longer needed after that.
+4
mata
source
to share
In the above example, the word bag contains 5000 functions; this requires significant memory. Thus, one solution is to reduce the number of functions, but this may affect the performance of the model. Another solution is to switch from 32-bit Python to 64-bit.
0
Andrey Teterin
source
to share