Mallet: java.lang.OutOfMemoryError with 1024 GB Memory Allocation
I am trying to use Mallet to run a theme simulation on a 1GB text file with 11403956 lines. From the mallet directory I am cd
up to bin
and update the memory requirement to 1024GB:
set MALLET_MEMORY=1024G
Then I will try to run the command:
bin/mallet import-file --input combined_bios.txt --output dh_size.mallet --keep-sequence --remove-stopwords
However, this causes a memory error:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at gnu.trove.TObjectIntHashMap.rehash(TObjectIntHashMap.java:170)
at gnu.trove.THash.postInsertHook(THash.java:359)
at gnu.trove.TObjectIntHashMap.put(TObjectIntHashMap.java:155)
at cc.mallet.types.Alphabet.lookupIndex(Alphabet.java:115)
at cc.mallet.types.Alphabet.lookupIndex(Alphabet.java:123)
at cc.mallet.types.FeatureSequence.add(FeatureSequence.java:131)
at cc.mallet.pipe.TokenSequence2FeatureSequence.pipe(TokenSequence2FeatureSequence.java:44)
at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:294)
at cc.mallet.pipe.Pipe$SimplePipeInstanceIterator.next(Pipe.java:282)
at cc.mallet.types.InstanceList.addThruPipe(InstanceList.java:267)
at cc.mallet.classify.tui.Csv2Vectors.main(Csv2Vectors.java:290)
Is there a workaround for situations like this? Any help others can offer would be greatly appreciated!
source to share
If you are on Linux or OS X I think you might be changing the wrong variable. The one you are changing is in bin / mallet.bat, but you want to change it in the executable in bin / mallet (i.e. without the .bat file extension):
MEMORY=1g
This is also described in the Big Data Problems section of this Mallet tutorial:
http://programminghistorian.org/lessons/topic-modeling-and-mallet
source to share