Fastest way to create trie (JSON) from 4gb file using only 1GB of RAM?

Question

Fastest way to create trie (JSON) from 4gb file using only 1GB of RAM?

Perhaps I am doing it wrong:

I have 4GB (33 million lines of text) where each line has a line in it.

I am trying to create a trie -> The algorithm is working. The problem is that Node.js has a 1.4GB process memory limit, so the moment I process 5.5M rows it crashes.

To get around this, I tried the following:

Instead of 1 Trie, I create many Tries, each with an alphabet range. For example: aTrie ---> all words starting with bTrie ---> all words starting with b ... etc ...

But the problem is, I still can't keep all the objects in memory when I read the file, so every time I read a line, I load / unload the trie from disk. When there is a change, I delete the old file and write the updated trie from memory to disk.

This is SUPER SLOW! Even on my macbook pro with SSD.

I considered writing this in Java, but then there is the problem of converting JAVA objects to json (same problem using C ++, etc.).

Any suggestions?

+3

java node.js

Kaizer sozay Dec 10. 14 at 16:40

source to share

2 answers

dave · Answer 1 · 2014-12-10T16:48:35+0000

Instead of using 26 Tries, you can use a hash function to create an arbitrary number of subtypes. Thus, the amount of data you have to read from disk is limited by the size of your sub-Trie, which you define. Alternatively, you can cache recently used sub-Tries in memory and then save changes to disk asynchronously in the background if IO is still a problem.

angrylemon · Answer 2 · 2017-06-30T14:37:58+0000

You can increase the memory limit that the node process uses by specifying the parameter below;

ps: size in mb.

node --max_old_space_size=4096

See the following sections for more information: https://github.com/thlorenz/v8-flags/blob/master/flags-0.11.md

Fastest way to create trie (JSON) from 4gb file using only 1GB of RAM?

More articles: