Clojure: parallel processing using multiple computers
I have 500 directories and 1000 files (each about 3-4k lines) for each directory. I want to run the same clojure program (already written) for each of these files. I have 4 octa servers. what is a good way to propagate processes in these kernels? cascalog (hadoop + clojure)?
basically the program reads the file, uses a third party java jar to do the calculations and inserts the results into the DB
Please note that: 1. the ability to use third-party libraries / jar is required 2. no requests of any type
source to share
Since there is no "pruning" in my overall process, from what I understand it makes sense to put 125 directories on each server and then spend the rest of the time trying to speed up the process of that program. To the point where you saturate the DB, of course.
Most of the "big data" tools available (Hadoop, Storm) focus on processes that require both very powerful maps and reduction operations, possibly at several stages each. In your case, all you really need is a decent way to keep track of which assignments have passed and which have not. I am as bad as anyone (and even worse than many) at predicting development time, although in this case I would say that even the randomness of rewriting your process on one of the map shortening tools will take more time than adding a monitoring process. to keep track of which jobs were completed and which failed, so you can rerun failed ones later (preferably automatically).
source to share