Clojure: Performing computations on multiple futures does not speed up my program
I am just getting started in Clojure and have written the following code to estimate pi using monte carlo simulation. I basically want to create X streams, each one counting the number of random points that fall within the circle unit and returning them. My main thread then sums them all up and calculates Pi.
However, running all samples in one thread is faster than splitting the computation across multiple threads via futures. Why?
(defn sumFutures [workerFutures acc i] (if (= -1 i) acc (recur workerFutures (+ acc @(nth workerFutures i)) (dec i)))) (defn getResults [workerFutures numSamples] (* 4.0 (/ (sumFutures workerFutures 0 (dec (count workerFutures))) numSamples))) (defn isInCircle  (let [x (rand) y (rand)] (if (<= (+ (* x x) (* y y)) 1) 1 0))) (defn countInCircle [remaining acc] (if (zero? remaining) acc (recur (dec remaining) (+ acc (isInCircle))))) (defn getWorker [samplesPerWorker] (future (countInCircle samplesPerWorker 0))) (defn addWorker [workers samplesPerWorker] (conj workers (getWorker samplesPerWorker))) (defn getWorkers [workers samplesPerWorker remWorkers] (if (not (zero? remWorkers)) (recur (addWorker workers samplesPerWorker) samplesPerWorker (dec remWorkers)) (doall workers))) (defn main [numSamples numWorkers] (getResults (getWorkers  (quot numSamples numWorkers) numWorkers) numSamples)) ;; Run all in 1 thread (main 1000000 1) ;; Split among 100 futures (at least 8 threads) ;; SLOWER (main 1000000 100)
A few other notes based on the debug results:
The correct number of futures is generated
Each future calculates the correct number of simulations
This works on multiple threads and processor cores
source to share
I think this will be easier to work with if you have idiomatic code.
effectively overrides Clojure's definition
using recursion directly, rather than Clojures's optimized implementation
. Here's an alternative simpler (and most likely quicker) definition:
(defn sum-futures [workers] (apply + (map deref workers)))
it's much easier to read now - and I'll catch the spot where rational separation is happening - if we don't need rational solutions, making one operand double will save a lot of work.
(defn get-results [ ] (* 4.0 (/ (sum-futures workers) (double num-samples))))
can be more clearly represented using Clojure
(defn count-in-circle [ ] (apply + (repeatedly n isInCircle)))
again does primitive-recursive work instead of using Clojure abstractions. If we use
, we can eliminate definitions
without diminishing clarity, modularity, or efficiency (in fact, in such a case where you don't need an index search and the result is consumed sequentially, the lazy seq version should perform better than the vector. This is also now cantidat for refactoring along with
for a more efficient converter based version).
(defn get-workers [ ] (repeatedly num-workers
And finally it
(defn main [get-results (get-workers (quot num-samples num-workers) num-workers) num-workers))] (
source to share
What happens if you drop the number of futures in the second run to 2 (and up to 8)? Perhaps the time spent managing flows and futures is greater than the time saved by dividing the work.
In general, it is counterproductive to spawn more threads than the hardware can support (since threads will have to share cores, and there is a cost to that). If you are not getting the speed gain with only 2 threads (versus 1), there might be something wrong somewhere else.
source to share