Clojure: Performing computations on multiple futures does not speed up my program

I am just getting started in Clojure and have written the following code to estimate pi using monte carlo simulation. I basically want to create X streams, each one counting the number of random points that fall within the circle unit and returning them. My main thread then sums them all up and calculates Pi.

However, running all samples in one thread is faster than splitting the computation across multiple threads via futures. Why?

(defn sumFutures [workerFutures acc i]
  (if (= -1 i)
    acc
    (recur workerFutures (+ acc @(nth workerFutures i)) (dec i))))

(defn getResults [workerFutures numSamples]
  (* 4.0 (/ (sumFutures workerFutures 0 (dec (count workerFutures))) numSamples)))

(defn isInCircle []
  (let [x (rand) y (rand)]
    (if (<= (+ (* x x) (* y y)) 1)
      1
      0)))

(defn countInCircle [remaining acc]
  (if (zero? remaining)
    acc
    (recur (dec remaining) (+ acc (isInCircle)))))

(defn getWorker [samplesPerWorker]
  (future
    (countInCircle samplesPerWorker 0)))

(defn addWorker [workers samplesPerWorker]
  (conj workers (getWorker samplesPerWorker)))

(defn getWorkers [workers samplesPerWorker remWorkers]
  (if (not (zero? remWorkers))
    (recur (addWorker workers samplesPerWorker) samplesPerWorker (dec remWorkers))
    (doall workers)))

(defn main [numSamples numWorkers]
  (getResults (getWorkers [] (quot numSamples numWorkers) numWorkers) numSamples))

;; Run all in 1 thread
(main 1000000 1)

;; Split among 100 futures (at least 8 threads)
;; SLOWER 
(main 1000000 100)

      

A few other notes based on the debug results:

  • The correct number of futures is generated

  • Each future calculates the correct number of simulations

  • This works on multiple threads and processor cores

+3


source to share


2 answers


I think this will be easier to work with if you have idiomatic code.

sumFutures

effectively overrides Clojure's definition +

using recursion directly, rather than Clojures's optimized implementation reduce

. Here's an alternative simpler (and most likely quicker) definition:

(defn sum-futures
  [workers]
  (apply + (map deref workers)))

      

getResults

it's much easier to read now - and I'll catch the spot where rational separation is happening - if we don't need rational solutions, making one operand double will save a lot of work.

(defn get-results
  [workers num-samples]
  (* 4.0 (/ (sum-futures workers)
            (double num-samples))))

      

countInCircle

can be more clearly represented using Clojure +

.



(defn count-in-circle
  [n]
  (apply + (repeatedly n isInCircle)))

      

getWorkers

again does primitive-recursive work instead of using Clojure abstractions. If we use repeatedly

, we can eliminate definitions addWorker

and getWorker

without diminishing clarity, modularity, or efficiency (in fact, in such a case where you don't need an index search and the result is consumed sequentially, the lazy seq version should perform better than the vector. This is also now cantidat for refactoring along with sum-futures

for a more efficient converter based version).

(defn get-workers
  [num-workers samples-per-worker]
  (repeatedly num-workers
              #(future (count-in-circle samples-per-worker))

      

And finally it main

will become:

(defn main
  [num-samples num-workers]
  (get-results (get-workers (quot num-samples num-workers)
                            num-workers)
               num-workers))

      

+4


source


What happens if you drop the number of futures in the second run to 2 (and up to 8)? Perhaps the time spent managing flows and futures is greater than the time saved by dividing the work.



In general, it is counterproductive to spawn more threads than the hardware can support (since threads will have to share cores, and there is a cost to that). If you are not getting the speed gain with only 2 threads (versus 1), there might be something wrong somewhere else.

0


source







All Articles