Average brightness with Clojure is very slow

Being new to Clojure, I would like to compute the average luminance of (many) jpg images. To do this, I load the image into memory using ImageIO/read

from Java, fetch the byte buffer behind it, and apply the average.

(defn brightness
  "Computes the average brightness of an image."
  [^File file]
  (-> file
    ImageIO/read
    .getRaster
    .getDataBuffer
    .getData
    byteaverage))

      

Here the average value

(defn byteaverage
  [numbers]
  (/ (float
     (->> numbers
        (map bytetoint)
        (apply +)))
     (count numbers))
  )

      

be aware that bytes are Java signed and you must first convert them to large enough integers.

(defn bytetoint
   [b]
   (bit-and b 0xFF)
  )

      

While this gives the correct results, it is very slow. For 20MP images, it takes about 10-20 seconds. The disk is not a problem. From the game, the time

culprit appears to be a transformation bytetoint

. Merely mapping this bytetoint

to a byte array eats 8GB of memory and doesn't end up in the REPL.

Why is this and what can be done about it?

PS: I know it is possible to use other programming languages, libraries, multithreading, or change the algorithm. My point is that the above Clojure code should be much faster and I would like to understand why it is not.

+3


source to share


4 answers


Basically you control a lot of plumbing in a very tight loop like boxing, converting, using overly lazy sequences, etc. A lot of the benefits you get from modern cpus fly right out of the window; such as preloading cache lines, branch prediction, etc.

This type of loop (computational sum) is much better achieved in terms of a more direct form of computation, such as the clojure construct loop

, something like:

(defn get-sum [^bytes data]
  (let [m (alength data)]
    (loop [idx 0 sum 0]
      (if (< idx m)
        (recur (inc idx) (unchecked-add sum (bit-and (aget data idx) 0xff)))
        (/ sum m)))))

      

This is untested, so you might need to adapt it, but it shows a few things:



  • Using type array access
  • Using a forward loop which is very efficient
  • Using "integer" (long) math for the actual loop and division only at the end
  • Using unchecked-math which greatly improves performance in tight loops

Edit

You can also use other forms that might work even better, such dotimes

as internally mutable state (like a long vector of size 1) if you really need to squeeze out performance, but by then, you can write a small method in java;)

+2


source


in addition to @shlomi's answer:

you can also make it less verbose (and probably a little faster) with the function areduce

:



(defn get-sum-2 [^bytes data]
  (/ (areduce data i res 0 
              (unchecked-add res (bit-and (aget data i) 0xff)))
     (alength data)))

      

+1


source


If you want to do it very quickly in java you can use these options (it's best to use all of them):

  • use java wrapper for libjpeg-turbo as jpeg decompression library - it's 30x faster than ImageIO ...
  • Don't average out all pixels in the image, use 1% for 10% of the pixels are evenly spaced in the image (use some hash function to select pseudo-random pixels - or just go into a for loop for more than one pixel, depending on whether how many pixels you would like to hit) - on average calculated this way, much faster. The more pixels you use, the more accurate the results you will get, but if you use 5% of evenly spaced selected pixels, this is more than enough to get very good results.
  • Multithreading.
  • avoid floating point calculations, use integer calculations. Floating point calculations are only slower up to 3-4 times. where possible
  • Don't load all images into memory, as images often use a lot of memory, this can affect the garbage collector and your application is slow, better load them when you need them and let them be GC-ed after that - calculate the average step by step

Regarding negative byte values ​​... Don't convert the color value to byte, convert it directly to int, for example:

int rgb = somePixelColor;
int b = rgb & 0xFF;
int g = (rgb>>8) & 0xFF;
int r = (rgb>>16) & 0xFF;

int sillyBrightness = (r + g + b)/3; // because each color should have a weight for calculating brightness, there are some models of that.

      

0


source


In addition to the above useful information, you may be interested in the HipHip library for manipulating arrays of primitive values ​​from Clojure: https://github.com/plumatic/hiphip

Here's an example from the README regarding the mean and standard deviation of a primitive array:

(defn std-dev [xs]
  (let [mean (dbl/amean xs)
        square-diff-sum (dbl/asum [x xs] (Math/pow (- x mean) 2))]
    (/ square-diff-sum (dbl/alength xs))))

(defn covariance [xs ys]
  (let [ys-mean (dbl/amean ys)
        xs-mean (dbl/amean xs)
        diff-sum (dbl/asum [x xs y ys] (* (- x xs-mean) (- y ys-mean)))]
    (/ diff-sum (dec (dbl/alength xs)))))

(defn correlation [xs ys std-dev1 std-dev2]
  (/ (covariance xs ys) (* std-dev1 std-dev2)))

      

0


source







All Articles