Average brightness with Clojure is very slow
Being new to Clojure, I would like to compute the average luminance of (many) jpg images. To do this, I load the image into memory using ImageIO/read
from Java, fetch the byte buffer behind it, and apply the average.
(defn brightness
"Computes the average brightness of an image."
[^File file]
(-> file
ImageIO/read
.getRaster
.getDataBuffer
.getData
byteaverage))
Here the average value
(defn byteaverage
[numbers]
(/ (float
(->> numbers
(map bytetoint)
(apply +)))
(count numbers))
)
be aware that bytes are Java signed and you must first convert them to large enough integers.
(defn bytetoint
[b]
(bit-and b 0xFF)
)
While this gives the correct results, it is very slow. For 20MP images, it takes about 10-20 seconds. The disk is not a problem. From the game, the time
culprit appears to be a transformation bytetoint
. Merely mapping this bytetoint
to a byte array eats 8GB of memory and doesn't end up in the REPL.
Why is this and what can be done about it?
PS: I know it is possible to use other programming languages, libraries, multithreading, or change the algorithm. My point is that the above Clojure code should be much faster and I would like to understand why it is not.
source to share
Basically you control a lot of plumbing in a very tight loop like boxing, converting, using overly lazy sequences, etc. A lot of the benefits you get from modern cpus fly right out of the window; such as preloading cache lines, branch prediction, etc.
This type of loop (computational sum) is much better achieved in terms of a more direct form of computation, such as the clojure construct loop
, something like:
(defn get-sum [^bytes data]
(let [m (alength data)]
(loop [idx 0 sum 0]
(if (< idx m)
(recur (inc idx) (unchecked-add sum (bit-and (aget data idx) 0xff)))
(/ sum m)))))
This is untested, so you might need to adapt it, but it shows a few things:
- Using type array access
- Using a forward loop which is very efficient
- Using "integer" (long) math for the actual loop and division only at the end
- Using unchecked-math which greatly improves performance in tight loops
Edit
You can also use other forms that might work even better, such dotimes
as internally mutable state (like a long vector of size 1) if you really need to squeeze out performance, but by then, you can write a small method in java;)
source to share
If you want to do it very quickly in java you can use these options (it's best to use all of them):
- use java wrapper for libjpeg-turbo as jpeg decompression library - it's 30x faster than ImageIO ...
- Don't average out all pixels in the image, use 1% for 10% of the pixels are evenly spaced in the image (use some hash function to select pseudo-random pixels - or just go into a for loop for more than one pixel, depending on whether how many pixels you would like to hit) - on average calculated this way, much faster. The more pixels you use, the more accurate the results you will get, but if you use 5% of evenly spaced selected pixels, this is more than enough to get very good results.
- Multithreading.
- avoid floating point calculations, use integer calculations. Floating point calculations are only slower up to 3-4 times. where possible
- Don't load all images into memory, as images often use a lot of memory, this can affect the garbage collector and your application is slow, better load them when you need them and let them be GC-ed after that - calculate the average step by step
Regarding negative byte values ββ... Don't convert the color value to byte, convert it directly to int, for example:
int rgb = somePixelColor;
int b = rgb & 0xFF;
int g = (rgb>>8) & 0xFF;
int r = (rgb>>16) & 0xFF;
int sillyBrightness = (r + g + b)/3; // because each color should have a weight for calculating brightness, there are some models of that.
source to share
In addition to the above useful information, you may be interested in the HipHip library for manipulating arrays of primitive values ββfrom Clojure: https://github.com/plumatic/hiphip
Here's an example from the README regarding the mean and standard deviation of a primitive array:
(defn std-dev [xs]
(let [mean (dbl/amean xs)
square-diff-sum (dbl/asum [x xs] (Math/pow (- x mean) 2))]
(/ square-diff-sum (dbl/alength xs))))
(defn covariance [xs ys]
(let [ys-mean (dbl/amean ys)
xs-mean (dbl/amean xs)
diff-sum (dbl/asum [x xs y ys] (* (- x xs-mean) (- y ys-mean)))]
(/ diff-sum (dec (dbl/alength xs)))))
(defn correlation [xs ys std-dev1 std-dev2]
(/ (covariance xs ys) (* std-dev1 std-dev2)))
source to share