Python floating point error
When manually calculating the variance of the float list, I used "decrease", but found it to be a little bit from what I expected (given numpy.var). I then recalculated it using a list comprehension and got exactly the value I expected.
sumSqrdReduce = reduce((lambda total, val: total+(val - mean)**2), lst)
sumSqrdComprehension = sum([(val-mean)**2 for val in lst])
A list of examples demonstrating this problem:
lst = [0.53839998, 4.36650467, 3.64258786, 3.62987329, -0.33371547, 10.16436997, 3.11141481, 4.62991016, 0.72292498, -2.9477603, 4.0144724, 7.14428721, -3.05925725, 4.83175576, 5.55112354, 5.03295696, -2.40226829, 1.87662003, -1.02187228, 5.25553533, 1.54985611, 2.71460086, 0.83336707, -3.3935002, 3.88551682, -2.47155389, 1.76985117, 3.57110149, -5.17191153, 4.80879124, -0.97037815, 0.99500531, -0.22062183, 9.96261967, 3.31320864, 0.39606156, -2.71492665, 0.31085669, -1.82838686, 0.38113291, 2.7265862, 6.46300302, 3.11995554, 0.15073258, 12.03547416, 4.82310128, 2.43649615, 3.2195886, 2.84891094, 9.75191341]
With the above list (average value = 2.4008863134):
sumSqrdReduce = 671.241430925
sumSqrdComprehension = 674.171886287
Am I abbreviating correctly? Or is it a general "floating point accumulation error", and if so, why don't these two methods replicate the same floating point inaccuracies? I would expect any "truth" inconsistencies to be the same with every method and hopefully not so radically different.
source to share
You are really doing the contraction incorrectly You take the first element lst
as starting total
, without dropping the square of the deviation from the mean.
Part of (val - mean)**2
yours is reduce
more conceptually appropriate for map
if you really want to use traditional functional programming features:
reduce(lambda x, y: x+y, map(lambda x: (x-mean)**2, lst))
Or you can give an initial accumulator value of 0.0:
reduce((lambda total, val: total+(val - mean)**2), lst, 0.0)
Since you tagged your NumPy question, here's how you would do it for an array of NumPy arr
values, if for some reason you wanted to avoid the inline numpy.var
:
numpy.sum((arr-mean)**2)
source to share