# Matlab sum (XY) vs sum (X) - sum (Y)

`sum(x-y)`

Works a little faster on my machine for small arrays, but `sum(x)-sum(y)`

much faster for large arrays. For comparison, I am using MATLAB R2015a on a Windows 7 computer with 32GB memory.

```
n = ceil(logspace(0,4,25));
for i = 1:numel(n)
x = rand(n(i));
y = rand(n(i));
t1(i) = timeit(@()sum(x-y));
t2(i) = timeit(@()sum(x)-sum(y));
clear x y
end
figure; hold on
plot(n, t1)
plot(n, t2)
legend({'sum(x-y)', 'sum(x)-sum(y)'})
xlabel('n'); ylabel('time')
set(gca, 'XScale', 'log', 'YScale', 'log')
```

source to share

I'm all curious about you, and I decided to run some kind of benchmark. By the time I finished it seems like knedlsepp has this right as it `sum(X-Y)`

gets pretty slow for large matrices .

The crossover seems to be going around the elements `10^3`

.

```
%% // Benchmark code
nElem = (1:9).'*(10.^(1:6)) ; nElem = nElem(:) ; %'//damn pretifier
nIter = numel(nElem) ;
res = zeros(nIter,2) ;
for ii=1:nIter
X = rand(nElem(ii) ,1) ;
Y = rand(nElem(ii) ,1) ;
f1 = @() sum(X-Y) ;
f2 = @() sum(X)-sum(Y) ;
res(ii,1) = timeit( f1 ) ;
res(ii,2) = timeit( f2 ) ;
clear f1 f2 X Y
end
loglog(nElem,res,'DisplayName','nElem')
```

I ran this several times and the results on my machine are quite consistent. I blew my memory trying to go above 10 ^ 7 elements. Feel free to extend the test, but I don't think the relationship will change much.

Specifications: Windows 8.1 Pro / Matlab R2013a:

source to share

Assuming both x and y have N x M = K elements, then For the sum (x) -sum (y) you have:

K memory access to read x, K memory access to read y, one memory access to write the result → 2K + 1 memory access (Assuming the sub-sum inside the sum function will be held in the CPU register).

Sum operations 2K + subtraction 1 → 2k + 1 CPU operations.

For sum (x - y): Matlab will compute x - y and store the result and then compute the sum, so we have K memory access to read x, K memory access to read y, K memory access to write the result of the subtraction , memory access K to read subtraction again the result for the sum function, then 1 memory access to write the result of the sum → 4K + 1 memory operations.

k subtractions + k summations → 2K operations.

As we can see, the sum (x - y) consumes a lot of memory, so in a large number of elements it can consume a higher time, but I have no explanation why it is faster for a small number of elements.

source to share