Summing dense and sparse vectors

I need to sum dense and sparse vectors in Matlab and a naive way to do it, that is:

w = rand(1e7,1);
d = sprand(1e7,1,.001);
tic
for k = 1 : 100
    w = w + d;
end
toc

      

takes about 3.5 seconds, which is about 20 times slower than I expect Matlab to implement behind the hood:

for k = 1 : 100
    ind = find(d);
    w(ind) = w(ind) + d(ind);
end

      

(of course the timing of this faster version depends on sparsity).

So why doesn't Matlab do it the "quick way"? My experience with Matlab so far is that it is well used in resolution.

More importantly, are there other "sparse" operations that I should suspect as ineffective?

+3


source to share


1 answer


I don't know exactly the answer, but I will give you my guess as to what is going on. I don't know Fortran, but from a C ++ perspective, what you show makes sense, we just need to parse this statement.

A pseudocode translation a = b + c

where it is a,b

filled and c

is sparse would look like a.operator= ( b.operator+ (c) )

.

In all likelihood, full matrix containers in Matlab should have specialized arithmetic operators for working with sparse inputs, that is, something like full full::operator+ ( const sparse& )

. The main point to note here is that the result of the mixed full / sparse arithmetic operation must be filled. Therefore, we will need to create a new full container to store the result, although there are some updated values. [ Note: the returned full container is temporary, so an assignment a.operator= ( ... )

can avoid a full additional copy, for example with full& full::operator= ( full&& )

. ]



Unfortunately, there is no way to return a new full container because there are no arithmetic compound operations (i.e. operator +=

) in Matlab . This is why Matlab cannot use the fact that in your example is the a

same with b

(try timing with a loop instead x = w + d

, no runtime difference) and this is where the overhead comes from IMO. [ Note: even if there is no explicit assignment, for example a b+c;

shared response variable is assigned ans

. ]

Interestingly, there is a noticeable difference between full full::operator+ ( const sparse& )

and full sparse::operator+ ( const full& )

, which is between a = b + c

and a = c + b

; I can't say any more why this is, but the latter seems to be faster.

In any case, my short answer is "because Matlab has no arithmetic compound operators", which is unfortunate. If you know you are going to be doing these operations a lot, it shouldn't be too hard to implement custom optimized versions like the ones you suggested. Hope this helps!

+1


source







All Articles