MATLAB: Fast correlation calculation for all indices in 2 vectors

I have 2 vectors A and B, each of which is 10000. For each one, ind=1:10000

I want to calculate the Pearson correlation A(1:ind)

and B(1:ind)

. When I do this in a for loop, it takes too long. parfor doesn't work with more than two workers on my machine. Is there a way to quickly perform this operation and store the results in a vector C (apparently 10000 long, where the first element is NaN)? I found a question on Fast Moving Correlation in Matlab , but this is slightly different from what I need.

+3


source to share


2 answers


This method can be used to calculate the cumulative correlation coefficient:



function result = cumcor(x,y)
    n = reshape(1:numel(x),size(x));
    sumx = cumsum(x);
    sumy = cumsum(y);
    sumx2 = cumsum(x.^2);
    sumy2 = cumsum(y.^2);
    sumxy = cumsum(x.*y);
    result = (n.*sumxy-sumx.*sumy)./(sqrt((sumx.^2-n.*sumx2).*(sumy.^2-n.*sumy2)));
end

      

+2


source


Decision

I suggest the following approach:

  • Pearson's correlation can be calculated using the following formula : enter image description here

  • calculating the accumulated mean for each of the random variables above is effectively really easy (X, Y, XY, X ^ 2, Y ^ 2).

  • given the cumulative average computed at 2, we can compute the cumulative std X and Y.

  • given the accumulated std X, Y and the accumulated average above, we can calculate the accumulated pearson coefficient.

code

%defines inputs
N = 10000;
X = rand(N,1);
Y = rand(N,1);

%calculates accumolative mean for X, Y, X^2, Y^2, XY
EX = accumMean(X);
EY = accumMean(Y);
EX2 = accumMean(X.^2);
EY2 = accumMean(Y.^2);
EXY = accumMean(X.*Y);

%calculates accumolative pearson correlation
accumPearson = zeros(N,1);
for ii=2:N
    stdX = (EX2(ii)-EX(ii)^2).^0.5;
    stdY = (EY2(ii)-EY(ii)^2).^0.5;
    accumPearson(ii) = (EXY(ii)-EX(ii)*EY(ii))/(stdX*stdY);
end

%accumulative mean function, to be defined in an additional m file.
function [ accumMean ] = accumMean( vec )
accumMean = zeros(size(vec));
accumMean(1) = vec(1);
for ii=2:length(vec)
   accumMean(ii) = (accumMean(ii-1)*(ii-1) +vec(ii))/ii;
end

end

      

Runtime

for N = 10000:



Elapsed time is 0.002096 seconds.

      

for N = 1,000,000:

Elapsed time is 0.240669 seconds.

      

Correctness

Validation of the above code can be done by calculating the sum of the Pearson coefficient from the corr function and comparing it with the result given in the above code:

%ground truth for correctness comparison
gt = zeros(N,1)
for z=1:N
    gt(z) = corr(X(1:z),Y(1:z));
end

      

Unfortunately I don't have a set of statistics and machine learning, so I cannot perform this check. I really think this is a good start and you can continue here :)

+1


source







All Articles