Large Matrix Correlation

I have a matrix with 100000 columns (variables) and 100 rows (observation). I need to match (pearson) everything to everyone. I am using corrcoef as I found it much faster compared to corr. When I take a matrix of 25000 columns, the operation takes 15 seconds. However, when I increase the size to 50,000 a few minutes later, my MATLAB RAM increases to 16GB and the matlab (including windows) starts to freeze. Any suggestions? Any cleavage patent? Calculating a column by columns becomes extremely inefficient ...

Thanks for your help, Vadim

+3


source to share


1 answer


Computing the brute force of such a large array is not possible without a 64 bit version of matlab plus sufficient memory to store that large array or store the array in some other way. You can store the array offline only using what you need when you use it.

Also, if those numbers are always going to be small integers, then use uint8 or int8 or a boolean array, even one array, all of which will reduce memory requirements compared to double arrays. Better yet, if the array is sparse, then use sparse array operations.

An alternative is to use Parallel Computing Toolbox (and MATLAB Distributed Computing Server ) to work with the memory of multiple machines simultaneously. This will allow you to write:



matlabpool open <a large number>
x = distributed.zeros( 100000, 100 );

      

See also this thread for working with large matrices ...

+2


source







All Articles