MATLAB: Finding a Faster / More Elegant Way to Calculate Moving Median Absolute Deviation for Very Long Time Series

I have a very large array with five channels and approximately 6 million records (5 x 6,000,000). My goal is to traverse the array using a 7-point window and remove "bursts", which are defined as some scaled amount greater than the mean absolute deviation (MAD).

I am testing the code running only 10000 start points of the time series. It currently takes me about 3 seconds to launch the first 10,000 points. I am using a relatively old 32-bit Dell laptop with a 2.30GHz processor and 4GB of RAM. Obviously, if I was using a newer computer, I could complete the task very quickly. For example, my more powerful desktop does the same task in 0.7 seconds. However, I need to run the code on a laptop and cannot afford to wait 35-40 minutes every time I need to run the code. I'm looking for help finding inefficiencies and places where I can make the code faster.

Below is the code. Any suggestions on how to improve the speed are appreciated. I noticed that the calculation for "MAD" is the most time consuming (it takes about 1.9 seconds or more than half of the total time).

load('data.mat') % data (approx 5 channels x 6000000 data points (int32))
nscans = length(data); %number of data points in each channel

nwide = 7; %number of data points in the window

% Rejection parameters (not so important for the question)
iscale = 50; %scale factor for MAD
minmad = 2;
mincrit = [100 100 100 500 500];

nfixed=zeros(1,5);
L = floor(nwide/2); %half of window (odd window length only)

%Padding for beginning and end of data
data = [repmat(data(:,1),[1 L]) data repmat(data(:,end),[1 L])];

nfixed = zeros(1,5); %initialize counter
tic 
for n = L+1:10000
    idata = data(:,n-L:n+L)'; % temporary window

    % compute median of window
    med=median(idata);

    %compute median absolute deviation (MAD)
    % Note: mad = median(abs(X - median(X)))
    mad = median(abs(idata-repmat(median(idata),[nwide 1])));
    mad = max([mad;minmad*ones(1,5)]); %minmad threshold added

    %compute rejection threshold
    icrit=max([iscale*mad;mincrit]);

    for i = 1:5 %loop over channels
        if abs(data(i,n)-med(i)) > icrit(i) %if threshold is exceeded
            data(i,n)=med(i); %then replace with median value
            nfixed(i)=nfixed(i)+1; %count number of replacements
        end
    end

end
toc

data = data(:,L+1:end-L)'; %remove padding

      

I feel like there is probably a more elegant way to execute the "repmat" command.

Any ideas are appreciated.

Greetings

+3


source to share


2 answers


Any suggestions on how to improve the speed are greatly appreciated.

You can tighten up your code a bit without repeating the second call median(idata)

.

Change this:

mad = median(abs(idata-repmat(median(idata),[nwide 1])));

:

mad = median(abs(idata-repmat(med,[nwide 1])));



Alternatively, you can get more mileage from MATLAB's mad function , which originated before 2006. You will need to change your variable names.

For example, you can change your code:

mad = median(abs(idata-repmat(median(idata),[nwide 1])));
mad = max([mad;minmad*ones(1,5)]); %minmad threshold added

      

to

madV = max(mad(idata);[2 2 2 2 2]);

      

I just put vector 2 there since nothing was showing in the code minmad

.

+2


source


Can you use movmedian

to get the moving median?

med = movmedian(data,7,2,'Endpoints','discard');

      

Try to avoid loops if at all possible.



For example:

data = randn([5,1E6]);tic;
med = movmedian(data,7,2); %moving median
dev = abs(data-med); %deviation from median
thres = median(dev,2); %threshold
rep = dev>thres; %points to replace
data(rep) = med(rep); %replace data with median
toc
>>Elapsed time is 0.285828 seconds.
memory
>>Memory used by MATLAB:      1629 MB (1.708e+09 bytes)

      

0


source







All Articles