How do I limit the width of the correlation window in Numpy?

I am learning numpy / scipy coming from MATLAB background. The xcorr function in Matlab has an optional "maxlag" argument that limits the range from -maxlag to maxlag. This is very useful if you are looking at cross-correlation between two very long time series, but are only interested in correlation over a specific time interval. The performance is tremendous considering that cross-correlation is incredibly expensive to compute.

In numpy / scipy there seems to be several options for calculating cross-correlation. numpy.correlate , numpy.convolve , scipy.signal.fftconvolve . If anyone wants to explain the difference between the two, I'd love to hear, but my main concern is that none of them have a maxlag function. This means that even if I only want to see correlations between two time series with delays between -100 and +100 ms, for example, it will still calculate the correlation for each lag between -20,000 and +20,000 ms (this is the length of the time series). This gives you a 200x performance improvement! Do I have to manually recode the cross-correction feature to enable this feature?

+3


source to share


6 answers


matplotlib.pyplot

provides Matlab syntax for calculating and plotting cross-correlation, automatic correlation, etc.

You can use xcorr

which allows you to define a parameter maxlags

.



    import matplotlib.pyplot as plt


    import numpy  as np


    data = np.arange(0,2*np.pi,0.01)


    y1 = np.sin(data)


    y2 = np.cos(data)


    coeff = plt.xcorr(y1,y2,maxlags=10)

    print(*coeff)


[-10  -9  -8  -7  -6  -5  -4  -3  -2  -1   0   1   2   3   4   5   6   7
   8   9  10] [ -9.81991753e-02  -8.85505028e-02  -7.88613080e-02  -6.91325329e-02
  -5.93651264e-02  -4.95600447e-02  -3.97182508e-02  -2.98407146e-02
  -1.99284126e-02  -9.98232812e-03  -3.45104289e-06   9.98555430e-03
   1.99417667e-02   2.98641953e-02   3.97518558e-02   4.96037706e-02
   5.94189688e-02   6.91964864e-02   7.89353663e-02   8.86346584e-02
   9.82934198e-02] <matplotlib.collections.LineCollection object at 0x00000000074A9E80> Line2D(_line0)

      

+4


source


Here are a couple of functions for calculating automatic and cross-correlation with limited delays. The order of multiplication (and conjugation in the complex case) was chosen according to the corresponding behavior numpy.correlate

.

import numpy as np
from numpy.lib.stride_tricks import as_strided


def _check_arg(x, xname):
    x = np.asarray(x)
    if x.ndim != 1:
        raise ValueError('%s must be one-dimensional.' % xname)
    return x

def autocorrelation(x, maxlag):
    """
    Autocorrelation with a maximum number of lags.

    `x` must be a one-dimensional numpy array.

    This computes the same result as
        numpy.correlate(x, x, mode='full')[len(x)-1:len(x)+maxlag]

    The return value has length maxlag + 1.
    """
    x = _check_arg(x, 'x')
    p = np.pad(x.conj(), maxlag, mode='constant')
    T = as_strided(p[maxlag:], shape=(maxlag+1, len(x) + maxlag),
                   strides=(-p.strides[0], p.strides[0]))
    return T.dot(p[maxlag:].conj())


def crosscorrelation(x, y, maxlag):
    """
    Cross correlation with a maximum number of lags.

    `x` and `y` must be one-dimensional numpy arrays with the same length.

    This computes the same result as
        numpy.correlate(x, y, mode='full')[len(a)-maxlag-1:len(a)+maxlag]

    The return vaue has length 2*maxlag + 1.
    """
    x = _check_arg(x, 'x')
    y = _check_arg(y, 'y')
    py = np.pad(y.conj(), 2*maxlag, mode='constant')
    T = as_strided(py[2*maxlag:], shape=(2*maxlag+1, len(y) + 2*maxlag),
                   strides=(-py.strides[0], py.strides[0]))
    px = np.pad(x, maxlag, mode='constant')
    return T.dot(px)

      

For example,



In [367]: x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])

In [368]: autocorrelation(x, 3)
Out[368]: array([ 20.5,   5. ,  -3.5,  -1. ])

In [369]: np.correlate(x, x, mode='full')[7:11]
Out[369]: array([ 20.5,   5. ,  -3.5,  -1. ])

In [370]: y = np.arange(8)

In [371]: crosscorrelation(x, y, 3)
Out[371]: array([  5. ,  23.5,  32. ,  21. ,  16. ,  12.5,   9. ])

In [372]: np.correlate(x, y, mode='full')[4:11]
Out[372]: array([  5. ,  23.5,  32. ,  21. ,  16. ,  12.5,   9. ])

      

(It would be nice to have such a function in numpy itself.)

+4


source


As long as numpy does not fulfill maxlag argument, you can use the function ucorrelate

of pycorrelate package . ucorrelate

works with numpy arrays and has a keyword maxlag

. It implements correlation using a for loop and optimizes execution speed with numba.

An example is autocorrelation with 3 time delays:

import numpy as np
import pycorrelate as pyc

x = np.array([2, 1.5, 0, 0, -1, 3, 2, -0.5])
c = pyc.ucorrelate(x, x, maxlag=3)
c

      

Result:

Out[1]: array([20,  5, -3])

      

The pycorrelate documentation contains a notepad showing the perfect match between pycorrelate.ucorrelate

and numpy.correlate

:

enter image description here

+1


source


I think I found a solution as I ran into the same problem:

If you have two vectors x

and y

any length N and cross-correlation with a fixed len window is required m

, you can do:

x = <some_data>
y = <some_data>

# Trim your variables
x_short = x[window:]
y_short = y[window:]

# do two xcorrelations, lagging x and y respectively
left_xcorr = np.correlate(x, y_short)  #defaults to 'valid'
right_xcorr = np.correlate(x_short, y) #defaults to 'valid'

# combine the xcorrelations
# note the first value of right_xcorr is the same as the last of left_xcorr
xcorr = np.concatenate(left_xcorr, right_xcorr[1:])

      

Remember that you may need to normalize the variables if you want to constrain the correlation

0


source


Here's another answer, obtained from here , seems to be faster rather than np.correlate

and has the advantage of returning a normalized correlation:

def rolling_window(self, a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

def xcorr(self, x,y):

    N=len(x)
    M=len(y)
    meany=np.mean(y)
    stdy=np.std(np.asarray(y))
    tmp=self.rolling_window(np.asarray(x),M)
    c=np.sum((y-meany)*(tmp-np.reshape(np.mean(tmp,-1),(N-M+1,1))),-1)/(M*np.std(tmp,-1)*stdy)

    return c        

      

0


source


as I answered here fooobar.com/questions/1885686 / ... matplotlib.xcorr

has a maxlags parameter. It is actually a shell numpy.correlate

, so there is no performance savings. However, it gives exactly the same result obtained by Matlab's cross-correlation function. Below I edited the code from matplotlib so that it only returns the correlation. The reason is that if we use it matplotlib.corr

as is, it will return the plot as well. The problem is that if we put complex data types in it, we get a "casting complex to real datatype" warning when matplotlib tries to draw a plot.

<!-- language: python -->

import numpy as np
import matplotlib.pyplot as plt

def xcorr(x, y, maxlags=10):
    Nx = len(x)
    if Nx != len(y):
        raise ValueError('x and y must be equal length')

    c = np.correlate(x, y, mode=2)

    if maxlags is None:
        maxlags = Nx - 1

    if maxlags >= Nx or maxlags < 1:
        raise ValueError('maxlags must be None or strictly positive < %d' % Nx)

    c = c[Nx - 1 - maxlags:Nx + maxlags]

    return c

      

0


source







All Articles