Improving the computation of moving windows in consumption and memory speed

Is it possible to get better performance (both in memory consumption and speed) in this moving window computation? I have a 1000x1000

numpy array and I take windows 16x16

through the whole array and finally apply some function to each window (discrete cosine transform in this case).

import numpy as np
from scipy.fftpack import dct
from skimage.util import view_as_windows

X = np.arange(1000*1000, dtype=np.float32).reshape(1000,1000)
window_size = 16
windows = view_as_windows(X, (window_size,window_size))
dcts = np.zeros(windows.reshape(-1,window_size, window_size).shape, dtype=np.float32)
for idx, window in enumerate(windows.reshape(-1,window_size, window_size)):
    dcts[idx, :, :] = dct(window)
dcts = dcts.reshape(windows.shape)

      

This code takes up too much memory (in the example above, the memory consumption is not that bad - it windows

uses 1Gb and dcts

also needs 1Gb) and takes 25 seconds. I'm a little unsure what I am doing wrong because it should be a simple calculation (like filtering an image.) Is there a better way to do this?

UPDATE:

I was initially worried that the arrays issued by Kington's solution and my original approach were very different, but the difference is limited by boundaries, so this is unlikely to cause major problems for most applications. The only remaining problem is that both solutions are very slow. Currently, the first solution takes 1 minute 10 seconds and the second takes 59 seconds.

UPDATE 2:

I noticed that the biggest culprits are dct

and np.mean

. Even generic_filter

works decently (8.6 seconds) using the "cythonized" version mean

with a bottleneck:

import bottleneck as bp
def func(window, shape):
    window = window.reshape(shape)
    #return np.abs(dct(dct(window, axis=1), axis=0)).mean()
    return bp.nanmean(dct(window))

result = scipy.ndimage.generic_filter(X, func, (16, 16),
                                      extra_arguments=([16, 16],))

      

I am currently reading how to wrap C code using numpy to replace scipy.fftpack.dct

. If anyone knows how to do this, I would appreciate some help.

+3


source to share


2 answers


Since it scipy.fftpack.dct

computes separate transformations along the last axis of the input array, you can replace your loop like this:

windows = view_as_windows(X, (window_size,window_size))
dcts = dct(windows)
result1 = dcts.mean(axis=(2,3))

      

Now the array dcts

requires a lot of memory and windows

remains just a view in X

. And since DCT is computed with a single function call, it is much faster as well. However, due to the overlapping windows, a lot of recalculation occurs. This can be overcome by only calculating the DCT for each substring once, followed by the average of the window:

ws = window_size
row_dcts = dct(view_as_windows(X, (1, ws)))
cs = row_dcts.squeeze().sum(axis=-1).cumsum(axis=0)
result2 = np.vstack((cs[ws-1], cs[ws:]-cs[:-ws])) / ws**2

      

Although efficiency seems to be achieved, it is lost in the clarity of the code ... But basically the approach is to compute the DCT first and then take the average of the window, summing over the 2D window, and then dividing by the number of items in the window. DCTs are already calculated to move moving windows, so we take a regular amount over these windows. However, we need to take the sliding window sum across the columns to get the correct 2D window sums. To do this effectively, we use a trick cumsum

where:

sum(A[p:q])  # q-p == window_size

      

Equivalent to:



cs = cumsum(A)
cs[q-1] - cs[p-1]

      

This avoids adding the same numbers over and over again. Unfortunately, this does not work for the first window (at p == 0

), so for this we only have to take cs[q-1]

and sum it together with other window sums. Finally, we divide by the number of elements to achieve the average of the 2D window.

If you like doing 2D DCT, this second approach becomes less interesting because you need a full array 985 x 985 x 16 x 16

before you can take the average.


Both approaches above should be equivalent, but it might be a good idea to do arithmetic with 64-bit floats:

np.allclose(result1, result2, atol=1e-6)
# False
np.allclose(result1, result2, atol=1e-5)
# True

      

+3


source


skimage.util.view_as_windows

uses walking tricks to create an array of overlapping "windows" that do not use additional memory.

However, when you create a new array of the shape, it will take ~ 32 times (16 x 16) of the memory that your original array X

or array is using windows

.

Based on your comment, your end result is doing dcts.reshape(windows.shape).mean(axis=2).mean(axis=2)

- taking the average dct

for each window.

So it would be more memory efficient (at least similar performance) to take the average inside the loop rather than storing a huge intermediate array of windows:



import numpy as np
from scipy.fftpack import dct
from skimage.util import view_as_windows

X = np.arange(1000*1000, dtype=np.float32).reshape(1000,1000)
window_size = 16
windows = view_as_windows(X, (window_size, window_size))
dcts = np.zeros(windows.shape[:2], dtype=np.float32).ravel()
for idx, window in enumerate(windows.reshape(-1, window_size, window_size)):
    dcts[idx] = dct(window).mean()
dcts = dcts.reshape(windows.shape[:2])

      


Another option is scipy.ndimage.generic_filter

. This won't add much to performance (the bottleneck is the python function call in the inner loop), but you will have a lot more boundary condition parameters and it will be quite efficient:

import numpy as np
from scipy.fftpack import dct
import scipy.ndimage

X = np.arange(1000*1000, dtype=np.float32).reshape(1000,1000)

def func(window, shape):
    window = window.reshape(shape)
    return dct(window).mean()

result = scipy.ndimage.generic_filter(X, func, (16, 16),
                                      extra_arguments=([16, 16],))

      

+3


source







All Articles