High performance variable blur in very large images using Python

I have a large collection of large images (eg 15000x15000 pixels) that I would like to blur. I need to blur images using the distance function, so the further, the more I move out of certain areas of the image, the stronger the blur will be. I have a distance map describing how far a given pixel is from areas.

Due to the large amount of images I have to consider performance. I looked at NumPY / SciPY, they have great features, but they seem to use a fixed kernel size and I need to decrease or increase the kernel size depending on the distance to the previous mentioned areas.

How can I solve this problem in python?


UPDATE: My solution so far is based on rth's answer :

# cython: boundscheck=False
# cython: cdivision=True
# cython: wraparound=False

import numpy as np
cimport numpy as np

def variable_average(int [:, ::1] data, int[:,::1] kernel_size):
    cdef int width, height, i, j, ii, jj
    width = data.shape[1]
    height = data.shape[0]
    cdef double [:, ::1] data_blurred = np.empty([width, height])
    cdef double res
    cdef int sigma, weight

    for i in range(width):
        for j in range(height):
            weight = 0
            res = 0
            sigma =  kernel_size[i, j]
            for ii in range(i - sigma, i + sigma + 1):
                for jj in range(j - sigma, j + sigma + 1):
                    if ii < 0 or ii >= width or jj < 0 or jj >= height:
                        continue
                    res += data[ii, jj]
                    weight += 1
            data_blurred[i, j] = res/weight

    return data_blurred

      

Test:

data = np.random.randint(256, size=(1024,1024))
kernel = np.random.randint(256, size=(1024,1024)) + 1
result = np.asarray(variable_average(data, kernel))

      

The method using the above settings takes about 186 seconds to run. Is this what I can expect to ultimately squeeze out of the method, or are there optimizations I can use to further improve performance (while still using Python)?

+3


source to share


1 answer


As you noticed, bound functions scipy

do not support variable blur blur. You can implement this in pure python with loops and then use Cython, Numba, or PyPy to get C-like performance.

Below is a low level python implementation than using numpy to store data only,

import numpy as np

def variable_blur(data, kernel_size):
    """ Blur with a variable window size
    Parameters:
      - data: 2D ndarray of floats or integers
      - kernel_size: 2D ndarray of integers, same shape as data
    Returns:
      2D ndarray
    """
    data_blurred = np.empty(data.shape)
    Ni, Nj = data.shape
    for i in range(Ni):
        for j in range(Nj):
            res = 0.0
            weight = 0
            sigma =  kernel_size[i, j]
            for ii in range(i - sigma, i+sigma+1):
                for jj in range(j - sigma, j+sigma+1):
                    if ii<0 or ii>=Ni or jj < 0 or jj >= Nj:
                        continue
                    res += data[ii, jj]
                    weight += 1
            data_blurred[i, j] = res/weight
    return data_blurred

data = np.random.rand(50, 20)
kernel_size = 3*np.ones((50, 20), dtype=np.int)
variable_blur(data, kernel_size)

      



which calculates the arithmetic mean of pixels with a variable kernel size. This is a poor implementation in regards to numpy, in a sense, which is not vectorized. However, this makes it convenient to port to other high-performance solutions:

  • Cython : just statically typed variables, and compilation should give you C-like performance,

    def variable_blur(double [:, ::1] data, long [:, ::1] kernel_size):
         cdef double [:, ::1] data_blurred = np.empty(data.shape)
         cdef Py_ssize_t Ni, Nj
         Ni = data.shape[0]
         Nj = data.shape[1]
         for i in range(Ni):
             # [...] etc.
    
          

    see this post for complete example as well as compilation notes .

  • Numba . Wrapping the above function with a @jit

    decorator
    should mostly be sufficient.

  • PyPy : Installing PyPy + the experimental numpy branch could be another alternative worth trying. Though, then you have to use PyPy for all your code, which may not be possible at the moment.

After a quick implementation, you can use multiprocessing

etc. to process different images in parallel if necessary. Or even parallelize the outer loop with OpenMP in Cython for

.

+5


source







All Articles