Understanding MATLAB convn behavior

I am doing convolution of some tensors.

Here's a little test in MATLAB:

    ker= rand(3,4,2);
    a= rand(5,7,2);
    c=convn(a,ker,'valid');
    c11=sum(sum(a(1:3,1:4,1).*ker(:,:,1)))+sum(sum(a(1:3,1:4,2).*ker(:,:,2)));
    c(1,1)-c11  % not equal!

      

The third row does the ND convolution with convn

and I want to compare the result of the first row, first column convn

with calculating the value manually. However, my calculation is convn

not equal to.

So what's behind MATLAB convn

? Is my understanding of tensor convolution wrong?

+3


source to share


2 answers


You almost got it right. There are two things in your understanding:

  • You have selected valid

    as the flag the convolution. This means that the result returned from the convolution has its own size, so when you use the kernel to scroll through a matrix, it should fit comfortably inside the matrix itself. Therefore, the first "valid" output that is returned is actually intended to be computed at the location of (2,2,1)

    your matrix. This means that you can conveniently place your core at this location and it matches the (1,1)

    exit position . To demonstrate this fact, that a

    and ker

    look for me, using your code:

    >> a
    
    a(:,:,1) =
    
    0.9930    0.2325    0.0059    0.2932    0.1270    0.8717    0.3560
    0.2365    0.3006    0.3657    0.6321    0.7772    0.7102    0.9298
    0.3743    0.6344    0.5339    0.0262    0.0459    0.9585    0.1488
    0.2140    0.2812    0.1620    0.8876    0.7110    0.4298    0.9400
    0.1054    0.3623    0.5974    0.0161    0.9710    0.8729    0.8327
    
    
    a(:,:,2) =
    
    0.8461    0.0077    0.5400    0.2982    0.9483    0.9275    0.8572
    0.1239    0.0848    0.5681    0.4186    0.5560    0.1984    0.0266
    0.5965    0.2255    0.2255    0.4531    0.5006    0.0521    0.9201
    0.0164    0.8751    0.5721    0.9324    0.0035    0.4068    0.6809
    0.7212    0.3636    0.6610    0.5875    0.4809    0.3724    0.9042
    
    >> ker
    
    ker(:,:,1) =
    
    0.5395    0.4849    0.0970    0.3418
    0.6263    0.9883    0.4619    0.7989
    0.0055    0.3752    0.9630    0.7988
    
    
    ker(:,:,2) =
    
    0.2082    0.4105    0.6508    0.2669
    0.4434    0.1910    0.8655    0.5021
    0.7156    0.9675    0.0252    0.0674
    
          

    As you can see, in a position (2,2,1)

    in a matrix a

    , it ker

    can fit comfortably inside a matrix, and if you remember from convolution, it is simply the sum of the elements by element between the kernel and a subset of the matrix in position (2,2,1)

    , the same as your kernel (by in fact you need to do something else for the kernel, which I will reserve for my next point). Hence, the coefficient you are calculating is actually the output value (2,2,1)

    , not (1,1,1)

    . However, from the gist of it, you already know this, but I would like to express it there if you did not know it.

  • You forget that in order to convolve the ND you need to flip the mask in every dimension . If you remember from 1D convolution, the mask needs to be flipped horizontally. What I mean flipped is that you just place the items in reverse order. For example, an array [1 2 3 4]

    would become [4 3 2 1]

    . In 2D convolution, you have to flip both horizontally and vertically. So you take each row of your matrix and put each row in reverse, similar to the 1D case. Here you will treat each line as a 1D signal and flip through. Once you do that, you will get this inverted result and process each column as a 1D signal and repeat again.

    Now, in your case for 3D, you must flip horizontally, vertically and temporarily . This means that you will need to perform 2D flip for each piece of your matrix independently, then you will capture individual columns in 3D and treat them as 1D signals. In MATLAB syntax, you get ker(1,1,:)

    , treat it as a 1D signal and then flip. Repeat this for ker(1,2,:)

    ,ker(1,3,:)

    etc. Until you finish the first cut. Keep in mind that we don't go to the second slice or to any of the other slices and repeat what we just did. Since you are taking the 3D section of your matrix, you are essentially working on all slices for each 3D column that you extract. So only look at the first slice of your matrix, and so you need to do this down to your kernel before calculating the convolution:

    ker_flipped = flipdim(flipdim(flipdim(ker, 1), 2), 3);
    
          

    flipdim

    switches to the specified axis. In our case, we do it vertically, then we take the result and do it horizontally, and then we do it again temporarily. You would use ker_flipped

    in summation instead . Please note that it doesn't matter which order you place. flipdim

    works independently of each dimension, and as long as you remember to flip all dimensions, the result will be the same.


To demonstrate what the result looks like with convn

:

c =

    4.1837    4.1843    5.1187    6.1535
    4.5262    5.3253    5.5181    5.8375
    5.1311    4.7648    5.3608    7.1241

      

Now, to figure out which c(1,1)

manually, you will need to do the calculations in the flipped kernel :

ker_flipped = flipdim(flipdim(flipdim(ker, 1), 2), 3);
c11 = sum(sum(a(1:3,1:4,1).*ker_flipped(:,:,1)))+sum(sum(a(1:3,1:4,2).*ker_flipped(:,:,2)));

      



The result of receiving:

c11 =

    4.1837

      

As you can see, this verifies what we are getting manually by a computation done in MATLAB using convn

. If you want to compare more digits of precision, use format long

and compare both:

>> format long;
>> disp(c11)

   4.183698205668000

>> disp(c(1,1))

   4.183698205668001

      

As you can see, all the numbers are the same except for the last one. This is due to numerical rounding. To be absolutely sure:

>> disp(abs(c11 - c(1,1)));

   8.881784197001252e-16

      

... I think the difference is in order, or 10 -16 is good enough to show that they are equal, right?

+4


source


Yes, your understanding of convolution is wrong. Your formula for c11 is not a convolution: you just multiply the corresponding indices and then sum. It is more of a point operation (on tensors cropped to the same size). I will try to explain starting from 1 dimension.

1-dimensional arrays

Input conv([4 5 6], [2 3])

returns [8 22 27 18]

. I find it easiest to think of this in terms of multiplying polynomials:

(4 + 5x + 6x ^ 2) * (2 + 3x) = 8 + 22x + 27x ^ 2 + 18x ^ 3

Use the entries of each array as polynomial coefficients, multiply polynomials, collect as members, and read the result from the coefficients. The x forces are here to keep track of what is multiplied and added. Note that the coefficient at x ^ n is in the (n + 1) th notation, since powers of x start at 0 and indices start at 1.

2-dimensional arrays

Input conv2([2 3; 3 1], [4 5 6; 0 -1 1])

returns a matrix

 8  22  27  18
12  17  22   9
 0  -3   2   1

      

Again, this can be interpreted as multiplying polynomials, but now we need two variables, say x and y. The coefficient at x ^ ny ^ m is in the (m + 1, n + 1) input. The above output means that



(2 + 3x + 3y + xy) * (4 + 5x + 6x ^ 2 + 0y-xy + x ^ 2y) = 8 + 22x + 27x ^ 2 + 18x ^ 3 + 12y + 17xy + 22x ^ 2y + 9x ^ 3y-3xy ^ 2 + 2x ^ 2y ^ 2 + x ^ 3y ^ 2

3-dimensional arrays

The same story. You can think of records as coefficients of a polynomial in variables x, y, z. The polynomials are multiplied and the coefficients of the product are the result of the convolution.

"valid" parameter

This preserves only the central part of the convolution: those coefficients in which the all terms of the second factor participate . For this to be non-empty, the second array must be no larger than the first. (This is unlike the default setting, for which the order of the collapsed arrays does not matter.) Example:

conv([4 5 6], [2 3])

returns [22 27]

(compared to the 1D example above). This is consistent with what in

(4 + 5x + 6x ^ 2) * (2 + 3x) = 8+ 22x + 27x ^ 2 + 18x ^ 3

The bold terms received contributions from both 2 and 3x.

+3


source







All Articles