Don't get what "spatial weights" are for the HOG

Question

Don't get what "spatial weights" are for the HOG

I am using HOG for sunflower detection. I understand most of what HOG is doing now, but there are some things that I don't understand in the last stages. (I am reviewing MATLAB code from Mathworks).

Let's assume we are using the Dalal-Triggs implementation. (That is, 8x8 pixels make up 1 cell, 2x2 cells make up 1 block, blocks are taken with 50% overlap in both directions, and finally we quantized the histograms into 9 bins, unsigned (0 to 180 degrees)). Finally, our image is 64x128 pixels here.

Let's say we are on the first block. This block has 4 cells. I understand that we will weigh the orientation of each of the orientations by their magnitude. I also understand that we will weigh them even more, Gaussian focused on the block.

So far so good.

However, in the MATLAB implementation, they have an extra step whereby they create "spatial" weights:

enter image description here

If we dive into this function, it looks like this:

enter image description here

Finally, the computeLowerHistBin function looks like this:

function [x1, b1] = computeLowerHistBin(x, binWidth)
% Bin index
width    = single(binWidth);
invWidth = 1./width;
bin      = floor(x.*invWidth - 0.5);

% Bin center x1
x1 = width * (bin + 0.5);

% add 2 to get to 1-based indexing
b1 = int32(bin + 2);
end

Now I believe these "spatial" weights are used in the trilinear interpolation part later ... but what I don't get is just how exactly they are calculated, or the logic behind this code. I completely lost this issue.

Note. I understand the need for 3-line interpolation and (I think) how it works. I do not understand why we need these "spatial weights" and what is the logic behind their calculations here.

Thank.

+3

image-processing matlab computer-vision feature-extraction matlab-cvst

Spacey 13 oct. 14 at 16:34

source to share

2 answers

The idea here is that each pixel contributes to some degree not only to its own histogram cell, but also to its neighboring cell. These contributions are weighted differently, depending on how close the pixel is to the edge of the cell. The closer you are to the edge of your cell, the more you contribute to the corresponding neighboring cell and the less you contribute to your own cell.

+3

Dima 13 oct. 14 at 18:44

source to share

bpatel · Accepted Answer · 2014-10-16T15:10:59+0000

This code precomputes the spatial weights for trilinear interpolation. Take a look at the equation here for trilinear interpolation:

HOG Trilinear interpolation of histograms

Here you see things like (x-x1) / bx, (y-y1) / by, (1 - (x-x1) / bx), etc. In the code, wx1 and wy1 correspond to:

wx1 = (1 - (x-x1)/bx)
wy1 = (1 - (y-y1)/by)

Here x1 and y1 are the centers of the histogram bins for the X and Y directions. It's easier to describe these things in 1D. Thus, in 1D, the value of x will be between the two bit centers x1 <= x <x2. It doesn't matter which bin (1 or 2) it belongs to. It is important to find out the share of x belonging to x1, the rest belongs to x2. Using the distance from x to x1 and dividing by the width of the basket gives a percentage distance. 1 minus, that is the fraction that belongs to bin 1. So, if x == x1, wx1 is equal to 1. And if x == x2, wx1 is equal to zero, because x2 - x1 == bx (bin width).

Going back to the code that creates the 4 matrices, it just pre-calculates all the weight multiplications needed to interpolate all the pixels in the HOG block. This is why it is a matrix of weights: each element in the matrix if for one of the pixels in the HOG block.

For example, if you look at the equation for wieghts for h (x1, y2, ~), you will see these 2 weights for x and y (ignoring the z component).

(1 - (x-x1)/bx) * ((y-y1)/by)

Coming back to the code, this multiplication is precomputed for each pixel in the block using:

weights.x1y2 = (1-wy1)' * wx1;

Where

(1-wy1) == (y - y1)/by

The same applies to other weighting matrices.

As for the code in "computeLowerHistBin", it just finds x1 in the trilinear interpolation equation, where x1 <= x <x2 (same for y1). There are probably many ways to solve this problem, given the location of the pixel x and the bin width bx, if you satisfy the condition x1 <= x <x2.

For example, "|" indicate the edges of the bin. "o" are bin centers.

-20             0              20               40
 |------o-------|-------o-------|-------o-------|
       -10              10              30

if x = [2 9 11], the bottom center of the bin x1 is [-10 -10 10].

Don't get what "spatial weights" are for the HOG

More articles: