Defining the (initial) set of Haar Like functions

When it comes to cascading classifiers (using hay functions), I always read that techniques like AdaBoosting are used to select the "best" functions for detection. However, this only works if there is some initial set of features to start boosting.

Given a 24x24 pixel image, there are 162,336 possible hara possibilities. I may be wrong, but I don't think libraries like openCV check all these functions first.

So my question is, how are the original functions selected or how are they generated? Are there any guidelines regarding the starting number of functions?

And if initially 162,336 functions are used. How are they generated?

+3


source to share


4 answers


I assume you are familiar with the Viola / Jones label on this topic.

First, you choose the type of the function (e.g. Rectangle A). This gives you a mask with which you can train your weak classifiers. To avoid pixel-by-pixel movement and retraining (which would take a huge amount of time, not better accuracy), you can specify how far the function moves in the x and y directions on a trained weak classifier. The size of your jumps depends on the size of your data. The goal is for the mask to be able to move in and out of the detected object. The size of the function can also be variable.

Once you have prepared a few classifiers with the appropriate function (i.e. mask position), you continue learning AdaBoost and Cascade as usual.



The number of features / weak classifiers is highly dependent on your data and experimental setup (i.e. the type of classifier used). You will need to constantly test the parameters to also know which type of function works best (objects with rectangle / circle / tetris, etc.). I worked on this 2 years ago and it took us quite a while to evaluate which features and object generation heuristics gave the best results.

If you want to start something, just take 1 of the 4 original Viola / Jones functions and prepare a classifier applying it to bind to (0,0). Train the next classifier using (x, 0). Next with (2x, 0) .... (0, y), (0.2y), (0.4y), .. (x, y), (x, 2y) etc .... And let's see what happens. You will most likely see that it has less weak classifiers, i.e. You can increase the x / y step values, which determine how the mask is smoothed. You can also enlarge the mask or do other things to save time. The reason this "lazy" function creation works for AdaBoost is that while these functions make classifiers slightly better than random ones, AdaBoost will combine these classifiers into a meaningful classifier.

+2


source


From your question, I can understand that you wanted to know what 1,62,336 functions are.

From 4 original alt john functions ( http://en.wikipedia.org/wiki/Viola%E2%80%93Jones_object_detection_framework )

We can generate 1,62,336 functions by varying the size of the 4 original functions and their position on 24 * 24 input images.

For example, consider one of the original functions that has two rectangles adjacent to each other. Consider the size of each rectangle - 1 pixel. Originally if one rectangle is present on (0,0) of a 24 * 24 image then it is treated as one feature, and now if you move it horizontally one pixel (up to (1,0)) then it is considered the second function how its position changes to (1,0). So u can move it horizontally upward (22,0), generating 23 functions. Likewise, if you move along the vertical axis from (0,0) to (0,23), then u can generate 24 functions. Now if you move the image covering each position (e.g. (1,1), (1,2) ..... (22,23)) then u can generate 24 * 23 = 552 functions.

Now if we consider that each rectangle is 2 pixels wide and 1 pixel high. Initially, if one rectangle is present at (0,0) and moves along the horizontal axis to (20,0) as above, then we can have 21 characters since its height is the same if we move along the vertical axis from (0 , 0) to (0.23), we can have 24 functions. Thus, if we move to cover every position in the image, then we can have 24 * 21 = 504 functions.

So if we increase the width of each rectangle by one pixel, keeping the height of each rectangle as 1 pixel every time we cover the full image, so that its width changes from 1 pixel to 24 pixels, we don't get. functions = 24 * (23 + 21 + 19 ..... 3 + 1)

Now if we consider that each rectangle is 1 pixel wide and 2 pixels high. Initially, if one rectangle is present at (0,0) and moves along the horizontal axis to (23,0), then we can have 23 objects, since its width is 1 pixel, since its height is 2 pixels if we move along the vertical axis from (0,0) to (0,22), then we can have 23 functions. Thus, if we move to cover every position in the image, then we can have 23 * 23 = 529 functions.

Likewise, if we increase the width of each rectangle by one pixel, keeping the height of each rectangle as 2 pixels each time we cover the full image so that its width changes from 1 pixel to 24 pixels, we don't get. functions = 23 * (23 + 21 + 19 ..... 3 + 1)

Now, if we increase the height of each rectangle by 1 pixel after changing the width of each rectangle from 1 pixel to 24 pixels, until the height of each rectangle is 24 pixels, then

not. functions = 24 * (23 + 21 + 19 ..... 3 + 1) + 23 * (23 + 21 + 19 ..... 3 + 1) + 22 * ​​(23 + 21 + 19 .. ... 3 + 1) + ................. + 2 * (23 + 21 + 19 ..... 3 + 1) + 1 * (23 + 21 + 19 ..... 3 + 1)

            = 43,200 features

      

Now if we look at the original 2nd altimeter function which has two rectangles with one rectangle above the other (the rectangles are vertical), since this is similar to the original 1st alpha john function, it will also have



not. functions = 43,200

Similarly, if we follow the above process, from the 3rd original alt-Jones function, which has 3 rectangles along the horizontal direction, we get

not. of features = 24 * (22 + 19 + 16 + .... + 4 + 1) + 23 * (22 + 19 + 16 + .... + 4 + 1) + 22 * ​​(22 + 19 + 16 + .... + 4 + 1) + ................ + 2 * (22 + 19 + 16 + .... + 4 + 1) + 1 * (22 + 19 + 16 + .... + 4 + 1)

            =27,600

      

Now if we consider another function that has 3 rectangles arranged vertically (i.e. one rectangle on top of another), we get

not. of stats = 27,600 (as this is similar to the 3rd original alt john function)

Finally, if we consider the 4th original alt-jones function, which has 4 rectangles, we get

no signs = 23 * (23 + 21 + 19 + ...... 3 + 1) + 21 * (23 + 21 + 19 + ...... 3 + 1) + 19 * (23 + 21 + 19 + ...... 3 + 1) .................. + 3 * (23 + 21 + 19 + ...... 3 + 1) + 1 * (23 + 21 + 19 + ...... 3 + 1)

           = 20,736

      

Now, summing up all these functions, we get = 43,200 + 43,200 + 27,600 + 27,600 + 20,736

                                     = 1,62,336 features

      

So from above 1,62,336, the Adaboost functions pick some of them to form a strong classifier.

+5


source


I would like to point to a recent work published in February 2015 on this topic. Job:

César Cobos-May, Vcctor Uc-Cetina, Carlos Brito-Loeza and Anabel Martin-Gonzalez, Automatic Convex Set Algorithm Generate Haar-like Functions, Comput. Sci. Appl.Volume 2, Number 2, pp. 64-70, 2015.

It is presented as "the first step towards automating the design of Haar objects)."

An algorithm is proposed that should "extract" a haar-like function if the segmented image represents an object to be detected. The basic idea is to try to find the largest rectangle of an area that is entirely contained in a convex area.

First, a representative image of the object of interest is selected and processed by the K-means algorithm (with K = 2). Then the resulting segmented image is fed to an algorithm that finds the largest rectangle for each (some?) Convex regions in the segmented image to obtain a final function.

Actually, quoting the article:

Later [after segmentation] our method automatically creates a template [haar-like function] was applied to the regions of the segmented image.

So, if I'm not mistaken, there must still be some measure of subjective judgment as to which regions of the segmented image are chosen as input to the algorithm.

Note that functions obtained this way tend to be a little more complex than traditional ones.

This methodology is being tested in a specific case study against traditional characteristics with seemingly good results.

0


source


It seems to me that there is a bit of confusion here.
Even the accepted answer seems wrong to me (maybe I didn't get it well). The original Viola-Jones algorithm, major subsequent enhancements like the Lenhart-Maidt algorithm and the Opencv implementation, all evaluate each feature of the feature set in turn. You can check out the Opencv source code (and any implementation you prefer).
At the end of the void CvHaarEvaluator :: generateFeatures () function, you have numFeatures, which is only 162,336 for BASIC mode and 24x24 size.
And they are all tested in turn, when the entire feature set is provided as featureEvaluator ( source ):

bool isStageTrained = tempStage->train( (CvFeatureEvaluator*)featureEvaluator, curNumSamples,
  _precalcValBufSize, _precalcIdxBufSize, *((CvCascadeBoostParams*)stageParams) );

      

Each weak classifier is created by checking each function and choosing the one that gives the best result at that point (in the case of a decision tree, the process is similar).
After this selection, the sample weights are changed accordingly so that in the next round a different function will be selected from the entire set of functions. A single feature score is cheap to compute, but may be required when multiplied by numFeatures. The entire preparation of the cascade can take weeks, but the bottleneck is not the process of evaluating the signs, but the negative sampling in the last stages. From the wikipedia link you provided I read:

in a standard 24x24 pixel subscreen, there are a total of M = 162,336 possible functions, and it would be incredibly expensive to evaluate all of them when testing an image.

Don't be fooled, this means that after a lot of training, your detection algorithm should be very fast and it only needs to test some features (only the ones that were selected during training).

0


source







All Articles