Improving human detection with openCV

I am testing a sample for human detection on openCV. After running on image ( original image available here ), this is my output:

enter image description here

I'm using the person detection sample that comes bundled with openCV (slightly modified to avoid Visual Studio bugs). This is the code that gets executed:

    // opencv-sample.cpp : Defines the entry point for the console application.

#include "stdafx.h"

#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/objdetect/objdetect.hpp"
#include "opencv2/highgui/highgui.hpp"

#include <stdio.h>
#include <string.h>
#include <ctype.h>

using namespace cv;
using namespace std;

// static void help()
// {
//     printf(
//             "\nDemonstrate the use of the HoG descriptor using\n"
//             "  HOGDescriptor::hog.setSVMDetector(HOGDescriptor::getDefaultPeopleDetector());\n"
//             "Usage:\n"
//             "./peopledetect (<image_filename> | <image_list>.txt)\n\n");
// }

int main(int argc, char** argv)
    Mat img;
    FILE* f = 0;
    char _filename[1024];

    if (argc == 1)
        printf("Usage: peopledetect (<image_filename> | <image_list>.txt)\n");
        return 0;
    img = imread(argv[1]);

    if (
        strcpy_s(_filename, argv[1]);
        fopen_s(&f, argv[1], "rt");
        if (!f)
            fprintf(stderr, "ERROR: the specified file could not be loaded\n");
            return -1;

    HOGDescriptor hog;
    namedWindow("people detector", 1);

    for (;;)
        char* filename = _filename;
        if (f)
            if (!fgets(filename, (int)sizeof(_filename) - 2, f))
            //while(*filename && isspace(*filename))
            //  ++filename;
            if (filename[0] == '#')
            int l = (int)strlen(filename);
            while (l > 0 && isspace(filename[l - 1]))
            filename[l] = '\0';
            img = imread(filename);
        printf("%s:\n", filename);
        if (!

        vector<Rect> found, found_filtered;
        double t = (double)getTickCount();
        // run the detector with default parameters. to get a higher hit-rate
        // (and more false alarms, respectively), decrease the hitThreshold and
        // groupThreshold (set groupThreshold to 0 to turn off the grouping completely).
        hog.detectMultiScale(img, found, 0, Size(8, 8), Size(32, 32), 1.05, 2);
        t = (double)getTickCount() - t;
        printf("tdetection time = %gms\n", t*1000. / cv::getTickFrequency());
        size_t i, j;
        for (i = 0; i < found.size(); i++)
            Rect r = found[i];
            for (j = 0; j < found.size(); j++)
                if (j != i && (r & found[j]) == r)
            if (j == found.size())
        for (i = 0; i < found_filtered.size(); i++)
            Rect r = found_filtered[i];
            // the HOG detector returns slightly larger rectangles than the real objects.
            // so we slightly shrink the rectangles to get a nicer output.
            r.x += cvRound(r.width*0.1);
            r.width = cvRound(r.width*0.8);
            r.y += cvRound(r.height*0.07);
            r.height = cvRound(r.height*0.8);
            rectangle(img,,, cv::Scalar(0, 255, 0), 3);
        imshow("people detector", img);
        imwrite("detected_ppl.jpg", img);
        int c = waitKey(0) & 255;
        if (c == 'q' || c == 'Q' || !f)
    if (f)
    return 0;


I would like to improve on this result, where I can detect at least 9 out of 11 people in this image. How can I improve this result? Do I need to train a separate SVM? Or is there a better library I can use? Or do I need to resort to deep learning?


source to share

1 answer

enter image description here

This is an improvement I got after spending a lot of time on the sample code.

What I did - tweak some of the options in detectMultiScale

- adjust the filter to eliminate the highly overlapping rectangles.

I would say I get 9/11 hits, with one false positive and two false negatives.

Which is all very well, but this is one static image. Tweaking the parameters to work with one sample will result in an override: such that you get exactly the answer you are doing after this one example, but a poor generalization.

I highly recommend that you familiarize yourself with openCV algorithms inside out before iterating over them for the "best" libraries and "deep learning" approaches. If you do not know the strengths and weaknesses of this algorithm, you will not be able to compare with other approaches of other libraries.

This is the code I used to achieve the result. It is closely related to the peopledetect.cpp

openCV sample . You will need to make a few changes as I am using a custom image reader function which will not be relevant to you.

I've added a slider for the scaleFactor parameter so you can easily see the effect of changing it. detectMultiScale

launches the classifier window over the image in several passes of different sizes. The scaleFactor parameter, which affects the calibration steps for each pass, makes a huge difference for the output with small tuning changes. However, it is a bit pointless to tweak these parameters on a single still image, you really need to let it go to a representative set of tests from your target data to assess the suitability of this (or any other) algorithm.



All Articles