Boosting with OpenCV

facedetect.cpp

A boosting based object (e.g. faces, persons, whatever) detector comes with OpenCV.

In OpenCV 2.4.9 the corresponding sample program is called facedetect.cpp, but you can detect arbitrary objects with it. It can be found in \\samples\\c\\.

This program demonstrates the cascade recognizer. Now you can use Haar or LBP features.
This classifier can recognize many kinds of rigid objects, once the appropriate classifier is trained.
It's most known use is for faces.
Usage:
./facedetect [--cascade=<cascade_path> this is the primary trained classifier such as frontal face]
   [--nested-cascade[=nested_cascade_path this an optional secondary classifier such as eyes]]
   [--scale=<image scale greater or equal to 1, try 1.3 for example>]
   [--try-flip]
   [filename|camera_index]

example run of facedetect.cpp

Example call:

facedetect
 --scale=1.0
 --cascade=W:\20_src\90_thirdparty_libs\opencv_trunk\data\haarcascades\haarcascade_frontalface_alt.xml
 W:\01_data\07_testimgs\person_detection\CIMG0389.JPG

Example result:

Debugging one run of facedetect.cpp

Taking the stairway down to the core of the boosting detector

Have a look at the detectAndDraw() method in facedetect.cpp:

void detectAndDraw( Mat& img, CascadeClassifier& cascade,
                    CascadeClassifier& nestedCascade,
                    double scale, bool tryflip )
{
    vector<Rect> faces, faces2;
    ...
    Mat gray, smallImg( cvRound (img.rows/scale), cvRound(img.cols/scale), CV_8UC1 ); // [1]
    cvtColor( img, gray, CV_BGR2GRAY );
    resize( gray, smallImg, smallImg.size(), 0, 0, INTER_LINEAR );
    equalizeHist( smallImg, smallImg );
 
    cascade.detectMultiScale( smallImg, faces,
        1.1, 2, 0
        //|CV_HAAR_FIND_BIGGEST_OBJECT
        //|CV_HAAR_DO_ROUGH_SEARCH
        |CV_HAAR_SCALE_IMAGE
        ,
        Size(30, 30) );
    ...
}

[1] Note that the scale factor scale means a down-scaling if scale is > 1.0 since we divide the nr of rows/columns by scale!

All detection stuff is done in CascadeClassifier::detectMultiScale()

void CascadeClassifier::detectMultiScale( const Mat& image, vector<Rect>& objects,
                                          double scaleFactor, int minNeighbors,
                                          int flags, Size minObjectSize, Size maxObjectSize)
{
    vector<int> fakeLevels;
    vector<double> fakeWeights;
    detectMultiScale( image, objects, fakeLevels, fakeWeights, scaleFactor,
        minNeighbors, flags, minObjectSize, maxObjectSize, false );
}

… which calls another overloaded version of CascadeClassifier::detectMultiScale():

void CascadeClassifier::detectMultiScale( const Mat& image, vector<Rect>& objects,
                                          vector<int>& rejectLevels,
                                          vector<double>& levelWeights,
                                          double scaleFactor, int minNeighbors,
                                          int flags, Size minObjectSize, Size maxObjectSize,
                                          bool outputRejectLevels )
{
    const double GROUP_EPS = 0.2;
 
    CV_Assert( scaleFactor > 1 && image.depth() == CV_8U );
 
    if( empty() )
        return;
 
    if( isOldFormatCascade() )
    {
        MemStorage storage(cvCreateMemStorage(0));
        CvMat _image = image;
        CvSeq* _objects = cvHaarDetectObjectsForROC( &_image, oldCascade, storage, rejectLevels, levelWeights, scaleFactor,
                                              minNeighbors, flags, minObjectSize, maxObjectSize, outputRejectLevels );
        vector<CvAvgComp> vecAvgComp;
        Seq<CvAvgComp>(_objects).copyTo(vecAvgComp);
        objects.resize(vecAvgComp.size());
        std::transform(vecAvgComp.begin(), vecAvgComp.end(), objects.begin(), getRect());
        return;
    }

It now depends on whether the cascade is in old or new format. We assume the old format and continue here with cvHaarDetectObjectsForROC which can be found in haar.cpp:

Image pyramid

CvSeq*
cvHaarDetectObjectsForROC( const CvArr* _img,
                     CvHaarClassifierCascade* cascade, CvMemStorage* storage,
                     std::vector<int>& rejectLevels, std::vector<double>& levelWeights,
                     double scaleFactor, int minNeighbors, int flags,
                     CvSize minSize, CvSize maxSize, bool outputRejectLevels )
{
...
     for( factor = 1; ; factor *= scaleFactor )
        {
            CvSize winSize = { cvRound(winSize0.width*factor),
                                cvRound(winSize0.height*factor) };
            CvSize sz = { cvRound( img->cols/factor ), cvRound( img->rows/factor ) };
            CvSize sz1 = { sz.width - winSize0.width + 1, sz.height - winSize0.height + 1 };
 
            CvRect equRect = { icv_object_win_border, icv_object_win_border,
                winSize0.width - icv_object_win_border*2,
                winSize0.height - icv_object_win_border*2 };
 
            CvMat img1, sum1, sqsum1, norm1, tilted1, mask1;
            CvMat* _tilted = 0;
 
            if( sz1.width <= 0 || sz1.height <= 0 )
                break; // Jürgen: exit image pyramid, if sliding window size (winSize0) is larger than re-scaled image (sz)
            if( winSize.width > maxSize.width || winSize.height > maxSize.height )
                break;
            if( winSize.width < minSize.width || winSize.height < minSize.height )
                continue;
 
            img1 = cvMat( sz.height, sz.width, CV_8UC1, imgSmall->data.ptr );
            sum1 = cvMat( sz.height+1, sz.width+1, CV_32SC1, sum->data.ptr );
            sqsum1 = cvMat( sz.height+1, sz.width+1, CV_64FC1, sqsum->data.ptr );
            if( tilted )
            {
                tilted1 = cvMat( sz.height+1, sz.width+1, CV_32SC1, tilted->data.ptr );
                _tilted = &tilted1;
            }
            norm1 = cvMat( sz1.height, sz1.width, CV_32FC1, normImg ? normImg->data.ptr : 0 );
            mask1 = cvMat( sz1.height, sz1.width, CV_8UC1, temp->data.ptr );
 
            cvResize( img, &img1, CV_INTER_LINEAR );
            cvIntegral( &img1, &sum1, &sqsum1, _tilted );
 
            int ystep = factor > 2 ? 1 : 2;
            const int LOCS_PER_THREAD = 1000;
            int stripCount = ((sz1.width/ystep)*(sz1.height + ystep-1)/ystep + LOCS_PER_THREAD/2)/LOCS_PER_THREAD;
            stripCount = std::min(std::max(stripCount, 1), 100);
            ...
            cv::parallel_for(cv::BlockedRange(0, stripCount),
                         cv::HaarDetectObjects_ScaleImage_Invoker(cascade,
                                (((sz1.height + stripCount - 1)/stripCount + ystep-1)/ystep)*ystep,
                                factor, cv::Mat(&sum1), cv::Mat(&sqsum1), &_norm1, &_mask1,
                                cv::Rect(equRect), allCandidates, rejectLevels, levelWeights, outputRejectLevels));
}

[1] Note that the scaleFactor here is fixed set to 1.1 in the call from detectAndDraw and is different from the image rescale factor used in the code line:

smallImg( cvRound (img.rows/scale), cvRound(img.cols/scale), CV_8UC1 ); // [1]

So what happens here?

We compute

  • re-scaled versions of the original image img with re-size factor factor
  • store the re-scaled version in img1
  • compute the corresponding integral image sum1
  • and then start a parallel-detection

The call of cv::HaarDetectObjects_ScaleImage_Invoker then finally brings us in objdetect.hpp to:

Sliding Window

void operator()( const BlockedRange& range ) const
{
ystep = factor > 2 ? 1 : 2; // Jürgen: sliding window step size is set to 1 or 2 depending on image down-scale factor
...
for( y = y1; y < y2; y += ystep )
    for( x = 0; x < ssz.width; x += ystep )
    {
        double gypWeight;
        int result = cvRunHaarClassifierCascadeSum( cascade, cvPoint(x,y), gypWeight, 0 );
        if( rejectLevels )
        {
            if( result == 1 )
                result = -1*cascade->count;
            if( cascade->count + result < 4 )
            {
                vec->push_back(Rect(cvRound(x*factor), cvRound(y*factor),
                                winSize.width, winSize.height));
                rejectLevels->push_back(-result);
                levelWeights->push_back(gypWeight);
            }
        }
        else
        {
            if( result > 0 )
                vec->push_back(Rect(cvRound(x*factor), cvRound(y*factor),
                                winSize.width, winSize.height));
        }
    }
...
}

in haar.cpp where the actual classification per window is done using cvRunHaarClassifierCascadeSum:

At the core of the boosting detector

Here (in haar.cpp) we can find how OpenCV decides for a single sliding window whether there is a face (object) or not:

static int
cvRunHaarClassifierCascadeSum( const CvHaarClassifierCascade* _cascade,
                               CvPoint pt, double& stage_sum, int start_stage )
{
  ...
  else if( cascade->isStumpBased )
  {
  ...
  for( i = start_stage; i < cascade->count; i++ )
            {
                stage_sum = 0.0;
                if( cascade->stage_classifier[i].two_rects )
                {
                    for( j = 0; j < cascade->stage_classifier[i].count; j++ )
                    {
                        CvHidHaarClassifier* classifier = cascade->stage_classifier[i].classifier + j;
                        CvHidHaarTreeNode* node = classifier->node;
                        double t = node->threshold*variance_norm_factor; // Jürgen: feature normalization here!
                        double sum = calc_sum(node->feature.rect[0],p_offset) * node->feature.rect[0].weight; // Jürgen: rectangle #1
                        sum += calc_sum(node->feature.rect[1],p_offset) * node->feature.rect[1].weight; // Jürgen: rectangle #2
                        stage_sum += classifier->alpha[sum >= t];
                    }
                }
                else
                {
                    for( j = 0; j < cascade->stage_classifier[i].count; j++ )
                    {
                        CvHidHaarClassifier* classifier = cascade->stage_classifier[i].classifier + j;
                        CvHidHaarTreeNode* node = classifier->node;
                        double t = node->threshold*variance_norm_factor; // Jürgen: feature normalization here!
                        double sum = calc_sum(node->feature.rect[0],p_offset) * node->feature.rect[0].weight; // Jürgen: rectangle #1
                        sum += calc_sum(node->feature.rect[1],p_offset) * node->feature.rect[1].weight; // Jürgen: rectangle #2
                        if( node->feature.rect[2].p0 )
                            sum += calc_sum(node->feature.rect[2],p_offset) * node->feature.rect[2].weight; // Jürgen: rectangle #3
                        stage_sum += classifier->alpha[sum >= t];
                    }
                }
                if( stage_sum < cascade->stage_classifier[i].threshold )
                    return -i; // Jürgen: exit cascade! no object at sliding window location <pt>!
            }
  ...
  return 1; // Jürgen: object detected at sliding window location <pt>!
}

where calc_sum() gives us the intensity in a rectangular region using the pre-computed integral image:

#define calc_sum(rect,offset) \
    ((rect).p0[offset] - (rect).p1[offset] - (rect).p2[offset] + (rect).p3[offset])

In our case

  • the face detector is a cascade of strong classifiers
  • the face detector cascade has 22 stages (cascade→count = 22) / strong classifiers
  • each strong classifier is a sum of (a varying nr of) decision stumps
  • each decision stump (weak classifier) is a Haar feature
  • the 1st stage learned strong classifier uses 3 decision stumps (using 2 rectangle Haar features)
  • the 2nd stage learned strong classifier has 16 decision stumps (using 3 rectangle Haar features)
  • the 3rd stage learned strong classifier has 21 decision stumps (using 3 rectangle Haar features)
  • etc.

Note that at each stage of the cascade, the classifier will return with -1 if stage_sum is below the stage specific threshold.

Only if it passes all 22 stages, it will return 1, i.e. say “yes, at the current location pt there is a face!”

Normalization of features

Instead of normalizing the image as a whole, each individual Haar-feature is normalized by its variance by adapting its decision threshold:

double t = node->threshold*variance_norm_factor;

where variance_norm_factor is computed previously in cvRunHaarClassifierCascadeSum() as well:

    p_offset = pt.y * (cascade->sum.step/sizeof(sumtype)) + pt.x;
    pq_offset = pt.y * (cascade->sqsum.step/sizeof(sqsumtype)) + pt.x;
    mean = calc_sum(*cascade,p_offset)*cascade->inv_window_area;
    variance_norm_factor = cascade->pq0[pq_offset] - cascade->pq1[pq_offset] -
                           cascade->pq2[pq_offset] + cascade->pq3[pq_offset];
    variance_norm_factor = variance_norm_factor*cascade->inv_window_area - mean*mean;
    if( variance_norm_factor >= 0. )
        variance_norm_factor = sqrt(variance_norm_factor);
    else
        variance_norm_factor = 1.;

Note: the variance is the average squared distance to the mean.

Fusion of multiple detections (groupRectangles)

Grouping of similar rectangles starts after detections on all scales has been done in haar.cpp in method cvHaarDetectObjectsForROC():

 ...
 if( minNeighbors != 0 || findBiggestObject )
    {
        if( outputRejectLevels )
        {
            groupRectangles(rectList, rejectLevels, levelWeights, minNeighbors, GROUP_EPS );
        }
        else
        {
            groupRectangles(rectList, rweights, std::max(minNeighbors, 1), GROUP_EPS); // here!
        }
    }
 ...

We continue in cascadedetect.cpp:

void groupRectangles(vector<Rect>& rectList, vector<int>& weights, int groupThreshold, double eps)
{
    groupRectangles(rectList, groupThreshold, eps, &weights, 0);
}

Here is the method groupRectangles():

The function is a wrapper for the generic function partition() . It clusters all the input rectangles using the rectangle equivalence criteria that combines rectangles with similar sizes and similar locations. The similarity is defined by eps. When eps=0 , no clustering is done at all. If \texttt{eps}\rightarrow +\inf , all the rectangles are put in one cluster. Then, the small clusters containing less than or equal to groupThreshold rectangles are rejected. In each other cluster, the average rectangle is computed and put into the output rectangle list.

see OpenCV docu about groupRectangles() function

void groupRectangles(vector<Rect>& rectList, int groupThreshold, double eps, vector<int>* weights, vector<double>* levelWeights)
{
    if( groupThreshold <= 0 || rectList.empty() )
    {
        if( weights )
        {
            size_t i, sz = rectList.size();
            weights->resize(sz);
            for( i = 0; i < sz; i++ )
                (*weights)[i] = 1;
        }
        return;
    }
 
    vector<int> labels;
 
    //
    // Jürgen: most important step is done here!
    //
    int nclasses = partition(rectList, labels, SimilarRects(eps)); 
 
    vector<Rect> rrects(nclasses);
    vector<int> rweights(nclasses, 0);
    vector<int> rejectLevels(nclasses, 0);
    vector<double> rejectWeights(nclasses, DBL_MIN);
    int i, j, nlabels = (int)labels.size();
 
    //
    // Jürgen: compute average location (x,y) and (width,height)
    //         for all rectangles that are assigned to the same class / label
    //         i.e. are considered equivalent concerning the SimilarRects()
    //         equivalence predicate (part I)
    //
    for( i = 0; i < nlabels; i++ )
    {
        int cls = labels[i];
        rrects[cls].x += rectList[i].x;
        rrects[cls].y += rectList[i].y;
        rrects[cls].width += rectList[i].width;
        rrects[cls].height += rectList[i].height;
 
        //
        // Jürgen: rweights[<labelNr>] counts how many detections are assigned to the same
        //         final rectangle
        //         --> could be used as confidence value!
        //
        rweights[cls]++; 
    }
    if ( levelWeights && weights && !weights->empty() && !levelWeights->empty() )
    {
        for( i = 0; i < nlabels; i++ )
        {
            int cls = labels[i];
            if( (*weights)[i] > rejectLevels[cls] )
            {
                rejectLevels[cls] = (*weights)[i];
                rejectWeights[cls] = (*levelWeights)[i];
            }
            else if( ( (*weights)[i] == rejectLevels[cls] ) && ( (*levelWeights)[i] > rejectWeights[cls] ) )
                rejectWeights[cls] = (*levelWeights)[i];
        }
    }
 
 
    // 
    // Jürgen: part II of computing average rectangles
    //         we have to divide by the nr of equivalent rectangles for each class
    //         (rweights[<labelNr>] counter)
    for( i = 0; i < nclasses; i++ )
    {
        Rect r = rrects[i];
        float s = 1.f/rweights[i];
        rrects[i] = Rect(saturate_cast<int>(r.x*s),
             saturate_cast<int>(r.y*s),
             saturate_cast<int>(r.width*s),
             saturate_cast<int>(r.height*s));
    }
 
    // 
    // Jürgen: clear rectList - the final rectangle list!
    //
    rectList.clear();
    if( weights )
        weights->clear();
    if( levelWeights )
        levelWeights->clear();
 
    //
    // Jürgen: if there are not at least groupThreshold
    //         raw detections that lead to the final
    //         detection rectangle rrects[i], we discard it!
    for( i = 0; i < nclasses; i++ )
    {
        Rect r1 = rrects[i];
        int n1 = levelWeights ? rejectLevels[i] : rweights[i];
        double w1 = rejectWeights[i];
        if( n1 <= groupThreshold )
            continue;
        // filter out small face rectangles inside large rectangles
        for( j = 0; j < nclasses; j++ )
        {
            int n2 = rweights[j];
 
            if( j == i || n2 <= groupThreshold )
                continue;
            Rect r2 = rrects[j];
 
            int dx = saturate_cast<int>( r2.width * eps );
            int dy = saturate_cast<int>( r2.height * eps );
 
            if( i != j &&
                r1.x >= r2.x - dx &&
                r1.y >= r2.y - dy &&
                r1.x + r1.width <= r2.x + r2.width + dx &&
                r1.y + r1.height <= r2.y + r2.height + dy &&
                (n2 > std::max(3, n1) || n1 < 3) )
                break;
        }
 
        //
        // Jürgen: put rectangle r1 into final rectangle list rectList?
        //
        if( j == nclasses )
        {
            rectList.push_back(r1);
            if( weights )
                weights->push_back(n1);
            if( levelWeights )
                levelWeights->push_back(w1);
        }
    }
}

The actual grouping of rectangles is done by the method partition(). It computes a cluster label for each detection rectangle:

  • the screenshot shows the debugging where there are 14 rectangles, but only 7 clusters, i.e. the 14 raw detection rectangles will be merged to 7 final detection bounding boxes

So what happens in partition()?

Its a function located in core\operations.hpp:

// This function splits the input sequence or set into one or more equivalence classes and
// returns the vector of labels - 0-based class indexes for each element.
// predicate(a,b) returns true if the two sequence elements certainly belong to the same class.
//
// The algorithm is described in "Introduction to Algorithms"
// by Cormen, Leiserson and Rivest, the chapter "Data structures for disjoint sets"
template<typename _Tp, class _EqPredicate> int
partition( const vector<_Tp>& _vec, vector<int>& labels,
           _EqPredicate predicate=_EqPredicate())
{
    int i, j, N = (int)_vec.size();
    const _Tp* vec = &_vec[0];
 
    const int PARENT=0;
    const int RANK=1;
 
    vector<int> _nodes(N*2);
    int (*nodes)[2] = (int(*)[2])&_nodes[0];
 
    // The first O(N) pass: create N single-vertex trees
    for(i = 0; i < N; i++)
    {
        nodes[i][PARENT]=-1;
        nodes[i][RANK] = 0;
    }
 
    // The main O(N^2) pass: merge connected components
    for( i = 0; i < N; i++ )
    {
        int root = i;
 
        // find root
        while( nodes[root][PARENT] >= 0 )
            root = nodes[root][PARENT];
 
        for( j = 0; j < N; j++ )
        {
            if( i == j || !predicate(vec[i], vec[j]))
                continue;
            int root2 = j;
 
            while( nodes[root2][PARENT] >= 0 )
                root2 = nodes[root2][PARENT];
 
            if( root2 != root )
            {
                // unite both trees
                int rank = nodes[root][RANK], rank2 = nodes[root2][RANK];
                if( rank > rank2 )
                    nodes[root2][PARENT] = root;
                else
                {
                    nodes[root][PARENT] = root2;
                    nodes[root2][RANK] += rank == rank2;
                    root = root2;
                }
                assert( nodes[root][PARENT] < 0 );
 
                int k = j, parent;
 
                // compress the path from node2 to root
                while( (parent = nodes[k][PARENT]) >= 0 )
                {
                    nodes[k][PARENT] = root;
                    k = parent;
                }
 
                // compress the path from node to root
                k = i;
                while( (parent = nodes[k][PARENT]) >= 0 )
                {
                    nodes[k][PARENT] = root;
                    k = parent;
                }
            }
        }
    }
 
    // Final O(N) pass: enumerate classes
    labels.resize(N);
    int nclasses = 0;
 
    for( i = 0; i < N; i++ )
    {
        int root = i;
        while( nodes[root][PARENT] >= 0 )
            root = nodes[root][PARENT];
        // re-use the rank as the class label
        if( nodes[root][RANK] >= 0 )
            nodes[root][RANK] = ~nclasses++;
        labels[i] = ~nodes[root][RANK];
    }
 
    return nclasses;
}

The generic OpenCV function partition() implements an O(N^2) algorithm for splitting a set of N elements into one or more equivalence classes, as described here.

The equivalence predicate of partition that is used here is

SimilarRects(double _eps) : eps(_eps) {}
    inline bool operator()(const Rect& r1, const Rect& r2) const
    {
        double delta = eps*(std::min(r1.width, r2.width) + std::min(r1.height, r2.height))*0.5;
        return std::abs(r1.x - r2.x) <= delta &&        // Jürgen: similar location (x-coord)
        std::abs(r1.y - r2.y) <= delta &&               // Jürgen: similar location (y-coord)
        std::abs(r1.x + r1.width - r2.x - r2.width) <= delta && // Jürgen: similar width
        std::abs(r1.y + r1.height - r2.y - r2.height) <= delta; // Jürgen: similar height
    }

i.e. two detection rectangles are considered equivalent (the same), if:

  • they have a similar location
  • they have a similar width/height (max width/height difference that is allowed, also depends on the location x/y difference)

Conclusions

  • Haar features are normalized by the variance (in the search window?)
  • image pyramid is computed by simple image rescaling, original image is only down-scaled (no up-scaling)
  • sliding window step width is 2 (for image down scaling ⇐ 2.0) or 1 (for image down scaling factors > 2.0), see ystep
  • stage_sum or simply the nr of stages before we rejected the sliding window to be a face could perhaps be used as a confidence value for detections
  • a confidence value for the detections could be derived by counting how many similar rectangles have been grouped
 
public/boosting_with_opencv.txt · Last modified: 2012/11/06 09:40 (external edit) · []
Recent changes RSS feed Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki