Thursday, 14 January 2016

Viola & Jones Object Detection Algorithm well Explained with simple example



Viola & Jones Object Detection Algorithm

            Viola Jones Face Detection Algorithm [1] uses Haar features as shown in Figure 1. The steps in the algorithm are
  1. Integral Image Calculation
  2. Feature Computation
  3. Adaboost Feature Selection
  4. Classifier Cascade

            All the haar features (shown in Figure 1) in different scales are used to produce approximately 1,80,000 features. Viola Jones uses 24x24 window as the base window size to evaluate the haar features.

 Figure 1. Haar Features Used in Viola Jones [1]
1. Integral image Calculation
            Consider an image with pixel intensities as shown in Figure 1.1

5
2
5
2
3
6
3
6
5
2
5
2
3
6
3
6

Figure 1.1 Original Image

Integral Image of the above image is calculated as follows
            For each pixel, we draw a line as follows. All the pixel intensities above the line must be added to get the integral image.
5
2
3
6



 The value of first pixel remains the same. 


5
2
3
6
  


The value of first row second column value changes from 2 to 7 

5
2
3
6
                                                                           
So, in place of 6 we get 6+5+2+3 = 16
            We calculate like this for all the pixels in the image and the resultant image is called integral image.
2. Feature Computation
            A haar classifier as shown in Figure 1.2 is run on the image and we sum all the pixels under black region and subtract from sum of all pixels under white region. If the difference is above some threshold, the feature matches. This computation will be easy, if we calculate the integral image.





  Figure 1.2 Haar Classifier
5
7
12
14
8
16
24
62
13
23
36
46
16
32
48
64

Figure 1.3 Haar Classifer When Run On Integral Image
            Sum of pixels  under black region =  5+32-(7+16)=14 (same as 6+2+6 =14 in given image). Sum of pixels  under white region =  7+48-(12+32)=11 (same as 3+5+3 in given image)
3. AdaBoost
            Adaboost is used to train Strong Classifier which is linear combination of weak classifier. It also decides whether a feature is relevant or not. The steps in Adaboost are:
  1. Training set of positive and negative examples (Ex: faces and non-faces images).
  2. Initially all the positive training images are given weights equal to  and negative training images are given weights equal to .
  3. All the 1,80,000 Haar features or weak  classifiers are run on the training images
  4. A good  threshold (for ex: decision tree)  such that any image above  threshold is face and below threshold is non-face is determined.
  5. Now, Error rate is calculated as sum of weights of images misclassified by each weak classifier. Of the 1,80,000 error rates choose the weak classifier with lowest error rate.
            The chosen weak classifier is added to the strong classifier. Now, increase the weights of misclassified images and decrease the weights of correctly classified by normalizing the weights. Again repeat the  steps 3 to 5  for 1,80,000 times and all the Haar features are run on the images with updated weights and each round selects one weak classifier, which is added as linear combination to obtain final Strong Classifier. The output of weak classifier is 1 or 0 for classifying the image as face or non face.
4. Cascading Of Stages
            After all the rounds of Adaboost, we build a strong classifier which is a linear combination of selected weak classifiers (let’s say, 2,000). Instead of running all the 2,000 weak classifiers on the 24x24 window of test image, we build a cascade of classifiers. This will reduce computation cost as Stage1 immediately rejects windows that are non-faces.
Figure 1.4. Cascade of Stages to Reject Non-Face Windows Immediately [1]
To train a cascade, we must choose
  • Number of stages or Strong classifiers in cascade
  • Number of weak classifiers in strong Classifier (which is done by Adaboost)
For this we do Manual Tweaking, which is a heuristic algorithm to train the cascade
  1. Select Maximum Acceptable False Positive rate.
  2. Select Minimum Acceptable True Positive rate.
  3. Threshold for each Strong Classifier (which is decided by Adaboost)
Let the User select the Target Overall False Positive for all the stages
Until Target Overall False Positive is met
    Add new Stage
            Until Maximum Acceptable False Positive rate and Minimum Acceptable
            True Positive rate are met   
                           Keep adding weak classifiers and train Strong Classifier using Adaboost.
You can listen to the video at below link 
https://www.youtube.com/watch?v=WfdYYNamHZ8
References
[1] Viola, Paul, and Michael Jones. "Rapid object detection using a boosted cascade of simple features." Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on. Vol. 1. IEEE, 2001.

2 comments:

  1. very informative and explained clearly. I never understood viola jones technique, your post made me understand the technique clearly

    ReplyDelete