Skip to main content
Log in

Online learning and detection of faces with low human supervision

  • Original Article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

We present an efficient, online, and interactive approach for computing a classifier, called Wild Lady Ferns (WiLFs), for face learning and detection using small human supervision. More precisely, on the one hand, WiLFs combine online boosting and extremely randomized trees (random ferns) to compute progressively an efficient and discriminative classifier. On the other hand, WiLFs use an interactive human–machine approach that combines two complementary learning strategies to reduce considerably the degree of human supervision during learning. While the first strategy corresponds to query-by-boosting active learning, that requests human assistance over difficult samples in function of the classifier confidence, the second strategy refers to a memory-based learning which uses \(\kappa \) exemplar-based nearest neighbors (\(\kappa \text {ENN}\)) to assist automatically the classifier. A pretrained convolutional neural network is used to perform \(\kappa \text {ENN}\) with high-level feature descriptors. The proposed approach is therefore fast (WilFs run in 1 FPS using a code not fully optimized), accurate (we obtain detection rates over \(82\%\) in complex datasets), and labor-saving (human assistance percentages of less than \(20\%\)). As a by-product, we demonstrate that WiLFs also perform semiautomatic annotation during learning, as while the classifier is being computed, WiLFs are discovering faces instances in input images which are used subsequently for training online the classifier. The advantages of our approach are demonstrated in synthetic and publicly available databases, showing comparable detection rates as offline approaches that require larger amounts of handmade training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

Notes

  1. https://www.mturk.com/mturk/welcome.

  2. https://www.flickr.com/.

  3. The term drifting refers to the deterioration of the classifier over time because it is updated with noisy and misclassified samples.

  4. http://www.hardyferns.org/.

  5. In this learning methodology, the training samples are individually fed to the classifier without seeing them again.

  6. http://www.vlfeat.org/matconvnet/pretrained/.

  7. The indicator function \(\mathbb {I}(e)=1\) if e is true, and 0 otherwise.

  8. We use these terms interchangeably to express that the system discovers new face instances on images.

  9. Albeit the classifier is able to predict their class labels.

  10. BEP is the point in the curve where precision=recall.

  11. The squared Hellinger distance for two distributions P and Q is defined as: \(H^2(P,Q) = 1 -\sqrt{k_1/k_2}\exp (-0.25k_3/k_2)\), with \(k_1 = 2 \sigma _P \sigma _Q\), \(k_2=\sigma _P^2 + \sigma _Q^2\), and \(k_3 =(\mu _P - \mu _Q)^2\).

  12. http://people.csail.mit.edu/torralba/shortCourseRLOC/boosting/boosting.html.

  13. Training times refer to the times spent on computing the classifier using all training samples, whereas run times are the times spent on testing the classifier on a test sample.

  14. We assume knowing the true class labels of all samples (ground truth labels) for evaluation purposes.

  15. For these 2D experiments, we assume that the human labels correspond to the ground truth labels of the class distributions.

  16. Note that although the classifier is computed with both kinds of samples, the annotation cost uniquely corresponds to human labels, since machine labels are automatically processed.

  17. http://www.vlfeat.org/matconvnet/pretrained/.

  18. https://github.com/mahyarnajibi/SSH.

References

  1. Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: International Conference on Machine Learning, pp. 1–9 (1998)

  2. Ali, K., Saenko, K.: Confidence-rated multiple instance boosting for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)

  3. Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011)

    Article  Google Scholar 

  4. Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., Fua, P.: Brief: computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1281–1298 (2012)

    Article  Google Scholar 

  5. Cheng, Y., Chen, Z., Liu, L., Wang, J., Agrawal, A., Choudhary, A.: Feedback-driven multiclass active learning for data streams. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 1311–1320. ACM (2013)

  6. Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends® Comput. Graph. Vis. 7(2–3), 81–227 (2012)

    MATH  Google Scholar 

  7. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)

  8. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009)

  9. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  10. Ferrer, G., Garrell, A., Villamizar, M., Huerta, I., Sanfeliu, A.: Robot interactive learning through human assistance. In: Multimodal Interaction in Image and Video Applications, pp. 185–203. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35932-3_11

  11. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28(2), 337–407 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  12. Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2188–2202 (2011)

    Article  Google Scholar 

  13. Genki face dataset. http://mplab.ucsd.edu, The MPLab GENKI Database, GENKI-4K Subset

  14. Godec, M., Roth, P.M., Bischof, H.: Hough-based tracking of non-rigid objects. Comput. Vis. Image Underst. 117(10), 1245–1256 (2013)

    Article  Google Scholar 

  15. Grabner, H., Bischof, H.: On-line boosting and vision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 260–267 (2006)

  16. Grabner, H., Grabner, M., Bischof, H.: Real-time tracking via on–line boosting. In: British Machine Vision Conference (2006)

  17. Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: European Conference on Computer Vision, pp. 234–247 (2008)

  18. Hare, S., Saffari, A., Torr, P.H.: Efficient online structured output learning for keypoint-based object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 15, pp. 1894–1901 (2012)

  19. Hu, P., Ramanan, D.: Finding tiny faces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1522–1530. IEEE (2017)

  20. Huang, GB., Ramesh, M., Berg, T., Learned-Miller E: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October (2007)

  21. Jain, V., Learned-Miller, E.: Online domain adaptation of a pre-trained cascade of classifiers. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 577–584. IEEE (2011)

  22. Jain, V, Learned-Miller, E: Fddb: a benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst (2010)

  23. Kalal, Z., Matas, J., Mikolajczyk, K.: P-N learning: bootstrapping binary classifiers by structural constraints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–56 (2010)

  24. Kim, T.K., Cipolla, R.: Mcboost: multiple classifier boosting for perceptual co-clustering of images and visual features. In: Neural Information Processing Systems, pp. 841–848 (2009)

  25. Kim, T.K., Woodley, T., Stenger, B., Cipolla, R.: Online multiple classifier boosting for object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, vol. 11, pp. 1–6 (2010)

  26. Köstinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Robust face detection by simple means. In: DAGM 2012 CVAW workshop (2012)

  27. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc, New York (2012)

    Google Scholar 

  28. Krupka, E., Vinnikov, A., Klein, B., Hillel, A.B., Freedman, D., Stachniak, S.: Discriminative ferns ensemble for hand pose recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3670–3677 (2014)

  29. Kumar, V., Namboodiri, A., Jawahar, C.V.: Visual phrases for exemplar face detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1994–2002 (2015)

  30. Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12 (1994)

  31. Li, H., Lin, Z., Brandt, J., Shen, X., Hua, G.: Efficient boosted exemplar-based face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1843–1850 (2014)

  32. Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015)

  33. Li, Y., Zhang, X.: Improving k nearest neighbor with exemplar generalization for imbalanced classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 321–332. Springer (2011)

  34. Liu, B., Wu, H., Su, W., Zhang, W., Sun, J.: Rotation-invariant object detection using sector-ring hog and boosted random ferns. Vis. Comput. 34(5), 707–719 (2018)

    Article  Google Scholar 

  35. Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: International Conference on Computer Vision, pp. 89–96 (2011)

  36. Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: European Conference on Computer Vision, pp. 720–735. Springer (2014)

  37. Murphy, K.P.: Machine learning: a probabilistic perspective. The MIT Press (2012). ISBN 0262018020, 9780262018029

  38. Najibi, M., Samangouei, P., Chellappa, R., Davis, L.: SSH: Single stage headless face detector. In: The IEEE International Conference on Computer Vision (ICCV) (2017)

  39. Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 448–461 (2010)

    Article  Google Scholar 

  40. Park, J.-K., Kang, D.-J.: Unified convolutional neural network for direct facial keypoints detection. Vis Comput (2018). https://doi.org/10.1007/s00371-018-1561-3

  41. Quan, W., Chen, J.X., Yu, N.: Robust object tracking using enhanced random ferns. Vis. Comput. 30(4), 351–358 (2014)

    Article  Google Scholar 

  42. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)

  43. Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1), 157–173 (2008)

    Article  Google Scholar 

  44. Santner, J., Leistner, C., Saffari, A., Pock, T., Bischof, H.: Prost: parallel robust online simple tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 723–730 (2010)

  45. Segui, S., Drozdzal, M., Radeva, P., Vitria, J.: An integrated approach to contextual face detection. In: ICPRAM (2012)

  46. Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–56), 11 (2010). 2010

    Google Scholar 

  47. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the ACM Workshop on Computational Learning Theory, pp. 287–294 (1992)

  48. Sharma, P., Nevatia, R.: Multi class boosted random ferns for adapting a generic object detector to a specific video. In: IEEE Winter Conference on Applications of Computer Vision, pp. 745–752 (2014)

  49. Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing visual features for multiclass and multiview object detection. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 854–869 (2007)

    Article  Google Scholar 

  50. Villamizar, M., Andrade-Cetto, J., Sanfeliu, A., Moreno-Noguer, F.: Boosted random ferns for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 272–288 (2018). https://doi.org/10.1109/TPAMI.2017.2676778

    Article  Google Scholar 

  51. Villamizar, M., Garrell, A., Sanfeliu, A., Moreno-Noguer, F.: Online human-assisted learning using random ferns. In: International Conference on Pattern Recognition, pp. 2821–2824 (2012)

  52. Villamizar, M., Grabner, H., Andrade-Cetto, J., Sanfeliu, A., Van Gool, L., Moreno-Noguer, F.: Efficient 3d object detection using multiple pose-specific classifiers. In: British Machine Vision Conference (2011)

  53. Villamizar, M., Sanfeliu, A. , Moreno-Noguer, F.: Fast online learning and detection of natural landmarks for autonomous aerial robots. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4996–5003. IEEE (2014)

  54. Villamizar, M., Andrade-Cetto, J., Sanfeliu, A., Moreno-Noguer, F.: Bootstrapping boosted random ferns for discriminative and efficient object classification. Pattern Recognit. 45(9), 3141–3153 (2012)

    Article  Google Scholar 

  55. Villamizar, M., Garrell, A., Sanfeliu, A., Moreno-Noguer, F.: Interactive multiple object learning with scanty human supervision. Comput. Vis. Image Underst. 149, 51–64 (2016)

    Article  Google Scholar 

  56. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. l–511 (2001)

  57. Yao, A., Gall, J., Leistner, C., van Gool, L.: Interactive object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3242–3249 (2012)

  58. Zeisl, B., Leistner, C., Saffari, A., Bischof, H.: On-line semi-supervised multiple-instance boosting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1879–1879 (2010)

  59. Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886. IEEE (2012)

Download references

Acknowledgements

This work is partially supported by the Spanish Ministry of Economy and Competitiveness under projects HuMoUR TIN2017-90086-R, ColRobTransp DPI2016-78957 and María de Maeztu Seal of Excellence MDM- 2016-0656.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Villamizar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Villamizar, M., Sanfeliu, A. & Moreno-Noguer, F. Online learning and detection of faces with low human supervision. Vis Comput 35, 349–370 (2019). https://doi.org/10.1007/s00371-018-01617-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-018-01617-y

Navigation