Advertisement

The Visual Computer

, Volume 35, Issue 3, pp 349–370 | Cite as

Online learning and detection of faces with low human supervision

  • Michael VillamizarEmail author
  • Alberto Sanfeliu
  • Francesc Moreno-Noguer
Original Article
  • 80 Downloads

Abstract

We present an efficient, online, and interactive approach for computing a classifier, called Wild Lady Ferns (WiLFs), for face learning and detection using small human supervision. More precisely, on the one hand, WiLFs combine online boosting and extremely randomized trees (random ferns) to compute progressively an efficient and discriminative classifier. On the other hand, WiLFs use an interactive human–machine approach that combines two complementary learning strategies to reduce considerably the degree of human supervision during learning. While the first strategy corresponds to query-by-boosting active learning, that requests human assistance over difficult samples in function of the classifier confidence, the second strategy refers to a memory-based learning which uses \(\kappa \) exemplar-based nearest neighbors (\(\kappa \text {ENN}\)) to assist automatically the classifier. A pretrained convolutional neural network is used to perform \(\kappa \text {ENN}\) with high-level feature descriptors. The proposed approach is therefore fast (WilFs run in 1 FPS using a code not fully optimized), accurate (we obtain detection rates over \(82\%\) in complex datasets), and labor-saving (human assistance percentages of less than \(20\%\)). As a by-product, we demonstrate that WiLFs also perform semiautomatic annotation during learning, as while the classifier is being computed, WiLFs are discovering faces instances in input images which are used subsequently for training online the classifier. The advantages of our approach are demonstrated in synthetic and publicly available databases, showing comparable detection rates as offline approaches that require larger amounts of handmade training data.

Notes

Acknowledgements

This work is partially supported by the Spanish Ministry of Economy and Competitiveness under projects HuMoUR TIN2017-90086-R, ColRobTransp DPI2016-78957 and María de Maeztu Seal of Excellence MDM- 2016-0656.

References

  1. 1.
    Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: International Conference on Machine Learning, pp. 1–9 (1998)Google Scholar
  2. 2.
    Ali, K., Saenko, K.: Confidence-rated multiple instance boosting for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  3. 3.
    Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011)CrossRefGoogle Scholar
  4. 4.
    Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., Fua, P.: Brief: computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1281–1298 (2012)CrossRefGoogle Scholar
  5. 5.
    Cheng, Y., Chen, Z., Liu, L., Wang, J., Agrawal, A., Choudhary, A.: Feedback-driven multiclass active learning for data streams. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 1311–1320. ACM (2013)Google Scholar
  6. 6.
    Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends® Comput. Graph. Vis. 7(2–3), 81–227 (2012)zbMATHGoogle Scholar
  7. 7.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)Google Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  9. 9.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  10. 10.
    Ferrer, G., Garrell, A., Villamizar, M., Huerta, I., Sanfeliu, A.: Robot interactive learning through human assistance. In: Multimodal Interaction in Image and Video Applications, pp. 185–203. Springer, Berlin, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-35932-3_11
  11. 11.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28(2), 337–407 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2188–2202 (2011)CrossRefGoogle Scholar
  13. 13.
    Genki face dataset. http://mplab.ucsd.edu, The MPLab GENKI Database, GENKI-4K Subset
  14. 14.
    Godec, M., Roth, P.M., Bischof, H.: Hough-based tracking of non-rigid objects. Comput. Vis. Image Underst. 117(10), 1245–1256 (2013)CrossRefGoogle Scholar
  15. 15.
    Grabner, H., Bischof, H.: On-line boosting and vision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 260–267 (2006)Google Scholar
  16. 16.
    Grabner, H., Grabner, M., Bischof, H.: Real-time tracking via on–line boosting. In: British Machine Vision Conference (2006)Google Scholar
  17. 17.
    Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: European Conference on Computer Vision, pp. 234–247 (2008)Google Scholar
  18. 18.
    Hare, S., Saffari, A., Torr, P.H.: Efficient online structured output learning for keypoint-based object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 15, pp. 1894–1901 (2012)Google Scholar
  19. 19.
    Hu, P., Ramanan, D.: Finding tiny faces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1522–1530. IEEE (2017)Google Scholar
  20. 20.
    Huang, GB., Ramesh, M., Berg, T., Learned-Miller E: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October (2007)Google Scholar
  21. 21.
    Jain, V., Learned-Miller, E.: Online domain adaptation of a pre-trained cascade of classifiers. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 577–584. IEEE (2011)Google Scholar
  22. 22.
    Jain, V, Learned-Miller, E: Fddb: a benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst (2010)Google Scholar
  23. 23.
    Kalal, Z., Matas, J., Mikolajczyk, K.: P-N learning: bootstrapping binary classifiers by structural constraints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–56 (2010)Google Scholar
  24. 24.
    Kim, T.K., Cipolla, R.: Mcboost: multiple classifier boosting for perceptual co-clustering of images and visual features. In: Neural Information Processing Systems, pp. 841–848 (2009)Google Scholar
  25. 25.
    Kim, T.K., Woodley, T., Stenger, B., Cipolla, R.: Online multiple classifier boosting for object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, vol. 11, pp. 1–6 (2010)Google Scholar
  26. 26.
    Köstinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Robust face detection by simple means. In: DAGM 2012 CVAW workshop (2012)Google Scholar
  27. 27.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc, New York (2012)Google Scholar
  28. 28.
    Krupka, E., Vinnikov, A., Klein, B., Hillel, A.B., Freedman, D., Stachniak, S.: Discriminative ferns ensemble for hand pose recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3670–3677 (2014)Google Scholar
  29. 29.
    Kumar, V., Namboodiri, A., Jawahar, C.V.: Visual phrases for exemplar face detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1994–2002 (2015)Google Scholar
  30. 30.
    Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12 (1994)Google Scholar
  31. 31.
    Li, H., Lin, Z., Brandt, J., Shen, X., Hua, G.: Efficient boosted exemplar-based face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1843–1850 (2014)Google Scholar
  32. 32.
    Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015)Google Scholar
  33. 33.
    Li, Y., Zhang, X.: Improving k nearest neighbor with exemplar generalization for imbalanced classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 321–332. Springer (2011)Google Scholar
  34. 34.
    Liu, B., Wu, H., Su, W., Zhang, W., Sun, J.: Rotation-invariant object detection using sector-ring hog and boosted random ferns. Vis. Comput. 34(5), 707–719 (2018)CrossRefGoogle Scholar
  35. 35.
    Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: International Conference on Computer Vision, pp. 89–96 (2011)Google Scholar
  36. 36.
    Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: European Conference on Computer Vision, pp. 720–735. Springer (2014)Google Scholar
  37. 37.
    Murphy, K.P.: Machine learning: a probabilistic perspective. The MIT Press (2012). ISBN 0262018020, 9780262018029Google Scholar
  38. 38.
    Najibi, M., Samangouei, P., Chellappa, R., Davis, L.: SSH: Single stage headless face detector. In: The IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  39. 39.
    Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 448–461 (2010)CrossRefGoogle Scholar
  40. 40.
    Park, J.-K., Kang, D.-J.: Unified convolutional neural network for direct facial keypoints detection. Vis Comput (2018).  https://doi.org/10.1007/s00371-018-1561-3
  41. 41.
    Quan, W., Chen, J.X., Yu, N.: Robust object tracking using enhanced random ferns. Vis. Comput. 30(4), 351–358 (2014)CrossRefGoogle Scholar
  42. 42.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)Google Scholar
  43. 43.
    Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1), 157–173 (2008)CrossRefGoogle Scholar
  44. 44.
    Santner, J., Leistner, C., Saffari, A., Pock, T., Bischof, H.: Prost: parallel robust online simple tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 723–730 (2010)Google Scholar
  45. 45.
    Segui, S., Drozdzal, M., Radeva, P., Vitria, J.: An integrated approach to contextual face detection. In: ICPRAM (2012)Google Scholar
  46. 46.
    Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–56), 11 (2010). 2010Google Scholar
  47. 47.
    Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the ACM Workshop on Computational Learning Theory, pp. 287–294 (1992)Google Scholar
  48. 48.
    Sharma, P., Nevatia, R.: Multi class boosted random ferns for adapting a generic object detector to a specific video. In: IEEE Winter Conference on Applications of Computer Vision, pp. 745–752 (2014)Google Scholar
  49. 49.
    Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing visual features for multiclass and multiview object detection. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 854–869 (2007)CrossRefGoogle Scholar
  50. 50.
    Villamizar, M., Andrade-Cetto, J., Sanfeliu, A., Moreno-Noguer, F.: Boosted random ferns for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 272–288 (2018).  https://doi.org/10.1109/TPAMI.2017.2676778 CrossRefGoogle Scholar
  51. 51.
    Villamizar, M., Garrell, A., Sanfeliu, A., Moreno-Noguer, F.: Online human-assisted learning using random ferns. In: International Conference on Pattern Recognition, pp. 2821–2824 (2012)Google Scholar
  52. 52.
    Villamizar, M., Grabner, H., Andrade-Cetto, J., Sanfeliu, A., Van Gool, L., Moreno-Noguer, F.: Efficient 3d object detection using multiple pose-specific classifiers. In: British Machine Vision Conference (2011)Google Scholar
  53. 53.
    Villamizar, M., Sanfeliu, A. , Moreno-Noguer, F.: Fast online learning and detection of natural landmarks for autonomous aerial robots. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4996–5003. IEEE (2014)Google Scholar
  54. 54.
    Villamizar, M., Andrade-Cetto, J., Sanfeliu, A., Moreno-Noguer, F.: Bootstrapping boosted random ferns for discriminative and efficient object classification. Pattern Recognit. 45(9), 3141–3153 (2012)CrossRefGoogle Scholar
  55. 55.
    Villamizar, M., Garrell, A., Sanfeliu, A., Moreno-Noguer, F.: Interactive multiple object learning with scanty human supervision. Comput. Vis. Image Underst. 149, 51–64 (2016)CrossRefGoogle Scholar
  56. 56.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. l–511 (2001)Google Scholar
  57. 57.
    Yao, A., Gall, J., Leistner, C., van Gool, L.: Interactive object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3242–3249 (2012)Google Scholar
  58. 58.
    Zeisl, B., Leistner, C., Saffari, A., Bischof, H.: On-line semi-supervised multiple-instance boosting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1879–1879 (2010)Google Scholar
  59. 59.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886. IEEE (2012)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Michael Villamizar
    • 1
    Email author
  • Alberto Sanfeliu
    • 2
  • Francesc Moreno-Noguer
    • 2
  1. 1.Idiap Research InstituteMartignySwitzerland
  2. 2.Institut de Robotica i Informatica IndustrialBarcelonaSpain

Personalised recommendations