Online learning and detection of faces with low human supervision

Villamizar, Michael; Sanfeliu, Alberto; Moreno-Noguer, Francesc

doi:10.1007/s00371-018-01617-y

Online learning and detection of faces with low human supervision

Original Article
Published: 13 December 2018

Volume 35, pages 349–370, (2019)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Michael Villamizar¹,
Alberto Sanfeliu² &
Francesc Moreno-Noguer²

388 Accesses
7 Citations
Explore all metrics

Abstract

We present an efficient, online, and interactive approach for computing a classifier, called Wild Lady Ferns (WiLFs), for face learning and detection using small human supervision. More precisely, on the one hand, WiLFs combine online boosting and extremely randomized trees (random ferns) to compute progressively an efficient and discriminative classifier. On the other hand, WiLFs use an interactive human–machine approach that combines two complementary learning strategies to reduce considerably the degree of human supervision during learning. While the first strategy corresponds to query-by-boosting active learning, that requests human assistance over difficult samples in function of the classifier confidence, the second strategy refers to a memory-based learning which uses \(\kappa \) exemplar-based nearest neighbors (\(\kappa \text {ENN}\)) to assist automatically the classifier. A pretrained convolutional neural network is used to perform \(\kappa \text {ENN}\) with high-level feature descriptors. The proposed approach is therefore fast (WilFs run in 1 FPS using a code not fully optimized), accurate (we obtain detection rates over \(82\%\) in complex datasets), and labor-saving (human assistance percentages of less than \(20\%\)). As a by-product, we demonstrate that WiLFs also perform semiautomatic annotation during learning, as while the classifier is being computed, WiLFs are discovering faces instances in input images which are used subsequently for training online the classifier. The advantages of our approach are demonstrated in synthetic and publicly available databases, showing comparable detection rates as offline approaches that require larger amounts of handmade training data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervised Transformer Network for Efficient Face Detection

Annotate. Train. Evaluate. A Unified Tool for the Analysis and Visualization of Workflows in Machine Learning Applied to Object Detection

Real-Time Face Detection Using Artificial Neural Networks

Notes

https://www.mturk.com/mturk/welcome.
https://www.flickr.com/.
The term drifting refers to the deterioration of the classifier over time because it is updated with noisy and misclassified samples.
http://www.hardyferns.org/.
In this learning methodology, the training samples are individually fed to the classifier without seeing them again.
http://www.vlfeat.org/matconvnet/pretrained/.
The indicator function \(\mathbb {I}(e)=1\) if e is true, and 0 otherwise.
We use these terms interchangeably to express that the system discovers new face instances on images.
Albeit the classifier is able to predict their class labels.
BEP is the point in the curve where precision=recall.
The squared Hellinger distance for two distributions P and Q is defined as: \(H^2(P,Q) = 1 -\sqrt{k_1/k_2}\exp (-0.25k_3/k_2)\), with \(k_1 = 2 \sigma _P \sigma _Q\), \(k_2=\sigma _P^2 + \sigma _Q^2\), and \(k_3 =(\mu _P - \mu _Q)^2\).
http://people.csail.mit.edu/torralba/shortCourseRLOC/boosting/boosting.html.
Training times refer to the times spent on computing the classifier using all training samples, whereas run times are the times spent on testing the classifier on a test sample.
We assume knowing the true class labels of all samples (ground truth labels) for evaluation purposes.
For these 2D experiments, we assume that the human labels correspond to the ground truth labels of the class distributions.
Note that although the classifier is computed with both kinds of samples, the annotation cost uniquely corresponds to human labels, since machine labels are automatically processed.
http://www.vlfeat.org/matconvnet/pretrained/.
https://github.com/mahyarnajibi/SSH.

References

Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: International Conference on Machine Learning, pp. 1–9 (1998)
Ali, K., Saenko, K.: Confidence-rated multiple instance boosting for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (2014)
Babenko, B., Yang, M.H., Belongie, S.: Robust object tracking with online multiple instance learning. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1619–1632 (2011)
Article Google Scholar
Calonder, M., Lepetit, V., Ozuysal, M., Trzcinski, T., Strecha, C., Fua, P.: Brief: computing a local binary descriptor very fast. IEEE Trans. Pattern Anal. Mach. Intell. 34(7), 1281–1298 (2012)
Article Google Scholar
Cheng, Y., Chen, Z., Liu, L., Wang, J., Agrawal, A., Choudhary, A.: Feedback-driven multiclass active learning for data streams. In: Proceedings of the 22nd ACM International Conference on Conference on Information & Knowledge Management, pp. 1311–1320. ACM (2013)
Criminisi, A., Shotton, J., Konukoglu, E.: Decision forests: a unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning. Found. Trends® Comput. Graph. Vis. 7(2–3), 81–227 (2012)
MATH Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 886–893 (2005)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, pp. 248–255. IEEE (2009)
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Ferrer, G., Garrell, A., Villamizar, M., Huerta, I., Sanfeliu, A.: Robot interactive learning through human assistance. In: Multimodal Interaction in Image and Video Applications, pp. 185–203. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35932-3_11
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28(2), 337–407 (2000)
Article MathSciNet MATH Google Scholar
Gall, J., Yao, A., Razavi, N., Van Gool, L., Lempitsky, V.: Hough forests for object detection, tracking, and action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 33(11), 2188–2202 (2011)
Article Google Scholar
Genki face dataset. http://mplab.ucsd.edu, The MPLab GENKI Database, GENKI-4K Subset
Godec, M., Roth, P.M., Bischof, H.: Hough-based tracking of non-rigid objects. Comput. Vis. Image Underst. 117(10), 1245–1256 (2013)
Article Google Scholar
Grabner, H., Bischof, H.: On-line boosting and vision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 260–267 (2006)
Grabner, H., Grabner, M., Bischof, H.: Real-time tracking via on–line boosting. In: British Machine Vision Conference (2006)
Grabner, H., Leistner, C., Bischof, H.: Semi-supervised on-line boosting for robust tracking. In: European Conference on Computer Vision, pp. 234–247 (2008)
Hare, S., Saffari, A., Torr, P.H.: Efficient online structured output learning for keypoint-based object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 15, pp. 1894–1901 (2012)
Hu, P., Ramanan, D.: Finding tiny faces. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1522–1530. IEEE (2017)
Huang, GB., Ramesh, M., Berg, T., Learned-Miller E: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October (2007)
Jain, V., Learned-Miller, E.: Online domain adaptation of a pre-trained cascade of classifiers. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 577–584. IEEE (2011)
Jain, V, Learned-Miller, E: Fddb: a benchmark for face detection in unconstrained settings. Technical Report UM-CS-2010-009, University of Massachusetts, Amherst (2010)
Kalal, Z., Matas, J., Mikolajczyk, K.: P-N learning: bootstrapping binary classifiers by structural constraints. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 49–56 (2010)
Kim, T.K., Cipolla, R.: Mcboost: multiple classifier boosting for perceptual co-clustering of images and visual features. In: Neural Information Processing Systems, pp. 841–848 (2009)
Kim, T.K., Woodley, T., Stenger, B., Cipolla, R.: Online multiple classifier boosting for object tracking. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, vol. 11, pp. 1–6 (2010)
Köstinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Robust face detection by simple means. In: DAGM 2012 CVAW workshop (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates Inc, New York (2012)
Google Scholar
Krupka, E., Vinnikov, A., Klein, B., Hillel, A.B., Freedman, D., Stachniak, S.: Discriminative ferns ensemble for hand pose recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3670–3677 (2014)
Kumar, V., Namboodiri, A., Jawahar, C.V.: Visual phrases for exemplar face detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1994–2002 (2015)
Lewis, D., Gale, W.: A sequential algorithm for training text classifiers. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12 (1994)
Li, H., Lin, Z., Brandt, J., Shen, X., Hua, G.: Efficient boosted exemplar-based face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1843–1850 (2014)
Li, H., Lin, Z., Shen, X., Brandt, J., Hua, G.: A convolutional neural network cascade for face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5325–5334 (2015)
Li, Y., Zhang, X.: Improving k nearest neighbor with exemplar generalization for imbalanced classification. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 321–332. Springer (2011)
Liu, B., Wu, H., Su, W., Zhang, W., Sun, J.: Rotation-invariant object detection using sector-ring hog and boosted random ferns. Vis. Comput. 34(5), 707–719 (2018)
Article Google Scholar
Malisiewicz, T., Gupta, A., Efros, A.A.: Ensemble of exemplar-svms for object detection and beyond. In: International Conference on Computer Vision, pp. 89–96 (2011)
Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: European Conference on Computer Vision, pp. 720–735. Springer (2014)
Murphy, K.P.: Machine learning: a probabilistic perspective. The MIT Press (2012). ISBN 0262018020, 9780262018029
Najibi, M., Samangouei, P., Chellappa, R., Davis, L.: SSH: Single stage headless face detector. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Ozuysal, M., Calonder, M., Lepetit, V., Fua, P.: Fast keypoint recognition using random ferns. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 448–461 (2010)
Article Google Scholar
Park, J.-K., Kang, D.-J.: Unified convolutional neural network for direct facial keypoints detection. Vis Comput (2018). https://doi.org/10.1007/s00371-018-1561-3
Quan, W., Chen, J.X., Yu, N.: Robust object tracking using enhanced random ferns. Vis. Comput. 30(4), 351–358 (2014)
Article Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: Orb: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1), 157–173 (2008)
Article Google Scholar
Santner, J., Leistner, C., Saffari, A., Pock, T., Bischof, H.: Prost: parallel robust online simple tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 723–730 (2010)
Segui, S., Drozdzal, M., Radeva, P., Vitria, J.: An integrated approach to contextual face detection. In: ICPRAM (2012)
Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–56), 11 (2010). 2010
Google Scholar
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the ACM Workshop on Computational Learning Theory, pp. 287–294 (1992)
Sharma, P., Nevatia, R.: Multi class boosted random ferns for adapting a generic object detector to a specific video. In: IEEE Winter Conference on Applications of Computer Vision, pp. 745–752 (2014)
Torralba, A., Murphy, K.P., Freeman, W.T.: Sharing visual features for multiclass and multiview object detection. IEEE Trans. Pattern Anal. Mach. Intell. 29(5), 854–869 (2007)
Article Google Scholar
Villamizar, M., Andrade-Cetto, J., Sanfeliu, A., Moreno-Noguer, F.: Boosted random ferns for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(2), 272–288 (2018). https://doi.org/10.1109/TPAMI.2017.2676778
Article Google Scholar
Villamizar, M., Garrell, A., Sanfeliu, A., Moreno-Noguer, F.: Online human-assisted learning using random ferns. In: International Conference on Pattern Recognition, pp. 2821–2824 (2012)
Villamizar, M., Grabner, H., Andrade-Cetto, J., Sanfeliu, A., Van Gool, L., Moreno-Noguer, F.: Efficient 3d object detection using multiple pose-specific classifiers. In: British Machine Vision Conference (2011)
Villamizar, M., Sanfeliu, A. , Moreno-Noguer, F.: Fast online learning and detection of natural landmarks for autonomous aerial robots. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 4996–5003. IEEE (2014)
Villamizar, M., Andrade-Cetto, J., Sanfeliu, A., Moreno-Noguer, F.: Bootstrapping boosted random ferns for discriminative and efficient object classification. Pattern Recognit. 45(9), 3141–3153 (2012)
Article Google Scholar
Villamizar, M., Garrell, A., Sanfeliu, A., Moreno-Noguer, F.: Interactive multiple object learning with scanty human supervision. Comput. Vis. Image Underst. 149, 51–64 (2016)
Article Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. l–511 (2001)
Yao, A., Gall, J., Leistner, C., van Gool, L.: Interactive object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3242–3249 (2012)
Zeisl, B., Leistner, C., Saffari, A., Bischof, H.: On-line semi-supervised multiple-instance boosting. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1879–1879 (2010)
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886. IEEE (2012)

Download references

Acknowledgements

This work is partially supported by the Spanish Ministry of Economy and Competitiveness under projects HuMoUR TIN2017-90086-R, ColRobTransp DPI2016-78957 and María de Maeztu Seal of Excellence MDM- 2016-0656.

Author information

Authors and Affiliations

Idiap Research Institute, Martigny, Switzerland
Michael Villamizar
Institut de Robotica i Informatica Industrial, Barcelona, Spain
Alberto Sanfeliu & Francesc Moreno-Noguer

Authors

Michael Villamizar
View author publications
You can also search for this author in PubMed Google Scholar
Alberto Sanfeliu
View author publications
You can also search for this author in PubMed Google Scholar
Francesc Moreno-Noguer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Villamizar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Villamizar, M., Sanfeliu, A. & Moreno-Noguer, F. Online learning and detection of faces with low human supervision. Vis Comput 35, 349–370 (2019). https://doi.org/10.1007/s00371-018-01617-y

Download citation

Published: 13 December 2018
Issue Date: 13 March 2019
DOI: https://doi.org/10.1007/s00371-018-01617-y

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Online learning and detection of faces with low human supervision

Abstract

Access this article

Similar content being viewed by others

Supervised Transformer Network for Efficient Face Detection

Annotate. Train. Evaluate. A Unified Tool for the Analysis and Visualization of Workflows in Machine Learning Applied to Object Detection

Real-Time Face Detection Using Artificial Neural Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Navigation

Abstract

Access this article

Similar content being viewed by others

Supervised Transformer Network for Efficient Face Detection

Annotate. Train. Evaluate. A Unified Tool for the Analysis and Visualization of Workflows in Machine Learning Applied to Object Detection

Real-Time Face Detection Using Artificial Neural Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation