Robust facial landmark extraction scheme using multiple convolutional neural networks

Kim, Hyungjoon; Park, Jisoo; Kim, HyeonWoo; Hwang, Eenjun; Rho, Seungmin

doi:10.1007/s11042-018-6482-7

Robust facial landmark extraction scheme using multiple convolutional neural networks

Published: 23 August 2018

Volume 78, pages 3221–3238, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hyungjoon Kim¹,
Jisoo Park¹,
HyeonWoo Kim¹,
Eenjun Hwang¹ &
…
Seungmin Rho²

805 Accesses
12 Citations
Explore all metrics

A Correction to this article was published on 05 September 2018

This article has been updated

Abstract

Facial landmarks are a set of features that can be distinguished on the human face with the naked eye. Typical facial landmarks include eyes, eyebrows, nose, and mouth. Landmarks play an important role in human-related image analysis. For example, they can be used to determine whether there is a human being in the image, identify who the person is, or recognize the orientation of a face when taking a photograph. General techniques for detecting facial landmarks can be classified into two groups: One is based on traditional image processing techniques, such as Haar cascade classifiers and edge detection. The other is based on machine learning techniques in which landmarks can be detected by training neural network using facial features. However, such techniques have shown low accuracy, especially in some special conditions such as low luminance and overlapped faces. To overcome these problems, we proposed in our previous work a facial landmark extraction scheme using deep learning and semantic segmentation, and demonstrated that with even a small dataset, our scheme could achieve reasonable facial landmark extraction performance under such conditions. Nevertheless, for more extensive dataset, we found several exceptional cases where the scheme could not detect face landmarks precisely. Hence, in this paper, we revise our facial landmark extraction scheme using a deep learning model called Faster R-CNN and show how our scheme can improve the overall performance by handling such exceptional cases appropriately. Also, we show how to expand the training dataset by using image filters and image operations such as rotation for more robust landmark detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Article 14 March 2024

Change history

05 September 2018
In the original publication, the author name “Seungmin Rho” was incorrectly spelled as “Seumgmin Rho”. The correct author name is given above. The original article has been corrected.

References

Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. arXiv preprint arXiv:1511.00561
Badrinarayanan V, Handa A, Cipolla R (2015) Segnet: a deep convolutional encoder-decoder architecture for robust semantic pixel-wise labelling. arXiv preprint arXiv:1505.07293
Eigen D, Fergus R (2015) Predicting depth, surface normals, and semantic labels with a common multi-scale convolutional architecture. in ICCV, pp 2650–2658
Erjin Z et al (2013) Extensive facial landmark localization with coarse-to-fine convolutional network cascade. Comput Vis Workshops (ICCVW) 2013 IEEE Int Conf IEEE
Face datasets – http://ac.aua.am/Skhachat/Web/CS322/Face/FEI/. Accessed: 2017-11-03
Girshick R (2015) Fast r-cnn. arXiv preprint arXiv:1504.08083
Girshick R et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. Proc IEEE Conf Comput Vis Pattern Recognit
Güçlü U et al (2017) End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent, and adversarial networks. arXiv preprint arXiv:1703.03305
Kasinski A, Schmidt A (2010) The architecture and performance of the face and eyes detection system based on the Haar cascade classifiers. Pattern Anal Applic 13(2):197–211
Article MathSciNet Google Scholar
Kim H, Park J, Kim H, Hwang E (2018) Facial landmark extraction scheme based on semantic segmentation. 2018 International Conference on Platform Technology and Service (PlatCon-18), Jeju, Korea.01
King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res pp 1755–1758
Krizhevsky et al (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst
Le V, Brandt J, Lin Z, Bourdev LD, Huang TS (2012) Interactive facial feature localization. Interactive facial feature localization. Eur Conf Comput Vis pp 679–692
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. Proc IEEE Conf Comput Vis Pattern Recognit pp 3431–3440
Noh H, Hong S, Han B (2015) Learning deconvolution networks for semantic segmentation. Proc IEEE Int Conf Comput Vis pp 1520–1528
Park J et al (2018) An automatic virtual makeup scheme based on personal color analysis. International Conference on Ubiquitous Information Management and Communication (IMCOM 2018), Langkawi, Malaysia. 01
Redmon J et al (2016) You only look once: unified, real-time object detection. Proc IEEE Conf Comput Vis Pattern Recognit
Ren et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. Int Conf Med Image Comput Comput Assist Interv pp 234–241
Russakovsky O et al. (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) pp 1–42
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Yang J et al (2009) Linear spatial pyramid matching using sparse coding for image classification. Comput Vis Pattern Recognit CVPR 2009. IEEE Conference on. IEEE 2009

Download references

Acknowledgements

This work was supported by Korea Environment Industry & Technology Institute (KEITI) through Public Technology Program based on Environmental Policy, funded by Korea Ministry of Environment (MOE)(2017000210001).

Author information

Authors and Affiliations

School of Electrical Engineering, Korea University, Seoul, Republic of Korea
Hyungjoon Kim, Jisoo Park, HyeonWoo Kim & Eenjun Hwang
Department of Media Software, Sungkyul University, Anyang, Republic of Korea
Seungmin Rho

Authors

Hyungjoon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jisoo Park
View author publications
You can also search for this author in PubMed Google Scholar
HyeonWoo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Eenjun Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Seungmin Rho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eenjun Hwang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: The author name “Seungmin Rho” was incorrectly spelled as “Seumgmin Rho”.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, H., Park, J., Kim, H. et al. Robust facial landmark extraction scheme using multiple convolutional neural networks. Multimed Tools Appl 78, 3221–3238 (2019). https://doi.org/10.1007/s11042-018-6482-7

Download citation

Received: 20 March 2018
Revised: 01 June 2018
Accepted: 29 July 2018
Published: 23 August 2018
Issue Date: February 2019
DOI: https://doi.org/10.1007/s11042-018-6482-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust facial landmark extraction scheme using multiple convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Change history

05 September 2018

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust facial landmark extraction scheme using multiple convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

YOLO-based Object Detection Models: A Review and its Applications

Change history

05 September 2018

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation