Optimizing facial feature extraction and localization using YOLOv5: An empirical analysis of backbone architectures with data augmentation for precise facial region detection

Chanda, Srishti; Kumar, Yachika N.; Srivastava, Shrankhla; Rani, Ritu; Shree, Manu; Mohapatra, A. K.

doi:10.1007/s11042-024-19284-8

Optimizing facial feature extraction and localization using YOLOv5: An empirical analysis of backbone architectures with data augmentation for precise facial region detection

1232: Human-centric Multimedia Analysis
Published: 03 May 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Srishti Chanda¹,
Yachika N. Kumar¹,
Shrankhla Srivastava¹,
Ritu Rani¹,
Manu Shree¹ &
…
A. K. Mohapatra¹

38 Accesses
1 Altmetric
Explore all metrics

Abstract

The task of object detection in computer vision revolves around the identification of objects within images or videos. A specific subtask within object detection is face detection, which focuses on detecting human faces. Within the realm of face detection, an important research area is facial feature detection, which has diverse applications ranging from facial recognition to emotion detection and facial expression analysis. The crucial step in facial feature detection is the identification and localization of key facial features such as the eyes, eyebrows, nose, mouth, and chin, which can also be called facial region detection. Face region detection can be done in two ways: landmark detection and Bounding box- based detection. Bounding boxes offer computational benefits such as increased speed and efficiency. They are preferable when the objective is to accurately detect and locate the presence of an object or face in an image or video frame. Although most of the existing algorithms for facial feature detection based on bounding box predictions typically treat the eyes as a single entity, our approach using YOLOv5 addresses the separation of left and right eye detection. In this research study, we conducted experiments using YOLOv5, which provides bounding box predictions. We used a subset of LFW (Labelled Faces in the Wild) Dataset which we augmented using GFP-GAN, Gaussian Noise, Image Sharpening, and CLAHE. We explored the effectiveness of different backbone architectures when applied to YOLOv5 for the task of facial region detection. We evaluated three popular backbone networks: EfficientNet-b0, GhostNet, and CSP-Darknet53. Our objective was to identify the most suitable backbone architecture that yields accurate detection of facial features, including the left eye, right eye, nose, and lips. Our experiments show that when GhostNet is used as a backbone in the YOLOv5 architecture, it produces superior results for the detection and classification of features as compared to the other backbones. We present a detailed evaluation of our findings, including discussions of the experimental results using different IOU thresholds and backbone combinations. Our proposed methodology and findings make valuable contributions to the field of facial feature extraction and provide meaningful insights into the potential and performance of YOLOv5 for detecting and localizing key facial elements.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Face Detector Based on YOLOv3

Exploring Plain Vision Transformer Backbones for Object Detection

YOLO5Face: Why Reinventing a Face Detector

Data availability

The datasets analyzed during the current study are available with the authors and may be provided on request.

References

Dhingra A (2017) Face identification and clustering. Rutgers The State University of New Jersey, School of Graduate Studies
Hjelmås E, Low BK (2001) Face detection: a survey. Comput Vis Image Underst 83(3):236–274
Article Google Scholar
Lam KM, Yan H (1994) Facial feature location and extraction for computerized human face recognition. In ISITA’94: International Symposium on Information Theory & Its Applications 1994; Proceedings. Institution of Engineers, Australia, Barton, pp 167–171
Crowley JL, Berard F (1997) Multi-modal tracking of faces for video communications. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (pp 640–645). IEEE
Bagherian E, Rahmat RWO (2008) Facial feature extraction for face recognition: a review. In: 2008 International Symposium on Information Technology (vol 2, pp 1–9). IEEE
Ryu YS, Oh SY (2001) Automatic extraction of eye and mouth fields from a face image using eigenfeatures and multilayer perceptrons. Pattern Recogn 34(12):2459–2466
Article Google Scholar
Cristinacce D, Cootes TF (2003, September) Facial feature detection using AdaBoost with shape constraints. In BMVC, pp 1–10
Wiskott L, Fellous JM, Krüger N, Von Der Malsburg C (2022) Face recognition by elastic bunch graph matching. In Intelligent biometric techniques in fingerprint and face recognition. Routledge, pp 355–396
Feris RS, Gemmell J, Toyama K, Kruger V (2002) Hierarchical wavelet networks for facial feature localization. In: Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition. IEEE, pp 125–130
Cootes TF, Edwards GJ, Taylor CJ (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685
Article Google Scholar
Xiao J, Baker S, Matthews I, Kanade T (2004) Real-time combined 2D+ 3D active appearance models. In CVPR (2), pp 535–542
Wu Y, Ji Q (2019) Facial landmark detection: a literature survey. Int J Comput Vision 127:115–142
Article Google Scholar
Szegedy C, Toshev A, Erhan D (2013) Deep neural networks for object detection. Advances in neural information processing systems, 26.
Zhang K, Zhang Z, Li Z, Qiao Y (2016) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process Lett 23(10):1499–1503
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
Sun Y, Wang X, Tang X (2013) Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp 3476–3483)
Dong X, Yu S, Wu Z, Guo Y, Yang Y (2017) Face alignment with coarse- to-fine topology. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5325–5334
Hou Q, Wang J, Cheng L, Gong Y (2015) Facial landmark detection via cascade multi-channel convolutional neural network. In: 2015 IEEE International Conference on Image Processing (ICIP). IEEE, pp 1800–1804
Zhang J, Li H, Wang Y, Wang R, Li Z, Zuo W (2018) Robust facial landmark detection via a fully-convolutional local-global context network. arXiv Preprint arXiv :180303073
Deng J, Trigeorgis G, Zhou Y, Zafeiriou S (2019) Joint multi-view face alignment in the wild. IEEE Trans Image Process 28(7):3636–3648
Article MathSciNet Google Scholar
Colaco S, Han D (2022) Deep learning-based facial landmarks localization using compound scaling. IEEE Access 1–1
Yang S, Luo P, Loy C-C, Tang X (2015) From facial parts responses to face detection: A deep learning approach. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV)
Feng ZH, Kittler J, Awais M, Huber P, Wu XJ (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2235–2245
Huang G, Mattar M, Lee H, Learned-Miller E (2012) Learning to align from scratch. Advances in Neural Information Processing Systems, pp 25
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57:137–154
Article Google Scholar
Wang X, Li Y, Zhang H, Shan Y (2021) Towards real-world blind face restoration with generative facial prior. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9168–9178
Alqahtani H, Kavakli-Thorne M, Kumar G, SBSSTC F (2019 An analysis of evaluation metrics of GANs. In: International Conference on Information Technology and Applications (ICITA) (vol 7)
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training gans. Advances in Neural Information Processing Systems, pp 29
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems, pp 30
Cheng Z, Sun H, Takeuchi M, Katto J (2018) Performance comparison of convolutional autoencoders, generative adversarial networks and super-resolution for image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 2613–2616
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Zhang Y, Guo Z, Wu J, Tian Y, Tang H, Guo X (2022) Real-time vehicle detection based on improved yolo v5. Sustainability 14(19):12274
Article Google Scholar
Jocher G, Stoken A, Borovec J, Chaurasia A, Changyu L, Hogan A, …, Ingham F (2021) ultralytics/yolov5: v5. 0-YOLOv5-P6 1280 models, AWS, Supervise. ly and YouTube integrations. Zenodo
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C (2020) Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1580–1589
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. PMLR, pp 6105–6114
Nowozin S (2014) Optimal decisions from probabilistic models: the intersection-over-union case. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 548–555

Download references

Author information

Authors and Affiliations

Indira Gandhi Delhi Technical University for Women, Kashmere Gate, Delhi, 110006, New Delhi, India
Srishti Chanda, Yachika N. Kumar, Shrankhla Srivastava, Ritu Rani, Manu Shree & A. K. Mohapatra

Authors

Srishti Chanda
View author publications
You can also search for this author in PubMed Google Scholar
Yachika N. Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Shrankhla Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Ritu Rani
View author publications
You can also search for this author in PubMed Google Scholar
Manu Shree
View author publications
You can also search for this author in PubMed Google Scholar
A. K. Mohapatra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ritu Rani.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chanda, S., Kumar, Y.N., Srivastava, S. et al. Optimizing facial feature extraction and localization using YOLOv5: An empirical analysis of backbone architectures with data augmentation for precise facial region detection. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19284-8

Download citation

Received: 24 May 2023
Revised: 12 April 2024
Accepted: 22 April 2024
Published: 03 May 2024
DOI: https://doi.org/10.1007/s11042-024-19284-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing facial feature extraction and localization using YOLOv5: An empirical analysis of backbone architectures with data augmentation for precise facial region detection

Abstract

Access this article

Similar content being viewed by others

A Novel Face Detector Based on YOLOv3

Exploring Plain Vision Transformer Backbones for Object Detection

YOLO5Face: Why Reinventing a Face Detector

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimizing facial feature extraction and localization using YOLOv5: An empirical analysis of backbone architectures with data augmentation for precise facial region detection

Abstract

Access this article

Similar content being viewed by others

A Novel Face Detector Based on YOLOv3

Exploring Plain Vision Transformer Backbones for Object Detection

YOLO5Face: Why Reinventing a Face Detector

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation