Face detection and alignment method for driver on highroad based on improved multi-task cascaded convolutional networks

  • Yang Zhang
  • Peihua Lv
  • Xiaobo LuEmail author
  • Jun Li


Driver’s face detection and alignment techniques in Intelligent Transportation System (ITS) under unlimited environment are challenging issues, which are conductive to supervising traffic order and maintaining public safety. This paper proposes the improved Multi-task Cascaded Convolutional Networks (ITS-MTCNN) to realize accurate face region detection and feature alignment of driver’s face on highway, predicting face and feature location via a coarse-to-fine pattern. Moreover, the improved regularization method and effective online hard sample mining technique are proposed in ITS-MTCNN method. Then, the training model and contrast experiment are conducted on our self-build traffic driver’s face database. Finally, the effectiveness of ITS-MTCNN method is validated by comparative experiments and verified under various complex highway conditions. At the same time, average alignment errors on left eye, right eye, nose, left mouth as well as right mouth of the proposed technique are performed. Experimental results show that ITS-MTCNN model shows satisfied performance compared to other state-of-the-art techniques used in driver’s face detection and alignment, keeping robust to the occlusion, varying pose and extreme illumination on highway.


Intelligent transportation system Face detection and alignment Multi-task Convolutional networks Deep learning 



We would like to thank the National Natural Science Foundation Projects of China (No.61871123), National Natural Science Foundation of China (No.61374194), National Key Science and Technology Pillar Program of China (No.2014BAG01B03) Key Research and Development Program of Jiangsu Province (No.BE2016739) for funding. In addition, we would like to thank the Public Security Department of Jiangsu Province for providing PSD-HIGHROAD database.


  1. 1.
    Alsmirat MA, Al-Alem F, Al-Ayyoub M et al. (2018) Impact of digital fingerprint image quality on the fingerprint recognition accuracy[J]. Multimed Tools ApplGoogle Scholar
  2. 2.
    Amberg, Brian, and Thomas Vetter. "Optimal landmark detection using shape models and branch and bound." 2011 International Conference on Computer Vision. IEEE, 2011.Google Scholar
  3. 3.
    Atawneh S, Almomani A, Al Bazar H et al (2017) Secure and imperceptible digital image steganographic algorithm based on diamond encoding in DWT domain[J]. Multimed Tools Appl 76(18):18451–18472CrossRefGoogle Scholar
  4. 4.
    Belhumeur, Peter N., et al. "Localizing parts of faces using a consensus of exemplars." IEEE transactions on pattern analysis and machine intelligence 35.12 (2013): 2930–2940.Google Scholar
  5. 5.
    Chen, Dong, et al. "Joint cascade face detection and alignment." European Conference on Computer Vision. Springer, Cham, 2014.Google Scholar
  6. 6.
    Cheng, Zhiyong, et al. "MMALFM: Explainable recommendation by leveraging reviews and images." ACM Transactions on Information Systems (TOIS) 37.2 (2019): 16.Google Scholar
  7. 7.
    Chiang, Hsin-Han, et al. "Embedded driver-assistance system using multiple sensors for safe overtaking maneuver." IEEE Systems Journal 8.3 (2012): 681-698.Google Scholar
  8. 8.
    El-Latif, Ahmed A. Abd, et al. "Efficient quantum information hiding for remote medical image sharing." IEEE Access 6 (2018): 21075–21083.Google Scholar
  9. 9.
    Gao S, Zhang Y, Jia K, Lu J, Zhang Y (2015) Single sample face recognition via learning deep supervised autoencoders. IEEE Trans Inf Forensics Secur 10(10):2108–2118CrossRefGoogle Scholar
  10. 10.
    Guo, Yangyang, et al. "Multi-modal preference modeling for product search." 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2018.Google Scholar
  11. 11.
    Gupta, Brij B., ed. Computer and cyber security: principles, algorithm, applications, and perspectives. CRC Press, 2018.Google Scholar
  12. 12.
    Gupta, Brij, Dharma P. Agrawal, and Shingo Yamaguchi, eds. Handbook of research on modern cryptographic solutions for computer and cyber security. IGI global, 2016.Google Scholar
  13. 13.
    Hu C, Lu X, Ye M, Zeng W (2017) Singular value decomposition and local near neighbors for face recognition under varying illumination. Pattern Recogn 64:60–83CrossRefGoogle Scholar
  14. 14.
    Huang Y, Yao H, Zhao S et al (2015) Towards more efficient and flexible face image deblurring using robust salient face landmark detection[J]. Multimed Tools Appl 76(1):1–20Google Scholar
  15. 15.
    Jain, Vidit, and Erik Learned-Miller. Fddb: A benchmark for face detection in unconstrained settings. Vol. 2. No. 4. UMass Amherst Technical Report, 2010.Google Scholar
  16. 16.
    Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.Google Scholar
  17. 17.
    Li, Haoxiang, et al. "A convolutional neural network cascade for face detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.Google Scholar
  18. 18.
    Li J, Yu C, Gupta BB et al (2018) Color image watermarking scheme based on quaternion Hadamard transform and Schur decomposition[J]. Multimed Tools Appl 77(4):4545–4561CrossRefGoogle Scholar
  19. 19.
    Zou, Liming, et al. "A novel coverless information hiding method based on the average pixel value of the sub-images." Multimedia Tools and Applications (2018): 1-16.Google Scholar
  20. 20.
    Lv J, Shao X, Xing J, Cheng C, Zhou X (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. Proc IEEE Conf Comput Vis Pattern RecognitGoogle Scholar
  21. 21.
    Marin-Jimenez MJ, Z isserman A, Eichner M et al (2014) Detecting people looking at each other in videos[J]. Int J Comput Vis 106(3):282–296CrossRefGoogle Scholar
  22. 22.
    Martinez CM, Heucke M, Wang F-Y et al (2018) Driving style recognition for intelligent vehicle control and advanced driver assistance: a survey[J]. IEEE Trans Intell Transp Syst 19(3):666–676CrossRefGoogle Scholar
  23. 23.
    Neubeck, Alexander, and Luc Van Gool. "Efficient non-maximum suppression." 18th International Conference on Pattern Recognition (ICPR'06). Vol. 3. IEEE, 2006.Google Scholar
  24. 24.
    Pham, Minh-Tri, et al. "Fast polygonal integration and its application in extending haar-like features to improve object detection." 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2010.Google Scholar
  25. 25.
    Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement." arXiv preprint arXiv:1804.02767 (2018).Google Scholar
  26. 26.
    Sun, Yi, Xiaogang Wang, and Xiaoou Tang. "Deep convolutional network cascade for facial point detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2013.itGoogle Scholar
  27. 27.
    Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
  28. 28.
    Viola P, Jones MJ (2004) Robust real-time face detection[J]. Int J Comput Vis 57(2):137–154CrossRefGoogle Scholar
  29. 29.
    Yang, Bin, et al. "Aggregate channel features for multi-view face detection." IEEE international joint conference on biometrics. IEEE, 2014.Google Scholar
  30. 30.
    Yang, Shuo, et al. "Wider face: A face detection benchmark." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
  31. 31.
    Zeng W, Lu X (2011) Region-based nonlocal means algorithm for noise removal. Electron Lett 47:1125–1127CrossRefGoogle Scholar
  32. 32.
    Zhang, Jie, et al. "Coarse-to-fine auto-encoder networks (cfan) for real-time face alignment." European conference on computer vision. Springer, Cham, 2014.Google Scholar
  33. 33.
    Zhang K, Zhang Z, Li Z et al (2016) Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Process Lett 23(10):1499–1503CrossRefGoogle Scholar
  34. 34.
    Zhao S, Yao H, Sun X (2013) Video classification and recommendation based on affective analysis of viewers[J]. Neurocomputing 119:101–110CrossRefGoogle Scholar
  35. 35.
    Zhao S, Yao H, Jiang X (2016) Multi-modal microblog classification via multi-task learning[J]. Multimed Tools Appl 75(15):8921–8938CrossRefGoogle Scholar
  36. 36.
    Zhao S, Yao H, Gao Y et al (2017) Continuous probability distribution prediction of image emotions via multi-task shared sparse regression[J]. IEEE Trans Multimed 19(3):632–645CrossRefGoogle Scholar
  37. 37.
    Zheng Q, Wang X, Khurram Khan M et al (2018) A lightweight authenticated encryption scheme based on chaotic SCML for railway cloud service[J]. IEEE Access 6:21075–21083CrossRefGoogle Scholar
  38. 38.
    Ramanan, Deva, and Xiangxin Zhu. "Face detection, pose estimation, and landmark localization in the wild." Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2012.Google Scholar
  39. 39.
    Zhu, Qiang, et al. "Fast human detection using a cascade of histograms of oriented gradients." 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06). Vol. 2. IEEE, 2006.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.School of AutomationSoutheast UniversityNanjingChina
  2. 2.Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of EducationSoutheast UniversityNanjingChina
  3. 3.Faculty of Engineering and Information TechnologyUniversity of Technology SydneyUltimoAustralia

Personalised recommendations