Deep Learning Architectures for Face Recognition in Video Surveillance

  • Saman Bashbaghi
  • Eric GrangerEmail author
  • Robert Sabourin
  • Mostafa Parchami


Face recognition (FR) systems for video surveillance (VS) applications attempt to accurately detect the presence of target individuals over a distributed network of cameras. In video-based FR systems, facial models of target individuals are designed a priori during enrollment using a limited number of reference still images or video data. These facial models are not typically representative of faces being observed during operations due to large variations in illumination, pose, scale, occlusion, blur, and camera interoperability. Specifically, in still-to-video FR application, a single high-quality reference still image captured with still camera under controlled conditions is employed to generate a facial model to be matched later against lower-quality faces captured with video cameras under uncontrolled conditions. Current video-based FR systems can perform well on controlled scenarios, while their performance is not satisfactory in uncontrolled scenarios mainly because of the differences between the source (enrollment) and the target (operational) domains. Most of the efforts in this area have been toward the design of robust video-based FR systems in unconstrained surveillance environments. This chapter presents an overview of recent advances in still-to-video FR scenario through deep convolutional neural networks (CNNs). In particular, deep learning architectures proposed in the literature based on triplet-loss function (e.g., cross-correlation matching CNN, trunk-branch ensemble CNN and HaarNet) and supervised autoencoders (e.g., canonical face representation CNN) are reviewed and compared in terms of accuracy and computational complexity.



This work was supported by the Fonds de Recherche du Québec – Nature et Technologies and MITACS.


  1. 1.
    Barr, J.R., Bowyer, K.W., Flynn, P.J., Biswas, S.: Face recognition from video: A review. International Journal of Pattern Recognition and Artificial Intelligence 26(05) (2012)Google Scholar
  2. 2.
    Bashbaghi, S., Granger, E., Sabourin, R., Bilodeau, G.A.: Watch-list screening using ensembles based on multiple face representations. In: ICPR, pp. 4489–4494 (2014)Google Scholar
  3. 3.
    Bashbaghi, S., Granger, E., Sabourin, R., Bilodeau, G.A.: Dynamic ensembles of exemplar-svms for still-to-video face recognition. Pattern Recognition 69, 61–81 (2017)CrossRefGoogle Scholar
  4. 4.
    Bashbaghi, S., Granger, E., Sabourin, R., Bilodeau, G.A.: Robust watch-list screening using dynamic ensembles of svms based on multiple face representations. Machine Vision and Applications 28(1), 219–241 (2017)CrossRefGoogle Scholar
  5. 5.
    Canziani, A., Paszke, A., Culurciello, E.: An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678 (2016)Google Scholar
  6. 6.
    Chellappa, R., Chen, J., Ranjan, R., Sankaranarayanan, S., Kumar, A., Patel, V.M., Castillo, C.D.: Towards the design of an end-to-end automated system for image and video-based recognition. CoRR abs/1601.07883 (2016)Google Scholar
  7. 7.
    Dewan, M.A.A., Granger, E., Marcialis, G.L., Sabourin, R., Roli, F.: Adaptive appearance model tracking for still-to-video face recognition. Pattern Recognition 49, 129–151 (2016)CrossRefGoogle Scholar
  8. 8.
    Ding, C., Tao, D.: Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE Trans on PAMI PP(99), 1–14 (2017).
  9. 9.
    Gao, S., Zhang, Y., Jia, K., Lu, J., Zhang, Y.: Single sample face recognition via learning deep supervised autoencoders. IEEE Transactions on Information Forensics and Security 10(10), 2108–2118 (2015)CrossRefGoogle Scholar
  10. 10.
    Ghodrati, A., Jia, X., Pedersoli, M., Tuytelaars, T.: Towards automatic image editing: Learning to see another you. In: BMVC (2016)Google Scholar
  11. 11.
    Gomerra, M., Granger, E., Radtke, P.V., Sabourin, R., Gorodnichy, D.O.: Partially-supervised learning from facial trajectories for face recognition in video surveillance. Information Fusion 24(0), 31–53 (2015)CrossRefGoogle Scholar
  12. 12.
    Heo, Y.S., Lee, K.M., Lee, S.U.: Robust stereo matching using adaptive normalized cross-correlation. IEEE Trans on PAMI 33(4), 807–822 (2011)CrossRefGoogle Scholar
  13. 13.
    Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Tech. Rep. 07-49 (2007)Google Scholar
  14. 14.
    Huang, G.B., Lee, H., Learned-Miller, E.: Learning hierarchical representations for face verification with convolutional deep belief networks. In: CVPR (2012)Google Scholar
  15. 15.
    Huang, Z., Shan, S., Wang, R., Zhang, H., Lao, S., Kuerban, A., Chen, X.: A benchmark and comparative study of video-based face recognition on cox face database. IP, IEEE Trans on 24(12), 5967–5981 (2015)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Kamgar-Parsi, B., Lawson, W., Kamgar-Parsi, B.: Toward development of a face recognition system for watchlist surveillance. PAMI, IEEE Trans on 33(10), 1925–1937 (2011)CrossRefGoogle Scholar
  17. 17.
    Kan, M., Shan, S., Su, Y., Xu, D., Chen, X.: Adaptive discriminant learning for face recognition. Pattern Recognition 46(9), 2497–2509 (2013)CrossRefGoogle Scholar
  18. 18.
    Kan, M., Shan, S., Chang, H., Chen, X.: Stacked progressive auto-encoders (spae) for face recognition across poses. In: CVPR (2014)Google Scholar
  19. 19.
    Le, Q.V.: Building high-level features using large scale unsupervised learning. In: ICASSP (2013)Google Scholar
  20. 20.
    Ma, A., Li, J., Yuen, P., Li, P.: Cross-domain person re-identification using domain adaptation ranking svms. IP, IEEE Trans on 24(5), 1599–1613 (2015)zbMATHGoogle Scholar
  21. 21.
    Matta, F., Dugelay, J.L.: Person recognition using facial video information: A state of the art. Journal of Visual Languages and Computing 20(3), 180–187 (2009)CrossRefGoogle Scholar
  22. 22.
    Mokhayeri, F., Granger, E., Bilodeau, G.A.: Synthetic face generation under various operational conditions in video surveillance. In: ICIP (2015)Google Scholar
  23. 23.
    Pagano, C., Granger, E., Sabourin, R., Marcialis, G., Roli, F.: Adaptive ensembles for face recognition in changing video surveillance environments. Information Sciences 286, 75–101 (2014)CrossRefGoogle Scholar
  24. 24.
    Parchami, M., Bashbaghi, S., Granger, E.: Cnns with cross-correlation matching for face recognition in video surveillance using a single training sample per person. In: AVSS (2017)Google Scholar
  25. 25.
    Parchami, M., Bashbaghi, S., Granger, E.: Video-based face recognition using ensemble of haar-like deep convolutional neural networks. In: IJCNN (2017)Google Scholar
  26. 26.
    Parchami, M., Bashbaghi, S., Granger, E., Sayed, S.: Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition. In: AVSS (2017)Google Scholar
  27. 27.
    Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC (2015)Google Scholar
  28. 28.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: CVPR (2015)Google Scholar
  29. 29.
    Sun, Y., Wang, X., Tang, X.: Hybrid deep learning for face verification. In: ICCV (2013)Google Scholar
  30. 30.
    Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: NIPS (2014)Google Scholar
  31. 31.
    Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR (2014)Google Scholar
  32. 32.
    Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: CVPR (2015)Google Scholar
  33. 33.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  34. 34.
    Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: Closing the gap to human-level performance in face verification. In: CVPR (2014)Google Scholar
  35. 35.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. JMLR 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Yang, M., Van Gool, L., Zhang, L.: Sparse variation dictionary learning for face recognition with a single training sample per person. In: ICCV (2013)Google Scholar
  37. 37.
    Yim, J., Jung, H., Yoo, B., Choi, C., Park, D., Kim, J.: Rotating your face using multi-task deep neural network. In: CVPR (2015)Google Scholar
  38. 38.
    Zheng, J., Patel, V.M., Chellappa, R.: Recent developments in video-based face recognition. In: Handbook of Biometrics for Forensic Science, pp. 149–175. Springer (2017)Google Scholar
  39. 39.
    Zhu, Z., Luo, P., Wang, X., Tang, X.: Multi-view perceptron: a deep model for learning face identity and view representations. In: NIPS (2014)Google Scholar
  40. 40.
    Zhu, Z., Luo, P., Wang, X., Tang, X.: Recover canonical-view faces in the wild with deep neural networks. arXiv preprint arXiv:1404.3543 (2014)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Saman Bashbaghi
    • 1
  • Eric Granger
    • 1
    Email author
  • Robert Sabourin
    • 1
  • Mostafa Parchami
    • 2
  1. 1.Laboratoire. d’imagerie de vision et d’intelligence artificielleÉcole de technologie supérieure, Université du QuébecMontrealCanada
  2. 2.Computer Science and Engineering DepartmentUniversity of Texas at ArlingtonArlingtonUSA

Personalised recommendations