Large-Scale Bisample Learning on ID Versus Spot Face Recognition

Zhu, Xiangyu; Liu, Hao; Lei, Zhen; Shi, Hailin; Yang, Fan; Yi, Dong; Qi, Guojun; Li, Stan Z.

doi:10.1007/s11263-019-01162-8

Large-Scale Bisample Learning on ID Versus Spot Face Recognition

Published: 16 February 2019

Volume 127, pages 684–700, (2019)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Xiangyu Zhu^1,2^na1,
Hao Liu^1,2^na1,
Zhen Lei^1,2,
Hailin Shi¹,
Fan Yang³,
Dong Yi⁴,
Guojun Qi⁵ &
…
Stan Z. Li^1,2

1231 Accesses
35 Citations
7 Altmetric
1 Mention
Explore all metrics

Abstract

In real-world face recognition applications, there is a tremendous amount of data with two images for each person. One is an ID photo for face enrollment, and the other is a probe photo captured on spot. Most existing methods are designed for training data with limited breadth (a relatively small number of classes) and sufficient depth (many samples for each class). They would meet great challenges on ID versus Spot (IvS) data, including the under-represented intra-class variations and an excessive demand on computing devices. In this paper, we propose a deep learning based large-scale bisample learning (LBL) method for IvS face recognition. To tackle the bisample problem with only two samples for each class, a classification–verification–classification training strategy is proposed to progressively enhance the IvS performance. Besides, a dominant prototype softmax is incorporated to make the deep learning scalable on large-scale classes. We conduct LBL on a IvS face dataset with more than two million identities. Experimental results show the proposed method achieves superior performance to previous ones, validating the effectiveness of LBL on IvS face recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ETM-face: effective training sample selection and multi-scale feature learning for face detection

Article 09 March 2023

Deep learning based single sample face recognition: a survey

Article 05 August 2022

Face Verification Algorithm with Exploiting Feature Distribution

References

Babbar, R., Schölkopf, B. (2017). Dismec: Distributed sparse machines for extreme multi-label classification. In Proceedings of the tenth ACM international conference on web search and data mining (pp. 721–729). ACM.
Balntas, V., Riba, E., Ponsa, D., & Mikolajczyk, K. (2016). Learning local feature descriptors with triplets and shallow convolutional neural networks. In British machine vision conference (pp. 119.1–119.11).
Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.
MATH Google Scholar
Bertinetto, L., Henriques, J. F., Valmadre, J., Torr, P. H. S., & Vedaldi, A. (2016). Learning feed-forward one-shot learners. In Neural information processing systems (pp. 523–531).
Bhatia, K., Jain, H., Kar, P., Varma, M., & Jain, P. (2015). Sparse local embeddings for extreme multi-label classification. In Advances in neural information processing systems (pp. 730–738).
Cao, Q., Shen, L., Xie, W., Parkhi, O. M., & Zisserman, A. (2017). Vggface2: A dataset for recognising faces across pose and age. arXiv preprint arXiv:1710.08092.
Chen, W., Chen, X., Zhang, J., & Huang, K. (2017). Beyond triplet loss: A deep quadruplet network for person re-identification. In The conference on computer vision and pattern recognition (pp. 1320–1329).
Choe, J., Park, S., Kim, K., Hyun Park, J., Kim, D., & Shim, H. (2017). Face generation for low-shot learning using generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1940–1948).
Choromanska, A., Agarwal, A., & Langford, J. (2013). Extreme multi class classification. In NIPS Workshop: eXtreme Classification (submitted).
Feifei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611.
Article Google Scholar
Feng, Z. -H., Kittler, J., Awais, M., Huber, P., & Wu, X. J. (2017). Wing loss for robust facial landmark localisation with convolutional neural networks. arXiv preprint arXiv:1711.06753.
Guo, Y., & Zhang, L. (2017). One-shot face recognition by promoting underrepresented classes. arXiv preprint arXiv:1707.05574.
Guo, Y., Zhang, L., Hu, Y., He, X., & Gao, J. J. (2016). Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In European conference on computer vision (pp. 87–102). Springer.
Gutmann, M., & Hyvärinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. InProceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 297–304).
Hariharan, B., & Girshick, R. (2016). Low-shot visual recognition by shrinking and hallucinating features. arXiv preprint arXiv:1606.02819.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (pp. 770–778).
Hsu, D. J., Kakade, S. M., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In Advances in neural information processing systems (pp. 772–780).
Huang, C., Loy, C. C., & Tang, X. (2016). Local similarity-aware deep feature embedding. In Advances in neural information processing systems (pp. 1262–1270).
Huang, G. B., Mattar, M., Berg, T., & Learned-Miller, E. (2008). E: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. In Workshop on faces in‘Real-Life’ Images: detection, alignment, and recognition
Koch, G., Zemel, R., & Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In International conference on neural information processing systems (pp. 1097–1105).
Kumar, V. B., Harwood, B., Carneiro, G., Reid, I., & Drummond, T. (2017). Smart mining for deep metric learning. arXiv preprint arXiv:1704.01285.
Liao, S., Lei, Z., Yi, D., & Li, S. Z. (2014). A benchmark study of large-scale unconstrained face recognition. In IEEE international joint conference on biometrics (pp. 1–8).
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., & Song, L. (2017a). Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Liu, W., Wen, Y., Yu, Z., & Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. In ICML (pp. 507–516).
Liu, W., Zhang, Y. M., Li, X., Yu, Z., Dai, B., Zhao, T., & Song, L. (2017b). Deep hyperspherical learning. In Advances in neural information processing systems (pp. 3953–3963).
Mnih, A., & Kavukcuoglu, K. (2013). Learning word embeddings efficiently with noise-contrastive estimation. In Advances in neural information processing systems (pp. 2265–2273).
Mnih, A., & Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.
Nech, A., & Kemelmacher-Shlizerman, I. (2017). Level playing field for million scale face recognition. arXiv preprint arXiv:1705.00393.
Oh Song, H. Xiang, Y., Jegelka, S., & Savarese, S. (2016) Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4004–4012).
Ouyang, W., Wang, X., Zhang, C., & Yang, X. (2016). Factors in finetuning deep model for object detection with long-tail distribution. In Computer vision and pattern recognition (pp. 864–873).
Parkhi, O. M., Vedaldi, A., Zisserman, A., et al. (2015). Deep face recognition. BMVC, 1(3), 6.
Google Scholar
Prabhu, Y., & Varma, M. (2014). Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 263–272). ACM.
Ranjan, R., Castillo, C. D., & Chellappa, R. (2017). L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507.
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. P. (2016). One-shot learning with memory-augmented neural networks. arXiv:1605.06065
Schroff, F., Kalenichenko, D., & Philbin, J. (2015) Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815–823).
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Smirnov, E., Melnikov, A., Novoselov, S., Luckyanets, E., & Lavrentyeva, G. (2017). Doppelganger mining for face representation learning. In International conference on computer vision
Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. In Advances in neural information processing systems (pp. 1857–1865).
Sun, C., Shrivastava, A., Singh, S. & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In 2017 IEEE international conference on computer vision (ICCV) (pp. 843–852). IEEE.
Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In Advances in neural information processing systems (pp. 1988–1996).
Sun, Y., Wang, X., & Tang, X. (2013). Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1891–1898).
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, D. V., & Rabinovich, A. (2015). Going deeper with convolutions. In The IEEE conference on computer vision and pattern recognition (CVPR).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Computer Vision and Pattern Recognition (pp. 2818–2826).
Tagami, Y. (2017). Annexml: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 455–464). ACM.
Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2013) Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1701–1708).
Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Web-scale training for face identification. arXiv preprint arXiv:1406.5266.
Vaswani, A., Zhao, Y., Fossum, V., & Chiang, D. (2013). Decoding with large-scale neural language models improves translation. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1387–1392).
Vinyals, O., Blundell, C., Lillicrap, T. P., Kavukcuoglu, K., & Wierstra, D. (2016). Matching networks for one shot learning. In Neural information processing systems (pp. 3630–3638).
Wang, C., Zhang, X., & Lan, X. (2017). How to train triplet networks with 100k identities? arXiv preprint arXiv:1709.02940.
Wang, F., Liu, W., Liu, H., & Cheng, J. (2018a). Additive margin softmax for face verification. IEEE Signal Processing Letters, 25, 926–930.
Article Google Scholar
Wang, F., Xiang, X., Cheng, J., & Yuille, A. L. (2017). Normface: \( l\_2 \) hypersphere embedding for face verification. arXiv preprint arXiv:1704.06369
Wang, H., Wang, Y., Zhou, Z., Ji, X., & Liu, W. (2018b). Cosface: Large margin cosine loss for deep face recognition. In 2018 IEEE conference on computer vision and pattern recognition (CVPR). IEEE.
Wang, Y. X., & Hebert, M. (2016). Learning to learn: model regression networks for easy small sample learning. Berlin: Springer.
Google Scholar
Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European conference on computer vision (pp. 499–515). Springer.
Weston, J., Chopra, S., & Bordes, A. (2014). Memory networks. arXiv preprint arXiv:1410.3916.
Wu, X., He, R., Sun, Z., & Tan, T. (2015). A light CNN for deep face representation with noisy labels. arXiv preprint arXiv:1511.02683.
Xu, C., Tao, D., & Xu, C. (2016). Robust extreme multi-label learning. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1275–1284). ACM.
Xu, Z., Zhu, L., & Yang, Y. (2016). Few-shot object recognition from machine-labeled web images. arXiv preprint arXiv:1612.06152.
Yang, J., Price, B., Cohen, S., & Yang, M. H. (2014). Context driven scene parsing with attention to rare classes. In IEEE conference on computer vision and pattern recognition (pp. 3294–3301).
Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Learning face representation from scratch. In Computer vision and pattern recognition
Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., & Li, S. Z. (2017). Faceboxes: a CPU real-time face detector with high accuracy. arXiv preprint arXiv:1708.05234.
Zhang, X., Fang, Z., Wen, Y., Li, Z., & Qiao, Y. (2017). Range loss for deep face recognition with long-tailed training data. In The IEEE international conference on computer vision (ICCV).
Zhao, Y., Jin, Z., Qi, G., Lu, H., & Hua, X. (2018). A principled approach to hard triplet generation via adversarial nets. In European conference on computer vision.
Zhou, E., Cao, Z., & Yin, Q. (2015). Naive-deep face recognition: Touching the limit of LFW benchmark or not? arXiv preprint arXiv:1501.04690.

Download references

Acknowledgements

This work was supported by the Chinese National Natural Science Foundation Projects #61876178, #61806196, the National Key Research and Development Plan (Grant No.2016YFC0801002), and AuthenMetric R&D Funds. Zhen Lei is the corresponding author.

Author information

Xiangyu Zhu and Hao Liu have contributed equally to this work.

Authors and Affiliations

Center for Biometrics and Security Research and National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Xiangyu Zhu, Hao Liu, Zhen Lei, Hailin Shi & Stan Z. Li
University of Chinese Academy of Sciences, Beijing, China
Xiangyu Zhu, Hao Liu, Zhen Lei & Stan Z. Li
College of Software, Beihang University, Beijing, China
Fan Yang
DAMO Academy, Alibaba Group, Zhejiang, China
Dong Yi
HUAWEI Cloud, Boston, USA
Guojun Qi

Authors

Xiangyu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Lei
View author publications
You can also search for this author in PubMed Google Scholar
Hailin Shi
View author publications
You can also search for this author in PubMed Google Scholar
Fan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Dong Yi
View author publications
You can also search for this author in PubMed Google Scholar
Guojun Qi
View author publications
You can also search for this author in PubMed Google Scholar
Stan Z. Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen Lei.

Additional information

Communicated by Dr. Rama Chellappa, Dr. Xiaoming Liu, Dr. Tae-Kyun Kim, Dr. Fernando De la Torre and Dr. Chen Change Loy.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, X., Liu, H., Lei, Z. et al. Large-Scale Bisample Learning on ID Versus Spot Face Recognition. Int J Comput Vis 127, 684–700 (2019). https://doi.org/10.1007/s11263-019-01162-8

Download citation

Received: 16 February 2018
Accepted: 01 February 2019
Published: 16 February 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s11263-019-01162-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large-Scale Bisample Learning on ID Versus Spot Face Recognition

Abstract

Access this article

Similar content being viewed by others

ETM-face: effective training sample selection and multi-scale feature learning for face detection

Deep learning based single sample face recognition: a survey

Face Verification Algorithm with Exploiting Feature Distribution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Large-Scale Bisample Learning on ID Versus Spot Face Recognition

Abstract

Access this article

Similar content being viewed by others

ETM-face: effective training sample selection and multi-scale feature learning for face detection

Deep learning based single sample face recognition: a survey

Face Verification Algorithm with Exploiting Feature Distribution

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation