Skip to main content
Log in

Large-Scale Bisample Learning on ID Versus Spot Face Recognition

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

In real-world face recognition applications, there is a tremendous amount of data with two images for each person. One is an ID photo for face enrollment, and the other is a probe photo captured on spot. Most existing methods are designed for training data with limited breadth (a relatively small number of classes) and sufficient depth (many samples for each class). They would meet great challenges on ID versus Spot (IvS) data, including the under-represented intra-class variations and an excessive demand on computing devices. In this paper, we propose a deep learning based large-scale bisample learning (LBL) method for IvS face recognition. To tackle the bisample problem with only two samples for each class, a classification–verification–classification training strategy is proposed to progressively enhance the IvS performance. Besides, a dominant prototype softmax is incorporated to make the deep learning scalable on large-scale classes. We conduct LBL on a IvS face dataset with more than two million identities. Experimental results show the proposed method achieves superior performance to previous ones, validating the effectiveness of LBL on IvS face recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  • Babbar, R., Schölkopf, B. (2017). Dismec: Distributed sparse machines for extreme multi-label classification. In Proceedings of the tenth ACM international conference on web search and data mining (pp. 721–729). ACM.

  • Balntas, V., Riba, E., Ponsa, D., & Mikolajczyk, K. (2016). Learning local feature descriptors with triplets and shallow convolutional neural networks. In British machine vision conference (pp. 119.1–119.11).

  • Bengio, Y., Ducharme, R., Vincent, P., & Jauvin, C. (2003). A neural probabilistic language model. Journal of Machine Learning Research, 3, 1137–1155.

    MATH  Google Scholar 

  • Bertinetto, L., Henriques, J. F., Valmadre, J., Torr, P. H. S., & Vedaldi, A. (2016). Learning feed-forward one-shot learners. In Neural information processing systems (pp. 523–531).

  • Bhatia, K., Jain, H., Kar, P., Varma, M., & Jain, P. (2015). Sparse local embeddings for extreme multi-label classification. In Advances in neural information processing systems (pp. 730–738).

  • Cao, Q., Shen, L., Xie, W., Parkhi, O. M., & Zisserman, A. (2017). Vggface2: A dataset for recognising faces across pose and age. arXiv preprint arXiv:1710.08092.

  • Chen, W., Chen, X., Zhang, J., & Huang, K. (2017). Beyond triplet loss: A deep quadruplet network for person re-identification. In The conference on computer vision and pattern recognition (pp. 1320–1329).

  • Choe, J., Park, S., Kim, K., Hyun Park, J., Kim, D., & Shim, H. (2017). Face generation for low-shot learning using generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1940–1948).

  • Choromanska, A., Agarwal, A., & Langford, J. (2013). Extreme multi class classification. In NIPS Workshop: eXtreme Classification (submitted).

  • Feifei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4), 594–611.

    Article  Google Scholar 

  • Feng, Z. -H., Kittler, J., Awais, M., Huber, P., & Wu, X. J. (2017). Wing loss for robust facial landmark localisation with convolutional neural networks. arXiv preprint arXiv:1711.06753.

  • Guo, Y., & Zhang, L. (2017). One-shot face recognition by promoting underrepresented classes. arXiv preprint arXiv:1707.05574.

  • Guo, Y., Zhang, L., Hu, Y., He, X., & Gao, J. J. (2016). Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In European conference on computer vision (pp. 87–102). Springer.

  • Gutmann, M., & Hyvärinen, A. (2010). Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. InProceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 297–304).

  • Hariharan, B., & Girshick, R. (2016). Low-shot visual recognition by shrinking and hallucinating features. arXiv preprint arXiv:1606.02819.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (pp. 770–778).

  • Hsu, D. J., Kakade, S. M., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In Advances in neural information processing systems (pp. 772–780).

  • Huang, C., Loy, C. C., & Tang, X. (2016). Local similarity-aware deep feature embedding. In Advances in neural information processing systems (pp. 1262–1270).

  • Huang, G. B., Mattar, M., Berg, T., & Learned-Miller, E. (2008). E: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. In Workshop on faces in‘Real-Life’ Images: detection, alignment, and recognition

  • Koch, G., Zemel, R., & Salakhutdinov, R. (2015). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In International conference on neural information processing systems (pp. 1097–1105).

  • Kumar, V. B., Harwood, B., Carneiro, G., Reid, I., & Drummond, T. (2017). Smart mining for deep metric learning. arXiv preprint arXiv:1704.01285.

  • Liao, S., Lei, Z., Yi, D., & Li, S. Z. (2014). A benchmark study of large-scale unconstrained face recognition. In IEEE international joint conference on biometrics (pp. 1–8).

  • Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., & Song, L. (2017a). Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition.

  • Liu, W., Wen, Y., Yu, Z., & Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. In ICML (pp. 507–516).

  • Liu, W., Zhang, Y. M., Li, X., Yu, Z., Dai, B., Zhao, T., & Song, L. (2017b). Deep hyperspherical learning. In Advances in neural information processing systems (pp. 3953–3963).

  • Mnih, A., & Kavukcuoglu, K. (2013). Learning word embeddings efficiently with noise-contrastive estimation. In Advances in neural information processing systems (pp. 2265–2273).

  • Mnih, A., & Teh, Y. W. (2012). A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426.

  • Nech, A., & Kemelmacher-Shlizerman, I. (2017). Level playing field for million scale face recognition. arXiv preprint arXiv:1705.00393.

  • Oh Song, H. Xiang, Y., Jegelka, S., & Savarese, S. (2016) Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4004–4012).

  • Ouyang, W., Wang, X., Zhang, C., & Yang, X. (2016). Factors in finetuning deep model for object detection with long-tail distribution. In Computer vision and pattern recognition (pp. 864–873).

  • Parkhi, O. M., Vedaldi, A., Zisserman, A., et al. (2015). Deep face recognition. BMVC, 1(3), 6.

    Google Scholar 

  • Prabhu, Y., & Varma, M. (2014). Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 263–272). ACM.

  • Ranjan, R., Castillo, C. D., & Chellappa, R. (2017). L2-constrained softmax loss for discriminative face verification. arXiv preprint arXiv:1703.09507.

  • Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., & Lillicrap, T. P. (2016). One-shot learning with memory-augmented neural networks. arXiv:1605.06065

  • Schroff, F., Kalenichenko, D., & Philbin, J. (2015) Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815–823).

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • Smirnov, E., Melnikov, A., Novoselov, S., Luckyanets, E., & Lavrentyeva, G. (2017). Doppelganger mining for face representation learning. In International conference on computer vision

  • Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. In Advances in neural information processing systems (pp. 1857–1865).

  • Sun, C., Shrivastava, A., Singh, S. & Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In 2017 IEEE international conference on computer vision (ICCV) (pp. 843–852). IEEE.

  • Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In Advances in neural information processing systems (pp. 1988–1996).

  • Sun, Y., Wang, X., & Tang, X. (2013). Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1891–1898).

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, D. V., & Rabinovich, A. (2015). Going deeper with convolutions. In The IEEE conference on computer vision and pattern recognition (CVPR).

  • Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Computer Vision and Pattern Recognition (pp. 2818–2826).

  • Tagami, Y. (2017). Annexml: Approximate nearest neighbor search for extreme multi-label classification. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 455–464). ACM.

  • Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2013) Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1701–1708).

  • Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Web-scale training for face identification. arXiv preprint arXiv:1406.5266.

  • Vaswani, A., Zhao, Y., Fossum, V., & Chiang, D. (2013). Decoding with large-scale neural language models improves translation. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1387–1392).

  • Vinyals, O., Blundell, C., Lillicrap, T. P., Kavukcuoglu, K., & Wierstra, D. (2016). Matching networks for one shot learning. In Neural information processing systems (pp. 3630–3638).

  • Wang, C., Zhang, X., & Lan, X. (2017). How to train triplet networks with 100k identities? arXiv preprint arXiv:1709.02940.

  • Wang, F., Liu, W., Liu, H., & Cheng, J. (2018a). Additive margin softmax for face verification. IEEE Signal Processing Letters, 25, 926–930.

    Article  Google Scholar 

  • Wang, F., Xiang, X., Cheng, J., & Yuille, A. L. (2017). Normface: \( l\_2 \) hypersphere embedding for face verification. arXiv preprint arXiv:1704.06369

  • Wang, H., Wang, Y., Zhou, Z., Ji, X., & Liu, W. (2018b). Cosface: Large margin cosine loss for deep face recognition. In 2018 IEEE conference on computer vision and pattern recognition (CVPR). IEEE.

  • Wang, Y. X., & Hebert, M. (2016). Learning to learn: model regression networks for easy small sample learning. Berlin: Springer.

    Google Scholar 

  • Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In European conference on computer vision (pp. 499–515). Springer.

  • Weston, J., Chopra, S., & Bordes, A. (2014). Memory networks. arXiv preprint arXiv:1410.3916.

  • Wu, X., He, R., Sun, Z., & Tan, T. (2015). A light CNN for deep face representation with noisy labels. arXiv preprint arXiv:1511.02683.

  • Xu, C., Tao, D., & Xu, C. (2016). Robust extreme multi-label learning. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1275–1284). ACM.

  • Xu, Z., Zhu, L., & Yang, Y. (2016). Few-shot object recognition from machine-labeled web images. arXiv preprint arXiv:1612.06152.

  • Yang, J., Price, B., Cohen, S., & Yang, M. H. (2014). Context driven scene parsing with attention to rare classes. In IEEE conference on computer vision and pattern recognition (pp. 3294–3301).

  • Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Learning face representation from scratch. In Computer vision and pattern recognition

  • Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., & Li, S. Z. (2017). Faceboxes: a CPU real-time face detector with high accuracy. arXiv preprint arXiv:1708.05234.

  • Zhang, X., Fang, Z., Wen, Y., Li, Z., & Qiao, Y. (2017). Range loss for deep face recognition with long-tailed training data. In The IEEE international conference on computer vision (ICCV).

  • Zhao, Y., Jin, Z., Qi, G., Lu, H., & Hua, X. (2018). A principled approach to hard triplet generation via adversarial nets. In European conference on computer vision.

  • Zhou, E., Cao, Z., & Yin, Q. (2015). Naive-deep face recognition: Touching the limit of LFW benchmark or not? arXiv preprint arXiv:1501.04690.

Download references

Acknowledgements

This work was supported by the Chinese National Natural Science Foundation Projects #61876178, #61806196, the National Key Research and Development Plan (Grant No.2016YFC0801002), and AuthenMetric R&D Funds. Zhen Lei is the corresponding author.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhen Lei.

Additional information

Communicated by Dr. Rama Chellappa, Dr. Xiaoming Liu, Dr. Tae-Kyun Kim, Dr. Fernando De la Torre and Dr. Chen Change Loy.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Liu, H., Lei, Z. et al. Large-Scale Bisample Learning on ID Versus Spot Face Recognition. Int J Comput Vis 127, 684–700 (2019). https://doi.org/10.1007/s11263-019-01162-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01162-8

Keywords

Navigation