Soft-Margin Softmax for Deep Classification

  • Xuezhi Liang
  • Xiaobo WangEmail author
  • Zhen Lei
  • Shengcai Liao
  • Stan Z. Li
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10635)


In deep classification, the softmax loss (Softmax) is arguably one of the most commonly used components to train deep convolutional neural networks (CNNs). However, such a widely used loss is limited due to its lack of encouraging the discriminability of features. Recently, the large-margin softmax loss (L-Softmax [1]) is proposed to explicitly enhance the feature discrimination, with hard margin and complex forward and backward computation. In this paper, we propose a novel soft-margin softmax (SM-Softmax) loss to improve the discriminative power of features. Specifically, SM-Softamx only modifies the forward of Softmax by introducing a non-negative real number m, without changing the backward. Thus it can not only adjust the desired continuous soft margin but also be easily optimized by the typical stochastic gradient descent (SGD). Experimental results on three benchmark datasets have demonstrated the superiority of our SM-Softmax over the baseline Softmax, the alternative L-Softmax and several state-of-the-art competitors.


CNN Softmax L-Softmax SM-Softmax Classification 



This work was supported by the National Key Research and Development Plan (Grant No. 2016YFC0801002), the Chinese National Natural Science Foundation Projects \(\#\)61473291, \(\#\)61572501, \(\#\)61502491, \(\#\)61572536, \(\#\)61672521 and AuthenMetric R&D Funds.


  1. 1.
    Liu, W., Wen, Y., Yu, Z.: Large-margin softmax loss for convolutional neural networks. In: ICML (2016)Google Scholar
  2. 2.
    Wan, L., Zeiler, M., Zhang, S.: Regularization of neural networks using dropconnect. In: ICML (2013)Google Scholar
  3. 3.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  4. 4.
    He, K., Zhang, X.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  5. 5.
    Sun, Y., Chen, Y., Wang, X.: Deep learning face representation by joint identification-verification. In: NIPS (2014)Google Scholar
  6. 6.
    Taigman, Y., Yang, M., Ranzato, M.A.: Deepface: closing the gap to human-level performance in face verification. In: CVPR (2014)Google Scholar
  7. 7.
    Szegedy, C., Liu, W., Jia, Y.: Going deeper with convolutions. In: CVPR (2015)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: CVPR (2015)Google Scholar
  9. 9.
    Srivastava, N., Hinton, G.E., Krizhevsky, A.: Dropout: a simple way to prevent neural networks from overfitting. JMLR (2014)Google Scholar
  10. 10.
    Zeiler, M.D., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. arXiv preprint arXiv:1301.3557 (2013)
  11. 11.
    Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: CVPR (2015)Google Scholar
  12. 12.
    Tang, Y.: Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239 (2013)
  13. 13.
    Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016). doi: 10.1007/978-3-319-46478-7_31 Google Scholar
  14. 14.
    Liu, W., Wen, Y., Yu, Z.: SphereFace: deep hypersphere embedding for face recognition. In: CVPR (2017)Google Scholar
  15. 15.
    Martins, A., Astudillo, R.: From softmax to sparsemax: a sparse model of attention and multi-label classification. In: ICML (2016)Google Scholar
  16. 16.
    Jarrett, K., Kavukcuoglu, K., LeCun, Y.: What is the best multi-stage architecture for object recognition? In: ICCV (2009)Google Scholar
  17. 17.
    Lin, M., Chen, Q., Yan, S.: Network in network. In: ICLR (2014)Google Scholar
  18. 18.
    Goodfellow, I.J., Warde-Farley, D., Mirza, M.: Maxout Networks. In: ICML (2013)Google Scholar
  19. 19.
    Romero, A., Ballas, N.: Fitnets: Hints for thin deep nets. In: ICLR (2013)Google Scholar
  20. 20.
    Lee, C.Y., Xie, S., Gallagher, P.W.: Deeply-supervised nets. AISTATS (2015)Google Scholar
  21. 21.
    Springenberg, J.T., Dosovitskiy, A., Brox, T.: Striving for simplicity: The all convolutional net. In: ICLR (2015)Google Scholar
  22. 22.
    Liang, M., Hu, X.: Recurrent convolutional neural network for object recognition. In: CVPR (2015)Google Scholar
  23. 23.
    Lee, C.Y., Gallagher, P.W., Tu, Z.: Generalizing pooling functions in convolutional neural networks: mixed, gated, and tree. AISTATS (2016)Google Scholar
  24. 24.
    LeCun, Y. The MNIST database of handwritten digits (1998),
  25. 25.
    Krizhevsky, A., Geoffrey, H.: Learning multiple layers of features from tiny images (2009)Google Scholar
  26. 26.
    Jia, Y., Shelhamer, E., Donahue, J.: Caffe: Convolutional architecture for fast feature embedding. In: ACM (2014)Google Scholar
  27. 27.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  28. 28.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. AISTATS (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Xuezhi Liang
    • 1
    • 2
    • 3
  • Xiaobo Wang
    • 1
    • 3
    Email author
  • Zhen Lei
    • 1
    • 3
  • Shengcai Liao
    • 1
    • 3
  • Stan Z. Li
    • 1
    • 2
    • 3
  1. 1.Center for Biometrics and Security Research and National Laboratory of Pattern Recognition Institute of AutomationChinese Academy of SciencesBeijingChina
  2. 2.Center for Internet of ThingsChinese Academy of SciencesWuxiChina
  3. 3.University of Chinese Academy of SciencesBeijingChina

Personalised recommendations