Regularizing CNN via Feature Augmentation

  • Liechuan Ou
  • Zheng Chen
  • Jianwei LuEmail author
  • Ye LuoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10635)


Very deep convolutional neural network has a strong representation power and becomes the dominant model to tackle very complex image classification problems. Due to the huge number of parameters, overfitting is always a primary problem in training a network without enough data. Data augmentation at input layer is a commonly used regularization method to make the trained model generalize better. In this paper, we propose that feature augmentation at intermediate layers can be also used to regularize the network. We implement a modified residual network by adding augmentation layers and train the model on CIFAR10. Experimental results demonstrate our method can successfully regularize the model. It significantly decreases the cross-entropy loss on test set although the training loss is higher than the original network. The final recognition accuracy on test set is also improved. In comparison with Dropout, our method can cooperate better with batch normalization to produce performance gain.


Deep learning CNN Overfitting Model regularization 



This work was supported by the General Program of National Natural Science Foundation of China under Grant No. 61572362 and No. 81571347.


  1. 1.
    LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404. Citeseer (1990)Google Scholar
  2. 2.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)Google Scholar
  3. 3.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feed forward neural networks. In: Proceedings of AISTATS, pp. 249–256 (2010)Google Scholar
  4. 4.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034. IEEE (2015)Google Scholar
  5. 5.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  6. 6.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  7. 7.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)Google Scholar
  8. 8.
    Yang, W., Ouyang W., Li, H., Wang, X.: End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3073–3082. IEEE (2016)Google Scholar
  9. 9.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)Google Scholar
  10. 10.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)zbMATHGoogle Scholar
  11. 11.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826. IEEE (2016)Google Scholar
  12. 12.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). IEEECrossRefGoogle Scholar
  13. 13.
    Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. arXiv preprint (2016). arXiv:1611.03530
  14. 14.
    DeVries, T., Taylor, G.W.: Dataset augmentation in feature space. arXiv preprint (2017). arXiv:1702.05538
  15. 15.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)zbMATHMathSciNetGoogle Scholar
  16. 16.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of Software EngineeringTongji UniversityShanghaiChina
  2. 2.Institute of Translational MedicineTongji UniversityShanghaiChina
  3. 3.College of Architecture and Urban PlanningTongji UniversityShanghaiChina

Personalised recommendations