Advertisement

Fissionable Deep Neural Network

  • DongXu TanEmail author
  • JunMin Wu
  • HuanXin Zheng
  • Yan Yin
  • YaXin Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9950)

Abstract

Model combination nearly always improves the performance of machine learning methods. Averaging the predictions of multi-model further decreases the error rate. In order to obtain multi high quality models more quickly, this article proposes a novel deep network architecture called “Fissionable Deep Neural Network”, abbreviated as FDNN. Instead of just adjusting the weights in a fixed topology network, FDNN contains multi branches with shared parameters and multi Softmax layers. During training, the model divides until to be multi models. FDNN not only can reduce computational cost, but also overcome the interference of convergence between branches and give an opportunity for the branches falling into a poor local optimal solution to re-learn. It improves the performance of neural network on supervised learning which is demonstrated on MNIST and CIFAR-10 datasets.

Keywords

Model combination Neural network Shared parameters Fission 

References

  1. 1.
    Srivastava, N., Hinton, G., Krizhevsky, A., et al.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)Google Scholar
  3. 3.
    Elman, J.L.: Learning and development in neural networks: the importance of starting small. Cognition 48(1), 71–99 (1993)CrossRefGoogle Scholar
  4. 4.
    Mishkin, D., Matas, J.: All you need is a good init (2015). arXiv preprint  arXiv:1511.06422
  5. 5.
    Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  6. 6.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). arXiv preprint  arXiv:1502.03167
  7. 7.
    Szegedy, C., Vanhoucke, V., Ioffe, S., et al.: Rethinking the Inception Architecture for Computer Vision (2015). arXiv preprint arXiv:1512.00567
  8. 8.
    Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning (2016). arXiv preprint arXiv:1602.07261
  9. 9.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., et al.: Striving for simplicity: The all convolutional net (2014). arXiv preprint arXiv:1412.6806
  10. 10.
    He, K., Zhang, X., Ren, S., et al.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  11. 11.
    Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, no. 1 (2013)Google Scholar
  12. 12.
    Goodfellow, I.J., Warde-Farley, D., Mirza, M., et al.: Maxout networks (2013). arXiv preprint arXiv:1302.4389
  13. 13.
    Fahlman, S.E., Lebiere, C.: The cascade-correlation learning architecture (1989)Google Scholar
  14. 14.
    LeCun, Y., Denker, J.S., Solla, S.A., et al.: Optimal brain damage. In: NIPs (1989)Google Scholar
  15. 15.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  16. 16.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto (2009)Google Scholar
  17. 17.
    Lin, M., Chen, Q., Yan, S.: Network in network (2013). CoRR, abs/1312.4400Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • DongXu Tan
    • 3
    Email author
  • JunMin Wu
    • 1
    • 2
  • HuanXin Zheng
    • 2
  • Yan Yin
    • 2
  • YaXin Liu
    • 3
  1. 1.Suzhou Institute for Advanced StudyUniversity of Science and Technology of ChinaSuzhouChina
  2. 2.Department of Computer Science and TechnologyUniversity of Science and Technology of ChinaSuzhouChina
  3. 3.School of Software EngineeringUniversity of Science and Technology of ChinaSuzhouChina

Personalised recommendations