A Pipelined Pre-training Algorithm for DBNs

  • Zhiqiang MaEmail author
  • Tuya LiEmail author
  • Shuangtao YangEmail author
  • Li ZhangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10565)


Deep networks have been widely used in many domains in recent years. However, the pre-training of deep networks is time consuming with greedy layer-wise algorithm, and the scalability of this algorithm is greatly restricted by its inherently sequential nature where only one hidden layer can be trained at one time. In order to speed up the training of deep networks, this paper mainly focuses on pre-training phase and proposes a pipelined pre-training algorithm because it uses distributed cluster, which can significantly reduce the pre-training time at no loss of recognition accuracy. It’s more efficient than greedy layer-wise pre-training algorithm by using the computational cluster. The contrastive experiments between greedy layer-wise and pipelined layer-wise algorithm are conducted finally, so we have carried out a comparative experiment on the greedy layer-wise algorithm and pipelined pre-training algorithms on the TIMIT corpus, result shows that the pipelined pre-training algorithm is an efficient algorithm to utilize distributed GPU cluster. We achieve a 2.84 and 5.9 speed-up with no loss of recognition accuracy when we use 4 slaves and 8 slaves. Parallelization efficiency is close to 0.73.


Component Deep networks Pre-training Greedy layer-wise RBM Pipelined 



Funding project: National Natural Science Foundation of China (61650205). Inner Mongolia Autonomous Region Natural Sciences Foundation project (2014MS0608). Inner Mongolian University of Technology key Fund (ZD201118).


  1. 1.
    Dahl, G.E., et al.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)CrossRefGoogle Scholar
  2. 2.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  3. 3.
    Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)CrossRefGoogle Scholar
  4. 4.
    Bengio, Y.: Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Bengio, Y., et al.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, vol. 19, p. 153 (2007)Google Scholar
  7. 7.
    Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems (2012)Google Scholar
  8. 8.
    Seide, F., et al.: On parallelizability of stochastic gradient descent for speech DNNs. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2014)Google Scholar
  9. 9.
    Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop. No. EPFL-CONF-192376 (2011)Google Scholar
  10. 10.
    Bergstra, J., et al.: Theano: deep learning on gpus with python. In: NIPS 2011, BigLearning Workshop, Granada, Spain (2011)Google Scholar
  11. 11.
    Moritz, P., et al.: SparkNet: Training Deep Networks in Spark (2015). arXiv preprint: arXiv:1511.06051
  12. 12.
    Zhang, S., et al.: Asynchronous stochastic gradient descent for DNN training. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)Google Scholar
  13. 13.
    Miao, Y., Zhang, H., Metze, F.: Distributed learning of multilingual DNN feature extractors using GPUs (2014)Google Scholar
  14. 14.
    Chen, X., et al.: Pipelined back-propagation for context-dependent deep neural networks. In: INTERSPEECH (2012)Google Scholar
  15. 15.
    Santara, A., et al.: Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training (2016). arXiv preprint: arXiv:1603.02836

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.College of Information EngineeringInner Mongolia University of TechnologyHohhotChina

Personalised recommendations