Training Restricted Boltzmann Machines with Multi-tempering: Harnessing Parallelization

  • Philemon Brakel
  • Sander Dieleman
  • Benjamin Schrauwen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7553)


Restricted Boltzmann Machines (RBM’s) are unsupervised probabilistic neural networks that can be stacked to form Deep Belief Networks. Given the recent popularity of RBM’s and the increasing availability of parallel computing architectures, it becomes interesting to investigate learning algorithms for RBM’s that benefit from parallel computations. In this paper, we look at two extensions of the parallel tempering algorithm, which is a Markov Chain Monte Carlo method to approximate the likelihood gradient. The first extension is directed at a more effective exchange of information among the parallel sampling chains. The second extension estimates gradients by averaging over chains from different temperatures. We investigate the efficiency of the proposed methods and demonstrate their usefulness on the MNIST dataset. Especially the weighted averaging seems to benefit Maximum Likelihood learning.


Markov Chain Monte Carlo Restricted Boltzmann Machines Neural Networks Machine Learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Athènes, M., Calvo, F.: Multiple-Replica Exchange with Information Retrieval. Chemphyschem. 9(16), 2332–2339 (2008)CrossRefGoogle Scholar
  2. 2.
    Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009), also published as a book. Now Publishers (2009)Google Scholar
  3. 3.
    Brenner, P., Sweet, C.R., VonHandorf, D., Izaguirre, J.A.: Accelerating the Replica Exchange Method through an Efficient All-Pairs Exchange. The Journal of Chemical Physics 126(7), 074103 (2007)Google Scholar
  4. 4.
    Desjardins, G., Courville, A.C., Bengio, Y., Vincent, P., Delalleau, O.: Tempered markov chain monte carlo for training of restricted boltzmann machines. Journal of Machine Learning Research - Proceedings Track 9, 145–152 (2010)Google Scholar
  5. 5.
    Freund, Y., Haussler, D.: Unsupervised Learning of Distributions on Binary Vectors Using Two Layer Networks. Tech. rep., Santa Cruz, CA, USA (1994)Google Scholar
  6. 6.
    Hinton, G.E.: Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation 14(8), 1771–1800 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Computation 18(7), 1527–1554 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Neal, R.M.: Annealed importance sampling. Statistics and Computing 11, 125–139 (1998)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Salakhutdinov, R.: Learning in markov random fields using tempered transitions. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) NIPS, pp. 1598–1606. Curran Associates, Inc. (2009)Google Scholar
  10. 10.
    Salakhutdinov, R., Murray, I.: On the quantitative analysis of Deep Belief Networks. In: McCallum, A., Roweis, S. (eds.) Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), pp. 872–879. Omnipress (2008)Google Scholar
  11. 11.
    Swendsen, R.H., Wang, J.S.: Replica Monte Carlo Simulation of Spin-Glasses. Physical Review Letters 57(21), 2607–2609 (1986)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Tieleman, T., Hinton, G.: Using Fast Weights to Improve Persistent Contrastive Divergence. In: Proceedings of the 26th International Conference on Machine Learning, pp. 1033–1040. ACM, New York (2009)Google Scholar
  13. 13.
    Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the International Conference on Machine Learning (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Philemon Brakel
    • 1
  • Sander Dieleman
    • 1
  • Benjamin Schrauwen
    • 1
  1. 1.Department of Electronics and Information SystemsGhent UniversityGentBelgium

Personalised recommendations