Boltzmann Machines

  • Ke-Lin DuEmail author
  • M. N. S. Swamy


Since its invention in 1985, the Boltzmann machine has long been treated as a model with mere historic significance to the machine learning community. In 2006, this model began to gain popularity when Hinton and collaborators achieved a breakthrough in deep learning, where restricted Boltzmann machine is the prime component of the deep neural network. In this chapter, we introduce the Boltzmann machine and its reduced form known as the restricted Boltzmann machine, as well as their learning algorithms. Related topics are also treated.


  1. 1.
    Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9, 147–169.CrossRefGoogle Scholar
  2. 2.
    Akiyama, Y., Yamashita, A., Kajiura, M., & Aiso, H. (1989). Combinatorial optimization with Gaussian machines. In Proceedings of International Joint Conference on Neural Networks (pp. 533–540). Washington, DC.Google Scholar
  3. 3.
    Attias, H. (1999). Inferring parameters and structure of latent variable models by variational Bayes. In Proceedings of the 15th Annual Conference on Uncertainty in AI (pp. 21–30).Google Scholar
  4. 4.
    Azencott, R., Doutriaux, A., & Younes, L. (1993). Synchronous Boltzmann machines and curve identification tasks. Network, 4, 461–480.zbMATHGoogle Scholar
  5. 5.
    Baldi, P., & Pineda, F. (1991). Contrastive learning and neural oscillations. Neural Computation, 3(4), 526–545.CrossRefGoogle Scholar
  6. 6.
    Baldi, P., & Sadowski, P. (2014). The dropout learning algorithm. Artificial Intelligence, 210, 78–122.MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Barra, A., Bernacchia, A., Santucci, E., & Contucci, P. (2012). On the equivalence of Hopfield networks and Boltzmann machines. Neural Networks, 34, 1–9.zbMATHCrossRefGoogle Scholar
  8. 8.
    Bengio, Y. (2009). Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1), 1–127.MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Bengio, Y., & Delalleau, O. (2009). Justifying and generalizing contrastive divergence. Neural Computation, 21(6), 1601–1621.MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    Brugge, K., Fischer, A., & Igel, C. (2013). The flip-the-state transition operator for restricted Boltzmann machines. Machine Learning, 93(1), 53–69.MathSciNetzbMATHCrossRefGoogle Scholar
  11. 11.
    Carreira-Perpinan, M. A., & Hinton, G. E. (2005). On contrastive divergence learning. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics (pp. 59–66).Google Scholar
  12. 12.
    Cho, K. H., Raiko, T., & Ilin, A. (2013). Gaussian–Bernoulli deep Boltzmann machine. In Proceedings of International Joint Conference on Neural Networks (IJCNN) (pp. 1–7).Google Scholar
  13. 13.
    Cote, M. A., & Larochelle, H. (2016). An infinite restricted Boltzmann machine. Neural Computation, 28, 1265–1289.MathSciNetzbMATHCrossRefGoogle Scholar
  14. 14.
    Del Genio, C. I., Gross, T., & Bassler, K. E. (2011). All scale-free networks are sparse. Physical Review Letters, 107(19), Paper No. 178701.Google Scholar
  15. 15.
    Desjardins, G., Courville, A., Bengio, Y., Vincent, P., & Dellaleau, O. (2010). Parallel tempering for training of restricted Boltzmann machines. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS’10) (pp. 145–152).Google Scholar
  16. 16.
    Detorakis, G., Bartley, T., & Neftci, E. (2019). Contrastive Hebbian learning with random feedback weights. Neural Networks, 114, 1–14.CrossRefGoogle Scholar
  17. 17.
    Elfwing, S., Uchibe, E., & Doya, K. (2015). Expected energy-based restricted Boltzmann machine for classification. Neural Networks, 64, 29–38.zbMATHCrossRefGoogle Scholar
  18. 18.
    Fischer, A., & Igel, C. (2011). Bounding the bias of contrastive divergence learning. Neural Computation, 23(3), 664–673.MathSciNetzbMATHCrossRefGoogle Scholar
  19. 19.
    Gabrie, M., Tramel, E. W., & Krzakala, F. (2015). Training restricted Boltzmann machine via the Thouless–Anderson–Palmer free energy. In Advances in neural information processing systems (pp. 640–648).Google Scholar
  20. 20.
    Galland, C. C. (1993). The limitations of deterministic Boltzmann machine learning. Network, 4, 355–380.zbMATHCrossRefGoogle Scholar
  21. 21.
    Glauber, R. J. (1963). Time-dependent statistics of the Ising model. Journal of Mathematical Physics, 4, 294–307.MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    Hartman, E. (1991). A high storage capacity neural network content-addressable memory. Network, 2, 315–334.CrossRefGoogle Scholar
  23. 23.
    Haykin, S. (1999). Neural networks: A comprehensive foundation (2nd ed.). Upper Saddle River, NJ: Prentice Hall.zbMATHGoogle Scholar
  24. 24.
    Hinton, G. E. (1989). Deterministic Boltzmann learning performs steepest descent in weight-space. Neural Computation, 1, 143–150.CrossRefGoogle Scholar
  25. 25.
    Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.zbMATHCrossRefGoogle Scholar
  26. 26.
    Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507.MathSciNetzbMATHCrossRefGoogle Scholar
  28. 28.
    Hinton, G. E., & Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in microstructure of cognition (Vol. 1, pp. 282–317). Cambridge, MA: MIT Press.Google Scholar
  29. 29.
    Igel, C., Glasmachers, T., & Heidrich-Meisner, V. (2008). Shark. Journal of Machine Learning Research, 9, 993–996.Google Scholar
  30. 30.
    Kam, M., & Cheng, R. (1989). Convergence and pattern stabilization in the Boltzmann machine. In D. S. Touretzky (Ed.), Advances in neural information processing systems (Vol. 1, pp. 511–518). San Mateo, CA: Morgan Kaufmann.Google Scholar
  31. 31.
    Kappen, H. J., & Rodriguez, F. B. (1998). Efficient learning in Boltzmann machine using linear response theory. Neural Computation, 10, 1137–1156.CrossRefGoogle Scholar
  32. 32.
    Kurita, N., & Funahashi, K. I. (1996). On the Hopfield neural networks and mean field theory. Neural Networks, 9, 1531–1540.zbMATHCrossRefGoogle Scholar
  33. 33.
    Larochelle, H., & Bengio, Y. (2008). Classification using discriminative restricted Boltzmann machines. In Proceedings of the 25th International Conference on Machine Learning (pp. 536–543). Helsinki, Finlan.Google Scholar
  34. 34.
    Le Roux, N., & Bengio, Y. (2008). Representational power of restricted Boltzmann machines and deep belief networks. Neural Computation, 20(6), 1631–1649.MathSciNetzbMATHCrossRefGoogle Scholar
  35. 35.
    Levy, B. C., & Adams, M. B. (1987). Global optimization with stochastic neural networks. In Proceedings of the 1st IEEE Conference on Neural Networks (Vol. 3, pp. 681–689). San Diego, CA.Google Scholar
  36. 36.
    Lillicrap, T. P., Cownden, D., Tweed, D. B., & Akerman, C. J. (2016). Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7, Paper No. 13276.Google Scholar
  37. 37.
    Lin, C. T., & Lee, C. S. G. (1995). A multi-valued Boltzmann machine. IEEE Transactions on Systems Man and Cybernetics, 25(4), 660–669.CrossRefGoogle Scholar
  38. 38.
    Mocanu, D. C., Mocanu, E., Nguyen, P. H., Gibescu, M., & Liotta, A. (2016). A topological insight into restricted Boltzmann machines. Machine Learning, 104(2), 243–270.MathSciNetzbMATHCrossRefGoogle Scholar
  39. 39.
    Montufar, G., Ay, N., & Ghazi-Zahedi, K. (2015). Geometry and expressive power of conditional restricted Boltzmann machines. Journal of Machine Learning Research, 16, 2405–2436.MathSciNetzbMATHGoogle Scholar
  40. 40.
    Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the International Conference on Machine Learning (ICML) (pp. 807–814).Google Scholar
  41. 41.
    Neftci, E., Das, S., Pedroni, B., Kreutz-Delgado, K., & Cauwenberghs, G. (2014). Event-driven contrastive divergence for spiking neuromorphic systems. Frontiers in Neuroscience, 8, 1–14.Google Scholar
  42. 42.
    Odense, S., & Edwards, R. (2016). Universal approximation results for the temporal restricted Boltzmann machine and the recurrent temporal restricted Boltzmann machine. Journal of Machine Learning Research, 17, 1–21.MathSciNetzbMATHGoogle Scholar
  43. 43.
    Peng, X., Gao, X., & Li, X. (2918). On better training the infinite restricted Boltzmann machines. Machine Learning, 107(6), 943–968.Google Scholar
  44. 44.
    Peterson, C., & Anderson, J. R. (1987). A mean field learning algorithm for neural networks. Complex Systems, 1(5), 995–1019.zbMATHGoogle Scholar
  45. 45.
    Ranzato, M. A., Krizhevsky, A., & Hinton, G. E. (2010). Factored 3-way restricted Boltzmann machines for modeling natural images. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS) (pp. 621–628). Sardinia, Italy.Google Scholar
  46. 46.
    Romero, E., Mazzantib, F., Delgado, J., & Buchaca, D. (2019). Weighted contrastive divergence. Neural Networks, 114, 147–156.CrossRefGoogle Scholar
  47. 47.
    Salakhutdinov, R., & Hinton, G. (2009). Replicated softmax: An undirected topic model. In Advances in neural information processing systems (Vol. 22, pp. 1607–1614). Vancouver, Canada.Google Scholar
  48. 48.
    Sankar, A. R., & Balasubramanian, V. N. (2015). Similarity-based contrastive divergence methods for energy-based deep learning models. In JMLR Workshop and Conference Proceedings (Vol. 45, pp. 391–406).Google Scholar
  49. 49.
    Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In D. E. Rumelhart, J. L. McClelland, & the PDP Research Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 194–281). Cambridge, MA: MIT Press.Google Scholar
  50. 50.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.MathSciNetzbMATHGoogle Scholar
  51. 51.
    Szu, H. H., & Hartley, R. L. (1987). Nonconvex optimization by fast simulated annealing. Proceedings of the IEEE, 75, 1538–1540.CrossRefGoogle Scholar
  52. 52.
    Taylor, G. W., Hinton, G. E., & Roweis, S. T. (2011). Two distributed-state models for generating high-dimensional time series. Journal of Machine Learning Research, 12, 1025–1068.MathSciNetzbMATHGoogle Scholar
  53. 53.
    Thouless, D. J., Anderson, P. W., & Palmer, R. G. (1977). Solution of “solvable model of a spin glass”. Philosophical Magazine, 35(3), 593–601.CrossRefGoogle Scholar
  54. 54.
    Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to the likelihood gradient. In W. W. Cohen, A. McCallum, & S. T. Roweis (Eds.), Proceedings of the 25th International Conference on Machine Learning (pp. 1064–1071). New York: ACM.Google Scholar
  55. 55.
    Tieleman, T., & Hinton, G. E. (2009). Using fast weights to improve persistent contrastive divergence. In A. P. Danyluk, L. Bottou, & M. L. Littman (Eds.), Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1033–1040). New York: ACM.Google Scholar
  56. 56.
    Welling, M., Rosen-Zvi, M., & Hinton, G. (2004). Exponential family harmoniums with an application to information retrieval. In Advances in neural information processing systems (Vol. 17, pp. 1481–1488).Google Scholar
  57. 57.
    Wu, J. M. (2004). Annealing by two sets of interactive dynamics. IEEE Transactions on Systems, Man, and Cybernetics Part B, 34(3), 1519–1525.MathSciNetCrossRefGoogle Scholar
  58. 58.
    Xie, X., & Seung, H. S. (2003). Equivalence of backpropagation and contrastive Hebbian learning in a layered network. Neural Computation, 15(2), 441–454.zbMATHCrossRefGoogle Scholar
  59. 59.
    Yasuda, M., & Tanaka, K. (2009). Approximate learning algorithm in Boltzmann machines. Neural Computation, 21, 3130–3178.MathSciNetzbMATHCrossRefGoogle Scholar
  60. 60.
    Younes, L. (1996). Synchronous Boltzmann machines can be universal approximators. Applied Mathematics Letters, 9(3), 109–113.MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringConcordia UniversityMontrealCanada
  2. 2.Xonlink Inc.HangzhouChina

Personalised recommendations