Quantitative Biology

, Volume 5, Issue 2, pp 159–172 | Cite as

Elastic restricted Boltzmann machines for cancer data analysis

  • Sai Zhang
  • Muxuan Liang
  • Zhongjun Zhou
  • Chen Zhang
  • Ning Chen
  • Ting Chen
  • Jianyang Zeng
Research Article



Restricted Boltzmann machines (RBMs) are endowed with the universal power of modeling (binary) joint distributions. Meanwhile, as a result of their confining network structure, training RBMs confronts less difficulties when dealing with approximation and inference issues. But little work has been developed to fully exploit the capacity of these models to analyze cancer data, e.g., cancer genomic, transcriptomic, proteomic and epigenomic data. On the other hand, in the cancer data analysis task, the number of features/predictors is usually much larger than the sample size, which is known as the “pN” problem and is also ubiquitous in other bioinformatics and computational biology fields. The “pN” problem puts the bias-variance trade-off in a more crucial place when designing statistical learning methods. However, to date, few RBM models have been particularly designed to address this issue.


We propose a novel RBMs model, called elastic restricted Boltzmann machines (eRBMs), which incorporates the elastic regularization term into the likelihood function, to balance the model complexity and sensitivity. Facilitated by the classic contrastive divergence (CD) algorithm, we develop the elastic contrastive divergence (eCD) algorithm which can train eRBMs efficiently.


We obtain several theoretical results on the rationality and properties of our model.We further evaluate the power of our model based on a challenging task — predicting dichotomized survival time using the molecular profiling of tumors. The test results show that the prediction performance of eRBMs is much superior to that of the state-of-the-art methods.


The proposed eRBMs are capable of dealing with the “pN” problems and have superior modeling performance over traditional methods. Our novel model is a promising method for future cancer data analysis.


RBMs regularization cancer data analysis survival time prediction 



This work was supported in part by the National Basic Research Program of China (Nos. 2011CBA00300 and 2011CBA00301), the National Natural Science Foundation of China (Nos. 61033001, 61361136003 and 61472205), and China’s Youth 1000-Talent Program, the Beijing Advanced Innovation Center for Structural Biology.


  1. 1.
    Ding, L., Wendl, M. C., McMichael, J. F. and Raphael, B. J. (2014) Expanding the computational toolbox for mining cancer genomes. Nat. Rev. Genet., 15, 556–570CrossRefPubMedPubMedCentralGoogle Scholar
  2. 2.
    Jiang, P. and Liu, X. S. (2015) Big data mining yields novel insights on cancer. Nat. Genet., 47, 103–104CrossRefPubMedGoogle Scholar
  3. 3.
    Kristensen, V. N., Lingjærde, O. C., Russnes, H. G., Vollan, H. K. M., Frigessi, A. and Børresen-Dale, A.-L. (2014) Principles and methods of integrative genomic analyses in cancer. Nat. Rev. Cancer, 14, 299–313CrossRefPubMedGoogle Scholar
  4. 4.
    The Cancer Genome Atlas Research Network, Weinstein, J. N., Collisson, E. A., Mills, G. B., Shaw, K. R., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C., and Stuart, J. M. (2013) The cancer genome atlas pan-cancer analysis project. Nat. Genet., 45, 1113–1120CrossRefGoogle Scholar
  5. 5.
    Hastie, T., Tibshirani, R. and Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, 2nd ed., NewYork: SpringerCrossRefGoogle Scholar
  6. 6.
    West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J. A., Marks, J. R. and Nevins, J. R. (2001) Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. USA, 98, 11462–11467CrossRefPubMedPubMedCentralGoogle Scholar
  7. 7.
    Fan, J. and Lv, J. (2010) A selective overview of variable selection in high dimensional feature space. Stat Sin, 20, 101–148PubMedPubMedCentralGoogle Scholar
  8. 8.
    Tibshirani, R. (1994) Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B, 58, 267–288Google Scholar
  9. 9.
    Zou, H. and Hastie, T. (2005) Regularization and variable selection via the elastic net. J. R. Statist. Soc. B, 67, 301–320CrossRefGoogle Scholar
  10. 10.
    Fischer, A. and Igel, C. (2012) An Introduction to Restricted Boltzmann Machines. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Alvarez, L., Mejail, M., Gomez, L. and Jacobo, J. eds., Vol. 7441 of Lecture Notes in Computer Science, pp. 14–36, Berlin: SpringerCrossRefGoogle Scholar
  11. 11.
    Hinton, G. E. and Salakhutdinov, R. R. (2006) Reducing the dimensionality of data with neural networks. Science, 313, 504–507CrossRefPubMedGoogle Scholar
  12. 12.
    Hinton, G. E., Osindero, S. and Teh, Y.-W. (2006) A fast learning algorithm for deep belief nets. Neural Comput., 18, 1527–1554CrossRefPubMedGoogle Scholar
  13. 13.
    Bengio, Y. (2009) Learning deep architectures for AI. Found. Trends Mach. Learn., 2, 1–127CrossRefGoogle Scholar
  14. 14.
    Zhang, S., Zhou, J., Hu, H., Gong, H., Chen, L., Cheng, C. and Zeng, J. (2016) A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res., 44, e32CrossRefPubMedGoogle Scholar
  15. 15.
    Salakhutdinov, R. and Hinton, G. E. (2009) Deep boltzmann machines. In International Conference on Artificial Intelligence and Statistics, 448–455Google Scholar
  16. 16.
    Le Roux, N. and Bengio, Y. (2008) Representational power of restricted boltzmann machines and deep belief networks. Neural Comput., 20, 1631–1649CrossRefPubMedGoogle Scholar
  17. 17.
    Hinton, G. E. (2002) Training products of experts by minimizing contrastive divergence. Neural Comput., 14, 1771–1800CrossRefPubMedGoogle Scholar
  18. 18.
    Hinton, G. E. and Salakhutdinov, R. R. (2009) Replicated Softmax: an Undirected Topic model. In Advances in Neural Information Processing Systems 22. Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C. and Culotta, A. eds., pp. 1607–1614. New York: Curran Associates, IncGoogle Scholar
  19. 19.
    Salakhutdinov, R., Mnih, A. and Hinton, G. (2007) Restricted boltzmann machines for collaborative filtering. In Proceedings of the 24th International Conference on Machine Learning, 791–798Google Scholar
  20. 20.
    Wang, Y. and Zeng, J. (2013) Predicting drug-target interactions using restricted Boltzmann machines. Bioinformatics, 29, i126–i134CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Hinton, G. (2010) A practical guide to training restricted Boltzmann machines. In Neural Networks: Tricks of the Trade, pp. 599–619. Berlin: SpringerGoogle Scholar
  22. 22.
    Yuan, Y., Van Allen, E. M., Omberg, L.,Wagle, N., Amin-Mansour, A., Sokolov, A., Byers, L. A., Xu, Y., Hess, K. R., Diao, L., et al. (2014) Assessing the clinical utility of cancer genomic and proteomic data across tumor types. Nat. Biotechnol., 32, 644–652CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Bengio, Y. (2012) Practical recommendations for gradient-based training of deep architectures. arXiv:1206.5533Google Scholar
  24. 24.
    Schervish, M. J. (1995) Theory of Statistics. In Springer series in statistics. New York: Springer. Corrected second printing: 1997Google Scholar
  25. 25.
    Fan, J. and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc., 96, 1348–1360CrossRefGoogle Scholar
  26. 26.
    Olshausen, B. A. and Field, D. J. (1997) Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Res., 37, 3311–3325CrossRefPubMedGoogle Scholar
  27. 27.
    Olshausen, B. A. and Field, D. J. (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609CrossRefPubMedGoogle Scholar
  28. 28.
    Bengio, Y., Courville, A. and Vincent, P. (2013) Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell., 35, 1798–1828CrossRefPubMedGoogle Scholar
  29. 29.
    Ranzato, M. A., Boureau, Y. L. and Le Cun, Y., (2008) Sparse Feature Learning for Deep Belief Networks. In Advances in Neural Information Processing Systems 20. Platt, J., Koller, D., Singer, Y. and Roweis, S. eds., pp. 1185–1192, New York: Curran Associates, IncGoogle Scholar
  30. 30.
    Ranzato, M. A., Poultney, C., Chopra, S. and Le Cun, Y. (2007) Efficient Learning of Sparse Representations with an Energy-based Model. In Advances in Neural Information Processing Systems 19. Schölkopf, B. Platt, J. and Hoffman, T., eds., pp. 1137–1144. Cambridge: MIT PressGoogle Scholar
  31. 31.
    Ranzato, M., Huang, F., Boureau, Y. and LeCun, Y. (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, 1–8Google Scholar
  32. 32.
    Nair, V. and Hinton, G. E. (2009) 3D Object Recognition with Deep Belief Nets. In Advances in Neural Information Processing Systems 22. Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C. and Culotta, A. eds., 1339–1347. New York: Curran Associates, IncGoogle Scholar
  33. 33.
    Min, W., Liu, J. and Zhang, S. (2016) Network-regularized sparse logistic regression models for clinical risk prediction and biomarker discovery. arXiv:1609.06480Google Scholar
  34. 34.
    Chawla, N. V. (2005) Data Mining for Imbalanced Datasets: an Overview. In Data Mining and Knowledge Discovery Handbook. pp. 853–867, New York: SpringerCrossRefGoogle Scholar
  35. 35.
    Larochelle, H. and Bengio, Y. (2008) Classification using discriminative restricted boltzmann machines. In Proceedings of the 25th International Conference on Machine Learning, 536–543Google Scholar
  36. 36.
    Larochelle, H., Mandel, M., Pascanu, R. and Bengio, Y. (2012) Learning algorithms for the classification restricted boltzmann machine. J. Mach. Learn. Res., 13, 643–669Google Scholar
  37. 37.
    Le Roux, N. and Bengio, Y. (2008) Representational power of restricted boltzmann machines and deep belief networks. Neural Comput., 20, 1631–1649CrossRefPubMedGoogle Scholar
  38. 38.
    Vapnik, V. N. (1998) Statistical Learning Theory. 1 ed, New Jersey: WileyGoogle Scholar
  39. 39.
    Efron, B., Hastie, T., Johnstone, L. and Tibshirani, R. (2004) Least angle regression. Ann. Stat., 32, 407–499CrossRefGoogle Scholar
  40. 40.
    Boyd, S. and Vandenberghe, L. (2004) Convex Optimization. New York: Cambridge University PressCrossRefGoogle Scholar

Copyright information

© Higher Education Press and Springer-Verlag GmbH 2017

Authors and Affiliations

  • Sai Zhang
    • 1
  • Muxuan Liang
    • 2
  • Zhongjun Zhou
    • 1
  • Chen Zhang
    • 1
  • Ning Chen
    • 3
  • Ting Chen
    • 3
    • 4
  • Jianyang Zeng
    • 1
  1. 1.Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijingChina
  2. 2.Department of StatisticsUniversity of Wisconsin-MadisonMadisonUSA
  3. 3.Bioinformatics Division, TNLIST, Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  4. 4.Program in Computational Biology and BioinformaticsUnivesity of Southern CaliforniaLos AngelesUSA

Personalised recommendations