Skip to main content

Forest Learning Based on the Chow-Liu Algorithm and Its Application to Genome Differential Analysis: A Novel Mutual Information Estimation

  • Conference paper
  • First Online:
Advanced Methodologies for Bayesian Networks (AMBN 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9505))

Included in the following conference series:

  • 1146 Accesses

Abstract

This paper proposes a new mutual information estimator for discrete and continuous variables, and constructs a forest based on the Chow-Liu algorithm. The state-of-art method assumes Gaussian and ANOVA for continuous and discrete/continuous cases, respectively. Given data, the proposed algorithm constructs several pairs of quantizers for X and Y such that each interval of the both axes contains the equal number of samples, and estimate the mutual information values based on the discrete data for the histograms. Among the mutual information values, we choose the maximum one, which is validated in terms of the minimum description length principle. Although strong consistency is not proved mathematically, the proposed method does not distinguish discrete and continuous values when dealing with data, and independence is detected correctly with probability one as the sample size grows. The obtained forest construction procedure is applied to genome differential analysis in which a discrete variable (wild and mutant phenotypes) affects gene expression values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: 2nd International Symposium on Information Theory, Budapest, Hungary (1973)

    Google Scholar 

  2. Cheng, J., Levina, E., Zhu, J.: High-Dimensional Mixed Graphical Models (2013)

    Google Scholar 

  3. Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theor. IT–14(3), 462–467 (1968)

    Article  MATH  Google Scholar 

  4. Edwards, D., de Abreu, G.C.G., Labouriau, R.: Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests. MBC Bioinform. 11(18) (2010). doi:10.1186/1471-2105-11-18

  5. Gessaman, M.P.: A consistent nonparametric multivariate density estimator based on statistically equivalent blocks. Ann. Math. Stat. 41(4), 1344–1346 (1970)

    Article  MATH  Google Scholar 

  6. Krichevsky, R.E., Trofimov, V.K.: The performance of universal encoding. IEEE Trans. Inf. Theor. IT–27(2), 199–207 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  7. Lee, J.D., Hastie, T.J.: Learning the Structure of Mixed Graphical Models. J. Comput. Graph. Stat. 24, 230–253 (2014)

    Article  MathSciNet  Google Scholar 

  8. Liang, P., Srebro, N.: Methods and experiments with bounded tree-width Markov networks. Technical report. MIT (2004)

    Google Scholar 

  9. Miller, L.D., Smeds, J., George, J., Vega, V.B., Vergara, L., Ploner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E.T., Bergh, J.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl. Acad. Sci. USA 102(38), 13550–13555 (2005)

    Article  Google Scholar 

  10. Panayidou, K.: Estimation of tree structure for variable selection. Ph.D. thesis, University of Oxford (2010)

    Google Scholar 

  11. Garber, M., Grabherr, M.G., Guttman, M., Trapnell, C.: Computational methods for transcriptome annotation and quantification using RNA-seq. Nat. Methods 8(6), 469–477 (2011)

    Article  Google Scholar 

  12. Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  13. Scutari, M.: Package ebnlearnf (2015). https://cran.r-project.org/web/packages/bnlearn/bnlearn.pdf

  14. Silva, J., Narayanan, S.S.: Nonproduct data-dependent partitions for mutual information estimation: strong consistency and applications. IEEE Trans. Sig. Process. 58(7), 3497–3511 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  15. Suzuki, J.: A construction of Bayesian networks from databases on an MDL principle. In: The Ninth Conference on Uncertainty in Artificial Intelligence, Washington, D.C., pp. 266–273 (1993)

    Google Scholar 

  16. Suzuki, J.: The Bayesian chow-liu algorithm. In: The Proceedings of The Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain (2012)

    Google Scholar 

  17. Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)

    Article  Google Scholar 

Download references

Acknowlegements

This work was partially supported by Advanced Research Networks A, Japan Society for the Promotion of Science (Takashi Suzuki, Osaka University).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joe Suzuki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Suzuki, J. (2015). Forest Learning Based on the Chow-Liu Algorithm and Its Application to Genome Differential Analysis: A Novel Mutual Information Estimation. In: Suzuki, J., Ueno, M. (eds) Advanced Methodologies for Bayesian Networks. AMBN 2015. Lecture Notes in Computer Science(), vol 9505. Springer, Cham. https://doi.org/10.1007/978-3-319-28379-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28379-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28378-4

  • Online ISBN: 978-3-319-28379-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics