Forest Learning Based on the Chow-Liu Algorithm and Its Application to Genome Differential Analysis: A Novel Mutual Information Estimation

Suzuki, Joe

doi:10.1007/978-3-319-28379-1_17

Joe Suzuki¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9505))

Included in the following conference series:

Workshop on Advanced Methodologies for Bayesian Networks

1146 Accesses

Abstract

This paper proposes a new mutual information estimator for discrete and continuous variables, and constructs a forest based on the Chow-Liu algorithm. The state-of-art method assumes Gaussian and ANOVA for continuous and discrete/continuous cases, respectively. Given data, the proposed algorithm constructs several pairs of quantizers for X and Y such that each interval of the both axes contains the equal number of samples, and estimate the mutual information values based on the discrete data for the histograms. Among the mutual information values, we choose the maximum one, which is validated in terms of the minimum description length principle. Although strong consistency is not proved mathematically, the proposed method does not distinguish discrete and continuous values when dealing with data, and independence is detected correctly with probability one as the sample size grows. The obtained forest construction procedure is applied to genome differential analysis in which a discrete variable (wild and mutant phenotypes) affects gene expression values.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: 2nd International Symposium on Information Theory, Budapest, Hungary (1973)
Google Scholar
Cheng, J., Levina, E., Zhu, J.: High-Dimensional Mixed Graphical Models (2013)
Google Scholar
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theor. IT–14(3), 462–467 (1968)
Article MATH Google Scholar
Edwards, D., de Abreu, G.C.G., Labouriau, R.: Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests. MBC Bioinform. 11(18) (2010). doi:10.1186/1471-2105-11-18
Gessaman, M.P.: A consistent nonparametric multivariate density estimator based on statistically equivalent blocks. Ann. Math. Stat. 41(4), 1344–1346 (1970)
Article MATH Google Scholar
Krichevsky, R.E., Trofimov, V.K.: The performance of universal encoding. IEEE Trans. Inf. Theor. IT–27(2), 199–207 (1981)
Article MathSciNet MATH Google Scholar
Lee, J.D., Hastie, T.J.: Learning the Structure of Mixed Graphical Models. J. Comput. Graph. Stat. 24, 230–253 (2014)
Article MathSciNet Google Scholar
Liang, P., Srebro, N.: Methods and experiments with bounded tree-width Markov networks. Technical report. MIT (2004)
Google Scholar
Miller, L.D., Smeds, J., George, J., Vega, V.B., Vergara, L., Ploner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E.T., Bergh, J.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl. Acad. Sci. USA 102(38), 13550–13555 (2005)
Article Google Scholar
Panayidou, K.: Estimation of tree structure for variable selection. Ph.D. thesis, University of Oxford (2010)
Google Scholar
Garber, M., Grabherr, M.G., Guttman, M., Trapnell, C.: Computational methods for transcriptome annotation and quantification using RNA-seq. Nat. Methods 8(6), 469–477 (2011)
Article Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Article MATH Google Scholar
Scutari, M.: Package ebnlearnf (2015). https://cran.r-project.org/web/packages/bnlearn/bnlearn.pdf
Silva, J., Narayanan, S.S.: Nonproduct data-dependent partitions for mutual information estimation: strong consistency and applications. IEEE Trans. Sig. Process. 58(7), 3497–3511 (2010)
Article MathSciNet MATH Google Scholar
Suzuki, J.: A construction of Bayesian networks from databases on an MDL principle. In: The Ninth Conference on Uncertainty in Artificial Intelligence, Washington, D.C., pp. 266–273 (1993)
Google Scholar
Suzuki, J.: The Bayesian chow-liu algorithm. In: The Proceedings of The Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain (2012)
Google Scholar
Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)
Article Google Scholar

Download references

Acknowlegements

This work was partially supported by Advanced Research Networks A, Japan Society for the Promotion of Science (Takashi Suzuki, Osaka University).

Author information

Authors and Affiliations

Osaka University, Suita, Japan
Joe Suzuki

Authors

Joe Suzuki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joe Suzuki .

Editor information

Editors and Affiliations

Osaka University, Osaka, Japan
Joe Suzuki
The University of Electro-Communications, Tokyo, Japan
Maomi Ueno

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suzuki, J. (2015). Forest Learning Based on the Chow-Liu Algorithm and Its Application to Genome Differential Analysis: A Novel Mutual Information Estimation. In: Suzuki, J., Ueno, M. (eds) Advanced Methodologies for Bayesian Networks. AMBN 2015. Lecture Notes in Computer Science(), vol 9505. Springer, Cham. https://doi.org/10.1007/978-3-319-28379-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-28379-1_17
Published: 08 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28378-4
Online ISBN: 978-3-319-28379-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics