Abstract
This paper proposes a new mutual information estimator for discrete and continuous variables, and constructs a forest based on the Chow-Liu algorithm. The state-of-art method assumes Gaussian and ANOVA for continuous and discrete/continuous cases, respectively. Given data, the proposed algorithm constructs several pairs of quantizers for X and Y such that each interval of the both axes contains the equal number of samples, and estimate the mutual information values based on the discrete data for the histograms. Among the mutual information values, we choose the maximum one, which is validated in terms of the minimum description length principle. Although strong consistency is not proved mathematically, the proposed method does not distinguish discrete and continuous values when dealing with data, and independence is detected correctly with probability one as the sample size grows. The obtained forest construction procedure is applied to genome differential analysis in which a discrete variable (wild and mutant phenotypes) affects gene expression values.
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: 2nd International Symposium on Information Theory, Budapest, Hungary (1973)
Cheng, J., Levina, E., Zhu, J.: High-Dimensional Mixed Graphical Models (2013)
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Trans. Inf. Theor. IT–14(3), 462–467 (1968)
Edwards, D., de Abreu, G.C.G., Labouriau, R.: Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests. MBC Bioinform. 11(18) (2010). doi:10.1186/1471-2105-11-18
Gessaman, M.P.: A consistent nonparametric multivariate density estimator based on statistically equivalent blocks. Ann. Math. Stat. 41(4), 1344–1346 (1970)
Krichevsky, R.E., Trofimov, V.K.: The performance of universal encoding. IEEE Trans. Inf. Theor. IT–27(2), 199–207 (1981)
Lee, J.D., Hastie, T.J.: Learning the Structure of Mixed Graphical Models. J. Comput. Graph. Stat. 24, 230–253 (2014)
Liang, P., Srebro, N.: Methods and experiments with bounded tree-width Markov networks. Technical report. MIT (2004)
Miller, L.D., Smeds, J., George, J., Vega, V.B., Vergara, L., Ploner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E.T., Bergh, J.: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl. Acad. Sci. USA 102(38), 13550–13555 (2005)
Panayidou, K.: Estimation of tree structure for variable selection. Ph.D. thesis, University of Oxford (2010)
Garber, M., Grabherr, M.G., Guttman, M., Trapnell, C.: Computational methods for transcriptome annotation and quantification using RNA-seq. Nat. Methods 8(6), 469–477 (2011)
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Scutari, M.: Package ebnlearnf (2015). https://cran.r-project.org/web/packages/bnlearn/bnlearn.pdf
Silva, J., Narayanan, S.S.: Nonproduct data-dependent partitions for mutual information estimation: strong consistency and applications. IEEE Trans. Sig. Process. 58(7), 3497–3511 (2010)
Suzuki, J.: A construction of Bayesian networks from databases on an MDL principle. In: The Ninth Conference on Uncertainty in Artificial Intelligence, Washington, D.C., pp. 266–273 (1993)
Suzuki, J.: The Bayesian chow-liu algorithm. In: The Proceedings of The Sixth European Workshop on Probabilistic Graphical Models, Granada, Spain (2012)
Wang, Z., Gerstein, M., Snyder, M.: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10(1), 57–63 (2009)
Acknowlegements
This work was partially supported by Advanced Research Networks A, Japan Society for the Promotion of Science (Takashi Suzuki, Osaka University).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Suzuki, J. (2015). Forest Learning Based on the Chow-Liu Algorithm and Its Application to Genome Differential Analysis: A Novel Mutual Information Estimation. In: Suzuki, J., Ueno, M. (eds) Advanced Methodologies for Bayesian Networks. AMBN 2015. Lecture Notes in Computer Science(), vol 9505. Springer, Cham. https://doi.org/10.1007/978-3-319-28379-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-28379-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28378-4
Online ISBN: 978-3-319-28379-1
eBook Packages: Computer ScienceComputer Science (R0)