Abstract
Applying for information entropy theory, we present a measure of dependence for multi-variables relationships: the high dimensional maximal mutual information coefficient (HMIC). It is a kind of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships in large data sets which generalizes the maximum information coefficient (MIC) measurement in mutual variables. To decreasing the complexity of the HMIC computing, the improved uniform grid is proposed by data grid idea. At the same time, some optimal single axis partition algorithm (SAR) is built to ensure the feasible of the HMIC measurement. Finally we apply the HMIC to analysis the data sets of physical measurement among college students.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
David, N.: Reshef: detecting novel associations in large data sets. Science 334, 1518–1524 (2011)
Jiang, Y., Zhang, Q., Liu, C.: Some novel measurement on exploring large data sets based on multil-variables mutual information theory. J. Theor. Appl. Inform. Technol. 2(47), 547–550 (2013)
Zeshui, X.: Intuitionistic fuzzy aggregation operators. IEEE Trans. Fuzzy Syst. 15(6), 1179–1187 (2007)
Karpinets, T.V., Park, B.H., Uberbacher, E.C.: Analyzing large biological datasets with association networks. Nucl. Acids Res. 40(17), 1–8 (2012)
Wang, L., Wang, X.: On the worst case data sets for order statistics. Appl. Math. Inf. Sci. 2(6), 356–362 (2012)
Guindon, S., Dufayard, J.-F., Lefort, V., Anisimova, M.: Wim Hordijk and Olivier Gascuel: new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 3(59), 307–321 (2010)
GarcÃa, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 10(180), 2044–2064 (2010)
Erceg-Hurn, D.M., Mirosevich, V.M.: Modern robust statistical methods. Am. Psychol. Assoc. 7(63), 591–601 (2008)
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S.: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance and maximum parsimony methods. Mol. Biol. Evol. 10(28), 2731–2739 (2011)
Albanese, D., Filosi, M., Visintainer, R.: cmine, minerva and minepy: a C engine for the MINE suite and its R and Python wrappers, pp. 1–10 (2012). arXiv:1208.4271 [stat.ML]
Das, J., Mohammed, J., Haiyuan, Y.: Genome-scale analysis of interaction dynamics reveals organization of biological networks. Bioinformatics 28(14), 1873–1878 (2012)
Deng, X., Havukkala, I., Deng, X.: Large-scale genomic 2D visualization reveals extensive CG-AT skew correlation in bird genomes. BMC Evol. Biol. 234(7), 1471–1482 (2007)
Kulczycki, P.: Nonparametric estimation for control engineering. In: 4th WSEAS/IASME International Conference on Dynamical Systems and Control, pp. 115–121 (2008)
Cover T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Acknowledgments
The research is supported by NNSF of China under Grant No. 61273008 and No. 61104003. and the Fund of Hebei Education Department No.z2014096. The research is also supported by the Key Laboratory of Integrated Automation of Process Industry (Northeastern University). The authors are grateful to the anonymous referee for a careful checking of the details and for helpful comments that allow us to improve the manuscript.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Jiang, Ys., Zhang, DK., Wang, Xm., Zhu, Wy. (2016). Exploring High Dimension Large Data Correlation Analysis with Mutual Information and Application. In: Cao, BY., Wang, PZ., Liu, ZL., Zhong, YB. (eds) International Conference on Oriental Thinking and Fuzzy Logic. Advances in Intelligent Systems and Computing, vol 443. Springer, Cham. https://doi.org/10.1007/978-3-319-30874-6_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-30874-6_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30873-9
Online ISBN: 978-3-319-30874-6
eBook Packages: EngineeringEngineering (R0)