Skip to main content

Exploring High Dimension Large Data Correlation Analysis with Mutual Information and Application

  • Conference paper
  • First Online:
International Conference on Oriental Thinking and Fuzzy Logic

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 443))

  • 849 Accesses

Abstract

Applying for information entropy theory, we present a measure of dependence for multi-variables relationships: the high dimensional maximal mutual information coefficient (HMIC). It is a kind of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships in large data sets which generalizes the maximum information coefficient (MIC) measurement in mutual variables. To decreasing the complexity of the HMIC computing, the improved uniform grid is proposed by data grid idea. At the same time, some optimal single axis partition algorithm (SAR) is built to ensure the feasible of the HMIC measurement. Finally we apply the HMIC to analysis the data sets of physical measurement among college students.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. David, N.: Reshef: detecting novel associations in large data sets. Science 334, 1518–1524 (2011)

    Article  Google Scholar 

  2. Jiang, Y., Zhang, Q., Liu, C.: Some novel measurement on exploring large data sets based on multil-variables mutual information theory. J. Theor. Appl. Inform. Technol. 2(47), 547–550 (2013)

    Google Scholar 

  3. Zeshui, X.: Intuitionistic fuzzy aggregation operators. IEEE Trans. Fuzzy Syst. 15(6), 1179–1187 (2007)

    Article  Google Scholar 

  4. Karpinets, T.V., Park, B.H., Uberbacher, E.C.: Analyzing large biological datasets with association networks. Nucl. Acids Res. 40(17), 1–8 (2012)

    Article  Google Scholar 

  5. Wang, L., Wang, X.: On the worst case data sets for order statistics. Appl. Math. Inf. Sci. 2(6), 356–362 (2012)

    MathSciNet  Google Scholar 

  6. Guindon, S., Dufayard, J.-F., Lefort, V., Anisimova, M.: Wim Hordijk and Olivier Gascuel: new algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 3(59), 307–321 (2010)

    Article  Google Scholar 

  7. García, S., Fernández, A., Luengo, J., Herrera, F.: Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power. Inf. Sci. 10(180), 2044–2064 (2010)

    Article  Google Scholar 

  8. Erceg-Hurn, D.M., Mirosevich, V.M.: Modern robust statistical methods. Am. Psychol. Assoc. 7(63), 591–601 (2008)

    Article  Google Scholar 

  9. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S.: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance and maximum parsimony methods. Mol. Biol. Evol. 10(28), 2731–2739 (2011)

    Article  Google Scholar 

  10. Albanese, D., Filosi, M., Visintainer, R.: cmine, minerva and minepy: a C engine for the MINE suite and its R and Python wrappers, pp. 1–10 (2012). arXiv:1208.4271 [stat.ML]

  11. Das, J., Mohammed, J., Haiyuan, Y.: Genome-scale analysis of interaction dynamics reveals organization of biological networks. Bioinformatics 28(14), 1873–1878 (2012)

    Article  Google Scholar 

  12. Deng, X., Havukkala, I., Deng, X.: Large-scale genomic 2D visualization reveals extensive CG-AT skew correlation in bird genomes. BMC Evol. Biol. 234(7), 1471–1482 (2007)

    Google Scholar 

  13. Kulczycki, P.: Nonparametric estimation for control engineering. In: 4th WSEAS/IASME International Conference on Dynamical Systems and Control, pp. 115–121 (2008)

    Google Scholar 

  14. Cover T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)

    Google Scholar 

Download references

Acknowledgments

The research is supported by NNSF of China under Grant No. 61273008 and No. 61104003. and the Fund of Hebei Education Department No.z2014096. The research is also supported by the Key Laboratory of Integrated Automation of Process Industry (Northeastern University). The authors are grateful to the anonymous referee for a careful checking of the details and for helpful comments that allow us to improve the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu-shan Jiang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Jiang, Ys., Zhang, DK., Wang, Xm., Zhu, Wy. (2016). Exploring High Dimension Large Data Correlation Analysis with Mutual Information and Application. In: Cao, BY., Wang, PZ., Liu, ZL., Zhong, YB. (eds) International Conference on Oriental Thinking and Fuzzy Logic. Advances in Intelligent Systems and Computing, vol 443. Springer, Cham. https://doi.org/10.1007/978-3-319-30874-6_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30874-6_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30873-9

  • Online ISBN: 978-3-319-30874-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics