Machine Learning

, Volume 107, Issue 3, pp 509–549 | Cite as

The randomized information coefficient: assessing dependencies in noisy data

  • Simone Romano
  • Nguyen Xuan Vinh
  • Karin Verspoor
  • James Bailey
Article

Abstract

When differentiating between strong and weak relationships using information theoretic measures, the variance plays an important role: the higher the variance, the lower the chance to correctly rank the relationships. We propose the randomized information coefficient (RIC), a mutual information based measure with low variance, to quantify the dependency between two sets of numerical variables. We first formally establish the importance of achieving low variance when comparing relationships using the mutual information estimated with grids. Second, we experimentally demonstrate the effectiveness of RIC for (i) detecting noisy dependencies and (ii) ranking dependencies for the applications of genetic network inference and feature selection for regression. Across these tasks, RIC is very competitive over other 16 state-of-the-art measures. Other prominent features of RIC include its simplicity and efficiency, making it a promising new method for dependency assessment.

Keywords

Dependency measures Noisy relationships Normalized mutual information Randomized ensembles 

Notes

Acknowledgements

Simone Romano’s work was supported by a Melbourne International Research Scholarship (MIRS). James Bailey’s work was supported by an Australian Research Council Future Fellowship. Experiments were carried out on Amazon cloud supported by AWS in Education Grant Award.

References

  1. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.CrossRefMATHGoogle Scholar
  2. Breiman, L., & Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American statistical Association, 80(391), 580–598.MathSciNetCrossRefMATHGoogle Scholar
  3. Cellucci, C., Albano, A. M., & Rapp, P. (2005). Statistical validation of mutual information calculations: Comparison of alternative numerical algorithms. Physical Review E, 71(6), 066208.CrossRefGoogle Scholar
  4. Cover, T. M., & Thomas, J. A. (2012). Elements of information theory. New York: Wiley.MATHGoogle Scholar
  5. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1–30.MathSciNetMATHGoogle Scholar
  6. Dougherty, J., Kohavi, R., Sahami, M., et al. (1995). Supervised and unsupervised discretization of continuous features. Machine learning: Proceedings of the twelfth international conference, 12, 194–202.Google Scholar
  7. Faivishevsky, L. & Goldberger, J. (2009). ICA based on a smooth estimation of the differential entropy. In Advances in neural information processing systems (pp. 433–440).Google Scholar
  8. Fayyad, U. & Irani, K. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In International joint conference on artificial intelligence (IJCAI) Google Scholar
  9. Fraser, A. M., & Swinney, H. L. (1986). Independent coordinates for strange attractors from mutual information. Physical Review A, 33(2), 1134.MathSciNetCrossRefMATHGoogle Scholar
  10. Garcia, S., Luengo, J., Sáez, J. A., López, V., & Herrera, F. (2013). A survey of discretization techniques: Taxonomy and empirical analysis in supervised learning. IEEE Transactions on Knowledge and Data Engineering, 25(4), 734–750.CrossRefGoogle Scholar
  11. Geurts, P. (2002). Bias/variance tradeoff and time series classification. PhD thesis, Department d’Életrecité, Életronique et Informatique. Institut Momntefiore. Unversité de Liège.Google Scholar
  12. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1), 3–42.CrossRefMATHGoogle Scholar
  13. Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B., & Smola, A. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13(1), 723–773.MathSciNetMATHGoogle Scholar
  14. Gretton, A., Bousquet, O., Smola, A., & Schölkopf, B. (2005). Measuring statistical dependence with Hilbert–Schmidt norms. In Algorithmic learning theory (pp. 63–77). Springer.Google Scholar
  15. Guo, X., Zhang, Y., Hu, W., Tan, H., & Wang, X. (2014). Inferring nonlinear gene regulatory networks from gene expression data based on distance correlation. PloS ONE, 9(2), e87446.CrossRefGoogle Scholar
  16. Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.MATHGoogle Scholar
  17. Huo, X. & Szekely, G. J. (2014). Fast computing for distance covariance. ArXiv preprint arXiv:1410.1503.
  18. Khan, S., Bandyopadhyay, S., Ganguly, A. R., Saigal, S., Erickson, D. J., I. I. I., Protopopescu, V., et al. (2007). Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. Physical Review E, 76(2), 026209.Google Scholar
  19. Kraskov, A., Stögbauer, H., & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69(6), 066138.Google Scholar
  20. Kursa, M. B. (2014). rFerns: An implementation of the random ferns method for general-purpose machine learning. Journal of Statistical Software, 61(10), 1–13.CrossRefGoogle Scholar
  21. Kvalseth, T. O. (1987). Entropy and correlation: Some comments. IEEE transactions on Systems, Man and Cybernetics, 17(3), 517–519.CrossRefGoogle Scholar
  22. Lippert, C., Stegle, O., Ghahramani, Z., & Borgwardt, K. M. (2009). A kernel method for unsupervised structured network inference. In International conference on artificial intelligence and statistics (pp. 368–375).Google Scholar
  23. Lizier, J. T. (2014). JIDT: An information-theoretic toolkit for studying the dynamics of complex systems. ArXiv preprint arXiv:1408.3270.
  24. Lopez-Paz, D., Hennig, P., & Schölkopf, B. (2013). The randomized dependence coefficient. In Advances in neural information processing systems (pp. 1–9).Google Scholar
  25. Luedtke, A. & Tran, L. (2013). The generalized mean information coefficient. ArXiv preprint arXiv:1308.5712.
  26. Margolin, A. A., Nemenman, I., Basso, K., Wiggins, C., Stolovitzky, G., Favera, R. D., et al. (2006). Aracne: An algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics, 7(Suppl 1), S7.CrossRefGoogle Scholar
  27. Moddemeijer, R. (1989). On estimation of entropy and mutual information of continuous distributions. Signal Processing, 16(3), 233–248.MathSciNetCrossRefGoogle Scholar
  28. Moon, Y.-I., Rajagopalan, B., & Lall, U. (1995). Estimation of mutual information using kernel density estimators. Physical Review E, 52(3), 2318.CrossRefGoogle Scholar
  29. Nguyen, H. V., Müller, E., Vreeken, J., Efros, P., & Böhm, K. (2014a). Multivariate maximal correlation analysis. In Proceedings of the 31st international conference on machine learning (ICML-14) (pp. 775–783).Google Scholar
  30. Nguyen, H. V. & Vreeken, J. (2015). Universal dependency analysis. ArXiv preprint arXiv:1510.08389.
  31. Nguyen, X. V., Chan, J., Romano, S., & Bailey, J. (2014b). Effective global approaches for mutual information based feature selection. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 512–521). ACM.Google Scholar
  32. Özuysal, M., Fua, P., & Lepetit, V. (2007). Fast keypoint recognition in ten lines of code. In CVPR.Google Scholar
  33. Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., et al. (2011). Detecting novel associations in large data sets. Science, 334(6062), 1518–1524.CrossRefMATHGoogle Scholar
  34. Reshef, D. N., Reshef, Y. A., Sabeti, P. C., & Mitzenmacher, M. M. (2015a). An empirical study of leading measures of dependence. ArXiv preprint arXiv:1505.02214.
  35. Reshef, Y. A., Reshef, D. N., Finucane, H. K., Sabeti, P. C., & Mitzenmacher, M, M. (2015b). Measuring dependence powerfully and equitably. ArXiv preprint arXiv:1505.02213.
  36. Romano, S., Bailey, J., Nguyen, V., & Verspoor, K. (2014). Standardized mutual information for clustering comparisons: One step further in adjustment for chance. In Proceedings of the 31st international conference on machine learning (ICML-14) (pp. 1143–1151).Google Scholar
  37. Romano, S., Vinh, N. X., Bailey, J., & Verspoor, K. (2016). A framework to adjust dependency measure estimates for chance. In Proceedings of the 2016 SIAM international conference on data mining (pp. 423–431). Society for Industrial and Applied Mathematics.Google Scholar
  38. Ross, S. (2012). A first course in probability. Upper Saddle River: Pearson.MATHGoogle Scholar
  39. Schaffernicht, E., Kaltenhaeuser, R., Verma, S. S., & Gross, H.-M. (2010). On estimating mutual information for feature selection. In Artificial neural networks ICANN 2010 (pp. 362–367). Springer.Google Scholar
  40. Simon, N. & Tibshirani, R. (2011). Comment on detecting novel associations in large data sets. ArXiv preprint arXiv:1401.7645.
  41. Song, L., Smola, A., Gretton, A., Borgwardt, K. M., & Bedo, J. (2007). Supervised feature selection via dependence estimation. In Proceedings of the 24th international conference on Machine learning (pp. 823–830). ACM.Google Scholar
  42. Steuer, R., Kurths, J., Daub, C. O., Weise, J., & Selbig, J. (2002). The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics, 18(suppl 2), S231–S240.CrossRefGoogle Scholar
  43. Sugiyama, M. & Borgwardt, K. M. (2013). Measuring statistical dependence via the mutual information dimension. In Proceedings of the twenty-third international joint conference on artificial intelligence (pp. 1692–1698). AAAI Press.Google Scholar
  44. Székely, G. J., Rizzo, M. L., et al. (2009). Brownian distance covariance. The Annals of Applied Statistics, 3(4), 1236–1265.MathSciNetCrossRefMATHGoogle Scholar
  45. Tang, D., Wang, M., Zheng, W., & Wang, H. (2014). Rapidmic: Rapid computation of the maximal information coefficient. Evolutionary Bioinformatics Online, 10, 11.Google Scholar
  46. Van den Bulcke, T., Van Leemput, K., Naudts, B., van Remortel, P., Ma, H., Verschoren, A., et al. (2006). Syntren: A generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics, 7(1), 43.CrossRefGoogle Scholar
  47. Villaverde, A. F., Ross, J., & Banga, J. R. (2013). Reverse engineering cellular networks with information theoretic methods. Cells, 2(2), 306–329.CrossRefGoogle Scholar
  48. Villaverde, A. F., Ross, J., Morán, F., & Banga, J. R. (2014). MIDER: Network inference with mutual information distance and entropy reduction. PloS ONE, 9(5), e96732.CrossRefGoogle Scholar
  49. Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11, 2837–2854.MathSciNetMATHGoogle Scholar
  50. Wang, J., Kumar, S., & Chang, S.-F. (2012). Semi-supervised hashing for large-scale search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), 2393–2406.CrossRefGoogle Scholar
  51. Wang, Y., Romano, S., Nguyen, V., Bailey, J., Ma, X., & Xia, S.-T. (2017). Unbiased multivariate correlation analysis.Google Scholar
  52. Xuan, N., Chetty, M., Coppel, R., & Wangikar, P. (2012). Gene regulatory network modeling via global optimization of high-order dynamic bayesian network. BMC Bioinformatics, 13(1), 131.CrossRefGoogle Scholar
  53. Zhang, Y., Jia, S., Huang, H., Qiu, J., & Zhou, C. (2014). A novel algorithm for the precise calculation of the maximal information coefficient. Scientific Reports, 4, 6662.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2017

Authors and Affiliations

  1. 1.School of Computing and Information SystemsThe University of MelbourneMelbourneAustralia

Personalised recommendations