Learning Correlations

  • Boris MirkinEmail author
Part of the Undergraduate Topics in Computer Science book series (UTICS)


After a short introduction of the general concept of decision rule to relate input and target features, this chapter describes some generic and most popular methods for learning correlations over two or more features. Four of them pertain to quantitative targets (linear regression, canonical correlation, neural network, and regression tree), and seven to categorical ones (linear discrimination, support vector machine, naïve Bayes classifier, classification tree, contingency table, distance between partition and ranking relations, and the correspondence analysis). Of these, classification trees are treated in a most detailed way including a number of theoretical results that are not well known. These establish firm relations between popular scoring functions and bivariate measures—Quetelet indexes in contingency tables and, rather unexpectedly, normalization options for dummy variables representing target categories. Some related concepts such as Bayesian decision rules, bag-of-word model in text analysis, VC-dimension and kernel for non-linear classification are introduced too. The Chapter outlines several important characteristics of summarization and correlation between two features, and displays some of the properties of those. They are:
  • linear regression and correlation coefficient for two quantitative variables (Sect. 3.2);

  • tabular regression and correlation ratio for the mixed scale case (Sect. 3.8.3); and

  • contingency table, Quetelet index, statistical independence, and Pearson’s chi-squared for two nominal variables; the latter is treated as a summary correlation measure, in contrast to the conventional view of it as just a criterion of statistical independence (Sect. 3.6.1); moreover, a few less known least-squares based concepts are outlined, including canonical correlation and correspondence analysis.


  1. J.-P. Benzecri, Correspondence Analysis Handbook (CRC Press, 1992). ISBN 10 0824784375Google Scholar
  2. M. Berthold, D. Hand, Intelligent Data Analysis (Springer, Berlin, 2003)Google Scholar
  3. L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees (Wadswarth, Belmont, Ca, 1984)zbMATHGoogle Scholar
  4. A.C. Davison, D.V. Hinkley, Bootstrap Methods and Their Application, 7th edn. (Cambridge University Press, Cambridge, 2005)zbMATHGoogle Scholar
  5. H.B. Demuth, M.H. Beale, O. De Jess, M.T. Hagan, Neural network design (Martin Hagan, 2014)Google Scholar
  6. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification (Wiley-Interscience, 2001). ISBN 0-471-05669-3Google Scholar
  7. S.B. Green, N.J. Salkind, Using SPSS for the Windows and Macintosh: Analyzing and Understanding Data (Prentice Hall, 2003)Google Scholar
  8. M. Greenacre, Correspondence Analysis in Practice (CRC Press, 2017)Google Scholar
  9. P.D. Grünwald, The Minimum Description Length Principle (MIT Press, 2007)Google Scholar
  10. J.F. Hair, W.C. Black, B.J. Babin, R.E. Anderson, Multivariate Data Analysis, 7th edn. (Prentice Hall, 2010). ISBN-10: 0-13-813263-1Google Scholar
  11. J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, 3rd edn. (Elsevier, 2011). ISBN: 978-9380931913Google Scholar
  12. S.S. Haykin, Neural Networks, 2nd edn. (Prentice Hall, 1999). ISBN: 0132733501Google Scholar
  13. A.Z. Israëls, Eigenvalue Techniques for Qualitative Data (Leiden, DSWO Press, 1987)Google Scholar
  14. J. Kemeny, L. Snell, Mathematical Models in Social Sciences (New-York, Blaisdell, 1962)Google Scholar
  15. M.G. Kendall, A. Stewart, Advanced Statistics: Inference and Relationship, 2nd edn. (Griffin, London, 1967)Google Scholar
  16. L. Lebart, A. Morineau, M. Piron, Statistique Exploratoire Multidimensionelle (Dunod, Paris, 1995). ISBN 2-10-002886-3zbMATHGoogle Scholar
  17. C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrival (Cambridge University Press, Cambridge, 2008)CrossRefGoogle Scholar
  18. B. Mirkin, Group Choice (Halsted Press, Washington, DC, 1979)Google Scholar
  19. B. Mirkin, Grouping in Socio-Economic Research (Finansy i Statistika Publishers, Moscow, Russia, 1985)Google Scholar
  20. B. Mirkin, Mathematical Classification and Clustering (Kluwer, AP, Dordrecht, 1996)Google Scholar
  21. B. Mirkin, Clustering: A Data Recovery Approach (Chapman & Hall/CRC, 2012). ISBN 978-1-4398-3841-9Google Scholar
  22. F. Murtagh, Correspondence Analysis and Data Coding with Java and R (Chapman & Hall/CRC, Boca Raton, FL, 2005)CrossRefGoogle Scholar
  23. T.M. Mitchell Machine Learning (McGraw Hill, 2010)Google Scholar
  24. S. Nishisato, Elements of Dual Scaling: An introduction to practical data analysis (Psychology Press, 2014)Google Scholar
  25. B. Polyak, Introduction to Optimization (Optimization Software, Los Angeles, 1987). ISBN 0911575146zbMATHGoogle Scholar
  26. J.R. Quinlan, C4. 5: Programs for Machine Learning (Morgan Kaufmann, San Mateo, 1993)Google Scholar
  27. B. Schölkopf, A.J. Smola, Learning with Kernels (The MIT Press, 2005)Google Scholar
  28. V. Vapnik, Estimation of Dependences Based on Empirical Data, 2nd edn. (Springer Science + Business Media Inc., 2006)Google Scholar
  29. A. Webb, Statistical Pattern Recognition (Wiley, 2002). ISBN-0-470-84514-7Google Scholar


  1. L. Breiman, Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  2. J. Bring, How to standardize regression coefficients. Am. Stat. 48(3), 209–213 (1994)Google Scholar
  3. J. Carpenter, J. Bithell, Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat. Med. 19, 1141–1163 (2000)CrossRefGoogle Scholar
  4. H.E. Daniels, The relation between measures of correlation in the universe of sample permutations. Biometrika, 33(2), 129–135 (1944)MathSciNetCrossRefGoogle Scholar
  5. F. Esposito, D. Malerba, G. Semeraro, A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 476–491 (1997)CrossRefGoogle Scholar
  6. T. Fawcett, An introduction to ROC analysis. Pattern Recogn. Lett. 27, 861–873 (2006)CrossRefGoogle Scholar
  7. D.H. Fisher, Knowledge acquisition via incremental conceptual clustering. Mach. Learn. 2, 139–173 (1987)Google Scholar
  8. P.J.F. Groenen, G. Nalbantov, J.C. Bioch, SVM-Maj: a majorization approach to linear support vector machines with different hinge errors. Adv. Data Anal. Classif. 2(1), 17–43 (2008)MathSciNetCrossRefGoogle Scholar
  9. J.G. Kemeny, Mathematics without numbers. Daedalus 88(4), 577–591 (1959)Google Scholar
  10. L. Lebart, B.G. Mirkin, Correspondence analysis and classification, in Multivariate Analysis: Future Directions, vol. 2 ed. by C. Cuadras, C.R. Rao (North Holland, 1993), pp. 341–357Google Scholar
  11. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436 (2015)CrossRefGoogle Scholar
  12. W.Y. Loh, Y.S. Shih, Split selection methods for classification trees. Stat. Sin. 815–840 (1997)Google Scholar
  13. E. Lombardo, F. Beh, P. Kroonenberg, Modelling trends in ordered correspondence analysis using orthogonal polynomials. Psychometrika 81(325–349), 2016 (2016)MathSciNetzbMATHGoogle Scholar
  14. G. Louppe, L. Wehenkel, A. Sutera, P. Geurts, Understanding variable importances in forests of randomized trees. in Advances in Neural Information Processing Systems (NIPS), (2013), pp. 431–439Google Scholar
  15. M. Meilă, Comparing clusterings—an information based distance, J. Multivar. Anal. 98(5), 873–895 (2007)MathSciNetCrossRefGoogle Scholar
  16. B. Mirkin, L. Cherny, Some properties of the partition space, in K. Bagrinovsky, E. Berland (Eds.), Math. Anal. Econ. Models III, Institute of Economics of the Siberian Branch of the USSR’s Academy of the Sciences, Novosibirsk, 126–147 (1972)Google Scholar
  17. B. Mirkin, Eleven ways to look at the chi-squared coefficient for contingency tables. Am. Stat. 55(2), 111–120 (2001)MathSciNetCrossRefGoogle Scholar
  18. B. Mirkin, T.I. Fenner Tied rankings, ordered partitions, and weak orders: Distance and consensus. J. Classif. 36(2), (2019)Google Scholar
  19. J.N. Morgan, J.A. Sonquist, Problems in the analysis of survey data, and a proposal. J. Am. Stat. Assoc. 58, 415–435 (1963)CrossRefGoogle Scholar
  20. I. Morlini, S. Zani, A new class of weighted similarity indices using polytomous variables. J. Classif. 29(2), 199–226 (2012)MathSciNetCrossRefGoogle Scholar
  21. K. Pearson, On a criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen in random sampling. Phil. Mag. 50, 157–175 (1900)CrossRefGoogle Scholar
  22. J. Schmidhuber, Deep learning in neural networks: an overview. Neural Netw 61, 85–117 (2015)CrossRefGoogle Scholar
  23. K. Steele, H.O. Stefánsson, Decision Theory, The Stanford Encyclopedia of Philosophy (Winter 2015 edn.), Edward N. Zalta (Ed.),
  24. N.G. Waller, J.A. Jones, Correlation weights in multiple regression. Psychometrika 75(1), 58–69 (2010)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Data Analysis and Artificial Intelligence, Faculty of Computer ScienceNational Research University Higher School of EconomicsMoscowRussia
  2. 2.Professor Emeritus, Department of Computer Science and Information SystemsBirkbeck University of LondonLondonUK

Personalised recommendations