Learning Multivariate Correlations in Data

  • Boris Mirkin
Part of the Undergraduate Topics in Computer Science book series (UTICS)


After a short introduction of the general concept of decision rule to relate input and target features, this chapter describes most popular methods for decision rule building. Two of them pertain to quantitative targets (linear regression, neural network), and four to categorical ones (linear discrimination, support vector machine, naïve Bayes classifier and classification tree). Of these, classification trees are treated in a most detailed way including a number of theoretical results that are not well known. These establish firm relations between popular scoring functions and, first, bivariate measures described in Chapter 3– Quetelet indexes in contingency tables, first of all – and, second, normalization options for dummy variables representing target categories. Some related concepts such as Bayes decision rule, bag-of-word model in text analysis, VC-complexity and kernel for non-linear classification are introduced too.


Support Vector Machine Decision Rule Hide Neuron Information Gain Classification Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadswarth, Belmont, CA (1984)MATHGoogle Scholar
  2. Bring, J.: How to standardize regression coefficients. Am. Stat. 48(3), 209–213 (1994)Google Scholar
  3. Cybenko, G.: Approximation by superposition of sigmoidal functions. Math. Control Signals Systems 2(4), 303–314 (1989)Google Scholar
  4. Duda, R.O., Hart, P.E., Stork D.G.: Pattern Classification. Wiley-Interscience, New York, NY, ISBN 0-471-05669-3 (2001)MATHGoogle Scholar
  5. Esposito, F., Malerba, D., Semeraro, G.: A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal.Mach. Intell. 19(5), 476–491 (1997)CrossRefGoogle Scholar
  6. Fawcett, T.: An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874 (2006)CrossRefGoogle Scholar
  7. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annu. Eugen. 7, Part II, 179–188 (1936); also in “Contributions to Mathematical Statistics” (Wiley, New York, NY, 1950)Google Scholar
  8. Fisher, D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172 (1987)Google Scholar
  9. Green, S.B., Salkind, N.J.: Using SPSS for the Windows and Mackintosh: Analyzing and Understanding Data. Prentice Hall, Upper Saddle River, NJ (2003)Google Scholar
  10. Groenen, P.J.F., Nalbantov, G.I., Bioch, J.C.: SVM-Maj: A majorization approach to linear support vector machines with different hinge errors. Adv. Data Anal. Classification 2(1), 17–43 (2008)Google Scholar
  11. Grünwald, P.D.: The Minimum Description Length Principle, MIT Press (2007)Google Scholar
  12. Haykin, S. S.: Neural Networks, 2nd edn. Prentice Hall, ISBN 0132733501 (1999)Google Scholar
  13. Loh, W.-Y., Shih, Y.-S.: Split selection methods for classification trees. Stat. Sin. 7, 815–840 (1997)Google Scholar
  14. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge, England (2008)MATHCrossRefGoogle Scholar
  15. Mirkin, B.: Methods for Grouping in Socioeconomic Research. Finansy i Statistika Publishers, Moscow (in Russian) (1985)Google Scholar
  16. Mirkin, B.: Mathematical Classification and Clustering. Kluwer Academic Press, Dordrecht/Boston (1996)Google Scholar
  17. Mirkin, B.: Clustering for Data Mining: A Data Recovery Approach. Chapman & Hall/CRC, London, ISBN 1-58488-534-3 (2005)MATHCrossRefGoogle Scholar
  18. Mitchell, T.M.: Machine Learning. McGraw Hill, New York, NY (2010)Google Scholar
  19. Polyak, B.: Introduction to Optimization. Optimization Software, Los Angeles, CA, ISBN: 0911575146 (1987)Google Scholar
  20. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)Google Scholar
  21. Schölkopf, B., A.J. Smola, A.J.: (2005) Learning with Kernels. The MIT Press, Cambridge, MA (2005)Google Scholar
  22. Vapnik, V.: Estimation of Dependences Based on Empirical Data, 2d edn. Springer Science + Business Media Inc. (2006)Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  • Boris Mirkin
    • 1
    • 2
  1. 1.Research University – Higher School of Economics, School of Applied Mathematics and InformaticsMoscowRussia
  2. 2.Department of Computer ScienceBirkbeck University of LondonLondonUK

Personalised recommendations