Topics in Substance of Data Analysis

  • Boris MirkinEmail author
Part of the Undergraduate Topics in Computer Science book series (UTICS)


This is an introductory chapter in which
  1. (i)

    The goals of core data analysis as a tool helping to enhance and augment knowledge of the domain are outlined. Since knowledge is represented by the concepts and statements of relation between them, two main pathways for data analysis are summarization, for developing and augmenting concepts, and correlation, for enhancing and establishing relations.

  2. (ii)

    A set of eight cases involving small datasets and related data analysis problems is presented. The datasets are taken from various fields such as monitoring market towns, computer security protocols, bioinformatics, and cognitive psychology.

  3. (iii)

    An overview of data visualization, its goals and some techniques, is given.

  4. (iv)

    A general view of strengths and pitfalls of data analysis is provided.

  5. (v)

    An overview of the concept of classification as a soft knowledge structure widely used in theory and practice is given.



  1. M. Berthold, D. Hand, Intelligent Data Analysis (Springer, New York, 2003)Google Scholar
  2. S.K. Card, J.D. Mackinlay, B. Shneiderman, Readings in Information Visualization: Using Vision to Think (Morgan Kaufmann Publishers, San Francisco, CA, 1999). ISBN 1-55860-533-9Google Scholar
  3. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd edn. (Wiley-Interscience, 2012). ISBN 0-471-05669-3Google Scholar
  4. J.F. Hair, W.C. Black, B.J. Babin, R.E. Anderson, Multivariate Data Analysis, 7th edn. (Prentice Hall, 2010). ISBN-10: 0-13-813263-1Google Scholar
  5. S.S. Haykin, Neural Networks, 2nd edn. (Prentice Hall, 1999). ISBN 0132733501Google Scholar
  6. C. Kendig (ed.), Natural Kinds and Classification in Scientific Practice (Routledge, Oxford, 2015). ISBN 9781848935402Google Scholar
  7. J. Kepler, Harmonies of the World (originally 1619, translated to English by C.G. Wallis, 1939) (Global Grey Publisher, 2014). (E-publication Accessed 11 June 2017)
  8. E.V. Koonin, The Logic of Chance: the Nature and Origin of Biological Evolution (FT Press Science, 2011)Google Scholar
  9. S.D. Levitt, S.J. Dubner, Freakonomics (William Morrow, New York, 2005). See also a free extension in about 300 episodes in voice and print: Accessed 17 June 2017
  10. K. Libbrecht, The Snowflake: Winter’s Secret Beauty (Voyageur Press, 2004)Google Scholar
  11. R. Mazza, Introduction to Information Visualization (Springer, New York, 2009). ISBN: 978-1-84800-218-0Google Scholar
  12. B. Mirkin, Methods for Grouping in SocioEconomic Research (Finansy I Statistika Publishers, Moscow, 1985). (in Russian)Google Scholar
  13. B. Mirkin, Mathematical Classification and Clustering (Kluwer Academic Press, 1996)Google Scholar
  14. B. Mirkin, Clustering: a Data Recovery Approach (Chapman & Hall/CRC, 2012). ISBN: 1-4398-3841-9Google Scholar
  15. S.R. Ranganathan, Colon Classification (Ess Ess Publications, 2006). ISBN-10: 8170004608Google Scholar
  16. H.H. Sisler, Electronic Structure, Properties, and the Periodic Law (Van Nostrand Reinhold Company, 1973)Google Scholar
  17. G. Standing, The Precariat—the New Dangerous Class (Bloomsbury Academic, London, 2011)Google Scholar
  18. J.W. Tukey, Exploratory Data Analysis (Addison-Wesley, Reading MA, 1977)zbMATHGoogle Scholar
  19. C. Ware, Information Visualization: Perception for Design, 3rd edn. (Elsevier, 2012). ISBN: 978-0-12-381464-7Google Scholar


  1. J.G. Adair, The Hawthorne effect: a reconsideration of the methodological artifact. J. Appl. Psychol. 69(2), 334–345 (1984)CrossRefGoogle Scholar
  2. H. Brody, M.R. Rip, P. Vinten-Johansen, N. Paneth, S. Rachman, Map-making and myth-making in Broad Street: the London cholera epidemic, 1854. Lancet 356(9223), 64–68 (2000)CrossRefGoogle Scholar
  3. W.S. Cleveland, Graphical methods for data presentation: full scale breaks, dot charts, and multibased logging. Am. Stat. 38, 270–280 (1984)Google Scholar
  4. R. Fisher, The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)CrossRefGoogle Scholar
  5. S. Henikoff, J. Henikoff, Amino acid substitution matrices from protein blocks. PNAS USA 89(22), 10915–10919 (1992)CrossRefGoogle Scholar
  6. M.I. Jordan, T.M. Mitchell, Machine learning: trends, perspectives, prospects. Science 349(6245), 255–260 (2015)MathSciNetCrossRefGoogle Scholar
  7. G. Keren, S. Baggen, Recognition models of alphanumeric characters, Perception and Psychophysics, 29(3), 234–246 (1981)CrossRefGoogle Scholar
  8. Y. LeCun, Obstacles on the path to AI (2015), Accessed 6 Feb 2018
  9. B. Lee, N.H. Riche, P. Isenberg, S. Carpendale, More than telling a story: a closer look at the process of transforming data into visually shared stories. IEEE Comput. Graph. Appl. 35(5), 84–90 (2015)CrossRefGoogle Scholar
  10. B. Mirkin, Summary and semi-average similarity criteria for individual clusters, in Models, Algorithms, and Technologies for Network Analysis, ed. by B. Goldengorin, V. Kalyagin, P. Pardalos (Springer, New York, 2013), pp. 101–126CrossRefGoogle Scholar
  11. Y. Qin, H.A. Simon, Laboratory replication of scientific discovery processes. Cogn. Sci. 14(2), 281–312 (1990)CrossRefGoogle Scholar
  12. R. Rao, S.K.Card, The table lens: merging graphical and symbolic representations in an interactive focus+ context visualization for tabular information. In Proceedings of the ACM SIGCHI conference on Human factors in computing systems, pp. 318–322 (1994)Google Scholar
  13. S. Roberts, J. Winters, Linguistic diversity and traffic accidents: lessons from statistical studies of cultural traits. PLoS ONE 8(8), e70902 (2013)CrossRefGoogle Scholar
  14. M. Savage, F. Devine, N. Cunningham, M. Taylor, Y. Li, J. Hjellbrekke, B. Le Roux, S. Friedman, A. Miles, A new model of social class? Findings from the BBC’s Great British class survey experiment. Sociology 47(2), 219–250 (2013)CrossRefGoogle Scholar
  15. E. Segel, J. Heer, Narrative visualization: telling stories with data. IEEE Trans. Vis. Comput. Graph. 16(6), 1139–1148 (2010)CrossRefGoogle Scholar
  16. H. Wainer, H.L. Zwerling, Evidence that smaller schools do not improve student achievement. Phi Delta Kappan 88(4), 300–303 (2006)CrossRefGoogle Scholar
  17. C.R. Woese, Bacterial evolution. Microbiol. Rev. 51(2), 221 (1987)Google Scholar
  18. G.U. Yule, Notes on the theory of association of attributes in statistics. Biometrika 2(2), 121–134 (1903)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Data Analysis and Artificial Intelligence, Faculty of Computer ScienceNational Research University Higher School of EconomicsMoscowRussia
  2. 2.Professor Emeritus, Department of Computer Science and Information SystemsBirkbeck University of LondonLondonUK

Personalised recommendations