Skip to main content

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

  • 2359 Accesses

Abstract

Before going to the thick of the multivariate summarization, this chapter first considers the concept of feature and its summarizations into histograms, density functions and centers. Two perspectives are defined, the probabilistic and vector-space ones, for defining concepts of feature centers and spreads. Also, current views on the types of measurement scales are described to conclude that the binary scales are both quantitative and categorical. The core of the Chapter describes the method of principal components (PCA) as a method for fitting a data-driven data summarization model. The model proposes that the data entries, up to the errors, are (sums of) products of hidden factor scores and feature loadings. This, together with the least-squares fitting criterion, appears to be equivalent to finding what is known in mathematics as part of the singular value decomposition (SVD) of a rectangular matrix. Three applications of the method are described: (1) scoring hidden aggregate factors, (2) visualization of the data, and (3) Latent Semantic Indexing. The conventional, and equivalent, formulation of PCA via covariance matrices involving their eigenvalues is also described. The main difference between the two formulations is that the property of principal components to be linear combinations of features is postulated in the conventional approach and derived in that SVD based. The issue of interpretation of the results is discussed, too. A novel promising approach based on a postulated linear model of stratification is presented via a project. The issue of data standardization in data summarization problems, remaining unsolved, is discussed at length in the beginning. A powerful application using eigenvectors for scoring node importance in networks and pair comparison matrices, the Google PageRank approach, is described too.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap (CRC Press, 1994)

    Google Scholar 

  • T.K. Landauer, Latent Semantic Analysis (Wiley, Hoboken, 2006)

    Book  Google Scholar 

  • R.D. Luce, Utility of Gains and Losses: Measurement-theoretical and Experimental Approaches (Psychology Press, 2014)

    Google Scholar 

  • C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, Cambridge, 2008)

    Book  Google Scholar 

  • B. Mirkin, (1979) Group Choice (Winston and Sons, 1979). A division of Scripta Technica (English translation from Russian, Group Choice Problems, 1974)

    Google Scholar 

  • B. Mirkin, Mathematical Classification and Clustering (Kluwer Academic Press, 1996)

    Google Scholar 

  • B. Mirkin, Clustering: A Data Recovery Approach (Chapman & Hall/CRC, Boca Raton, 2012)

    Book  Google Scholar 

  • R. Tibshirani, M. Wainwright, T. Hastie, Statistical Learning with Sparsity: The Lasso and Generalizations (Chapman and Hall/CRC, Boca Raton, 2015)

    MATH  Google Scholar 

Articles

  • E. Andersson, P.A. Ekström, Investigating Google’s pagerank algorithm. A Tech. Rep. Sci. Comput. (2004)

    Google Scholar 

  • J. Carpenter, J. Bithell, Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat. Med. 19(9), 1141–1164 (2000)

    Article  Google Scholar 

  • B. Cavallo, L. D’Apuzzo, A general unified framework for pairwise comparison matrices in multicriterial methods. Int. J. Intell. Syst. 24(4), 377–398 (2009)

    Article  Google Scholar 

  • S. Deerwester, S. Dumais, G.W. Furnas, T.K. Landauer, R. Harshman, Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  • H.J. Ferreau, C. Kirches, A. Potschka, H.G. Bock, M. Diehl, qpOASES: A parametric active-set algorithm for quadratic programming. Math. Program. Comput. 6(4), 327–363 (2014)

    Article  MathSciNet  Google Scholar 

  • W.D. Fisher, On grouping for maximum homogeneity. J. Am. Stat. Assoc. 53(284), 789–798 (1958)

    Article  MathSciNet  Google Scholar 

  • M. Franceschet, PageRank: Standing on the shoulders of giants. Commun. ACM 54(6), 92–101 (2011)

    Article  Google Scholar 

  • E.V. Kovaleva, B.G. Mirkin, Bisecting K-means and 1D projection divisive clustering: a unified framework and experimental comparison. J. Classif. 32(3), 414–442 (2015)

    Article  MathSciNet  Google Scholar 

  • D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 556–562 (2001)

    Google Scholar 

  • M.A. Makary, M. Daniel, Medical error—the third leading cause of death in the US. BMJ 353, i2139 (2016)

    Google Scholar 

  • F. Murtagh, M. Orlov, B. Mirkin, Qualitative judgement of research impact: Domain taxonomy as a fundamental framework for judgement of the quality of research. J. Classif. 35(1), 5–28 (2018)

    Article  MathSciNet  Google Scholar 

  • M. Orlov, B. Mirkin, A concept of multicriteria stratification: a definition and solution. Procedia Comput. Sci. 31, 273–280 (2014)

    Article  Google Scholar 

  • L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: bringing order to the web. Stanford InfoLab Technical Report (1999)

    Google Scholar 

  • V. Podinovski, O.V. Podinovskaya, Criteria importance theory for decision making problems with a hierarchical criterion structure, Moscow. HSE Working Paper WP7/2014/04 (2014)

    Google Scholar 

  • T.L. Saaty, How to make a decision: the analytic hierarchy process. Eur. J. Oper. Res. 48(1), 9–26 (1990)

    Article  MathSciNet  Google Scholar 

  • C. Wang, D.M. Blei, Collaborative topic modeling for recommending scientific articles, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2011), 448–456

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boris Mirkin .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mirkin, B. (2019). Quantitative Summarization. In: Core Data Analysis: Summarization, Correlation, and Visualization. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-00271-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00271-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00270-1

  • Online ISBN: 978-3-030-00271-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics