Quantitative Summarization

Mirkin, Boris

doi:10.1007/978-3-030-00271-8_2

Boris Mirkin^11,12

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

2359 Accesses

Abstract

Before going to the thick of the multivariate summarization, this chapter first considers the concept of feature and its summarizations into histograms, density functions and centers. Two perspectives are defined, the probabilistic and vector-space ones, for defining concepts of feature centers and spreads. Also, current views on the types of measurement scales are described to conclude that the binary scales are both quantitative and categorical. The core of the Chapter describes the method of principal components (PCA) as a method for fitting a data-driven data summarization model. The model proposes that the data entries, up to the errors, are (sums of) products of hidden factor scores and feature loadings. This, together with the least-squares fitting criterion, appears to be equivalent to finding what is known in mathematics as part of the singular value decomposition (SVD) of a rectangular matrix. Three applications of the method are described: (1) scoring hidden aggregate factors, (2) visualization of the data, and (3) Latent Semantic Indexing. The conventional, and equivalent, formulation of PCA via covariance matrices involving their eigenvalues is also described. The main difference between the two formulations is that the property of principal components to be linear combinations of features is postulated in the conventional approach and derived in that SVD based. The issue of interpretation of the results is discussed, too. A novel promising approach based on a postulated linear model of stratification is presented via a project. The issue of data standardization in data summarization problems, remaining unsolved, is discussed at length in the beginning. A powerful application using eigenvectors for scoring node importance in networks and pair comparison matrices, the Google PageRank approach, is described too.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

B. Efron, R.J. Tibshirani, An Introduction to the Bootstrap (CRC Press, 1994)
Google Scholar
T.K. Landauer, Latent Semantic Analysis (Wiley, Hoboken, 2006)
Book Google Scholar
R.D. Luce, Utility of Gains and Losses: Measurement-theoretical and Experimental Approaches (Psychology Press, 2014)
Google Scholar
C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval (Cambridge University Press, Cambridge, 2008)
Book Google Scholar
B. Mirkin, (1979) Group Choice (Winston and Sons, 1979). A division of Scripta Technica (English translation from Russian, Group Choice Problems, 1974)
Google Scholar
B. Mirkin, Mathematical Classification and Clustering (Kluwer Academic Press, 1996)
Google Scholar
B. Mirkin, Clustering: A Data Recovery Approach (Chapman & Hall/CRC, Boca Raton, 2012)
Book Google Scholar
R. Tibshirani, M. Wainwright, T. Hastie, Statistical Learning with Sparsity: The Lasso and Generalizations (Chapman and Hall/CRC, Boca Raton, 2015)
MATH Google Scholar

Articles

E. Andersson, P.A. Ekström, Investigating Google’s pagerank algorithm. A Tech. Rep. Sci. Comput. (2004)
Google Scholar
J. Carpenter, J. Bithell, Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Stat. Med. 19(9), 1141–1164 (2000)
Article Google Scholar
B. Cavallo, L. D’Apuzzo, A general unified framework for pairwise comparison matrices in multicriterial methods. Int. J. Intell. Syst. 24(4), 377–398 (2009)
Article Google Scholar
S. Deerwester, S. Dumais, G.W. Furnas, T.K. Landauer, R. Harshman, Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Article Google Scholar
H.J. Ferreau, C. Kirches, A. Potschka, H.G. Bock, M. Diehl, qpOASES: A parametric active-set algorithm for quadratic programming. Math. Program. Comput. 6(4), 327–363 (2014)
Article MathSciNet Google Scholar
W.D. Fisher, On grouping for maximum homogeneity. J. Am. Stat. Assoc. 53(284), 789–798 (1958)
Article MathSciNet Google Scholar
M. Franceschet, PageRank: Standing on the shoulders of giants. Commun. ACM 54(6), 92–101 (2011)
Article Google Scholar
E.V. Kovaleva, B.G. Mirkin, Bisecting K-means and 1D projection divisive clustering: a unified framework and experimental comparison. J. Classif. 32(3), 414–442 (2015)
Article MathSciNet Google Scholar
D.D. Lee, H.S. Seung, Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 556–562 (2001)
Google Scholar
M.A. Makary, M. Daniel, Medical error—the third leading cause of death in the US. BMJ 353, i2139 (2016)
Google Scholar
F. Murtagh, M. Orlov, B. Mirkin, Qualitative judgement of research impact: Domain taxonomy as a fundamental framework for judgement of the quality of research. J. Classif. 35(1), 5–28 (2018)
Article MathSciNet Google Scholar
M. Orlov, B. Mirkin, A concept of multicriteria stratification: a definition and solution. Procedia Comput. Sci. 31, 273–280 (2014)
Article Google Scholar
L. Page, S. Brin, R. Motwani, T. Winograd, The PageRank citation ranking: bringing order to the web. Stanford InfoLab Technical Report (1999)
Google Scholar
V. Podinovski, O.V. Podinovskaya, Criteria importance theory for decision making problems with a hierarchical criterion structure, Moscow. HSE Working Paper WP7/2014/04 (2014)
Google Scholar
T.L. Saaty, How to make a decision: the analytic hierarchy process. Eur. J. Oper. Res. 48(1), 9–26 (1990)
Article MathSciNet Google Scholar
C. Wang, D.M. Blei, Collaborative topic modeling for recommending scientific articles, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2011), 448–456
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Data Analysis and Artificial Intelligence, Faculty of Computer Science, National Research University Higher School of Economics, Moscow, Russia
Boris Mirkin (Professor)
Professor Emeritus, Department of Computer Science and Information Systems, Birkbeck University of London, London, UK
Boris Mirkin (Professor)

Authors

Boris Mirkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boris Mirkin .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mirkin, B. (2019). Quantitative Summarization. In: Core Data Analysis: Summarization, Correlation, and Visualization. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-00271-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-00271-8_2
Published: 14 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00270-1
Online ISBN: 978-3-030-00271-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics