The Progressive Single Linkage Algorithm Based on Minkowski Ultrametrics

Scippacercola, Sergio

doi:10.1007/978-3-642-03739-9_7

Sergio Scippacercola⁴

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

1517 Accesses

Abstract

This paper focuses on the problem to find an ultrametric whose distortion is close to optimal. We introduce the Minkowski ultrametric distances of the n statistical units obtained by a hierarchical Cluster method (single linkage). We consider the distortion matrix which measures the difference between the initial dissimilarity and the ultrametric approximation. We propose an algorithm which by the application of the Minkowski ultrametrics reaches a minimum approximation. The convergence of the algorithm allows us to identify when the ultrametric approximation is at the local minimum. The validity of the algorithm is confirmed by its application to sets of real data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This indicator shows the number of rooms that each person in a household has in his disposal by tenure status of the household.
2.
At current prices (% of total household consumption expenditure). Household final at current prices consumption expenditure consists of the expenditure, including imputed expenditure, incurred by resident households on individual consumption goods and services, including those sold at prices that are not economically significant.
3.
The indicator gives the change in percentage from one year to another of the total number of employed persons on the economic territory of the country or the geographical area.
4.
Apparent human consumption per capita is obtained by dividing human consumption by the number of inhabitants (resident population stated in official statistics as at 30 June).
5.
A data set with 150 random samples of flowers from the iris species setosa, versicolor, and virginica collected by Anderson (1935). From each species there are 50 observations for sepal length, sepal width, petal length, and petal width in centimeter.

References

Anderson, E. (1935). The irises of the Gaspé peninsula. Bulletin of the American Iris Society, 59, 2–5.
Google Scholar
Bădoiu, M., Chuzhoy, J., Indyk, P., & Sidiropoulos, A. (2006). Embedding ultrametrics into low-dimensional spaces. In Proceedings of twenty-second annual symposium on Computational Geometry SCG’06 (pp. 187–196), Sedona, AZ: ACM Press.
Chapter Google Scholar
Bock, H. H. (1996). Probabilistic models in cluster analysis. Computational Statistics and Data Analysis, 23(1), 6–28.
Article Google Scholar
Borg, I., & Lingoes, J. (1987). Multidimensional similarity structure analysis. Berlin: Springer.
Google Scholar
Chandon, J. L., Lemaire, J., & Pouget, J. (1980). Construction de l’ultrametrique la plus proche d’une dissimilarité au sens des moindres carrés. R.A.I.R.O. Recherche Operationelle, 14, 157–170.
Google Scholar
De Soëte, G. (1988). Tree representations of proximity data by least squares methods. In H. H. Bock (Ed.), Classification and related methods of data analysis (pp. 147–156). Amsterdam: North Holland.
Google Scholar
Eurostat. (n.d.). General and regional statistics. http://epp.eurostat.ec.europa.eu
Gordon, A. D. (1996). A survey of constrained classification. Computational Statistics and Data Analysis, 21(1), 17–29.
Article MATH MathSciNet Google Scholar
Gower, J. C., & Ross, J. S. (1969). Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18, 54–64.
Article MathSciNet Google Scholar
Hardy, G. H., Littlewood, J. E., & Polya, G. (1964). Inequalities. Cambridge: Cambridge University Press.
Google Scholar
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A Review. ACM Computing Survey, 31(3), 264–323.
Article Google Scholar
Kruskal, J. B. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the Mathematical Society, 7, 48–50.
Article MathSciNet Google Scholar
Mardia, K. V., Kent, J. T., & Bibby, J. M. (1989). Multivariate analysis. New York: Academic.
Google Scholar
Prim, R. C. (1957). Shortest connection network and some generalizations. Bell System Technical Journal, 36, 1389–1401.
Google Scholar
Rizzi, A. (1985). Analisi dei dati. Rome: La Nuova Italia Scientifica.
Google Scholar
Scippacercola, S. (2003). Evaluation of clusters stability based on minkowski ultrametrics. Statistica Applicata – Italian Journal of Applied Statistics, 15(4), 483–489.
Google Scholar
Scozzafava, P. (1995). Ultrametric spaces in statistics. In A. Rizzi (Ed.), Some relations between matrices and structures of multidimensional data analysis. Pisa: Giardini.
Google Scholar
Sebert, D. M., Montgomery, D. C., & Rollier, D. A. (1998). A clustering algorithm for identifying multiple outliers in linear regression. Computational Statistics and Data Analysis, 27(4), 461–484.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Matematica e Statistica, Università degli studi di Napoli Federico II, Via Cinthia, 80126, Napoli, Italy
Sergio Scippacercola

Authors

Sergio Scippacercola
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sergio Scippacercola .

Editor information

Editors and Affiliations

Fac. Economia, Università Macerata, Via Crescimbeni 20, Macerata, 62100, Italy
Francesco Palumbo
Dipto. Matematica e Statistica, Università Federico II di Napoli, Via Cinthia (Monte S. Angelo), Napoli, 80126, Italy
Carlo Natale Lauro
Depto. Economía y Empresa, Universitat Pompeu Fabra, Ramon Trias Fargas 25-27, Barcelona, 08005, Spain
Michael J. Greenacre

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Scippacercola, S. (2010). The Progressive Single Linkage Algorithm Based on Minkowski Ultrametrics. In: Palumbo, F., Lauro, C., Greenacre, M. (eds) Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03739-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-03739-9_7
Published: 25 November 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03738-2
Online ISBN: 978-3-642-03739-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics