Abstract
This paper focuses on the problem to find an ultrametric whose distortion is close to optimal. We introduce the Minkowski ultrametric distances of the n statistical units obtained by a hierarchical Cluster method (single linkage). We consider the distortion matrix which measures the difference between the initial dissimilarity and the ultrametric approximation. We propose an algorithm which by the application of the Minkowski ultrametrics reaches a minimum approximation. The convergence of the algorithm allows us to identify when the ultrametric approximation is at the local minimum. The validity of the algorithm is confirmed by its application to sets of real data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This indicator shows the number of rooms that each person in a household has in his disposal by tenure status of the household.
- 2.
At current prices (% of total household consumption expenditure). Household final at current prices consumption expenditure consists of the expenditure, including imputed expenditure, incurred by resident households on individual consumption goods and services, including those sold at prices that are not economically significant.
- 3.
The indicator gives the change in percentage from one year to another of the total number of employed persons on the economic territory of the country or the geographical area.
- 4.
Apparent human consumption per capita is obtained by dividing human consumption by the number of inhabitants (resident population stated in official statistics as at 30 June).
- 5.
A data set with 150 random samples of flowers from the iris species setosa, versicolor, and virginica collected by Anderson (1935). From each species there are 50 observations for sepal length, sepal width, petal length, and petal width in centimeter.
References
Anderson, E. (1935). The irises of the Gaspé peninsula. Bulletin of the American Iris Society, 59, 2–5.
Bădoiu, M., Chuzhoy, J., Indyk, P., & Sidiropoulos, A. (2006). Embedding ultrametrics into low-dimensional spaces. In Proceedings of twenty-second annual symposium on Computational Geometry SCG’06 (pp. 187–196), Sedona, AZ: ACM Press.
Bock, H. H. (1996). Probabilistic models in cluster analysis. Computational Statistics and Data Analysis, 23(1), 6–28.
Borg, I., & Lingoes, J. (1987). Multidimensional similarity structure analysis. Berlin: Springer.
Chandon, J. L., Lemaire, J., & Pouget, J. (1980). Construction de l’ultrametrique la plus proche d’une dissimilarité au sens des moindres carrés. R.A.I.R.O. Recherche Operationelle, 14, 157–170.
De Soëte, G. (1988). Tree representations of proximity data by least squares methods. In H. H. Bock (Ed.), Classification and related methods of data analysis (pp. 147–156). Amsterdam: North Holland.
Eurostat. (n.d.). General and regional statistics. http://epp.eurostat.ec.europa.eu
Gordon, A. D. (1996). A survey of constrained classification. Computational Statistics and Data Analysis, 21(1), 17–29.
Gower, J. C., & Ross, J. S. (1969). Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18, 54–64.
Hardy, G. H., Littlewood, J. E., & Polya, G. (1964). Inequalities. Cambridge: Cambridge University Press.
Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A Review. ACM Computing Survey, 31(3), 264–323.
Kruskal, J. B. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the Mathematical Society, 7, 48–50.
Mardia, K. V., Kent, J. T., & Bibby, J. M. (1989). Multivariate analysis. New York: Academic.
Prim, R. C. (1957). Shortest connection network and some generalizations. Bell System Technical Journal, 36, 1389–1401.
Rizzi, A. (1985). Analisi dei dati. Rome: La Nuova Italia Scientifica.
Scippacercola, S. (2003). Evaluation of clusters stability based on minkowski ultrametrics. Statistica Applicata – Italian Journal of Applied Statistics, 15(4), 483–489.
Scozzafava, P. (1995). Ultrametric spaces in statistics. In A. Rizzi (Ed.), Some relations between matrices and structures of multidimensional data analysis. Pisa: Giardini.
Sebert, D. M., Montgomery, D. C., & Rollier, D. A. (1998). A clustering algorithm for identifying multiple outliers in linear regression. Computational Statistics and Data Analysis, 27(4), 461–484.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Scippacercola, S. (2010). The Progressive Single Linkage Algorithm Based on Minkowski Ultrametrics. In: Palumbo, F., Lauro, C., Greenacre, M. (eds) Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03739-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-03739-9_7
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03738-2
Online ISBN: 978-3-642-03739-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)