Skip to main content

The Progressive Single Linkage Algorithm Based on Minkowski Ultrametrics

  • Conference paper
  • First Online:
Data Analysis and Classification
  • 1517 Accesses

Abstract

This paper focuses on the problem to find an ultrametric whose distortion is close to optimal. We introduce the Minkowski ultrametric distances of the n statistical units obtained by a hierarchical Cluster method (single linkage). We consider the distortion matrix which measures the difference between the initial dissimilarity and the ultrametric approximation. We propose an algorithm which by the application of the Minkowski ultrametrics reaches a minimum approximation. The convergence of the algorithm allows us to identify when the ultrametric approximation is at the local minimum. The validity of the algorithm is confirmed by its application to sets of real data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This indicator shows the number of rooms that each person in a household has in his disposal by tenure status of the household.

  2. 2.

    At current prices (% of total household consumption expenditure). Household final at current prices consumption expenditure consists of the expenditure, including imputed expenditure, incurred by resident households on individual consumption goods and services, including those sold at prices that are not economically significant.

  3. 3.

    The indicator gives the change in percentage from one year to another of the total number of employed persons on the economic territory of the country or the geographical area.

  4. 4.

    Apparent human consumption per capita is obtained by dividing human consumption by the number of inhabitants (resident population stated in official statistics as at 30 June).

  5. 5.

    A data set with 150 random samples of flowers from the iris species setosa, versicolor, and virginica collected by Anderson (1935). From each species there are 50 observations for sepal length, sepal width, petal length, and petal width in centimeter.

References

  • Anderson, E. (1935). The irises of the Gaspé peninsula. Bulletin of the American Iris Society, 59, 2–5.

    Google Scholar 

  • Bădoiu, M., Chuzhoy, J., Indyk, P., & Sidiropoulos, A. (2006). Embedding ultrametrics into low-dimensional spaces. In Proceedings of twenty-second annual symposium on Computational Geometry SCG’06 (pp. 187–196), Sedona, AZ: ACM Press.

    Chapter  Google Scholar 

  • Bock, H. H. (1996). Probabilistic models in cluster analysis. Computational Statistics and Data Analysis, 23(1), 6–28.

    Article  Google Scholar 

  • Borg, I., & Lingoes, J. (1987). Multidimensional similarity structure analysis. Berlin: Springer.

    Google Scholar 

  • Chandon, J. L., Lemaire, J., & Pouget, J. (1980). Construction de l’ultrametrique la plus proche d’une dissimilarité au sens des moindres carrés. R.A.I.R.O. Recherche Operationelle, 14, 157–170.

    Google Scholar 

  • De Soëte, G. (1988). Tree representations of proximity data by least squares methods. In H. H. Bock (Ed.), Classification and related methods of data analysis (pp. 147–156). Amsterdam: North Holland.

    Google Scholar 

  • Eurostat. (n.d.). General and regional statistics. http://epp.eurostat.ec.europa.eu

  • Gordon, A. D. (1996). A survey of constrained classification. Computational Statistics and Data Analysis, 21(1), 17–29.

    Article  MATH  MathSciNet  Google Scholar 

  • Gower, J. C., & Ross, J. S. (1969). Minimum spanning trees and single linkage cluster analysis. Applied Statistics, 18, 54–64.

    Article  MathSciNet  Google Scholar 

  • Hardy, G. H., Littlewood, J. E., & Polya, G. (1964). Inequalities. Cambridge: Cambridge University Press.

    Google Scholar 

  • Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A Review. ACM Computing Survey, 31(3), 264–323.

    Article  Google Scholar 

  • Kruskal, J. B. (1956). On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the Mathematical Society, 7, 48–50.

    Article  MathSciNet  Google Scholar 

  • Mardia, K. V., Kent, J. T., & Bibby, J. M. (1989). Multivariate analysis. New York: Academic.

    Google Scholar 

  • Prim, R. C. (1957). Shortest connection network and some generalizations. Bell System Technical Journal, 36, 1389–1401.

    Google Scholar 

  • Rizzi, A. (1985). Analisi dei dati. Rome: La Nuova Italia Scientifica.

    Google Scholar 

  • Scippacercola, S. (2003). Evaluation of clusters stability based on minkowski ultrametrics. Statistica Applicata – Italian Journal of Applied Statistics, 15(4), 483–489.

    Google Scholar 

  • Scozzafava, P. (1995). Ultrametric spaces in statistics. In A. Rizzi (Ed.), Some relations between matrices and structures of multidimensional data analysis. Pisa: Giardini.

    Google Scholar 

  • Sebert, D. M., Montgomery, D. C., & Rollier, D. A. (1998). A clustering algorithm for identifying multiple outliers in linear regression. Computational Statistics and Data Analysis, 27(4), 461–484.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sergio Scippacercola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Scippacercola, S. (2010). The Progressive Single Linkage Algorithm Based on Minkowski Ultrametrics. In: Palumbo, F., Lauro, C., Greenacre, M. (eds) Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03739-9_7

Download citation

Publish with us

Policies and ethics