A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data

Irpino, Antonio; Verde, Rosanna

doi:10.1007/3-540-34416-0_20

Antonio Irpino²² &
Rosanna Verde²²

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2777 Accesses
28 Citations

Abstract

Symbolic Data Analysis (SDA) aims to to describe and analyze complex and structured data extracted, for example, from large databases. Such data, which can be expressed as concepts, are modeled by symbolic objects described by multivalued variables. In the present paper we present a new distance, based on the Wasserstein metric, in order to cluster a set of data described by distributions with finite continue support, or, as called in SDA, by “histograms”. The proposed distance permits us to define a measure of inertia of data with respect to a barycenter that satisfies the Huygens theorem of decomposition of inertia. We propose to use this measure for an agglomerative hierarchical clustering of histogram data based on the Ward criterion. An application to real data validates the procedure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

AITCHISON, J. (1986): The Statistical Analysis of Compositional Data, New York: Chapman Hall.
MATH Google Scholar
BOCK, H.H. and DIDAY, E. (2000): Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data, Studies in Classification, Data Analysis and Knowledge Organisation, Springer-Verlag.
Google Scholar
BILLARD, L., DIDAY, E. (2003): From the Statistics of Data to the Statistics of Knowledge: Symbolic Data Analysis Journal of the American Statistical Association, 98, 462, 470–487.
Article MathSciNet Google Scholar
CHAVENT, M., DE CARVALHO, F.A.T., LECHEVALLIER, Y., and VERDE, R. (2003): Trois nouvelles méthodes de classification automatique des données symbolique de type intervalle, Revue de Statistique Appliquée, LI, 4, 5–29.
Google Scholar
GIBBS, A.L. and SU, F.E. (2002): On choosing and bounding probability metrics, International Statistical Review, 70, 419.
Article MATH Google Scholar
IRPINO, A. and VERDE, R.(2005): A New Distance for Symbolic Data Clustering, CLADAG 2005, Book of short papers, MUP, 393–396.
Google Scholar
MALLOWS, C. L. (1972): A note on asymptotic joint normality. Annals of Mathematical Statistics, 43(2), 508–515.
MATH MathSciNet Google Scholar
WARD, J.H. (1963): Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, vol. 58, 238–244.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Facoltá di Studi Politici e per l’Alta Formazione Europea e Mediterranea “Jean Monnet”, Seconda Universitá degli Studi di Napoli, Caserta, I-81020, Italy
Antonio Irpino & Rosanna Verde

Authors

Antonio Irpino
View author publications
You can also search for this author in PubMed Google Scholar
Rosanna Verde
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Mathematics, FMF, University of Ljubljana, Jadranska 19, 1000, Ljubljana, Slovenia
Vladimir Batagelj
Institute of Statistics, RWTH Aachen University, 52056, Aachen, Germany
Hans-Hermann Bock
Faculty of Social Sciences, University of Ljubljana, Kardeljeva pl. 5, 1000, Ljubljana, Slovenia
Anuška Ferligoj & Aleš Žiberna &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Irpino, A., Verde, R. (2006). A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data. In: Batagelj, V., Bock, HH., Ferligoj, A., Žiberna, A. (eds) Data Science and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg . https://doi.org/10.1007/3-540-34416-0_20

Download citation

DOI: https://doi.org/10.1007/3-540-34416-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34415-5
Online ISBN: 978-3-540-34416-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics