Abstract
In this paper, we present a new distance for comparing data described by histograms. The distance is a generalization of the classical Mahalanobis distance for data described by correlated variables. We define a way to extend the classical concept of inertia and codeviance from a set of points to a set of data described by histograms. The same results are also presented for data described by continuous density functions (empiric or estimated). An application to real data is performed to illustrate the effects of the new distance using dynamic clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BERTRAND, P. and GOUPIL, F. (2000): Descriptive statistics for symbolic data. In: H.-H. Bock, E. Diday (Eds.): Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, Berlin, 103–124.
BILLARD, L. (2007): Dependencies and Variation Components of Symbolic Interval–Valued Data. In: P. Brito, P. Bertrand, G. Cucumel, F. de Carvalho (Eds.): Selected Contributions in Data Analysis and Classification. Springer, Berlin, 3–12.
BOCK, H.-H. and DIDAY, E. (2000): Analysis of Symbolic Data, Exploratory meth- ods for extracting statistical information from complex data, Studies in Classification, Data Analysis and Knowledge Organisation, Springer-Verlag.
BRITO, P. (2007): On the Analysis of Symbolic Data. In: P. Brito, P. Bertrand, G. Cucumel, F. de Carvalho (Eds.): Selected Contributions in Data Analysis and Classification. Springer, Berlin, 13–22.
CHAVENT, M., DE CARVALHO, F.A.T., LECHEVALLIER, Y., and VERDE, R. (2003): Trois nouvelles méthodes de classification automatique des données symbolique de type intervalle. Revue de Statistique Appliquée, LI, 4, 5–29.
CUESTA-ALBERTOS, J.A., MATRÁN, C., TUERO-DIAZ, A. (1997): Optimal transportation plans and convergence in distribution. Journ. of Multiv. An., 60, 72–83.
DIDAY, E., and SIMON, J.C. (1976): Clustering analysis, In: K. S. Fu (Eds.), Digital Pattern Recognition, Springer Verlag, Heidelberg, 47–94.
DIDAY, E. (1971): Le méthode des nuées dynamiques, Revue de Statistique Appliquée, 19, 2, 19–34.
GIBBS, A.L. and SU, F.E. (2002): On choosing and bounding probability metrics. Intl. Stat. Rev. 7 (3), 419–435.
IRPINO, A., LECHEVALLIER, Y. and VERDE, R. (2006): Dynamic clustering of histograms using Wasserstein metric. In: A. Rizzi, M. Vichi, (Eds.) COMPSTAT 2006. Physica-Verlag, Berlin, 869–876.
IRPINO, A. and ROMANO, E. (2007): Optimal histogram representation of large data sets: Fisher vs piecewise linear approximations. RNTI E-9, 99–110.
IRPINO, A. and VERDE, R. (2006): A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: V. Batagelj, H.-H. Bock, A. Ferligoj, A. Ziberna (Eds.) Data Science and Classification, IFCS 2006. Springer, Berlin, 185–192.
VERDE, R. and IRPINO, A. (2007): Dynamic Clustering of Histogram Data: Using the Right Metric. In: P. Brito, P. Bertrand, G. Cucumel, F. de Carvalho (Eds.): Comparing Histograms Using a Mahalanobis–Wasserstein Distance 89 Selected Contributions in Data Analysis and Classification. Springer, Berlin, 123–134.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Physica-Verlag Heidelberg
About this paper
Cite this paper
Verde, R., Irpino, A. (2008). Comparing Histogram Data Using a Mahalanobis–Wasserstein Distance. In: Brito, P. (eds) COMPSTAT 2008. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2084-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-7908-2084-3_7
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-2083-6
Online ISBN: 978-3-7908-2084-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)