Skip to main content

Comparing Histogram Data Using a Mahalanobis–Wasserstein Distance

  • Conference paper
COMPSTAT 2008

Abstract

In this paper, we present a new distance for comparing data described by histograms. The distance is a generalization of the classical Mahalanobis distance for data described by correlated variables. We define a way to extend the classical concept of inertia and codeviance from a set of points to a set of data described by histograms. The same results are also presented for data described by continuous density functions (empiric or estimated). An application to real data is performed to illustrate the effects of the new distance using dynamic clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • BERTRAND, P. and GOUPIL, F. (2000): Descriptive statistics for symbolic data. In: H.-H. Bock, E. Diday (Eds.): Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data. Springer, Berlin, 103–124.

    Google Scholar 

  • BILLARD, L. (2007): Dependencies and Variation Components of Symbolic Interval–Valued Data. In: P. Brito, P. Bertrand, G. Cucumel, F. de Carvalho (Eds.): Selected Contributions in Data Analysis and Classification. Springer, Berlin, 3–12.

    Chapter  Google Scholar 

  • BOCK, H.-H. and DIDAY, E. (2000): Analysis of Symbolic Data, Exploratory meth- ods for extracting statistical information from complex data, Studies in Classification, Data Analysis and Knowledge Organisation, Springer-Verlag.

    Google Scholar 

  • BRITO, P. (2007): On the Analysis of Symbolic Data. In: P. Brito, P. Bertrand, G. Cucumel, F. de Carvalho (Eds.): Selected Contributions in Data Analysis and Classification. Springer, Berlin, 13–22.

    Chapter  Google Scholar 

  • CHAVENT, M., DE CARVALHO, F.A.T., LECHEVALLIER, Y., and VERDE, R. (2003): Trois nouvelles méthodes de classification automatique des données symbolique de type intervalle. Revue de Statistique Appliquée, LI, 4, 5–29.

    Google Scholar 

  • CUESTA-ALBERTOS, J.A., MATRÁN, C., TUERO-DIAZ, A. (1997): Optimal transportation plans and convergence in distribution. Journ. of Multiv. An., 60, 72–83.

    Article  MATH  Google Scholar 

  • DIDAY, E., and SIMON, J.C. (1976): Clustering analysis, In: K. S. Fu (Eds.), Digital Pattern Recognition, Springer Verlag, Heidelberg, 47–94.

    Google Scholar 

  • DIDAY, E. (1971): Le méthode des nuées dynamiques, Revue de Statistique Appliquée, 19, 2, 19–34.

    Google Scholar 

  • GIBBS, A.L. and SU, F.E. (2002): On choosing and bounding probability metrics. Intl. Stat. Rev. 7 (3), 419–435.

    Article  Google Scholar 

  • IRPINO, A., LECHEVALLIER, Y. and VERDE, R. (2006): Dynamic clustering of histograms using Wasserstein metric. In: A. Rizzi, M. Vichi, (Eds.) COMPSTAT 2006. Physica-Verlag, Berlin, 869–876.

    Google Scholar 

  • IRPINO, A. and ROMANO, E. (2007): Optimal histogram representation of large data sets: Fisher vs piecewise linear approximations. RNTI E-9, 99–110.

    Google Scholar 

  • IRPINO, A. and VERDE, R. (2006): A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: V. Batagelj, H.-H. Bock, A. Ferligoj, A. Ziberna (Eds.) Data Science and Classification, IFCS 2006. Springer, Berlin, 185–192.

    Chapter  Google Scholar 

  • VERDE, R. and IRPINO, A. (2007): Dynamic Clustering of Histogram Data: Using the Right Metric. In: P. Brito, P. Bertrand, G. Cucumel, F. de Carvalho (Eds.): Comparing Histograms Using a Mahalanobis–Wasserstein Distance 89 Selected Contributions in Data Analysis and Classification. Springer, Berlin, 123–134.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosanna Verde .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Physica-Verlag Heidelberg

About this paper

Cite this paper

Verde, R., Irpino, A. (2008). Comparing Histogram Data Using a Mahalanobis–Wasserstein Distance. In: Brito, P. (eds) COMPSTAT 2008. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2084-3_7

Download citation

Publish with us

Policies and ethics