Clustering Linear Models Using Wasserstein Distance

  • Antonio IrpinoEmail author
  • Rosanna Verde
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


This paper deals with the clustering of complex data. The input elements to be clustered are linear models estimated on samples arising from several sub-populations (typologies of individuals). We review the main approaches to the computation of metrics between linear models. We propose to use a Wasserstein based metric for the first time in this field. We show the properties of the proposed metric and an application to real data using a dynamic clustering algorithm.


Quantile Function Functional Data Analysis Allocation Function Single Person Household Pivotal Quantity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Cuesta-Albertos, J. A., Matrán, C., & Tuero-Diaz, A. (1997). Optimal transportation plans and convergence in distribution. Journal of Multivariate Analysis, 60, 72–83.zbMATHCrossRefMathSciNetGoogle Scholar
  2. Diday, E. (1971). La méthode des Nueées dynamiques. Revue de statistique appliquée, 19(2), 19–34.Google Scholar
  3. Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics. International Statistical Review, 7(3), 419–435.Google Scholar
  4. Ingrassia, S., Cerioli, A., & Corbellini, A. (2003). Some issues on clustering of functional data. In: M. Shader, W. Gaul, & M. Vichi (Eds.), Between data science and applied data analysis (pp. 49–56). Berlin: Springer.Google Scholar
  5. Irpino, A., & Romano, E. (2007). Optimal histogram representation of large data sets: Fisher vs piecewise linear approximations. RNTI, E-9, 99–110.Google Scholar
  6. Irpino, A., & Verde, R. (2006). A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: V. Batanjeli, H. H. Bock, A. Ferligoj, & A. Ziberna, (Eds.), Data science and classification, IFCS 2006 (pp. 185–192). Berlin: Springer.CrossRefGoogle Scholar
  7. Irpino, A., Verde, R., & Lechevallier, Y. (2006). Dynamic clustering of histograms using Wasserstein metric. In: A. Rizzi, & M. Vichi (Eds.), COMPSTAT 2006 – Advances in computational statistics (pp. 869–876). Berlin: Physica.Google Scholar
  8. McCullagh, P. (2007). What is a statistical model? The Annals of Statistics, 30(5), 1225–1310.MathSciNetGoogle Scholar
  9. Piccolo, D. (1990). A distance measure for classifying ARIMA models. Journal of Time Series Analysis, 11, 153–164.zbMATHCrossRefGoogle Scholar
  10. Romano, E., Giordano, G., & Lauro, C. N. (2006). An inter model distance for clustering utility function. Statistica Applicata, 18(3), 521–533.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Dipartimento di Studi Europei e MediterraneiSecond University of NaplesCasertaItaly

Personalised recommendations