Skip to main content

Factor PD-Clustering

  • Conference paper
  • First Online:
Algorithms from and for Nature and Life

Abstract

Probabilistic Distance (PD) Clustering is a non parametric probabilistic method to find homogeneous groups in multivariate datasets with J variables and n units. PD Clustering runs on an iterative algorithm and looks for a set of K group centers, maximising the empirical probabilities of belonging to a cluster of the n statistical units. As J becomes large the solution tends to become unstable. This paper extends the PD-Clustering to the context of Factorial clustering methods and shows that Tucker3 decomposition is a consistent transformation to project original data in a subspace defined according to the same PD-Clustering criterion. The method consists of a two step iterative procedure: a linear transformation of the initial data and PD-clustering on the transformed data. The integration of the PD Clustering and the Tucker3 factorial step makes the clustering more stable and lets us consider datasets with large J and let us use it in case of clusters not having elliptical form.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://archive.ics.uci.edu/ml/index.html

References

  • Ben-Israel, A., & Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), 5–26.

    Article  MathSciNet  MATH  Google Scholar 

  • Iyigun, C. (2007). Probabilistic distance clustering. Ph.D. thesis, New Brunswick Rutgers, The State University of New Jersey.

    Google Scholar 

  • Jain, A. K. (2009). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.

    Article  Google Scholar 

  • Kiers, H., & Kinderen, A. (2003). A fast method for choosing the numbers of components in tucker3 analysis. British Journal of Mathematical and Statistical Psychology, 56(1), 119–125.

    Article  MathSciNet  Google Scholar 

  • Kroonenberg, P. (2008). Applied multiway data analysis. Ebooks Corporation, Baarn, Nederland.

    Book  MATH  Google Scholar 

  • Menardi, G. (2011). Density-based Silhouette diagnostics for clustering methods. Statistics and Computing, 21, 295–308.

    Article  MathSciNet  MATH  Google Scholar 

  • Montanari, A., & Viroli, C. (2011). Maximum likelihood estimation of mixtures of factor analyzers. Computational Statistics and Data Analysis, 55, 2712–2723.

    Article  MathSciNet  Google Scholar 

  • Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review SIGKDD Explorations Newsletter, 6, 90–105.

    Google Scholar 

  • Tortora, C. (2011). Non-hierarchical clustering methods on factorial subspaces. Ph.D. thesis at Universitá di Napoli Federico II, Naples.

    Google Scholar 

  • Tortora, C., Palumbo, F., & Gettler Summa, M. (2011). Factorial PD-clustering. Working paper. arXiv:1106.3830v1.

    Google Scholar 

  • Vichi, M., & Kiers, H. (2001). Factorial k-means analysis for two way data. Computational Statistics and Data Analysis, 37, 29–64.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cristina Tortora .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Tortora, C., Summa, M.G., Palumbo, F. (2013). Factor PD-Clustering. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_11

Download citation

Publish with us

Policies and ethics