Factor PD-Clustering

Tortora, Cristina; Summa, Mireille Gettler; Palumbo, Francesco

doi:10.1007/978-3-319-00035-0_11

Cristina Tortora^21,22,
Mireille Gettler Summa²² &
Francesco Palumbo²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2823 Accesses
4 Citations

Abstract

Probabilistic Distance (PD) Clustering is a non parametric probabilistic method to find homogeneous groups in multivariate datasets with J variables and n units. PD Clustering runs on an iterative algorithm and looks for a set of K group centers, maximising the empirical probabilities of belonging to a cluster of the n statistical units. As J becomes large the solution tends to become unstable. This paper extends the PD-Clustering to the context of Factorial clustering methods and shows that Tucker3 decomposition is a consistent transformation to project original data in a subspace defined according to the same PD-Clustering criterion. The method consists of a two step iterative procedure: a linear transformation of the initial data and PD-clustering on the transformed data. The integration of the PD Clustering and the Tucker3 factorial step makes the clustering more stable and lets us consider datasets with large J and let us use it in case of clusters not having elliptical form.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://archive.ics.uci.edu/ml/index.html

References

Ben-Israel, A., & Iyigun, C. (2008). Probabilistic d-clustering. Journal of Classification, 25(1), 5–26.
Article MathSciNet MATH Google Scholar
Iyigun, C. (2007). Probabilistic distance clustering. Ph.D. thesis, New Brunswick Rutgers, The State University of New Jersey.
Google Scholar
Jain, A. K. (2009). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666.
Article Google Scholar
Kiers, H., & Kinderen, A. (2003). A fast method for choosing the numbers of components in tucker3 analysis. British Journal of Mathematical and Statistical Psychology, 56(1), 119–125.
Article MathSciNet Google Scholar
Kroonenberg, P. (2008). Applied multiway data analysis. Ebooks Corporation, Baarn, Nederland.
Book MATH Google Scholar
Menardi, G. (2011). Density-based Silhouette diagnostics for clustering methods. Statistics and Computing, 21, 295–308.
Article MathSciNet MATH Google Scholar
Montanari, A., & Viroli, C. (2011). Maximum likelihood estimation of mixtures of factor analyzers. Computational Statistics and Data Analysis, 55, 2712–2723.
Article MathSciNet Google Scholar
Parsons, L., Haque, E., & Liu, H. (2004). Subspace clustering for high dimensional data: A review SIGKDD Explorations Newsletter, 6, 90–105.
Google Scholar
Tortora, C. (2011). Non-hierarchical clustering methods on factorial subspaces. Ph.D. thesis at Universitá di Napoli Federico II, Naples.
Google Scholar
Tortora, C., Palumbo, F., & Gettler Summa, M. (2011). Factorial PD-clustering. Working paper. arXiv:1106.3830v1.
Google Scholar
Vichi, M., & Kiers, H. (2001). Factorial k-means analysis for two way data. Computational Statistics and Data Analysis, 37, 29–64.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Università degli Studi di Napoli Federico II, Naples, Italy
Cristina Tortora & Francesco Palumbo
CEREMADE, CNRS, Université Paris Dauphine, Paris, France
Cristina Tortora & Mireille Gettler Summa

Authors

Cristina Tortora
View author publications
You can also search for this author in PubMed Google Scholar
Mireille Gettler Summa
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Palumbo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cristina Tortora .

Editor information

Editors and Affiliations

University of Essex Department of Mathematical Sciences, Colchester, United Kingdom
Berthold Lausen
Ghent University Department of Marketing, Ghent, Belgium
Dirk Van den Poel
University of Marburg Databionics, FB 12, Marburg, Germany
Alfred Ultsch

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tortora, C., Summa, M.G., Palumbo, F. (2013). Factor PD-Clustering. In: Lausen, B., Van den Poel, D., Ultsch, A. (eds) Algorithms from and for Nature and Life. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-00035-0_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-00035-0_11
Published: 16 July 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-00034-3
Online ISBN: 978-3-319-00035-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics