Optimizing the Cauchy-Schwarz PDF Distance for Information Theoretic, Non-parametric Clustering

  • Robert Jenssen
  • Deniz Erdogmus
  • Kenneth E. Hild
  • Jose C. Principe
  • Torbjørn Eltoft
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3757)


This paper addresses the problem of efficient information theoretic, non-parametric data clustering. We develop a procedure for adapting the cluster memberships of the data patterns, in order to maximize the recent Cauchy-Schwarz (CS) probability density function (pdf) distance measure. Each pdf corresponds to a cluster. The CS distance is estimated analytically and non-parametrically by means of the Parzen window technique for density estimation. The resulting form of the cost function makes it possible to develop an efficient adaption procedure based on constrained gradient descent, using stochastic approximation of the gradients. The computational complexity of the algorithm is O(MN), MN, where N is the total number of data patterns and M is the number of data patterns used in the stochastic approximation. We show that the new algorithm is capable of performing well on several odd-shaped and irregular data sets.


Cost Function Stochastic Approximation Data Pattern Kernel Size Membership Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  2. 2.
    Bezdek, J.C.: A Convergence Theorem for the Fuzzy Isodata Clustering Algorithms. IEEE Transactions on Pattern Analysis and Machine Learning 2(1), 1–8 (1980)zbMATHCrossRefGoogle Scholar
  3. 3.
    McLachlan, G.J., Peel, D.: Finite Mixture Models. John Wiley & Sons, New York (2000)zbMATHCrossRefGoogle Scholar
  4. 4.
    Rose, K., Gurewitz, E., Fox, G.C.: Vector Quantization by Deterministic Annealing. IEEE Transactions on Information Theory 38(4), 1249–1257 (1992)CrossRefzbMATHGoogle Scholar
  5. 5.
    Hofmann, T., Buhmann, J.M.: Pairwise Data Clustering by Deterministic Annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(1), 1–14 (1997)CrossRefGoogle Scholar
  6. 6.
    Roberts, S.J., Everson, R., Rezek, I.: Maximum Certainty Data Partitioning. Pattern Recognition 33, 833–839 (2000)CrossRefGoogle Scholar
  7. 7.
    Tishby, N., Slonim, N.: Data Clustering by Markovian Relaxation and the Information Bottleneck Method. In: Advances in Neural Information Processing Systems, vol. 13, pp. 640–646. MIT Press, Cambridge (2001)Google Scholar
  8. 8.
    Principe, J., Xu, D., Fisher, J.: Information Theoretic Learning. In: Haykin, S. (ed.) Unsupervised Adaptive Filtering, ch. 7, vol. I. John Wiley & Sons, New York (2000)Google Scholar
  9. 9.
    Parzen, E.: On the Estimation of a Probability Density Function and the Mode. The Annals of Mathematical Statistics 32, 1065–1076 (1962)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Gokcay, E., Principe, J.: Information Theoretic Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(2), 158–170 (2002)CrossRefGoogle Scholar
  11. 11.
    Milligan, G.W., Cooper, M.C.: An Examination of Procedures for Determining the Number of Clusters in a Data Set. Phychometrica, 159–179 (1985)Google Scholar
  12. 12.
    Silverman, B.W.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)zbMATHGoogle Scholar
  13. 13.
    Shi, J., Malik, J.: Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)CrossRefGoogle Scholar
  14. 14.
    Mangasarian, O.L., Wolberg, W.H.: Cancer Diagnosis via Linear Programming. SIAM News 5, 1–18 (1990)Google Scholar
  15. 15.
    Jenssen, R., Principe, J.C., Eltoft, T.: Information Cut and Information Forces for Clustering. In: Proceedings of IEEE International Workshop on Neural Networks for Signal Processing, Toulouse, France, September 17-19, pp. 459–468 (2003)Google Scholar
  16. 16.
    Jenssen, R., Erdogmus, D., Principe, J.C., Eltoft, T.: The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space. In: Advances in Neural Information Processing Systems, vol. 17, pp. 625–632. MIT Press, Cambridge (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Robert Jenssen
    • 1
  • Deniz Erdogmus
    • 2
  • Kenneth E. Hild
    • 3
  • Jose C. Principe
    • 4
  • Torbjørn Eltoft
    • 1
  1. 1.Department of PhysicsUniversity of TromsøTromsøNorway
  2. 2.Department of Computer Science and Engineering, Oregon Graduate InstituteOHSUPortlandUSA
  3. 3.Department of RadiologyUniversity of CaliforniaSan FranciscoUSA
  4. 4.Department of Electrical and Computer EngineeringUniversity of FloridaGainesvilleUSA

Personalised recommendations