Advertisement

Experiments with a Non-convex Variance-Based Clustering Criterion

  • Rodrigo F. TosoEmail author
  • Evgeny V. Bauman
  • Casimir A. Kulikowski
  • Ilya B. Muchnik
Chapter
Part of the Springer Optimization and Its Applications book series (SOIA, volume 92)

Abstract

This paper investigates the effectiveness of a variance-based clustering criterion whose construct is similar to the popular minimum sum-of-squares or k-means criterion, except for two distinguishing characteristics: its ability to discriminate clusters by means of quadratic boundaries and its functional form, for which convexity does not hold. Using a recently proposed iterative local search heuristic that is suitable for general variance-based criteria—convex or not, the first to our knowledge that offers such broad support—the alternative criterion has performed remarkably well. In our experimental results, it is shown to be better suited for the majority of the heterogeneous real-world data sets selected. In conclusion, we offer strong reasons to believe that this criterion can be used by practitioners as an alternative to k-means clustering.

Keywords

Clustering Variance-based discriminants Iterative local search 

References

  1. 1.
    Asuncion, A., Newman, D.J.: UCI Machine Learning Repository (2009) http://archive.ics.uci.edu/ml/
  2. 2.
    Bauman, E.V., Dorofeyuk, A.A.: Variational approach to the problem of automatic classification for a class of additive functionals. Autom. Remote Control 8, 133–141 (1978)MathSciNetGoogle Scholar
  3. 3.
    Bock, H.-H.: Origins and extensions of the k-means algorithm in cluster analysis. Electron. J. Hist. Probab. Stat. 4(2) (2008)Google Scholar
  4. 4.
    Bradley, P.S., Fayyad, U.M.: Refining initial points for k-means clustering. In: Proceedings of the 15th International Conference on Machine Learning, pp. 91–99. Morgan Kaufmann Publishers, San Francisco (1998)Google Scholar
  5. 5.
    Brucker, P.: On the complexity of clustering problems. In: Optimization and Operations Research. Lecture Notes in Economics and Mathematical Systems, vol. 157, pp. 45–54. Springer, Berlin (1978)Google Scholar
  6. 6.
    Duda, R.O., Hart, P.E., Storck, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2000)Google Scholar
  7. 7.
    Efros, M., Schulman, L.J.: Deterministic clustering with data nets. Technical Report 04-050, Electronic Colloquium on Computational Complexity (2004)Google Scholar
  8. 8.
    Grotschel, M., Wakabayashi, Y.: A cutting plane algorithm for a clustering problem. Math. Program. 45, 59–96 (1989)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997)zbMATHMathSciNetGoogle Scholar
  10. 10.
    Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)CrossRefGoogle Scholar
  11. 11.
    Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)CrossRefGoogle Scholar
  12. 12.
    Kiseleva, N.E., Muchnik, I.B., Novikov, S.G.: Stratified samples in the problem of representative types. Autom. Remote Control 47, 684–693 (1986)Google Scholar
  13. 13.
    Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means algorithm. Pattern Recognit. 36, 451–461 (2003)CrossRefGoogle Scholar
  14. 14.
    Lloyd, S.P.: Least squares quantization in PCM. Technical report, Bell Telephone Labs Memorandum (1957)Google Scholar
  15. 15.
    Lytkin, N.I., Kulikowski, C.A., Muchnik, I.B.: Variance-based criteria for clustering and their application to the analysis of management styles of mutual funds based on time series of daily returns. Technical Report 2008-01, DIMACS (2008)Google Scholar
  16. 16.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)Google Scholar
  17. 17.
    Neyman, J.: On the two different aspects of the representative method: the method of stratified sampling and the method of purposive selection. J. R. Stat. Soc. 97, 558–625 (1934)CrossRefGoogle Scholar
  18. 18.
    Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281. ACM, New York (1999)Google Scholar
  19. 19.
    Pelleg, D., Moore, A.: x-means: extending k-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, pp. 727–734. Morgan Kaufmann Publishers, San Francisco (2000)Google Scholar
  20. 20.
    Ruszczyński, A.: Nonlinear Programming. Princeton University Press, Princeton (2006)Google Scholar
  21. 21.
    Schulman, L.J.: Clustering for edge-cost minimization. In: Proceedings of the 32nd Annual ACM Symposium on Theory of Computing, pp. 547–555. ACM, New York (2000)Google Scholar
  22. 22.
    Späth, H.: Cluster Analysis Algorithms for Data Reduction and Classification of Objects. E. Horwood, Chichester (1980)zbMATHGoogle Scholar
  23. 23.
    Toso, R.F., Kulikowski, C.A., Muchnik, I.B.: A heuristic for non-convex variance-based clustering criteria. In: Klasing, R. (ed.) Experimental Algorithms. Lecture Notes in Computer Science, vol. 7276, pp. 381–392. Springer, Berlin (2012)Google Scholar
  24. 24.
    Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1073–1080. ACM, New York (2009)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Rodrigo F. Toso
    • 1
    Email author
  • Evgeny V. Bauman
    • 2
  • Casimir A. Kulikowski
    • 1
  • Ilya B. Muchnik
    • 3
  1. 1.Department of Computer ScienceRutgers UniversityPiscatawayUSA
  2. 2.Markov Processes InternationalSummitUSA
  3. 3.DIMACS, Rutgers UniversityPiscatawayUSA

Personalised recommendations