Skip to main content

An Iterated Local Search Approach for Minimum Sum-of-Squares Clustering

  • Conference paper
Advances in Intelligent Data Analysis V (IDA 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2810))

Included in the following conference series:

Abstract

Since minimum sum-of-squares clustering (MSSC) is an NP-hard combinatorial optimization problem, applying techniques from global optimization appears to be promising for reliably clustering numerical data. In this paper, concepts of combinatorial heuristic optimization are considered for approaching the MSSC: An iterated local search (ILS) approach is proposed which is capable of finding (near-)optimum solutions very quickly. On gene expression data resulting from biological microarray experiments, it is shown that ILS outperforms multi–start k-means as well as three other clustering heuristics combined with k-means.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Brucker, P.: On the Complexity of Clustering Problems. Lecture Notes in Economics and Mathematical Systems 157, 45–54 (1978)

    MathSciNet  Google Scholar 

  2. Grötschel, M., Wakabayashi, Y.: A Cutting Plane Algorithm for a Clustering Problem. Mathematical Programming 45, 59–96 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  3. Hansen, P., Jaumard, B.: Cluster Analysis and Mathematical Programming. Mathematical Programming 79, 191–215 (1997)

    MathSciNet  MATH  Google Scholar 

  4. Zhang, M.: Large-scale Gene Expression Data Analysis: A New Challenge to Computational Biologists. Genome Research 9, 681–688 (1999)

    Google Scholar 

  5. Brazma, A., Vilo, J.: Gene Expression Data Analysis. FEBS Letters 480, 17–24 (2000)

    Article  Google Scholar 

  6. Eisen, M., Spellman, P., Botstein, D., Brown, P.: Cluster Analysis and Display of Genome-wide Expression Patterns. In: Proceedings of the National Academy of Sciences, USA, vol. 95, pp. 14863–14867 (1998)

    Google Scholar 

  7. Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic Determination of Genetic Network Architecture. Nature Genetics 22, 281–285 (1999)

    Article  Google Scholar 

  8. Yeung, K., Haynor, D., Ruzzo, W.: Validating Clustering for Gene Expression Data. Bioinformatics 17, 309–318 (2001)

    Article  Google Scholar 

  9. Bradley, P.S., Fayyad, U.M.: Refining Initial Points for k-Means Clustering. In: Proc. 15th International Conf. on Machine Learning, pp. 91–99. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  10. Penã, J.M., Lozano, J.A., Larranãga, P.: An Empirical Comparison of Four Initialization Methods for the k-Means Algorithm. Pattern Recognition Letters 20, 1027–1040 (1999)

    Article  Google Scholar 

  11. Johnson, D.S., McGeoch, L.A.: The Traveling Salesman Problem: A Case Study. In: Aarts, E.H.L., Lenstra, J.K. (eds.) Local Search in Combinatorial Optimization, pp. 215–310. Wiley and Sons, New York (1997)

    Google Scholar 

  12. Lourenco, H.R., Martin, O., StĂĽtzle, T.: Iterated Local Search. In: Glover, F., Kochenberger, G. (eds.) Handbook of Metaheuristics. Kluwer Academic Publishers, Dordrecht (2003)

    Google Scholar 

  13. Moscato, P.: Memetic Algorithms: A Short Introduction. In: Corne, D., Dorigo, M., Glover, F. (eds.) New Ideas in Optimization, pp. 219–234. McGraw-Hill, New York (1999)

    Google Scholar 

  14. Merz, P., Freisleben, B.: Memetic Algorithms for the Traveling Salesman Problem. Complex Systems 13, 297–345 (2001)

    MathSciNet  MATH  Google Scholar 

  15. Forgy, E.W.: Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of Classifications. Biometrics 21, 768–769 (1965)

    Google Scholar 

  16. MacQueen, J.: Some Methods of Classification and Analysis of Multivariate Observations. In: Proceedings of the Fifth Berkeley Symposium on Mathemtical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  17. Alsabti, K., Ranka, S., Singh, V.: An Efficient Space-Partitioning Based Algorithm for the k-Means Clustering. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 355–359. Springer, Berlin (1999)

    Google Scholar 

  18. Pelleg, D., Moore, A.: Accelerating Exact k-Means Algorithms with Geometric Reasoning. In: Chaudhuri, S., Madigan, D. (eds.) Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281. ACM Press, New York (1999)

    Chapter  Google Scholar 

  19. Likas, A., Vlassis, N., Verbeek, J.J.: The Global k-Means Clustering Algorithm. Pattern Recognition (36)

    Google Scholar 

  20. Pelleg, D., Moore, A.: X-means: Extending K-means with Efficient Estimation of the Number of Clusters. In: Proc. 17th International Conf. on Machine Learning, pp. 727–734. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  21. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S., Golub, T.R.: Interpreting Patterns of Gene Expression with Selforganizing Maps: Methods and Application to Hematopoietic Differentiation. In: Proceedings of the National Academy of Sciences, USA, vol. 96, pp. 2907–2912 (1999)

    Google Scholar 

  22. Cho, R.J., Campbell, M.J., Winzeler, E.A., Conway, S., Wodicka, L., Wolfsberg, T.G., Gabrielian, A.E., Landsman, D., Lockhart, D.J., Davis, R.W.: A Genomewide Transcriptional Analysis of the Mitotic Cell Cycle. Molecular Cell 2, 65–73 (1998)

    Article  Google Scholar 

  23. Merz, P., Zell, A.: Clustering Gene Expression Profiles with Memetic Algorithms. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 811–820. Springer, Berlin (2002)

    Chapter  Google Scholar 

  24. Xu, Y., Olman, V., Xu, D.: Clustering Gene Expression Data using a Graph- Theoretic Approach: An Application of Minimum Spanning Trees. Bioinformatics 18, 536–545 (2002)

    Article  Google Scholar 

  25. Merz, P., Freisleben, B.: Fitness Landscapes, Memetic Algorithms and Greedy Operators for Graph Bi-Partitioning. Evolutionary Computation 8, 61–91 (2000)

    Article  Google Scholar 

  26. Merz, P., Katayama, K.: Memetic Algorithms for the Unconstrained Binary Quadratic Programming Problem. Bio Systems (2002) (to appear)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Merz, P. (2003). An Iterated Local Search Approach for Minimum Sum-of-Squares Clustering. In: R. Berthold, M., Lenz, HJ., Bradley, E., Kruse, R., Borgelt, C. (eds) Advances in Intelligent Data Analysis V. IDA 2003. Lecture Notes in Computer Science, vol 2810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45231-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45231-7_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40813-0

  • Online ISBN: 978-3-540-45231-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics