Skip to main content

Double Layered Genetic Algorithm for Document Clustering

  • Conference paper
Software Engineering, Business Continuity, and Education (ASEA 2011)

Abstract

Genetic algorithm for document clustering(GC) shows good performance. However the genetic algorithm has problem of performance degradation by premature convergence phenomenon(PCP). In this paper, we propose double layered genetic algorithm for document clustering(DLGC) to solve this problem. The clustering algorithms including DLGC are tested and compared on Reuter-21578 data collection. The results show that our DLGC has the best performance among traditional clustering algorithms(K-means, Group Average Clustering) and GC in various experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Foster, I., Kesselman, C.: Modern information retrieval. Addison-Wesley (1999)

    Google Scholar 

  2. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

    Google Scholar 

  3. Croft, W.B., Metzler, D., Strohman, T.: Search Engines Information Retrieval in Practice. Addison Wesley (2009)

    Google Scholar 

  4. Maulik, U., Bandyopadhyay, S.: Genetic Algorithm-based Clustering Technique. Pattern Recognition 33(9), 1455–1465 (2000)

    Article  Google Scholar 

  5. Bandyopadhyay, S., Mauilk, U.: Nonparametric genetic clustering: Comparison of validity indices. IEEE Trans. System Man Cybern.-Part C Applications and Reviews 31, 120–125 (2001)

    Article  Google Scholar 

  6. Song, W., Park, S.C.: Genetic Algorithm-Based Text Clustering Technique. In: Jiao, L., Wang, L., Gao, X.-b., Liu, J., Wu, F. (eds.) ICNC 2006, Part I. LNCS, vol. 4221, pp. 779–782. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Goldberg, D.E.: The Grid: Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley (1989)

    Google Scholar 

  8. David, L.D.: Handbook of Genetic Algorithms. Van Nostrand Reinhold (1991)

    Google Scholar 

  9. Andre, J., Siarry, P., Dognon, T.: An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization. Advances in Engineering Software 32(1), 49–60 (2001)

    Article  MATH  Google Scholar 

  10. Yao, X., Liu, Y., Lin, G.: Evolutionary programming made faster. Presented at IEEE Trans. Evolutionary Computation, 82–102 (1999)

    Google Scholar 

  11. Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal.l Intell. 1, 224–227 (1979)

    Article  Google Scholar 

  12. Song, W., Park, S.C.: Genetic algorithm for text clustering based on latent semantic indexing. Presented at Computers & Mathematics with Applications, 1901–1907 (2009)

    Google Scholar 

  13. Selim, S.Z., Ismail, M.A.: K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality. IEEE Trans. Pattern Anal. Mach. Intell., 81–87 (1984)

    Google Scholar 

  14. Zhao, Y., Karypis, G., Fayyad, U.M.: Hierarchical Clustering Algorithms for Document Datasets. Data Min. Knowl. Discov. 10(2), 141–167 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Choi, L.C., Lee, J.S., Park, S.C. (2011). Double Layered Genetic Algorithm for Document Clustering. In: Kim, Th., et al. Software Engineering, Business Continuity, and Education. ASEA 2011. Communications in Computer and Information Science, vol 257. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27207-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27207-3_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27206-6

  • Online ISBN: 978-3-642-27207-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics