Skip to main content

Performance Evaluation of New Text Mining Method Based on GA and K-Means Clustering Algorithm

  • Conference paper
  • First Online:
Advanced Computing and Communication Technologies

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 562))

Abstract

Rapid breakthrough in technology and reduced storage cost permit the individuals and organizations to generate and gather an enormous amount of text data. Extracting user interested documents from this gigantic amount of text data is a tedious job. This necessitates the development of text mining method for discovering interesting information or knowledge from the massive data. Document clustering is an effective text mining method which classifies the similar set of documents into the most relevant groups. K-means is the most classic clustering algorithm. However, results obtained by K-means highly depend on initial cluster centers and might be trapped in local optima. The paper presents a K-means document clustering algorithm with optimized initial cluster centers based on genetic algorithm. Experimental studies conducted over two different text datasets confirm that clustering results are more accurate by the application of the proposed method compared to K-means clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Konchady, M.: Text mining application programming. Programming Series Charles River Media (2006)

    Google Scholar 

  2. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  3. Han, J., Kamber, M.: Data mining: concepts and techniques, 2nd edn. In: Gray, J. (ed.) The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann Publishers (2006)

    Google Scholar 

  4. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Advanced Reference Series. Prentice-Hall, New Jersey (1988)

    MATH  Google Scholar 

  5. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. Technical report, Department of Computer Science and Engineering, University of Minnesota (2000)

    Google Scholar 

  6. Sihag, V.K., Kumar, S.: Graph based text document clustering by detecting initial centroids for K-means. Int. J. Comput. Appl. 62(19) (2013)

    Google Scholar 

  7. Premalatha, K., Natarajan, A.M.: Genetic algorithm for documents clustering with simultaneous and ranked mutation. Mod. Appl. Sci. 3(2), 35–42 (2009)

    Article  Google Scholar 

  8. Selim, S.Z., Ismail, M.A.: K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6(1), 81–87 (1984)

    Article  MATH  Google Scholar 

  9. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  10. Muflikhah, L., Baharudin, B.: Document clustering using concept space and cosine similarity measurement. In: International Conference on Computer Technology and Development, IEEE, vol. 1, pp. 58–62 (2009)

    Google Scholar 

  11. Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)

    Google Scholar 

  12. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley Publishing Company (1989)

    Google Scholar 

  13. Yao, X., Liu, Y., Lin, G.: Evolutionary programming made faster. IEEE Trans. Evol. Comput. 3(2), 82–102 (1999)

    Article  Google Scholar 

  14. Classic Dataset. http://www.dataminingresearch.com/index.php/2010/09/classic3-classic4-datasets

  15. Newsgroups Dataset. http://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups

  16. Abraham, A., Das, S., Konar, A.: Document clustering using differential evolution. In: IEEE Congress on Evolutionary Computation CEC, pp. 1784–1791 (2006)

    Google Scholar 

  17. Aliguliyev, R.M.: Clustering of document collection—a weighting approach. Expert Syst. Appl. 36(4), 7904–7916 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Neha Garg .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Garg, N., Gupta, R.K. (2018). Performance Evaluation of New Text Mining Method Based on GA and K-Means Clustering Algorithm. In: Choudhary, R., Mandal, J., Bhattacharyya, D. (eds) Advanced Computing and Communication Technologies. Advances in Intelligent Systems and Computing, vol 562. Springer, Singapore. https://doi.org/10.1007/978-981-10-4603-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-4603-2_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-4602-5

  • Online ISBN: 978-981-10-4603-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics