Performance Evaluation of New Text Mining Method Based on GA and K-Means Clustering Algorithm

Garg, Neha; Gupta, R. K.

doi:10.1007/978-981-10-4603-2_3

Neha Garg¹⁷ &
R. K. Gupta¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 562))

655 Accesses
2 Citations

Abstract

Rapid breakthrough in technology and reduced storage cost permit the individuals and organizations to generate and gather an enormous amount of text data. Extracting user interested documents from this gigantic amount of text data is a tedious job. This necessitates the development of text mining method for discovering interesting information or knowledge from the massive data. Document clustering is an effective text mining method which classifies the similar set of documents into the most relevant groups. K-means is the most classic clustering algorithm. However, results obtained by K-means highly depend on initial cluster centers and might be trapped in local optima. The paper presents a K-means document clustering algorithm with optimized initial cluster centers based on genetic algorithm. Experimental studies conducted over two different text datasets confirm that clustering results are more accurate by the application of the proposed method compared to K-means clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Konchady, M.: Text mining application programming. Programming Series Charles River Media (2006)
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, vol. 1, pp. 281–297 (1967)
Google Scholar
Han, J., Kamber, M.: Data mining: concepts and techniques, 2nd edn. In: Gray, J. (ed.) The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann Publishers (2006)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Advanced Reference Series. Prentice-Hall, New Jersey (1988)
MATH Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. Technical report, Department of Computer Science and Engineering, University of Minnesota (2000)
Google Scholar
Sihag, V.K., Kumar, S.: Graph based text document clustering by detecting initial centroids for K-means. Int. J. Comput. Appl. 62(19) (2013)
Google Scholar
Premalatha, K., Natarajan, A.M.: Genetic algorithm for documents clustering with simultaneous and ranked mutation. Mod. Appl. Sci. 3(2), 35–42 (2009)
Article Google Scholar
Selim, S.Z., Ismail, M.A.: K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6(1), 81–87 (1984)
Article MATH Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Muflikhah, L., Baharudin, B.: Document clustering using concept space and cosine similarity measurement. In: International Conference on Computer Technology and Development, IEEE, vol. 1, pp. 58–62 (2009)
Google Scholar
Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)
Google Scholar
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley Publishing Company (1989)
Google Scholar
Yao, X., Liu, Y., Lin, G.: Evolutionary programming made faster. IEEE Trans. Evol. Comput. 3(2), 82–102 (1999)
Article Google Scholar
Classic Dataset. http://www.dataminingresearch.com/index.php/2010/09/classic3-classic4-datasets
Newsgroups Dataset. http://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups
Abraham, A., Das, S., Konar, A.: Document clustering using differential evolution. In: IEEE Congress on Evolutionary Computation CEC, pp. 1784–1791 (2006)
Google Scholar
Aliguliyev, R.M.: Clustering of document collection—a weighting approach. Expert Syst. Appl. 36(4), 7904–7916 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE & IT, Madhav Institute of Technology and Science, Gwalior, India
Neha Garg & R. K. Gupta

Authors

Neha Garg
View author publications
You can also search for this author in PubMed Google Scholar
R. K. Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Neha Garg .

Editor information

Editors and Affiliations

Asia Pacific Institute of Information Technology, Panipat, Haryana, India
Ramesh K. Choudhary
Department of Computer Science and Engineering, Faculty of Engineering, Technology and Management, Kalyani University, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Computational Science Division, Saha Institute of Nuclear Physics, Kolkata, West Bengal, India
Dhananjay Bhattacharyya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garg, N., Gupta, R.K. (2018). Performance Evaluation of New Text Mining Method Based on GA and K-Means Clustering Algorithm. In: Choudhary, R., Mandal, J., Bhattacharyya, D. (eds) Advanced Computing and Communication Technologies. Advances in Intelligent Systems and Computing, vol 562. Springer, Singapore. https://doi.org/10.1007/978-981-10-4603-2_3

Download citation

DOI: https://doi.org/10.1007/978-981-10-4603-2_3
Published: 25 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4602-5
Online ISBN: 978-981-10-4603-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics