Abstract
Rapid breakthrough in technology and reduced storage cost permit the individuals and organizations to generate and gather an enormous amount of text data. Extracting user interested documents from this gigantic amount of text data is a tedious job. This necessitates the development of text mining method for discovering interesting information or knowledge from the massive data. Document clustering is an effective text mining method which classifies the similar set of documents into the most relevant groups. K-means is the most classic clustering algorithm. However, results obtained by K-means highly depend on initial cluster centers and might be trapped in local optima. The paper presents a K-means document clustering algorithm with optimized initial cluster centers based on genetic algorithm. Experimental studies conducted over two different text datasets confirm that clustering results are more accurate by the application of the proposed method compared to K-means clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Konchady, M.: Text mining application programming. Programming Series Charles River Media (2006)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, vol. 1, pp. 281–297 (1967)
Han, J., Kamber, M.: Data mining: concepts and techniques, 2nd edn. In: Gray, J. (ed.) The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann Publishers (2006)
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall Advanced Reference Series. Prentice-Hall, New Jersey (1988)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. Technical report, Department of Computer Science and Engineering, University of Minnesota (2000)
Sihag, V.K., Kumar, S.: Graph based text document clustering by detecting initial centroids for K-means. Int. J. Comput. Appl. 62(19) (2013)
Premalatha, K., Natarajan, A.M.: Genetic algorithm for documents clustering with simultaneous and ranked mutation. Mod. Appl. Sci. 3(2), 35–42 (2009)
Selim, S.Z., Ismail, M.A.: K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. 6(1), 81–87 (1984)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Muflikhah, L., Baharudin, B.: Document clustering using concept space and cosine similarity measurement. In: International Conference on Computer Technology and Development, IEEE, vol. 1, pp. 58–62 (2009)
Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley Publishing Company (1989)
Yao, X., Liu, Y., Lin, G.: Evolutionary programming made faster. IEEE Trans. Evol. Comput. 3(2), 82–102 (1999)
Classic Dataset. http://www.dataminingresearch.com/index.php/2010/09/classic3-classic4-datasets
Newsgroups Dataset. http://archive.ics.uci.edu/ml/datasets/Twenty+Newsgroups
Abraham, A., Das, S., Konar, A.: Document clustering using differential evolution. In: IEEE Congress on Evolutionary Computation CEC, pp. 1784–1791 (2006)
Aliguliyev, R.M.: Clustering of document collection—a weighting approach. Expert Syst. Appl. 36(4), 7904–7916 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Garg, N., Gupta, R.K. (2018). Performance Evaluation of New Text Mining Method Based on GA and K-Means Clustering Algorithm. In: Choudhary, R., Mandal, J., Bhattacharyya, D. (eds) Advanced Computing and Communication Technologies. Advances in Intelligent Systems and Computing, vol 562. Springer, Singapore. https://doi.org/10.1007/978-981-10-4603-2_3
Download citation
DOI: https://doi.org/10.1007/978-981-10-4603-2_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-4602-5
Online ISBN: 978-981-10-4603-2
eBook Packages: EngineeringEngineering (R0)