Abstract
Categorizing the text documents into predefined number of categories is called text classification. This paper analyzes various ways of applying nearest neighbor classification for text documents. Text document classification categorizes the documents into predefined classes. In this paper, cosine similarity measure is used to find the similarity between the documents. This similarity measure is applied on term frequency-Inverse document frequency vector space model representation of preprocessed Classic data set. The documents that are most similar to a document are said to be nearest neighbors of that document. In this work, nearest neighbors and k nearest neighbor classification algorithms are used to classify the documents into predefined classes and classifier accuracy is measured.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Khan, A., Bahurdin, B.B., Khan, K.: An Overview of E-Documents Classification. In: 2009 International Conference on Machine Learning and Computing IPCSIT, vol. 3. IACSIT Press, Singapore (2011)
Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A Noval Feature Selection Algorithm for text catogorization. Elsevier, Science Direct Expert System with Application 33(1), 1–5 (2006)
Aha, D. (ed.): Lazy learning. Kluwer Academic Publishers (1997)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Huang, A.: Similarity Measures for Text Document Clustering. Published in the Proceedings of New Zealand Computer Science Research Student Conference (2008)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Han, J., Kamber, M.: Dat Mining concepts and techniques. Elsevier Publishers
Jarvis, R.A., Patrick, E.A.: Clustering Using a Similarity Measure Based on Shared Nearest Neighbors. IEEE Transactions on Computers C-22(11) (November 1973)
Sandhya, N., Sri Lalitha, Y.: Analysis of Stemming Algorithm for Text Clustering. IJCSI International Journal of Computer Science 8(5(1)) (September 2011)
Kruengkrai, C., Jaruskulchai, C.: A Parallel Learning Algorithm for Text Classification. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), Canada (July 2002)
Kamruzzaman, S.M., Haider, F., Hasan, A.R.: Text Classification Using Data Mining
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bhadri Raju, M.S.V.S., Vishnu Vardhan, B., Sowmya, V. (2014). Variant Nearest Neighbor Classification Algorithm for Text Document. In: Satapathy, S., Avadhani, P., Udgata, S., Lakshminarayana, S. (eds) ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India- Vol II. Advances in Intelligent Systems and Computing, vol 249. Springer, Cham. https://doi.org/10.1007/978-3-319-03095-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-03095-1_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03094-4
Online ISBN: 978-3-319-03095-1
eBook Packages: EngineeringEngineering (R0)