Variant Nearest Neighbor Classification Algorithm for Text Document

Bhadri Raju, M. S. V. S.; Vishnu Vardhan, B.; Sowmya, V.

doi:10.1007/978-3-319-03095-1_27

Variant Nearest Neighbor Classification Algorithm for Text Document

M. S. V. S. Bhadri Raju⁶,
B. Vishnu Vardhan⁷ &
V. Sowmya⁸

Conference paper

2573 Accesses
2 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 249))

Abstract

Categorizing the text documents into predefined number of categories is called text classification. This paper analyzes various ways of applying nearest neighbor classification for text documents. Text document classification categorizes the documents into predefined classes. In this paper, cosine similarity measure is used to find the similarity between the documents. This similarity measure is applied on term frequency-Inverse document frequency vector space model representation of preprocessed Classic data set. The documents that are most similar to a document are said to be nearest neighbors of that document. In this work, nearest neighbors and k nearest neighbor classification algorithms are used to classify the documents into predefined classes and classifier accuracy is measured.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Khan, A., Bahurdin, B.B., Khan, K.: An Overview of E-Documents Classification. In: 2009 International Conference on Machine Learning and Computing IPCSIT, vol. 3. IACSIT Press, Singapore (2011)
Google Scholar
Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A Noval Feature Selection Algorithm for text catogorization. Elsevier, Science Direct Expert System with Application 33(1), 1–5 (2006)
Article Google Scholar
Aha, D. (ed.): Lazy learning. Kluwer Academic Publishers (1997)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Huang, A.: Similarity Measures for Text Document Clustering. Published in the Proceedings of New Zealand Computer Science Research Student Conference (2008)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Han, J., Kamber, M.: Dat Mining concepts and techniques. Elsevier Publishers
Google Scholar
Jarvis, R.A., Patrick, E.A.: Clustering Using a Similarity Measure Based on Shared Nearest Neighbors. IEEE Transactions on Computers C-22(11) (November 1973)
Google Scholar
Sandhya, N., Sri Lalitha, Y.: Analysis of Stemming Algorithm for Text Clustering. IJCSI International Journal of Computer Science 8(5(1)) (September 2011)
Google Scholar
Kruengkrai, C., Jaruskulchai, C.: A Parallel Learning Algorithm for Text Classification. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), Canada (July 2002)
Google Scholar
Kamruzzaman, S.M., Haider, F., Hasan, A.R.: Text Classification Using Data Mining
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, SRKR Engg. College, Bhimavaram, AP, India
M. S. V. S. Bhadri Raju
Department of IT, JNTUCE, Jagityala, AP, India
B. Vishnu Vardhan
Department of CSE, GRIET, Hyderabad, AP, India
V. Sowmya

Authors

M. S. V. S. Bhadri Raju
View author publications
You can also search for this author in PubMed Google Scholar
B. Vishnu Vardhan
View author publications
You can also search for this author in PubMed Google Scholar
V. Sowmya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Anil Neerukonda Institute of Technology and Sciences, Vishakapatnam, India
Suresh Chandra Satapathy
College of Engineering(A), Andhra University, Vishakapatnam, India
P. S. Avadhani
University of Hyderabad, Hyderabad, India
Siba K. Udgata
CSIR-National Institute of Oceanography, Visakhapatnam, India
Sadasivuni Lakshminarayana

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhadri Raju, M.S.V.S., Vishnu Vardhan, B., Sowmya, V. (2014). Variant Nearest Neighbor Classification Algorithm for Text Document. In: Satapathy, S., Avadhani, P., Udgata, S., Lakshminarayana, S. (eds) ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India- Vol II. Advances in Intelligent Systems and Computing, vol 249. Springer, Cham. https://doi.org/10.1007/978-3-319-03095-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-03095-1_27
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03094-4
Online ISBN: 978-3-319-03095-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics