Skip to main content

Variant Nearest Neighbor Classification Algorithm for Text Document

  • Conference paper

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 249))

Abstract

Categorizing the text documents into predefined number of categories is called text classification. This paper analyzes various ways of applying nearest neighbor classification for text documents. Text document classification categorizes the documents into predefined classes. In this paper, cosine similarity measure is used to find the similarity between the documents. This similarity measure is applied on term frequency-Inverse document frequency vector space model representation of preprocessed Classic data set. The documents that are most similar to a document are said to be nearest neighbors of that document. In this work, nearest neighbors and k nearest neighbor classification algorithms are used to classify the documents into predefined classes and classifier accuracy is measured.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Khan, A., Bahurdin, B.B., Khan, K.: An Overview of E-Documents Classification. In: 2009 International Conference on Machine Learning and Computing IPCSIT, vol. 3. IACSIT Press, Singapore (2011)

    Google Scholar 

  2. Shang, W., Huang, H., Zhu, H., Lin, Y., Qu, Y., Wang, Z.: A Noval Feature Selection Algorithm for text catogorization. Elsevier, Science Direct Expert System with Application 33(1), 1–5 (2006)

    Article  Google Scholar 

  3. Aha, D. (ed.): Lazy learning. Kluwer Academic Publishers (1997)

    Google Scholar 

  4. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  5. Huang, A.: Similarity Measures for Text Document Clustering. Published in the Proceedings of New Zealand Computer Science Research Student Conference (2008)

    Google Scholar 

  6. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  7. Han, J., Kamber, M.: Dat Mining concepts and techniques. Elsevier Publishers

    Google Scholar 

  8. Jarvis, R.A., Patrick, E.A.: Clustering Using a Similarity Measure Based on Shared Nearest Neighbors. IEEE Transactions on Computers C-22(11) (November 1973)

    Google Scholar 

  9. Sandhya, N., Sri Lalitha, Y.: Analysis of Stemming Algorithm for Text Clustering. IJCSI International Journal of Computer Science 8(5(1)) (September 2011)

    Google Scholar 

  10. Kruengkrai, C., Jaruskulchai, C.: A Parallel Learning Algorithm for Text Classification. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2002), Canada (July 2002)

    Google Scholar 

  11. Kamruzzaman, S.M., Haider, F., Hasan, A.R.: Text Classification Using Data Mining

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Bhadri Raju, M.S.V.S., Vishnu Vardhan, B., Sowmya, V. (2014). Variant Nearest Neighbor Classification Algorithm for Text Document. In: Satapathy, S., Avadhani, P., Udgata, S., Lakshminarayana, S. (eds) ICT and Critical Infrastructure: Proceedings of the 48th Annual Convention of Computer Society of India- Vol II. Advances in Intelligent Systems and Computing, vol 249. Springer, Cham. https://doi.org/10.1007/978-3-319-03095-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-03095-1_27

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-03094-4

  • Online ISBN: 978-3-319-03095-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics