Advertisement

ICDSMLA 2019 pp 686-695 | Cite as

An Efficient Approach for Document Categorization Using Weighted Sum

  • Vimuktha E. Salis
  • Ranjana S. Chakrasali
  • Chowdaiah PathanjaliEmail author
Conference paper
  • 2 Downloads
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 601)

Abstract

Document categorization or classification is an active research area for simplifying the information retrieval due to enormous collection of electronic documents. Large amount of information is generated from day-to-day activities through various sources. Practically categorization of these text documents needs dexterous skills and consumes lot of time. Thus, it becomes challenging to automatically organize and classify documents into the pre-defined classes based on their contents using efficient approaches. In this paper, we adopt the weighted sum approach, for classification of documents. The weights for each word in the document are assigned based on the frequency of its appearance in the document. This approach is efficient and yields better performance than the existing methods. The results show the linear growth in time for the increase in varied data sets.

Keywords

Weighted sum Classification Document categorization 

References

  1. 1.
    Tan S (2005) Neighbor-weighted K-nearest neighbour for unbalanced textcorpus, ELSEVIER. Exp Syst Appl 667–671Google Scholar
  2. 2.
    Moldagulova A, Sulaiman RB (2017) Using KNN algorithm for classification of textual documents. In: 8th international conference on information technology. IEEE, pp 665–671Google Scholar
  3. 3.
    Suryawanshi VU, Bogawar P et al (2015) Automatic text classification system. Int J Adv Res Comput Eng Technol 4(2)Google Scholar
  4. 4.
    Vasa K (2016) Text classification through statistical and machine learning methods: a survey. Int J Eng Dev Res 4(2). ISSN: 2321–9939Google Scholar
  5. 5.
    Bahndari Akshita, Gupta Ashutosh, Das Debasis (2015) Improvised apriori algorithm using frequent pattern tree for real time applications in data mining, Elsevier. Procedia Comput Sci 46:644–651CrossRefGoogle Scholar
  6. 6.
    Shang S, Shi M, Shang W, Hong Z (2016) Improved feature weight algorithm and its application to text clasification. In: Mathematical problems in engineering. Hindawi Publishing CorporationGoogle Scholar
  7. 7.
    Dsouza FH, Ananthanarayana VS (2016) Document classification with a weighted frequency patttern tree algorithm. In: International conference on data mining and advanced computing (SAPIENCE)Google Scholar
  8. 8.
    Vijayan VK, Bindu KR, Parameswaran L (2017) A comprehensive study of text classification algorithms. IEEE, pp 1109–1113Google Scholar
  9. 9.
    Deisy AC, Gowri M, Baskar S, Kalaiaasi SMA, Ramraj N (2010) A novel term weighting scheme midf for text categorization. J Eng Sci Technol 5(1):94–107. © School of Engineering, Taylor’s University CollegeGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  • Vimuktha E. Salis
    • 1
  • Ranjana S. Chakrasali
    • 1
  • Chowdaiah Pathanjali
    • 1
    Email author
  1. 1.BNMITBengaluruIndia

Personalised recommendations