Skip to main content

Automated Document Categorization Model

  • Chapter
  • First Online:
  • 1552 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 907))

Abstract

The aim of this work is to build a generic model of Document Clustering that automatically groups together the related documents. Model is built with unsupervised and supervised learning with the assumption of no prior knowledge of the given domain. No manual effort is required for creating the training document set, instead the proposed model automatically generates training document. After that, it uses those for categorizing text documents. In the proposed model, the entire process is broadly divided into two steps. First, the initial classification is done in an unsupervised way. Apply K-means algorithm on the unlabeled documents in order to prepare the training dataset. Text documents are represented here as feature vector format where keywords extracted are considered as a feature. Here the selected representative documents are considered as the initial centroids. In step 2, create a supervised classifier on the initially categorized set. The categorized documents resulted from the previous step are used to train the supervised classifier. Naive Bayes classifier will be used as a statistical text classifier which uses word frequencies as features.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text classification using machine learning techniques. WSEAS Transactions on Computers, 4(8), 966–974.

    Google Scholar 

  2. Purohit, A., Atre, D., Jaswani, P., & Asawara, P. (2015). Text classification in data mining. International Journal of Scientific and Research Publications, 5(6), 1–7.

    Google Scholar 

  3. Morariu, D. I., Cretulescu, R. G., & Breazu, M.: Feature selection in document classification. https://pdfs.semanticscholar.org/.

  4. http://www.codeproject.com/Articles/822379/Text-Mining-and-its-Business-Applications.

  5. https://www.datanovia.com/en/lessons/determining-the-optimal-number-of-clusters-3-must-know-methods/.

  6. Liu, Y. C., Liu, M., Wang, X. L. (2012). Application of self-organizing maps in text clustering: a review (vol. 10). https://doi.org/10.5772/50618.

  7. https://www.kdnuggets.com/2015/01/text-analysis-101-document-classification.html.

  8. Ko, Y., & Seo, J.: Automatic text categorization by unsupervised learning. In: Proceedings of the 18th Conference on Computational Linguistics (vol. 1, pp. 453–459). Association for Computational Linguistics, July 2000.

    Google Scholar 

  9. https://www.slideserve.com/nelly/text-mining-overview.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rakhi Patra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Patra, R. (2021). Automated Document Categorization Model. In: Das, S., Das, S., Dey, N., Hassanien, AE. (eds) Machine Learning Algorithms for Industrial Applications. Studies in Computational Intelligence, vol 907. Springer, Cham. https://doi.org/10.1007/978-3-030-50641-4_2

Download citation

Publish with us

Policies and ethics