Skip to main content

Integer Representation and B-Tree for Classification of Text Documents: An Integrated Approach

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 701))

Abstract

Text document classification is creating more interest because of the availability of the information in the textual or electronic form. Generally, in conventional approaches, representation of text data and classification of text documents are considered as nondependent issues. In this research article, we have considered that overall efficiency of the text classification system depended on the effective representation of text data and efficient methodology for classification of the text documents. Here effective compressed representation for text documents is proposed for the text documents. Followed by a B-Tree-based classification methodology is adapted for classification. The proposed compressed representation and B-Tree methodologies are verified on the publically available large corpus to validate the effectiveness of the proposed models.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Rigutini, L.: Automatic text processing: machine learning techniques. Ph.D. thesis, University of Siena (2004)

    Google Scholar 

  2. Bhushan Bharath S.N., Danti, A.: Classification of text documents based on score level fusion approach. Pattern Recogn. Lett. 94, 118–126 (2017)

    Google Scholar 

  3. Marton, Y., Wu, N., Hellerstein, L.: On compression-based text classification. In: Proceedings of the European Colloquium on IR Research (ECIR), pp. 300–314 (2005)

    Google Scholar 

  4. Teahan, W., Harper, D.: Using compression based language models for text categorization. In: Proceedings of 2001 Workshop on Language Modeling and Information Retrieval (1998)

    Google Scholar 

  5. Frank, E., Cai, C., Witten, H.: Text Categorization using compression models. In: Proceedings of DCC-00, IEEE Data Compression Conference (2000)

    Google Scholar 

  6. Clemens, S., Frank, P.: Low complexity compression of short messages. In: Proceedings of IEEE Data Compression Conference, pp. 123–132 (2006)

    Google Scholar 

  7. Snel, V., Plato, J., Qawasmeh, E.: Compression of small text files. J. Adv. Eng. Inform. Inf. Achieve 20, 410–417 (2008)

    Google Scholar 

  8. Dvorski, J., Pokorn, J., Snsel V.: Word-based compression methods and indexing for text retrieval systems. In: Proceeding Third East European Conference on Advances in Databases and Information Systems, pp. 75–84 (1999)

    Google Scholar 

  9. Khurana, U., Koul, A.: Text compression and superfast searching. In: Proceedings of the CoRR, 2005 (2005)

    Google Scholar 

  10. Moura, E., Ziviani, N., Navarro, G., Yates, R.B.: Fast searching on compressed text allowing errors. In: Proceedings of the 21st Annual International ACM Sigir Conference on Research and Development in Information Retrieval, pp. 298–306 (1998)

    Google Scholar 

  11. Nieves, G., Brisaboa, E.L., Param, J.: An efficient compression code for text databases. In: Proceedings of the 25th European Conference on IR Research, pp. 468–481 (2003)

    Google Scholar 

  12. Horspool, R.N., Cormack, G.V.: Constructing word based text compression of short messages. In: Proceedings of the IEEE Data Compression Conference, pp. 62–71 (1992)

    Google Scholar 

  13. Danti, A., Bhushan Bharath, S.N.: Document vector space representation model for automatic text classification. In: Proceedings of International Conference on Multimedia Processing, Communication and Information Technology, Shimoga, pp. 338–344 (2013)

    Google Scholar 

  14. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)

    Article  Google Scholar 

  15. Salton, G., Buckely, C.: Term weighting approaches in automatic text retrieval. J. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  16. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of European Conference on Machine Learning (ECML), No. 1398, pp. 137–142 (2000)

    Chapter  Google Scholar 

  17. Danti, A., Bhushan Bharath, S.N.: Classification of text documents using integer representation and regression: an integrated approach. Spec. Issue of The IIOAB Scopus Index. J. 7(2), 45–50 (2016)

    Google Scholar 

  18. Bhushan Bharath, S.N., Danti, A., Fernandes, S.L.: A novel integer representation-based approach for classification of text documents. In: Proceedings of the International Conference on Data Engineering and Communication Technology, pp 557–564 (2017)

    Google Scholar 

  19. Hotho, A., Nurnberger, A., Paab, G.: A brief survey of text mining. J. Comput. Linguist. Lang. Technol. 20, 19–62 (2005)

    Google Scholar 

  20. Mccallum, A.K., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning, USA, pp. 350–358 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. N. Bharath Bhushan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bharath Bhushan, S.N., Danti, A., Fernandes, S.L. (2018). Integer Representation and B-Tree for Classification of Text Documents: An Integrated Approach. In: Satapathy, S., Tavares, J., Bhateja, V., Mohanty, J. (eds) Information and Decision Sciences. Advances in Intelligent Systems and Computing, vol 701. Springer, Singapore. https://doi.org/10.1007/978-981-10-7563-6_50

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-7563-6_50

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-7562-9

  • Online ISBN: 978-981-10-7563-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics