Integer Representation and B-Tree for Classification of Text Documents: An Integrated Approach

Bharath Bhushan, S. N.; Danti, Ajit; Fernandes, Steven Lawrence

doi:10.1007/978-981-10-7563-6_50

Integer Representation and B-Tree for Classification of Text Documents: An Integrated Approach

S. N. Bharath Bhushan¹⁸,
Ajit Danti¹⁹ &
Steven Lawrence Fernandes²⁰

Conference paper
First Online: 14 April 2018

1350 Accesses
1 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 701))

Abstract

Text document classification is creating more interest because of the availability of the information in the textual or electronic form. Generally, in conventional approaches, representation of text data and classification of text documents are considered as nondependent issues. In this research article, we have considered that overall efficiency of the text classification system depended on the effective representation of text data and efficient methodology for classification of the text documents. Here effective compressed representation for text documents is proposed for the text documents. Followed by a B-Tree-based classification methodology is adapted for classification. The proposed compressed representation and B-Tree methodologies are verified on the publically available large corpus to validate the effectiveness of the proposed models.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Rigutini, L.: Automatic text processing: machine learning techniques. Ph.D. thesis, University of Siena (2004)
Google Scholar
Bhushan Bharath S.N., Danti, A.: Classification of text documents based on score level fusion approach. Pattern Recogn. Lett. 94, 118–126 (2017)
Google Scholar
Marton, Y., Wu, N., Hellerstein, L.: On compression-based text classification. In: Proceedings of the European Colloquium on IR Research (ECIR), pp. 300–314 (2005)
Google Scholar
Teahan, W., Harper, D.: Using compression based language models for text categorization. In: Proceedings of 2001 Workshop on Language Modeling and Information Retrieval (1998)
Google Scholar
Frank, E., Cai, C., Witten, H.: Text Categorization using compression models. In: Proceedings of DCC-00, IEEE Data Compression Conference (2000)
Google Scholar
Clemens, S., Frank, P.: Low complexity compression of short messages. In: Proceedings of IEEE Data Compression Conference, pp. 123–132 (2006)
Google Scholar
Snel, V., Plato, J., Qawasmeh, E.: Compression of small text files. J. Adv. Eng. Inform. Inf. Achieve 20, 410–417 (2008)
Google Scholar
Dvorski, J., Pokorn, J., Snsel V.: Word-based compression methods and indexing for text retrieval systems. In: Proceeding Third East European Conference on Advances in Databases and Information Systems, pp. 75–84 (1999)
Google Scholar
Khurana, U., Koul, A.: Text compression and superfast searching. In: Proceedings of the CoRR, 2005 (2005)
Google Scholar
Moura, E., Ziviani, N., Navarro, G., Yates, R.B.: Fast searching on compressed text allowing errors. In: Proceedings of the 21st Annual International ACM Sigir Conference on Research and Development in Information Retrieval, pp. 298–306 (1998)
Google Scholar
Nieves, G., Brisaboa, E.L., Param, J.: An efficient compression code for text databases. In: Proceedings of the 25th European Conference on IR Research, pp. 468–481 (2003)
Google Scholar
Horspool, R.N., Cormack, G.V.: Constructing word based text compression of short messages. In: Proceedings of the IEEE Data Compression Conference, pp. 62–71 (1992)
Google Scholar
Danti, A., Bhushan Bharath, S.N.: Document vector space representation model for automatic text classification. In: Proceedings of International Conference on Multimedia Processing, Communication and Information Technology, Shimoga, pp. 338–344 (2013)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)
Article Google Scholar
Salton, G., Buckely, C.: Term weighting approaches in automatic text retrieval. J. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of European Conference on Machine Learning (ECML), No. 1398, pp. 137–142 (2000)
Chapter Google Scholar
Danti, A., Bhushan Bharath, S.N.: Classification of text documents using integer representation and regression: an integrated approach. Spec. Issue of The IIOAB Scopus Index. J. 7(2), 45–50 (2016)
Google Scholar
Bhushan Bharath, S.N., Danti, A., Fernandes, S.L.: A novel integer representation-based approach for classification of text documents. In: Proceedings of the International Conference on Data Engineering and Communication Technology, pp 557–564 (2017)
Google Scholar
Hotho, A., Nurnberger, A., Paab, G.: A brief survey of text mining. J. Comput. Linguist. Lang. Technol. 20, 19–62 (2005)
Google Scholar
Mccallum, A.K., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning, USA, pp. 350–358 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, Sahyadri College of Engineering & Management, Adyar, Mangalore, 575007, India
S. N. Bharath Bhushan
Department of Computer Applications, JNN College of Engineering, Shimoga, 577204, India
Ajit Danti
Department of Electronics and Communications, Sahyadri College of Engineering & Management, Adyar, Mangalore, 575007, India
Steven Lawrence Fernandes

Authors

S. N. Bharath Bhushan
View author publications
You can also search for this author in PubMed Google Scholar
Ajit Danti
View author publications
You can also search for this author in PubMed Google Scholar
Steven Lawrence Fernandes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. N. Bharath Bhushan .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, PVP Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India
Suresh Chandra Satapathy
Departamento de Engenharia Mecânica, Universidade do Porto, Porto, Portugal
Joao Manuel R.S. Tavares
Department of Electronics and Communication Engineering, SRMGPC, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
School of Computer Application, KIIT University, Bhubaneswar, Odisha, India
J. R. Mohanty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bharath Bhushan, S.N., Danti, A., Fernandes, S.L. (2018). Integer Representation and B-Tree for Classification of Text Documents: An Integrated Approach. In: Satapathy, S., Tavares, J., Bhateja, V., Mohanty, J. (eds) Information and Decision Sciences. Advances in Intelligent Systems and Computing, vol 701. Springer, Singapore. https://doi.org/10.1007/978-981-10-7563-6_50

Download citation

DOI: https://doi.org/10.1007/978-981-10-7563-6_50
Published: 14 April 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7562-9
Online ISBN: 978-981-10-7563-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics