Abstract
Text document classification is creating more interest because of the availability of the information in the textual or electronic form. Generally, in conventional approaches, representation of text data and classification of text documents are considered as nondependent issues. In this research article, we have considered that overall efficiency of the text classification system depended on the effective representation of text data and efficient methodology for classification of the text documents. Here effective compressed representation for text documents is proposed for the text documents. Followed by a B-Tree-based classification methodology is adapted for classification. The proposed compressed representation and B-Tree methodologies are verified on the publically available large corpus to validate the effectiveness of the proposed models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Rigutini, L.: Automatic text processing: machine learning techniques. Ph.D. thesis, University of Siena (2004)
Bhushan Bharath S.N., Danti, A.: Classification of text documents based on score level fusion approach. Pattern Recogn. Lett. 94, 118–126 (2017)
Marton, Y., Wu, N., Hellerstein, L.: On compression-based text classification. In: Proceedings of the European Colloquium on IR Research (ECIR), pp. 300–314 (2005)
Teahan, W., Harper, D.: Using compression based language models for text categorization. In: Proceedings of 2001 Workshop on Language Modeling and Information Retrieval (1998)
Frank, E., Cai, C., Witten, H.: Text Categorization using compression models. In: Proceedings of DCC-00, IEEE Data Compression Conference (2000)
Clemens, S., Frank, P.: Low complexity compression of short messages. In: Proceedings of IEEE Data Compression Conference, pp. 123–132 (2006)
Snel, V., Plato, J., Qawasmeh, E.: Compression of small text files. J. Adv. Eng. Inform. Inf. Achieve 20, 410–417 (2008)
Dvorski, J., Pokorn, J., Snsel V.: Word-based compression methods and indexing for text retrieval systems. In: Proceeding Third East European Conference on Advances in Databases and Information Systems, pp. 75–84 (1999)
Khurana, U., Koul, A.: Text compression and superfast searching. In: Proceedings of the CoRR, 2005 (2005)
Moura, E., Ziviani, N., Navarro, G., Yates, R.B.: Fast searching on compressed text allowing errors. In: Proceedings of the 21st Annual International ACM Sigir Conference on Research and Development in Information Retrieval, pp. 298–306 (1998)
Nieves, G., Brisaboa, E.L., Param, J.: An efficient compression code for text databases. In: Proceedings of the 25th European Conference on IR Research, pp. 468–481 (2003)
Horspool, R.N., Cormack, G.V.: Constructing word based text compression of short messages. In: Proceedings of the IEEE Data Compression Conference, pp. 62–71 (1992)
Danti, A., Bhushan Bharath, S.N.: Document vector space representation model for automatic text classification. In: Proceedings of International Conference on Multimedia Processing, Communication and Information Technology, Shimoga, pp. 338–344 (2013)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)
Salton, G., Buckely, C.: Term weighting approaches in automatic text retrieval. J. Inf. Process. Manag. 24(5), 513–523 (1988)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of European Conference on Machine Learning (ECML), No. 1398, pp. 137–142 (2000)
Danti, A., Bhushan Bharath, S.N.: Classification of text documents using integer representation and regression: an integrated approach. Spec. Issue of The IIOAB Scopus Index. J. 7(2), 45–50 (2016)
Bhushan Bharath, S.N., Danti, A., Fernandes, S.L.: A novel integer representation-based approach for classification of text documents. In: Proceedings of the International Conference on Data Engineering and Communication Technology, pp 557–564 (2017)
Hotho, A., Nurnberger, A., Paab, G.: A brief survey of text mining. J. Comput. Linguist. Lang. Technol. 20, 19–62 (2005)
Mccallum, A.K., Nigam, K.: Employing EM in pool-based active learning for text classification. In: Proceedings of the 15th International Conference on Machine Learning, USA, pp. 350–358 (1998)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bharath Bhushan, S.N., Danti, A., Fernandes, S.L. (2018). Integer Representation and B-Tree for Classification of Text Documents: An Integrated Approach. In: Satapathy, S., Tavares, J., Bhateja, V., Mohanty, J. (eds) Information and Decision Sciences. Advances in Intelligent Systems and Computing, vol 701. Springer, Singapore. https://doi.org/10.1007/978-981-10-7563-6_50
Download citation
DOI: https://doi.org/10.1007/978-981-10-7563-6_50
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7562-9
Online ISBN: 978-981-10-7563-6
eBook Packages: EngineeringEngineering (R0)