Compression Based Modeling for Classification of Text Documents

  • S. N. Bharath BhushanEmail author
  • Ajit Danti
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1037)


Classification of text data one of the well known, interesting research topic in computer science and knowledge engineering. This research article, address the classification of text files issue using lzw text compression algorithms. LZW is a lossless compression technique which requires two pass on the input data. These two passes are treated separately as training stage and text stage for classification of text data. The proposed compression based classification technique is tested on publically available datasets. Results of the experiments shows the effectiveness of the proposed algorithm.


Text classification LZW text compression Compressed representation 


  1. 1.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)CrossRefGoogle Scholar
  2. 2.
    Bhushan Bharath, S.N., Ajit, D.: Classification of text documents based on score level fusion approach. Pattern Recogn. Lett. 94, 118–126 (2017)CrossRefGoogle Scholar
  3. 3.
    Schoenharl, T.W., Madey, G.: Evaluation of measurement techniques for the validation of agent-based simulations against streaming data. In: Proceedings of ICCS, Kraków, Poland (2008)CrossRefGoogle Scholar
  4. 4.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, Elsevier, Boston (2006)Google Scholar
  5. 5.
    Ajit, D., Bhushan Bharath, S.N.: Document vector space representation model for automatic text classification. In: Proceedings of International Conference on Multimedia Processing, Communication and Information Technology, Shimoga, pp. 338–344 (2013)Google Scholar
  6. 6.
    Du, Y., LiuW, L.X., Peng, G.: An improved focused crawler based on semantic similarity vector space model. Appl. Soft Comput. 36, 392–407 (2015). Scholar
  7. 7.
    Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). Scholar
  8. 8.
    Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 541–544 (2003)Google Scholar
  9. 9.
    Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text classification. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1998)Google Scholar
  10. 10.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). Scholar
  11. 11.
    Donald, S.: Probabilistic neural networks. J. Neural Networks 3(1), 109–118 (1990)CrossRefGoogle Scholar
  12. 12.
    Patra, A., Singh, D.: Neural network approach for text classification using relevance factor as term weighing method. Int. J. Comput. Appl. 68(17), 37–41 (2013)Google Scholar
  13. 13.
    Ajit, D., Bharath, B.: Classification of text documents using integer representation and regression: an integrated approach. Spec. Issue IIOAB Scopus Indexed J. 7(2), 45–50 (2016)Google Scholar
  14. 14.
    Bharath Bhushan, S.N., Danti, A.: Classification of compressed and uncompressed text documents. Future Gener. Comput. Syst. 88, 614–623 (2018)CrossRefGoogle Scholar
  15. 15.
    Bharath Bhushan, S.N., Danti, A.: Comparative study of clustering algorithms on compressed text data. Int. J. Comput. Eng. Appl. XII(I), 182–190 (2018)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringSahyadri College of Engineering & ManagementMangaluruIndia
  2. 2.Faculty of Engineering-CSEChrist (Deemed to be University)BangaloreIndia

Personalised recommendations