Skip to main content

An Abstract-Based Approach for Text Classification

  • Conference paper
  • First Online:
Nature of Computation and Communication (ICTCC 2016)

Abstract

Text classification is a supervised learning task for assigning text document to one or more predefined classes/topics. These topics are determined by a set of training documents. In order to construct a classification model, a machine learning algorithm was used. Training data is often a set of full-text documents. The training model is used to predict a class for new coming document. In this paper, we propose a text classification approach based on automatic text summarization. The proposed approach is tested with 2000 Vietnamese text documents downloaded from vnexpress.net and vietnamnet.vn. The experimental results confirm the feasibility of proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142 (1998)

    Google Scholar 

  2. Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering by a tolerance rough set model. Intl. J. Fuzzy Logic Intell. Syst. 17(2), 199–212 (2012)

    Article  MATH  Google Scholar 

  3. Zaïane, O.R., Antonie, M.-L.: Classifying text documents by associating terms with text categories. In: Proceedings of the 13th Australasian Database Conference, pp. 215–222, Melbourne, Victoria, Australia (2002)

    Google Scholar 

  4. Amini, M.R., Usunier, N., Gallinari, P.: Automatic text summarization based on word-clusters and ranking algorithms. In: Proceedings of the 27th European Conference on Advances in Information Retrieval Research, Santiago de Compostela, Spain (2005). doi:10.1007/978-3-540-31865-1_11

    Google Scholar 

  5. Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, pp. 40–48, Seattle, Washington (2000). doi:10.3115/1117575.1117580

  6. Barzilay, R., Elhadad, N., McKeown, K.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 17, 35–55 (2002)

    MATH  Google Scholar 

  7. Johnson, D., Oles, F., Zhang, T., Goetz, T.: A decision tree-based symbolic rule induction system for text categorization. IBM Syst. J. 41(3), 428–437 (2002)

    Article  Google Scholar 

  8. Han, E.H., Karypis, G., Kumar, V.: Text categorization using weighted-adjusted k-nearest neighbor classification. In: PAKDD Conference (2001)

    Google Scholar 

  9. Ruiz, M., Srinivasan, P.: Hierarchical neural networks for text categorization. In: ACM SIGIR Conference (1999)

    Google Scholar 

  10. Truong, Q-D., Nguyen, Q-D.: Automatic Vietnamese text summarization (in Vietnamese). In: Proceeding of The Fifteenth National Conference, pp. 233–238, Hanoi, Vietnam (2012)

    Google Scholar 

  11. Hông Phuong, L., Thi Minh Huyên, N., Roussanaly, A., Vinh, H.T.: A hybrid approach to word segmentation of Vietnamese texts. In: Martín-Vide, C., Otto, F., Fernau, H. (eds.) LATA 2008. LNCS, vol. 5196, pp. 240–249. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88282-4_23

    Chapter  Google Scholar 

  12. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999)

    Google Scholar 

  13. Jaccard P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura, Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547–579

    Google Scholar 

  14. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)

    Google Scholar 

  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quoc Dinh Truong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Cite this paper

Truong, Q.D., Huynh, H.X., Nguyen, C.N. (2016). An Abstract-Based Approach for Text Classification. In: Vinh, P., Barolli, L. (eds) Nature of Computation and Communication. ICTCC 2016. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 168. Springer, Cham. https://doi.org/10.1007/978-3-319-46909-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46909-6_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46908-9

  • Online ISBN: 978-3-319-46909-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics