An Abstract-Based Approach for Text Classification

Truong, Quoc Dinh; Huynh, Hiep Xuan; Nguyen, Cuong Ngoc

doi:10.1007/978-3-319-46909-6_22

Quoc Dinh Truong¹⁷,
Hiep Xuan Huynh¹⁷ &
Cuong Ngoc Nguyen¹⁸

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 168))

Included in the following conference series:

International Conference on Nature of Computation and Communication

688 Accesses

Abstract

Text classification is a supervised learning task for assigning text document to one or more predefined classes/topics. These topics are determined by a set of training documents. In order to construct a classification model, a machine learning algorithm was used. Training data is often a set of full-text documents. The training model is used to predict a class for new coming document. In this paper, we propose a text classification approach based on automatic text summarization. The proposed approach is tested with 2000 Vietnamese text documents downloaded from vnexpress.net and vietnamnet.vn. The experimental results confirm the feasibility of proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142 (1998)
Google Scholar
Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering by a tolerance rough set model. Intl. J. Fuzzy Logic Intell. Syst. 17(2), 199–212 (2012)
Article MATH Google Scholar
Zaïane, O.R., Antonie, M.-L.: Classifying text documents by associating terms with text categories. In: Proceedings of the 13th Australasian Database Conference, pp. 215–222, Melbourne, Victoria, Australia (2002)
Google Scholar
Amini, M.R., Usunier, N., Gallinari, P.: Automatic text summarization based on word-clusters and ranking algorithms. In: Proceedings of the 27th European Conference on Advances in Information Retrieval Research, Santiago de Compostela, Spain (2005). doi:10.1007/978-3-540-31865-1_11
Google Scholar
Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, pp. 40–48, Seattle, Washington (2000). doi:10.3115/1117575.1117580
Barzilay, R., Elhadad, N., McKeown, K.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 17, 35–55 (2002)
MATH Google Scholar
Johnson, D., Oles, F., Zhang, T., Goetz, T.: A decision tree-based symbolic rule induction system for text categorization. IBM Syst. J. 41(3), 428–437 (2002)
Article Google Scholar
Han, E.H., Karypis, G., Kumar, V.: Text categorization using weighted-adjusted k-nearest neighbor classification. In: PAKDD Conference (2001)
Google Scholar
Ruiz, M., Srinivasan, P.: Hierarchical neural networks for text categorization. In: ACM SIGIR Conference (1999)
Google Scholar
Truong, Q-D., Nguyen, Q-D.: Automatic Vietnamese text summarization (in Vietnamese). In: Proceeding of The Fifteenth National Conference, pp. 233–238, Hanoi, Vietnam (2012)
Google Scholar
Hông Phuong, L., Thi Minh Huyên, N., Roussanaly, A., Vinh, H.T.: A hybrid approach to word segmentation of Vietnamese texts. In: Martín-Vide, C., Otto, F., Fernau, H. (eds.) LATA 2008. LNCS, vol. 5196, pp. 240–249. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88282-4_23
Chapter Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999)
Google Scholar
Jaccard P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura, Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547–579
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information and Communication Technology, Can Tho University, Campus 2, 3/2 Street, Ninh Kieu District, Can Tho City, Vietnam
Quoc Dinh Truong & Hiep Xuan Huynh
Department of Computer and Mathematical, The People’s Security University, Km9 Nguyen Trai Street, Ha Dong District, Ha Noi, Vietnam
Cuong Ngoc Nguyen

Authors

Quoc Dinh Truong
View author publications
You can also search for this author in PubMed Google Scholar
Hiep Xuan Huynh
View author publications
You can also search for this author in PubMed Google Scholar
Cuong Ngoc Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quoc Dinh Truong .

Editor information

Editors and Affiliations

Nguyen Tat Thanh University , Ho Chi Minh City, Vietnam
Phan Cong Vinh
Fukuoka Institute of Technology , Fukuoka, Japan
Leonard Barolli

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Truong, Q.D., Huynh, H.X., Nguyen, C.N. (2016). An Abstract-Based Approach for Text Classification. In: Vinh, P., Barolli, L. (eds) Nature of Computation and Communication. ICTCC 2016. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 168. Springer, Cham. https://doi.org/10.1007/978-3-319-46909-6_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-46909-6_22
Published: 26 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46908-9
Online ISBN: 978-3-319-46909-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics