Abstract
Text classification is a supervised learning task for assigning text document to one or more predefined classes/topics. These topics are determined by a set of training documents. In order to construct a classification model, a machine learning algorithm was used. Training data is often a set of full-text documents. The training model is used to predict a class for new coming document. In this paper, we propose a text classification approach based on automatic text summarization. The proposed approach is tested with 2000 Vietnamese text documents downloaded from vnexpress.net and vietnamnet.vn. The experimental results confirm the feasibility of proposed model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning, pp. 137–142 (1998)
Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering by a tolerance rough set model. Intl. J. Fuzzy Logic Intell. Syst. 17(2), 199–212 (2012)
Zaïane, O.R., Antonie, M.-L.: Classifying text documents by associating terms with text categories. In: Proceedings of the 13th Australasian Database Conference, pp. 215–222, Melbourne, Victoria, Australia (2002)
Amini, M.R., Usunier, N., Gallinari, P.: Automatic text summarization based on word-clusters and ranking algorithms. In: Proceedings of the 27th European Conference on Advances in Information Retrieval Research, Santiago de Compostela, Spain (2005). doi:10.1007/978-3-540-31865-1_11
Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic Summarization, pp. 40–48, Seattle, Washington (2000). doi:10.3115/1117575.1117580
Barzilay, R., Elhadad, N., McKeown, K.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 17, 35–55 (2002)
Johnson, D., Oles, F., Zhang, T., Goetz, T.: A decision tree-based symbolic rule induction system for text categorization. IBM Syst. J. 41(3), 428–437 (2002)
Han, E.H., Karypis, G., Kumar, V.: Text categorization using weighted-adjusted k-nearest neighbor classification. In: PAKDD Conference (2001)
Ruiz, M., Srinivasan, P.: Hierarchical neural networks for text categorization. In: ACM SIGIR Conference (1999)
Truong, Q-D., Nguyen, Q-D.: Automatic Vietnamese text summarization (in Vietnamese). In: Proceeding of The Fifteenth National Conference, pp. 233–238, Hanoi, Vietnam (2012)
Hông Phuong, L., Thi Minh Huyên, N., Roussanaly, A., Vinh, H.T.: A hybrid approach to word segmentation of Vietnamese texts. In: Martín-Vide, C., Otto, F., Fernau, H. (eds.) LATA 2008. LNCS, vol. 5196, pp. 240–249. Springer, Heidelberg (2008). doi:10.1007/978-3-540-88282-4_23
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999)
Jaccard P.: Étude comparative de la distribution florale dans une portion des Alpes et des Jura, Bulletin de la Société Vaudoise des Sciences Naturelles 37, 547–579
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Truong, Q.D., Huynh, H.X., Nguyen, C.N. (2016). An Abstract-Based Approach for Text Classification. In: Vinh, P., Barolli, L. (eds) Nature of Computation and Communication. ICTCC 2016. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 168. Springer, Cham. https://doi.org/10.1007/978-3-319-46909-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-46909-6_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46908-9
Online ISBN: 978-3-319-46909-6
eBook Packages: Computer ScienceComputer Science (R0)