Abstract
Abstract. Support Vector Machines (SVM) can classify objects described by an effectively infinite-dimensional feature vector. This gives them the ability to use counts of different words in a document, i.e. more than 100000 words, directly for classification. In this paper we describe the results of a large number of experiments of different preprocessing strategies to generate effective input features. It turns out that n-grams of syllables and phonemes are especially effective for classification.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cortes, C., Vapnik, V.: Support-vector Networks. Machine Learning Journal, (1995) 20 273–297.
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester, (1998).
Vapnik, V.: Estimation of Dependencies Based on Empirical Data. (1982) Springer.
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Proceedings of the Tenth European Conference on Machine Learning ECML ’98 (1998) 137–142.
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. Proceedings of the 7th International Conference on Information retrieval and Knowledge-Management ACM-CIKM-98 (1998) 148–155.
Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Transactions on Neural Networks, (1999) 10 (5): 1048–1054.
Joachims, T.: Learning Text Classifiers with Support Vector Machines, (2002) Kluwer.
Lezius, W., Rapp, R., Wettler, M.: A Freely Available Morphological Analyzer, Disambiguator and Context Sensitive Lemmatizer for German; in: Proceedings of the COLING-ACL (1998).
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval; McGraw Hill: New York et al. , (1983)
Leopold, E., Kindermann, J.: Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?; in: Machine Learning 46, 423 – 444, (2002)
Paaß, G., Leopold, E., Larson, M., Kindermann, J., Eickeler, S.: SVM Classification Using Sequences of Phonemes and Syllables; paper presented at the European Conference on Machine Learning (ECML), 19 – 23 August 2002 in Helsinki (Finland).
Klabbers, E., Stöber, K. , Veldhuis, R. Wagner, P., Breuer, S.: Speech synthesis development made easy: The Bonn Open Synthesis System, EUROSPEECH 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paaß, G., Kindermann, J., Leopold, E. (2004). Text Classification of News Articles with Support Vector Machines. In: Sirmakessis, S. (eds) Text Mining and its Applications. Studies in Fuzziness and Soft Computing, vol 138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45219-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-45219-5_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05780-9
Online ISBN: 978-3-540-45219-5
eBook Packages: Springer Book Archive