Skip to main content

Text Classification of News Articles with Support Vector Machines

  • Conference paper

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 138))

Abstract

Abstract. Support Vector Machines (SVM) can classify objects described by an effectively infinite-dimensional feature vector. This gives them the ability to use counts of different words in a document, i.e. more than 100000 words, directly for classification. In this paper we describe the results of a large number of experiments of different preprocessing strategies to generate effective input features. It turns out that n-grams of syllables and phonemes are especially effective for classification.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cortes, C., Vapnik, V.: Support-vector Networks. Machine Learning Journal, (1995) 20 273–297.

    MATH  Google Scholar 

  2. Vapnik, V.: Statistical Learning Theory. Wiley, Chichester, (1998).

    MATH  Google Scholar 

  3. Vapnik, V.: Estimation of Dependencies Based on Empirical Data. (1982) Springer.

    Google Scholar 

  4. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Proceedings of the Tenth European Conference on Machine Learning ECML ’98 (1998) 137–142.

    Google Scholar 

  5. Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. Proceedings of the 7th International Conference on Information retrieval and Knowledge-Management ACM-CIKM-98 (1998) 148–155.

    Google Scholar 

  6. Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spam categorization. IEEE Transactions on Neural Networks, (1999) 10 (5): 1048–1054.

    Article  Google Scholar 

  7. Joachims, T.: Learning Text Classifiers with Support Vector Machines, (2002) Kluwer.

    Book  Google Scholar 

  8. Lezius, W., Rapp, R., Wettler, M.: A Freely Available Morphological Analyzer, Disambiguator and Context Sensitive Lemmatizer for German; in: Proceedings of the COLING-ACL (1998).

    Google Scholar 

  9. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval; McGraw Hill: New York et al. , (1983)

    MATH  Google Scholar 

  10. Leopold, E., Kindermann, J.: Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?; in: Machine Learning 46, 423 – 444, (2002)

    Article  MATH  Google Scholar 

  11. Paaß, G., Leopold, E., Larson, M., Kindermann, J., Eickeler, S.: SVM Classification Using Sequences of Phonemes and Syllables; paper presented at the European Conference on Machine Learning (ECML), 19 – 23 August 2002 in Helsinki (Finland).

    Google Scholar 

  12. Klabbers, E., Stöber, K. , Veldhuis, R. Wagner, P., Breuer, S.: Speech synthesis development made easy: The Bonn Open Synthesis System, EUROSPEECH 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Paaß, G., Kindermann, J., Leopold, E. (2004). Text Classification of News Articles with Support Vector Machines. In: Sirmakessis, S. (eds) Text Mining and its Applications. Studies in Fuzziness and Soft Computing, vol 138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45219-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-45219-5_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05780-9

  • Online ISBN: 978-3-540-45219-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics