Skip to main content

A Comparative Study of Blog Comments Spam Filtering with Machine Learning Techniques

  • Chapter
Soft Computing for Recognition Based on Biometrics

Part of the book series: Studies in Computational Intelligence ((SCI,volume 312))

Abstract

In this paper we compare four machine learning techniques for spam filtering in blog comments. The machine learning techniques are: Naïve Bayes, K-nearest neighbors, neural networks and support vector machines. In this work we used a corpus of 1021 blog comments with 67% spam, the results of the filtering using 10 fold cross-validation are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tretyakov, K.: Machine Learning Techniques in Spam Filtering. Institute of Computer Science, University of Tartu (2004)

    Google Scholar 

  2. Aas, K., Eikvil, L.: Text categorization. A survey (1999), http://citeseer.ist.psu.edu/aas99text.html

  3. Cristianini, N., Shewe-Taylor, J.: An introduction to support Vector Machines and other Kernel Based Learning Methods. Cambridge University Press, Cambridge (2003)

    Google Scholar 

  4. Kecman, V.: Learning and soft computing. The MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  5. Haykin, S.: Neural Networks: A Comprehensive Foundation. Practice Hall (1998)

    Google Scholar 

  6. Androutsopoulos, I., et al.: Learning to filter Spam E-mail: A comparison of Naïve Bayesian and a Memory-Based Approach

    Google Scholar 

  7. Androutsopoulos, I., et al.: An experimental comparison of Naïve Bayesian and Keywords-Based Anti-Spam filtering with Personal E-mail

    Google Scholar 

  8. Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning (1995)

    Google Scholar 

  9. Vladimir, N., Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    MATH  Google Scholar 

  10. Mishne, G., Carmel, D., Lempel, R.: Bocking Blog Spam with Language Model Disagreement

    Google Scholar 

  11. Mishne, G.: Using Blogs Properties to Improve Retrieval

    Google Scholar 

  12. Kolari, P., Finin, T., Joshi, A.: SVMs for the Blogsphere: Blog Identification and Splog Detection. In: AAAI Spring Symposium on Computational Approaches to Analysis Weblogs (2006)

    Google Scholar 

  13. Cormack, G., Gomez, J.M., Puertas, E.: Spam Filterin For Shot Messages

    Google Scholar 

  14. Holdens, S.: Spam Filters (2004), http://freshment.net/articles/view/964

  15. Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning (1992)

    Google Scholar 

  16. Cover, T.M., Hart, P.E.: Nearest Neighbor Pattern Classification. Knowledge Based Systems (1995)

    Google Scholar 

  17. Goldstain, M.: K-Nearest Neighbor Classification (1972)

    Google Scholar 

  18. Bishop, C.M.: Neural Networks for Pattern Recognitions. Oxford University Press, U.K. (1995)

    Google Scholar 

  19. Ning Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Maning. Adison Wesley (2006)

    Google Scholar 

  20. Arasu, A., Novak, J., Tomkins, A., Tomlin, J.: PageRank computation and the structure of the web: Experiments and algorithms. In: Proceedings of the Eleventh International World Wide Web Conference, Poster Track. Brisbane, Australia, pp. 107–117 (2002), http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.18.5264&rep=rep1&type=pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Romero, C., Garcia-Valdez, M., Alanis, A. (2010). A Comparative Study of Blog Comments Spam Filtering with Machine Learning Techniques. In: Melin, P., Kacprzyk, J., Pedrycz, W. (eds) Soft Computing for Recognition Based on Biometrics. Studies in Computational Intelligence, vol 312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15111-8_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15111-8_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15110-1

  • Online ISBN: 978-3-642-15111-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics