Abstract
In this paper we compare four machine learning techniques for spam filtering in blog comments. The machine learning techniques are: Naïve Bayes, K-nearest neighbors, neural networks and support vector machines. In this work we used a corpus of 1021 blog comments with 67% spam, the results of the filtering using 10 fold cross-validation are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tretyakov, K.: Machine Learning Techniques in Spam Filtering. Institute of Computer Science, University of Tartu (2004)
Aas, K., Eikvil, L.: Text categorization. A survey (1999), http://citeseer.ist.psu.edu/aas99text.html
Cristianini, N., Shewe-Taylor, J.: An introduction to support Vector Machines and other Kernel Based Learning Methods. Cambridge University Press, Cambridge (2003)
Kecman, V.: Learning and soft computing. The MIT Press, Cambridge (2001)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Practice Hall (1998)
Androutsopoulos, I., et al.: Learning to filter Spam E-mail: A comparison of Naïve Bayesian and a Memory-Based Approach
Androutsopoulos, I., et al.: An experimental comparison of Naïve Bayesian and Keywords-Based Anti-Spam filtering with Personal E-mail
Cortes, C., Vapnik, V.: Support Vector Networks. Machine Learning (1995)
Vladimir, N., Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Mishne, G., Carmel, D., Lempel, R.: Bocking Blog Spam with Language Model Disagreement
Mishne, G.: Using Blogs Properties to Improve Retrieval
Kolari, P., Finin, T., Joshi, A.: SVMs for the Blogsphere: Blog Identification and Splog Detection. In: AAAI Spring Symposium on Computational Approaches to Analysis Weblogs (2006)
Cormack, G., Gomez, J.M., Puertas, E.: Spam Filterin For Shot Messages
Holdens, S.: Spam Filters (2004), http://freshment.net/articles/view/964
Cost, S., Salzberg, S.: A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning (1992)
Cover, T.M., Hart, P.E.: Nearest Neighbor Pattern Classification. Knowledge Based Systems (1995)
Goldstain, M.: K-Nearest Neighbor Classification (1972)
Bishop, C.M.: Neural Networks for Pattern Recognitions. Oxford University Press, U.K. (1995)
Ning Tan, P., Steinbach, M., Kumar, V.: Introduction to Data Maning. Adison Wesley (2006)
Arasu, A., Novak, J., Tomkins, A., Tomlin, J.: PageRank computation and the structure of the web: Experiments and algorithms. In: Proceedings of the Eleventh International World Wide Web Conference, Poster Track. Brisbane, Australia, pp. 107–117 (2002), http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.18.5264&rep=rep1&type=pdf
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Romero, C., Garcia-Valdez, M., Alanis, A. (2010). A Comparative Study of Blog Comments Spam Filtering with Machine Learning Techniques. In: Melin, P., Kacprzyk, J., Pedrycz, W. (eds) Soft Computing for Recognition Based on Biometrics. Studies in Computational Intelligence, vol 312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15111-8_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-15111-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15110-1
Online ISBN: 978-3-642-15111-8
eBook Packages: EngineeringEngineering (R0)