Spam Detection on Twitter Using Traditional Classifiers

McCord, M.; Chuah, M.

doi:10.1007/978-3-642-23496-5_13

M. McCord²² &
M. Chuah²²

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6906))

Included in the following conference series:

International Conference on Autonomic and Trusted Computing

1907 Accesses
136 Citations

Abstract

Social networking sites have become very popular in recent years. Users use them to find new friends, updates their existing friends with their latest thoughts and activities. Among these sites, Twitter is the fastest growing site. Its popularity also attracts many spammers to infiltrate legitimate users’ accounts with a large amount of spam messages. In this paper, we discuss some user-based and content-based features that are different between spammers and legitimate users. Then, we use these features to facilitate spam detection. Using the API methods provided by Twitter, we crawled active Twitter users, their followers/following information and their most recent 100 tweets. Then, we evaluated our detection scheme based on the suggested user and content-based features. Our results show that among the four classifiers we evaluated, the Random Forest classifier produces the best results. Our spam detector can achieve 95.7% precision and 95.7% F-measure using the Random Forest classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Mowbray, M.: The Twittering Machine. In: Proceedings of the 6th International Conference on Web Information and Technologies (April 2010)
Google Scholar
Analytics, P.: Twitter study (August 2009), http://www.peranalytics.com/blog/wp-content/uploads/2010/05/Twitter-Study-August-2009.pdf
CNET. 4 chan may be behind attack on twitter (2009), http://news.cnet.com/8301-13515_3-10279618-26.html
How to; 5 Top methods & applications to reduce Twitter Spam, http://blog.thoughtpick.com/2009/07/how-to-5-top-methods-applications-to-reduce-twitter-spam.html
Twitter, Restoring accidentally suspended accounts (2009a), http://status.twitter.com/post/136164828/restoring-accidentally-suspended-accounts
Twitter. The twitter rules (2009b), http://status.twitter.com/post/136164828/restoring-accidentally-suspended-accounts
Rish, I.: An empirical study of the naïve bayes classifier. In: Proeedings of IJCAI Workshop on Empirical Methods in Artificial Intelligence (2005)
Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
MATH Google Scholar
Compete site comparison, http://siteanalytics.compete.com/facebookcom+myspace.com+twitter.com/
Sophos facebook id probe (2008), http://www.sophos.com/pressoffice/news/articles/2007/08/facebook.html
Bilge, L., et al.: All your contacts are belong to us: automated identifty theft attacks on social networks. In: Proceedings of ACM World Wide Web Conference (2009)
Google Scholar
Jagatic, T.N., et al.: Social Phishing. Communications of ACM 50(10), 94–100 (2007)
Article Google Scholar
Yardi, S., et al.: Detecting Spam in a Twitter Network. First Monday 15(1) (2010)
Google Scholar
Stringhini, G., Kruegel, C., Vigna, G.: Detecting Spammers on Social Networks. In: Proceedings of ACM ACSAS 2010 (December 2010)
Google Scholar
Wang, A.H.: Don’t Follow me: Twitter Spam Detection. In: Proceedings of 5th International Conference on Security and Cryptography (July 2010)
Google Scholar
Platt, J.: Sequential Minimal Optimization: A fast algorithm for training support vector machines. In: Schoelkopf, B., et al. (eds.) Advanced in Kernel Methods – Support Vector Learning. MIT Press, Cambridge
Google Scholar
Berger, H., Kohle, M., Merkl, D.: On the impact of document representation on classifier performance in email categorization. In: Proceedings of the 4th International Conference on Information Systems Technology and IST Applications (May 2005)
Google Scholar
Aha, D., Kibler, D.: Instance-based Learning Algorithms. Machine Learning 6, 37–66
Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1) (October 2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science & Engineering Department, Lehigh University, Bethlehem, PA, 18015, USA
M. McCord & M. Chuah

Authors

M. McCord
View author publications
You can also search for this author in PubMed Google Scholar
M. Chuah
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Cloud and Security Laboratory, Hewlett-Packard Laboratories, BS34 8QZ, Stroke Gifford, United Kingdom
Jose M. Alcaraz Calero
Department of Computer Science, St. Francis Xavier University, B2G 2W5, Antigonish, NS, Canada
Laurence T. Yang
Security Laboratory, NEC Laboratories Europe, Kurfürsten-Anlage 36, 69115, Heidelberg, Germany
Félix Gómez Mármol
Departamento de Ingeniería del Software e Inteligencia, Universidad Complutense de Madrid (UCM), C/Profesor José García Santesmases s/n, 28040, Madrid, Spain
Luis Javier García Villalba
Department of Electrical and Computer Engineering, University of Florida, 216 Larsen Hll, 32611-6200, Gainesville, FL, USA
Andy Xiaolin Li
Department of Computing, Macquarie University, E6A339, 2109, Sydney, NSW, Australia
Yan Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McCord, M., Chuah, M. (2011). Spam Detection on Twitter Using Traditional Classifiers. In: Calero, J.M.A., Yang, L.T., Mármol, F.G., García Villalba, L.J., Li, A.X., Wang, Y. (eds) Autonomic and Trusted Computing. ATC 2011. Lecture Notes in Computer Science, vol 6906. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23496-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-23496-5_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23495-8
Online ISBN: 978-3-642-23496-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics