A Systematic Analysis of Random Forest Based Social Media Spam Classification

Al-Janabi, Mohammed; Andras, Peter

doi:10.1007/978-3-319-64701-2_31

A Systematic Analysis of Random Forest Based Social Media Spam Classification

Mohammed Al-Janabi¹⁷ &
Peter Andras¹⁷

Conference paper
First Online: 26 July 2017

3253 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10394))

Abstract

Recently random forest classification became a popular choice machine learning applications aimed to detect spam content in online social networks. In this paper, we report a systematic analysis of random forest classification for this purpose. We assessed the impact of key parameters, such as number of trees, depth of trees and minimum size of leaf nodes on classification performance. Our results show that controlling the complexity of random forest classifiers applied to social media spam is important in order to avoid overfitting and optimize performance We also conclude that in order to support reproducibility of experimental results it is important to report key parameters of random forest classifiers.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Statista: Number of social media users worldwide (2010–2020), https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/
Cao, Q., Sirivianos, M., Yang, X., Pregueiro, T.: Aiding the detection of fake accounts in large scale social online services. In: NSDI 2012 Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 15. USENIX Association (2012)
Google Scholar
Zafarani, R., Liu, H.: 10 Bits of Surprise: Detecting Malicious Users with Minimum Information, pp. 423–431 (2015). doi:10.1145/2806416.2806535
Scott, P.: Fake News in U.S. Election? Elsewhere, That’s Nothing New (2016), http://www.nytimes.com/2016/11/18/technology/fake-news-on-facebook-in-foreign-elections-thats-not-new.html
Solon, O.: Facebook staff mount secret push to tackle fake news, reports say (2016)
Google Scholar
Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings Anti-Phishing Work, Groups 2nd Annual eCrime Res. Summit, eCrime 2007, pp. 60–69 (2007). doi:10.1145/1299015.1299021
Yang, C., Harkreader, R.C., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forensics Secur. 8, 1280–1293 (2013). doi:10.1109/TIFS.2013.2267732
Article Google Scholar
Gupta, N., Aggarwal, A., Kumaraguru, P.: Bit.ly/malicious: deep dive into short URL based e-crime detection (2014)
Google Scholar
Aggarwal, A., Rajadesingan, A., Kumaraguru, P.: PhishAri: automatic realtime phishing detection on twitter. eCrime Res. Summit, eCrime, pp. 1–12 (2012). doi:10.1109/eCrime.2012.6489521
Chu, Z., Widjaja, I., Wang, H.: Detecting social spam campaigns on Twitter. In: Bao, F., Samarati, P., Zhou, J. (eds.) ACNS 2012. LNCS, vol. 7341, pp. 455–472. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31284-7_27
Chapter Google Scholar
McCord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: Calero, Jose M.Alcaraz, Yang, Laurence T., Mármol, F.G., García Villalba, L.J., Li, A.X., Wang, Y. (eds.) ATC 2011. LNCS, vol. 6906, pp. 175–186. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23496-5_13
Chapter Google Scholar
Bosch, A., Zisserman, A., Mu, X., Munoz, X.: Image classification using random forests and ferns. In: IEEE 11th International Conference Computer Vision (ICCV), pp. 1–8 (2007). doi:10.1109/ICCV.2007.4409066
Lempitsky, V., Verhoek, M., Noble, J.Alison, Blake, A.: Random forest classification for automatic delineation of myocardium in real-time 3D echocardiography. In: Ayache, N., Delingette, H., Sermesant, M. (eds.) FIMH 2009. LNCS, vol. 5528, pp. 447–456. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01932-6_48
Chapter Google Scholar
Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26, 217–222 (2005). doi:10.1080/01431160412331269698
Article Google Scholar
Liaw, A., Wiener, M., Hebebrand, J.: Classification and regression by randomForest. R News 2, 18–22 (2002). doi:10.1159/000323281
Google Scholar
Provan, C.A., Cook, L., Cunningham, J.: A probabilistic airport capacity model for improved ground delay program planning. In: AIAA/IEEE Digital Avionics Systems Conference, Proceedings, pp. 1–12 (2011). doi:10.1109/DASC.2011.6095990
Invernizzi, L., Miskovic, S., Torres, R., Saha, S., Lee, S.-J., Mellia, M., Kruegel, C., Vigna, G.: Nazca: detecting malware distribution in large-scale networks. In: Network and Distributed System Security Symposium, pp. 1–16 (2014)
Google Scholar
Aggarwal, A., Kumaraguru, P.: Followers or Phantoms? An Anatomy of Purchased Twitter Followers. (2014)
Google Scholar
Chen, C., Zhang, J., Chen, X., Xiang, Y., Zhou, W.: 6 million spam tweets: a large ground truth for timely Twitter spam detection. In: IEEE International Conference on Communications 2015, pp. 7065–7070, September 2015. doi:10.1109/ICC.2015.7249453
Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A comparison of decision tree ensemble creation techniques. IEEE Trans. Pattern Anal. Mach. Intell. 29, 173–180 (2007). doi:10.1109/TPAMI.2007.250609
Article Google Scholar
Bradford, J.P., Kunz, C., Kohavi, R., Brunk, C., Brodley, C.E.: Pruning decision trees with misclassification costs. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 131–136. Springer, Heidelberg (1998). doi:10.1007/BFb0026682
Chapter Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). doi:10.1023/A:1010933404324
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing and Mathematics, Keele University, Newcastle-Under-Lyme, UK
Mohammed Al-Janabi & Peter Andras

Authors

Mohammed Al-Janabi
View author publications
You can also search for this author in PubMed Google Scholar
Peter Andras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Al-Janabi .

Editor information

Editors and Affiliations

Xidian University, Xi’an, China
Zheng Yan
Eurecom, Sophia Antipolos, Valbonne, France
Refik Molva
Warsaw University of Technology, Warsaw, Poland
Wojciech Mazurczyk
Aalto University, Espoo, Finland
Raimo Kantola

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Al-Janabi, M., Andras, P. (2017). A Systematic Analysis of Random Forest Based Social Media Spam Classification. In: Yan, Z., Molva, R., Mazurczyk, W., Kantola, R. (eds) Network and System Security. NSS 2017. Lecture Notes in Computer Science(), vol 10394. Springer, Cham. https://doi.org/10.1007/978-3-319-64701-2_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-64701-2_31
Published: 26 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64700-5
Online ISBN: 978-3-319-64701-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics