Skip to main content

A Systematic Analysis of Random Forest Based Social Media Spam Classification

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10394))

Abstract

Recently random forest classification became a popular choice machine learning applications aimed to detect spam content in online social networks. In this paper, we report a systematic analysis of random forest classification for this purpose. We assessed the impact of key parameters, such as number of trees, depth of trees and minimum size of leaf nodes on classification performance. Our results show that controlling the complexity of random forest classifiers applied to social media spam is important in order to avoid overfitting and optimize performance We also conclude that in order to support reproducibility of experimental results it is important to report key parameters of random forest classifiers.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Statista: Number of social media users worldwide (2010–2020), https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/

  2. Cao, Q., Sirivianos, M., Yang, X., Pregueiro, T.: Aiding the detection of fake accounts in large scale social online services. In: NSDI 2012 Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 15. USENIX Association (2012)

    Google Scholar 

  3. Zafarani, R., Liu, H.: 10 Bits of Surprise: Detecting Malicious Users with Minimum Information, pp. 423–431 (2015). doi:10.1145/2806416.2806535

  4. Scott, P.: Fake News in U.S. Election? Elsewhere, That’s Nothing New (2016), http://www.nytimes.com/2016/11/18/technology/fake-news-on-facebook-in-foreign-elections-thats-not-new.html

  5. Solon, O.: Facebook staff mount secret push to tackle fake news, reports say (2016)

    Google Scholar 

  6. Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings Anti-Phishing Work, Groups 2nd Annual eCrime Res. Summit, eCrime 2007, pp. 60–69 (2007). doi:10.1145/1299015.1299021

  7. Yang, C., Harkreader, R.C., Gu, G.: Empirical evaluation and new design for fighting evolving twitter spammers. IEEE Trans. Inf. Forensics Secur. 8, 1280–1293 (2013). doi:10.1109/TIFS.2013.2267732

    Article  Google Scholar 

  8. Gupta, N., Aggarwal, A., Kumaraguru, P.: Bit.ly/malicious: deep dive into short URL based e-crime detection (2014)

    Google Scholar 

  9. Aggarwal, A., Rajadesingan, A., Kumaraguru, P.: PhishAri: automatic realtime phishing detection on twitter. eCrime Res. Summit, eCrime, pp. 1–12 (2012). doi:10.1109/eCrime.2012.6489521

  10. Chu, Z., Widjaja, I., Wang, H.: Detecting social spam campaigns on Twitter. In: Bao, F., Samarati, P., Zhou, J. (eds.) ACNS 2012. LNCS, vol. 7341, pp. 455–472. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31284-7_27

    Chapter  Google Scholar 

  11. McCord, M., Chuah, M.: Spam detection on twitter using traditional classifiers. In: Calero, Jose M.Alcaraz, Yang, Laurence T., Mármol, F.G., García Villalba, L.J., Li, A.X., Wang, Y. (eds.) ATC 2011. LNCS, vol. 6906, pp. 175–186. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23496-5_13

    Chapter  Google Scholar 

  12. Bosch, A., Zisserman, A., Mu, X., Munoz, X.: Image classification using random forests and ferns. In: IEEE 11th International Conference Computer Vision (ICCV), pp. 1–8 (2007). doi:10.1109/ICCV.2007.4409066

  13. Lempitsky, V., Verhoek, M., Noble, J.Alison, Blake, A.: Random forest classification for automatic delineation of myocardium in real-time 3D echocardiography. In: Ayache, N., Delingette, H., Sermesant, M. (eds.) FIMH 2009. LNCS, vol. 5528, pp. 447–456. Springer, Heidelberg (2009). doi:10.1007/978-3-642-01932-6_48

    Chapter  Google Scholar 

  14. Pal, M.: Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26, 217–222 (2005). doi:10.1080/01431160412331269698

    Article  Google Scholar 

  15. Liaw, A., Wiener, M., Hebebrand, J.: Classification and regression by randomForest. R News 2, 18–22 (2002). doi:10.1159/000323281

    Google Scholar 

  16. Provan, C.A., Cook, L., Cunningham, J.: A probabilistic airport capacity model for improved ground delay program planning. In: AIAA/IEEE Digital Avionics Systems Conference, Proceedings, pp. 1–12 (2011). doi:10.1109/DASC.2011.6095990

  17. Invernizzi, L., Miskovic, S., Torres, R., Saha, S., Lee, S.-J., Mellia, M., Kruegel, C., Vigna, G.: Nazca: detecting malware distribution in large-scale networks. In: Network and Distributed System Security Symposium, pp. 1–16 (2014)

    Google Scholar 

  18. Aggarwal, A., Kumaraguru, P.: Followers or Phantoms? An Anatomy of Purchased Twitter Followers. (2014)

    Google Scholar 

  19. Chen, C., Zhang, J., Chen, X., Xiang, Y., Zhou, W.: 6 million spam tweets: a large ground truth for timely Twitter spam detection. In: IEEE International Conference on Communications 2015, pp. 7065–7070, September 2015. doi:10.1109/ICC.2015.7249453

  20. Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A comparison of decision tree ensemble creation techniques. IEEE Trans. Pattern Anal. Mach. Intell. 29, 173–180 (2007). doi:10.1109/TPAMI.2007.250609

    Article  Google Scholar 

  21. Bradford, J.P., Kunz, C., Kohavi, R., Brunk, C., Brodley, C.E.: Pruning decision trees with misclassification costs. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 131–136. Springer, Heidelberg (1998). doi:10.1007/BFb0026682

    Chapter  Google Scholar 

  22. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). doi:10.1023/A:1010933404324

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Al-Janabi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Al-Janabi, M., Andras, P. (2017). A Systematic Analysis of Random Forest Based Social Media Spam Classification. In: Yan, Z., Molva, R., Mazurczyk, W., Kantola, R. (eds) Network and System Security. NSS 2017. Lecture Notes in Computer Science(), vol 10394. Springer, Cham. https://doi.org/10.1007/978-3-319-64701-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64701-2_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64700-5

  • Online ISBN: 978-3-319-64701-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics