Advertisement

Comparative Performance of Machine Learning Algorithms for Fake News Detection

  • Arvinder Pal Singh BaliEmail author
  • Mexson Fernandes
  • Sourabh Choubey
  • Mahima Goel
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1046)

Abstract

Automatic detection of fake news, which could negatively affect individuals and the society, is an emerging research area attracting global attention. The problem has been approached in this paper from Natural Language Processing and Machine Learning perspectives. The evaluation is carried out for three standard datasets with a novel set of features extracted from the headlines and the contents. Performances of seven machine learning algorithms in terms of accuracies and F1 scores are compared. Gradient Boosting outperformed other classifiers with mean accuracy of 88% and F1-Score of 0.91.

Keywords

Fake news Natural Language Processing Text classification Machine learning algorithms Gradient boosting 

Notes

Acknowledgements

Comments on the paper by the anonymous reviewers were immensely helpful in revising the paper.

References

  1. 1.
    Lazer, D., et al.: The science of fake news. Science 359(6380), 1094–1096 (2018)CrossRefGoogle Scholar
  2. 2.
    Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359(6380), 1146–1151 (2018)CrossRefGoogle Scholar
  3. 3.
    Ahmed, H., Traore, I., Saad, S.: Detecting opinion spams and fake news using text classification. Secur. Priv. 1(1) (2017). https://onlinelibrary.wiley.com/doi/full/10.1002/spy2.9CrossRefGoogle Scholar
  4. 4.
    Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017). https://www.kdd.org/exploration_files/19-1-Article2.pdfCrossRefGoogle Scholar
  5. 5.
    Horne, B.D., Adali, S.: This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. Paper Presented at: The 2nd International Workshop on News and Public Opinion at ICWSM; Montreal, Canada (2017). https://arxiv.org/abs/1703.09398
  6. 6.
    Horne, B.D., Khedr, S., Adali, S.: Sampling the news producers: a large news and feature data set for the study of the complex media landscape. In: Proceedings of the Twelfth International Conference on Web and Social Media, ICWSM 2018, Stanford, CA, USA, pp. 518–527 (2018)Google Scholar
  7. 7.
    Baly, R., Karadzhov, G., Alexandrov, D., Glass, J., Nakov, P.: Predicting Factuality of Reporting and Bias of News Media Sources (2018). https://arxiv.org/abs/1810.01765
  8. 8.
    Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. In: Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA, 20–26 August, pp. 3391–3401 (2018)Google Scholar
  9. 9.
    Gilda, S.: Evaluating machine learning algorithms for fake news detection. In: 2017 IEEE 15th Student Conference on Research and Development (SCOReD), Putrajaya, pp. 110–115 (2017)Google Scholar
  10. 10.
    Bajaj, S.: The Pope Has a New Baby! Fake News Detection Using Deep Learning. https://web.stanford.edu/class/cs224n/reports/2710385.pdf
  11. 11.
  12. 12.
    Liu, Y., Wu, Y.-F.B.: Early detection of fake news on social media through propagation path, classification with recurrent and convolutional networks. In: AAAI Publications, Thirty-Second AAAI Conference on Artificial Intelligence (2018). https://aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16826
  13. 13.
    Shu, K., Wang, S., Liu, H.: Beyond news contents: the role of social context for fake news detection. In: WSDM 2019, 11–15 February (2019). http://www.public.asu.edu/~skai2/files/wsdm_2019_fake_news.pdf
  14. 14.
    Tacchini, E., Ballarin, G., Della Vedova, M.L., Moret, S., de Alfaro, L.: Some like it hoax: automated fake news detection in social networks. Technical report UCSC-SOE-17-05 School of Engineering, University of California, Santa Cruz (2017). https://www.soe.ucsc.edu/sites/default/files/technical-reports/UCSC-SOE-17-05.pdf
  15. 15.
    Opensource Dataset. http://www.opensources.co/
  16. 16.
  17. 17.
  18. 18.
    Pennington, J., Socher, R., Manning, C.D.: GloVe: Global Vectors for Word Representation (2014). https://nlp.stanford.edu/pubs/glove.pdf
  19. 19.
  20. 20.
  21. 21.
    Furnkranz, J., et al.: Case study in using linguistic phrases for text categorization on the WWW. In: AAAI Technical report WS-98: (1998). https://www.aaai.org/Papers/Workshops/1998/WS-98-05/WS98-05-002.pdf
  22. 22.
    Seki, Y.: Sentence extraction by tf-idf and position weighting from newspaper articles. In: Proceedings of the 3rd NTCIR Workshop, Tokyo (2002). http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings3/NTCIR3-TSC-SekiY.pdf
  23. 23.
    Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD, pp. 785–794 (2016)Google Scholar
  24. 24.
    Alpaydın, E.: Introduction to Machine Learning, pp. 487–488, 2nd edn. MIT Press, Cambridge (2010)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Arvinder Pal Singh Bali
    • 1
    Email author
  • Mexson Fernandes
    • 1
  • Sourabh Choubey
    • 1
  • Mahima Goel
    • 1
  1. 1.Asia Pacific Institute of Information Technology SD IndiaPanipatIndia

Personalised recommendations