Automatic Ground Truth Dataset Creation for Fake News Detection in Social Media

  • Danae Pla KaridiEmail author
  • Harry Nakos
  • Yannis Stavrakas
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11871)


Fake news has become over the last years one of the most crucial issues for social media platforms, users and news organizations. Therefore, research has focused on developing algorithmic methods to detect misleading content on social media. These approaches are data-driven, meaning that the efficiency of the produced models depends on the quality of the training dataset. Although several ground truth datasets have been created, they suffer from serious limitations and rely heavily on human annotators. In this work, we propose a method for automating as far as possible the process of dataset creation. Such datasets can be subsequently used as training and test data in machine learning classification techniques regarding fake news detection in microblogging platforms, such as Twitter.


Fake news detection Automatic dataset creation Social network 


  1. 1.
    Santia, G.C., Williams, J.R.: BuzzFace: a news veracity dataset with Facebook user commentary and egos. In: ICWSM (2018)Google Scholar
  2. 2.
    Zubiaga, A., Liakata, M., Procter, R.: Learning reporting dynamics during breaking news for rumour detection in social media. arXiv:1610.07363 [cs] (2016)
  3. 3.
    Shin, J., Jian, L., Driscoll, K., Bar, F.: Political rumoring on Twitter during the 2012 US presidential election: Rumor diffusion and correction. New Media Soc. 19 (2016). Scholar
  4. 4.
    Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: FakeNewsNet: a data repository with news content, social context and dynamic information for studying fake news on social media. arXiv:1809.01286 [cs] (2018)
  5. 5.
    Mitra, T., Gilbert, E.: CREDBANK: a large-scale social media corpus with associated credibility annotations. In: ICWSM (2015)Google Scholar
  6. 6.
    Tacchini, E., Ballarin, G., Vedova, M.L.D., Moret, S., de Alfaro, L.: Some Like it Hoax: Automated Fake News Detection in Social Networks.Google Scholar
  7. 7.
    Wu, L., Liu, H.: Tracing fake-news footprints: characterizing social media messages by how they propagate. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 637–645. ACM, New York (2018).
  8. 8.
    Vosoughi, S., Roy, D., Aral, S.: The spread of true and false news online. Science 359, 1146–1151 (2018). Scholar
  9. 9.
    Apache Solr. Accessed 6 July 2019
  10. 10.
    Gupta, S., Kaiser, G., Neistadt, D., Grimm, P.: DOM-based content extraction of HTML documents. In: WWW (2003)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Danae Pla Karidi
    • 1
    Email author
  • Harry Nakos
    • 1
  • Yannis Stavrakas
    • 1
  1. 1.IMSI Athena RCAthensGreece

Personalised recommendations