Skip to main content

Combining Exogenous and Endogenous Signals with a Semi-supervised Co-attention Network for Early Detection of COVID-19 Fake Tweets

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12712))

Included in the following conference series:

Abstract

Fake tweets are observed to be ever-increasing, demanding immediate countermeasures to combat their spread. During COVID-19, tweets with misinformation should be flagged and neutralised in their early stages to mitigate the damages. Most of the existing methods for early detection of fake news assume to have enough propagation information for large labelled tweets – which may not be an ideal setting for cases like COVID-19 where both aspects are largely absent. In this work, we present ENDEMIC, a novel early detection model which leverages exogenous and endogenous signals related to tweets, while learning on limited labelled data. We first develop a novel dataset, called ECTF for early COVID-19 Twitter fake news, with additional behavioural test-sets to validate early detection. We build a heterogeneous graph with follower-followee, user-tweet, and tweet-retweet connections and train a graph embedding model to aggregate propagation information. Graph embeddings and contextual features constitute endogenous, while time-relative web-scraped information constitutes exogenous signals. ENDEMIC is trained in a semi-supervised fashion, overcoming the challenge of limited labelled data. We propose a co-attention mechanism to fuse signal representations optimally. Experimental results on ECTF, PolitiFact, and GossipCop show that ENDEMIC is highly reliable in detecting early fake tweets, outperforming nine state-of-the-art methods significantly.

R. Bansal and W. S. Paka—Equal Contribution. The work was done when Rachit was an intern at IIIT-Delhi.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Obtained from the textblob tool in Python.

References

  1. Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)

  2. Carlson: Coronavirus tweets, tweets (json) for coronavirus on Kaggle (2020). https://www.kaggle.com/carlsonhoo/coronavirus-tweets. Accessed 2020

  3. Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: WWW, pp. 675–684 (2011)

    Google Scholar 

  4. Chen, J., Yang, Z., Yang, D.: Mixtext: linguistically-informed interpolation of hidden space for semi-supervised text classification. arXiv preprint arXiv:2004.12239 (2020)

  5. Cui, L., Shu, K., Wang, S., Lee, D., Liu, H.: dEFEND: a system for explainable fake news detection. In: CIKM, pp. 2961–2964 (2019)

    Google Scholar 

  6. Guacho, G.B., Abdali, S., Shah, N., Papalexakis, E.E.: Semi-supervised content-based detection of misinformation via tensor embeddings. In: ASONAM, pp. 322–325. IEEE (2018)

    Google Scholar 

  7. Gururangan, S., Dang, T., Card, D., Smith, N.A.: Variational pretraining for semi-supervised text classification. arXiv preprint arXiv:1906.02242 (2019)

  8. Helmstetter, S., Paulheim, H.: Weakly supervised learning for fake news detection on Twitter. In: ASONAM, pp. 274–277. IEEE (2018)

    Google Scholar 

  9. Liu, Y., Wu, Y.: Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: AAAI, pp. 354–361 (2018)

    Google Scholar 

  10. Liu, Y., Wu, Y.F.B.: FNED: a deep network for fake news early detection on social media. ACM Trans. Inf. Syst. 38(3), 1–33 (2020)

    Google Scholar 

  11. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  12. Lohr, S.: It’s true: false news spreads faster and wider. And humans are to blame. The New York Times 8 (2018)

    Google Scholar 

  13. Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) NIPS, vol. 29, pp. 289–297. Curran Associates, Inc. (2016)

    Google Scholar 

  14. Lu, Y.J., Li, C.T.: GCAN: graph-aware co-attention networks for explainable fake news detection on social media. In: ACL, pp. 505–514, July 2020

    Google Scholar 

  15. Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016)

  16. Monti, F., Frasca, F., Eynard, D., Mannion, D., Bronstein, M.M.: Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673 (2019)

  17. Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: ICML, pp. 2014–2023 (2016)

    Google Scholar 

  18. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)

    Google Scholar 

  19. Paka, W.S., Bansal, R., Kaushik, A., Sengupta, S., Chakraborty, T.: Cross-SEAN: a cross-stitch semi-supervised neural attention model for COVID-19 fake news detection. arXiv preprint arXiv:2102.08924 (2021)

  20. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: EMNLP-IJCNLP, Hong Kong, China, pp. 3982–3992 (2019)

    Google Scholar 

  21. Rosenfeld, N., Szanto, A., Parkes, D.C.: A kernel of truth: determining rumor veracity on Twitter by diffusion pattern alone. In: The Web Conference, pp. 1018–1028 (2020)

    Google Scholar 

  22. Smith, S.: Coronavirus (covid19) Tweets - early April (2020). https://www.kaggle.com/smid80/coronavirus-covid19-tweets-early-april. Accessed 2020

  23. Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: FakeNewsNet: a data repository with news content, social context and dynamic information for studying fake news on social media. arXiv preprint arXiv:1809.01286 8 (2018)

  24. Shu, K., Wang, S., Liu, H.: Understanding user profiles on social media for fake news detection. In: MIPR, pp. 430–435. IEEE (2018)

    Google Scholar 

  25. Shu, K., et al.: Leveraging multi-source weak social supervision for early detection of fake news. arXiv preprint arXiv:2004.01732 (2020)

  26. Celin, S.: COVID-19 tweets afternoon 31.03.2020 (2020). https://www.kaggle.com/svencelin/covid19-tweets-afternoon-31032020. Accessed 2020

  27. Wang, Y., et al.: Weak supervision for fake news detection via reinforcement learning. In: AAAI, vol. 34, pp. 516–523 (2020)

    Google Scholar 

  28. Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: NAACL, pp. 1480–1489 (2016)

    Google Scholar 

  29. Yu, D., Chen, N., Jiang, F., Fu, B., Qin, A.: Constrained NMF-based semi-supervised learning for social media spammer detection. Knowl. Based Syst. 125, 64–73 (2017)

    Google Scholar 

  30. Zhou, X., Jain, A., Phoha, V.V., Zafarani, R.: Fake news early detection: A theory-driven model. Digital Threats Res. Pract. 1(2) (2020)

    Google Scholar 

  31. Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM CSUR 53(5), 1–40 (2020)

    Google Scholar 

Download references

Acknowledgements

The work was partially supported by Accenture Labs, SPARC (MHRD) and CAI, IIIT-Delhi. T. Chakraborty would like to thank the support of the Ramanujan Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rachit Bansal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bansal, R., Paka, W.S., Nidhi, Sengupta, S., Chakraborty, T. (2021). Combining Exogenous and Endogenous Signals with a Semi-supervised Co-attention Network for Early Detection of COVID-19 Fake Tweets. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75762-5_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75761-8

  • Online ISBN: 978-3-030-75762-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics