Abstract
Fake tweets are observed to be ever-increasing, demanding immediate countermeasures to combat their spread. During COVID-19, tweets with misinformation should be flagged and neutralised in their early stages to mitigate the damages. Most of the existing methods for early detection of fake news assume to have enough propagation information for large labelled tweets – which may not be an ideal setting for cases like COVID-19 where both aspects are largely absent. In this work, we present ENDEMIC, a novel early detection model which leverages exogenous and endogenous signals related to tweets, while learning on limited labelled data. We first develop a novel dataset, called ECTF for early COVID-19 Twitter fake news, with additional behavioural test-sets to validate early detection. We build a heterogeneous graph with follower-followee, user-tweet, and tweet-retweet connections and train a graph embedding model to aggregate propagation information. Graph embeddings and contextual features constitute endogenous, while time-relative web-scraped information constitutes exogenous signals. ENDEMIC is trained in a semi-supervised fashion, overcoming the challenge of limited labelled data. We propose a co-attention mechanism to fuse signal representations optimally. Experimental results on ECTF, PolitiFact, and GossipCop show that ENDEMIC is highly reliable in detecting early fake tweets, outperforming nine state-of-the-art methods significantly.
R. Bansal and W. S. Paka—Equal Contribution. The work was done when Rachit was an intern at IIIT-Delhi.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Obtained from the textblob tool in Python.
References
Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)
Carlson: Coronavirus tweets, tweets (json) for coronavirus on Kaggle (2020). https://www.kaggle.com/carlsonhoo/coronavirus-tweets. Accessed 2020
Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: WWW, pp. 675–684 (2011)
Chen, J., Yang, Z., Yang, D.: Mixtext: linguistically-informed interpolation of hidden space for semi-supervised text classification. arXiv preprint arXiv:2004.12239 (2020)
Cui, L., Shu, K., Wang, S., Lee, D., Liu, H.: dEFEND: a system for explainable fake news detection. In: CIKM, pp. 2961–2964 (2019)
Guacho, G.B., Abdali, S., Shah, N., Papalexakis, E.E.: Semi-supervised content-based detection of misinformation via tensor embeddings. In: ASONAM, pp. 322–325. IEEE (2018)
Gururangan, S., Dang, T., Card, D., Smith, N.A.: Variational pretraining for semi-supervised text classification. arXiv preprint arXiv:1906.02242 (2019)
Helmstetter, S., Paulheim, H.: Weakly supervised learning for fake news detection on Twitter. In: ASONAM, pp. 274–277. IEEE (2018)
Liu, Y., Wu, Y.: Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: AAAI, pp. 354–361 (2018)
Liu, Y., Wu, Y.F.B.: FNED: a deep network for fake news early detection on social media. ACM Trans. Inf. Syst. 38(3), 1–33 (2020)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lohr, S.: It’s true: false news spreads faster and wider. And humans are to blame. The New York Times 8 (2018)
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) NIPS, vol. 29, pp. 289–297. Curran Associates, Inc. (2016)
Lu, Y.J., Li, C.T.: GCAN: graph-aware co-attention networks for explainable fake news detection on social media. In: ACL, pp. 505–514, July 2020
Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016)
Monti, F., Frasca, F., Eynard, D., Mannion, D., Bronstein, M.M.: Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673 (2019)
Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: ICML, pp. 2014–2023 (2016)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)
Paka, W.S., Bansal, R., Kaushik, A., Sengupta, S., Chakraborty, T.: Cross-SEAN: a cross-stitch semi-supervised neural attention model for COVID-19 fake news detection. arXiv preprint arXiv:2102.08924 (2021)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: EMNLP-IJCNLP, Hong Kong, China, pp. 3982–3992 (2019)
Rosenfeld, N., Szanto, A., Parkes, D.C.: A kernel of truth: determining rumor veracity on Twitter by diffusion pattern alone. In: The Web Conference, pp. 1018–1028 (2020)
Smith, S.: Coronavirus (covid19) Tweets - early April (2020). https://www.kaggle.com/smid80/coronavirus-covid19-tweets-early-april. Accessed 2020
Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: FakeNewsNet: a data repository with news content, social context and dynamic information for studying fake news on social media. arXiv preprint arXiv:1809.01286 8 (2018)
Shu, K., Wang, S., Liu, H.: Understanding user profiles on social media for fake news detection. In: MIPR, pp. 430–435. IEEE (2018)
Shu, K., et al.: Leveraging multi-source weak social supervision for early detection of fake news. arXiv preprint arXiv:2004.01732 (2020)
Celin, S.: COVID-19 tweets afternoon 31.03.2020 (2020). https://www.kaggle.com/svencelin/covid19-tweets-afternoon-31032020. Accessed 2020
Wang, Y., et al.: Weak supervision for fake news detection via reinforcement learning. In: AAAI, vol. 34, pp. 516–523 (2020)
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: NAACL, pp. 1480–1489 (2016)
Yu, D., Chen, N., Jiang, F., Fu, B., Qin, A.: Constrained NMF-based semi-supervised learning for social media spammer detection. Knowl. Based Syst. 125, 64–73 (2017)
Zhou, X., Jain, A., Phoha, V.V., Zafarani, R.: Fake news early detection: A theory-driven model. Digital Threats Res. Pract. 1(2) (2020)
Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM CSUR 53(5), 1–40 (2020)
Acknowledgements
The work was partially supported by Accenture Labs, SPARC (MHRD) and CAI, IIIT-Delhi. T. Chakraborty would like to thank the support of the Ramanujan Fellowship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Bansal, R., Paka, W.S., Nidhi, Sengupta, S., Chakraborty, T. (2021). Combining Exogenous and Endogenous Signals with a Semi-supervised Co-attention Network for Early Detection of COVID-19 Fake Tweets. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-75762-5_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75761-8
Online ISBN: 978-3-030-75762-5
eBook Packages: Computer ScienceComputer Science (R0)