Combining Exogenous and Endogenous Signals with a Semi-supervised Co-attention Network for Early Detection of COVID-19 Fake Tweets

Bansal, Rachit; Paka, William Scott; Nidhi; Sengupta, Shubhashis; Chakraborty, Tanmoy

doi:10.1007/978-3-030-75762-5_16

Rachit Bansal¹⁵,
William Scott Paka¹⁶,
Nidhi¹⁷,
Shubhashis Sengupta¹⁷ &
…
Tanmoy Chakraborty¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12712))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3773 Accesses
3 Citations

Abstract

Fake tweets are observed to be ever-increasing, demanding immediate countermeasures to combat their spread. During COVID-19, tweets with misinformation should be flagged and neutralised in their early stages to mitigate the damages. Most of the existing methods for early detection of fake news assume to have enough propagation information for large labelled tweets – which may not be an ideal setting for cases like COVID-19 where both aspects are largely absent. In this work, we present ENDEMIC, a novel early detection model which leverages exogenous and endogenous signals related to tweets, while learning on limited labelled data. We first develop a novel dataset, called ECTF for early COVID-19 Twitter fake news, with additional behavioural test-sets to validate early detection. We build a heterogeneous graph with follower-followee, user-tweet, and tweet-retweet connections and train a graph embedding model to aggregate propagation information. Graph embeddings and contextual features constitute endogenous, while time-relative web-scraped information constitutes exogenous signals. ENDEMIC is trained in a semi-supervised fashion, overcoming the challenge of limited labelled data. We propose a co-attention mechanism to fuse signal representations optimally. Experimental results on ECTF, PolitiFact, and GossipCop show that ENDEMIC is highly reliable in detecting early fake tweets, outperforming nine state-of-the-art methods significantly.

R. Bansal and W. S. Paka—Equal Contribution. The work was done when Rachit was an intern at IIIT-Delhi.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Obtained from the textblob tool in Python.

References

Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. arXiv preprint arXiv:1312.6203 (2013)
Carlson: Coronavirus tweets, tweets (json) for coronavirus on Kaggle (2020). https://www.kaggle.com/carlsonhoo/coronavirus-tweets. Accessed 2020
Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: WWW, pp. 675–684 (2011)
Google Scholar
Chen, J., Yang, Z., Yang, D.: Mixtext: linguistically-informed interpolation of hidden space for semi-supervised text classification. arXiv preprint arXiv:2004.12239 (2020)
Cui, L., Shu, K., Wang, S., Lee, D., Liu, H.: dEFEND: a system for explainable fake news detection. In: CIKM, pp. 2961–2964 (2019)
Google Scholar
Guacho, G.B., Abdali, S., Shah, N., Papalexakis, E.E.: Semi-supervised content-based detection of misinformation via tensor embeddings. In: ASONAM, pp. 322–325. IEEE (2018)
Google Scholar
Gururangan, S., Dang, T., Card, D., Smith, N.A.: Variational pretraining for semi-supervised text classification. arXiv preprint arXiv:1906.02242 (2019)
Helmstetter, S., Paulheim, H.: Weakly supervised learning for fake news detection on Twitter. In: ASONAM, pp. 274–277. IEEE (2018)
Google Scholar
Liu, Y., Wu, Y.: Early detection of fake news on social media through propagation path classification with recurrent and convolutional networks. In: AAAI, pp. 354–361 (2018)
Google Scholar
Liu, Y., Wu, Y.F.B.: FNED: a deep network for fake news early detection on social media. ACM Trans. Inf. Syst. 38(3), 1–33 (2020)
Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lohr, S.: It’s true: false news spreads faster and wider. And humans are to blame. The New York Times 8 (2018)
Google Scholar
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) NIPS, vol. 29, pp. 289–297. Curran Associates, Inc. (2016)
Google Scholar
Lu, Y.J., Li, C.T.: GCAN: graph-aware co-attention networks for explainable fake news detection on social media. In: ACL, pp. 505–514, July 2020
Google Scholar
Miyato, T., Dai, A.M., Goodfellow, I.: Adversarial training methods for semi-supervised text classification. arXiv preprint arXiv:1605.07725 (2016)
Monti, F., Frasca, F., Eynard, D., Mannion, D., Bronstein, M.M.: Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673 (2019)
Niepert, M., Ahmed, M., Kutzkov, K.: Learning convolutional neural networks for graphs. In: ICML, pp. 2014–2023 (2016)
Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)
Google Scholar
Paka, W.S., Bansal, R., Kaushik, A., Sengupta, S., Chakraborty, T.: Cross-SEAN: a cross-stitch semi-supervised neural attention model for COVID-19 fake news detection. arXiv preprint arXiv:2102.08924 (2021)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: EMNLP-IJCNLP, Hong Kong, China, pp. 3982–3992 (2019)
Google Scholar
Rosenfeld, N., Szanto, A., Parkes, D.C.: A kernel of truth: determining rumor veracity on Twitter by diffusion pattern alone. In: The Web Conference, pp. 1018–1028 (2020)
Google Scholar
Smith, S.: Coronavirus (covid19) Tweets - early April (2020). https://www.kaggle.com/smid80/coronavirus-covid19-tweets-early-april. Accessed 2020
Shu, K., Mahudeswaran, D., Wang, S., Lee, D., Liu, H.: FakeNewsNet: a data repository with news content, social context and dynamic information for studying fake news on social media. arXiv preprint arXiv:1809.01286 8 (2018)
Shu, K., Wang, S., Liu, H.: Understanding user profiles on social media for fake news detection. In: MIPR, pp. 430–435. IEEE (2018)
Google Scholar
Shu, K., et al.: Leveraging multi-source weak social supervision for early detection of fake news. arXiv preprint arXiv:2004.01732 (2020)
Celin, S.: COVID-19 tweets afternoon 31.03.2020 (2020). https://www.kaggle.com/svencelin/covid19-tweets-afternoon-31032020. Accessed 2020
Wang, Y., et al.: Weak supervision for fake news detection via reinforcement learning. In: AAAI, vol. 34, pp. 516–523 (2020)
Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. In: NAACL, pp. 1480–1489 (2016)
Google Scholar
Yu, D., Chen, N., Jiang, F., Fu, B., Qin, A.: Constrained NMF-based semi-supervised learning for social media spammer detection. Knowl. Based Syst. 125, 64–73 (2017)
Google Scholar
Zhou, X., Jain, A., Phoha, V.V., Zafarani, R.: Fake news early detection: A theory-driven model. Digital Threats Res. Pract. 1(2) (2020)
Google Scholar
Zhou, X., Zafarani, R.: A survey of fake news: fundamental theories, detection methods, and opportunities. ACM CSUR 53(5), 1–40 (2020)
Google Scholar

Download references

Acknowledgements

The work was partially supported by Accenture Labs, SPARC (MHRD) and CAI, IIIT-Delhi. T. Chakraborty would like to thank the support of the Ramanujan Fellowship.

Author information

Authors and Affiliations

Delhi Technological University, Delhi, India
Rachit Bansal
IIIT-Delhi, Delhi, India
William Scott Paka & Tanmoy Chakraborty
Accenture Labs, Delhi, India
Nidhi & Shubhashis Sengupta

Authors

Rachit Bansal
View author publications
You can also search for this author in PubMed Google Scholar
William Scott Paka
View author publications
You can also search for this author in PubMed Google Scholar
Nidhi
View author publications
You can also search for this author in PubMed Google Scholar
Shubhashis Sengupta
View author publications
You can also search for this author in PubMed Google Scholar
Tanmoy Chakraborty
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rachit Bansal .

Editor information

Editors and Affiliations

IIIT, Hyderabad, Hyderabad, India
Kamal Karlapalem
Chinese University of Hong Kong, Shatin, Hong Kong
Hong Cheng
Virginia Tech, Arlington, VA, USA
Naren Ramakrishnan
Jawaharlal Nehru University, New Delhi, India
R. K. Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
IIIT Delhi, New Delhi, India
Tanmoy Chakraborty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bansal, R., Paka, W.S., Nidhi, Sengupta, S., Chakraborty, T. (2021). Combining Exogenous and Endogenous Signals with a Semi-supervised Co-attention Network for Early Detection of COVID-19 Fake Tweets. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-75762-5_16
Published: 09 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75761-8
Online ISBN: 978-3-030-75762-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics