A Semi-supervised Framework for Misinformation Detection

Liu, Yueyang; Boukouvalas, Zois; Japkowicz, Nathalie

doi:10.1007/978-3-030-88942-5_5

Yueyang Liu¹⁰,
Zois Boukouvalas¹⁰ &
Nathalie Japkowicz¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12986))

Included in the following conference series:

International Conference on Discovery Science

1514 Accesses
1 Citations

Abstract

The spread of misinformation in social media outlets has become a prevalent societal problem and is the cause of many kinds of social unrest. Curtailing its prevalence is of great importance and machine learning has shown significant promise. However, there are two main challenges when applying machine learning to this problem. First, while much too prevalent in one respect, misinformation, actually, represents only a minor proportion of all the postings seen on social media. Second, labeling the massive amount of data necessary to train a useful classifier becomes impractical. Considering these challenges, we propose a simple semi-supervised learning framework in order to deal with extreme class imbalances that has the advantage, over other approaches, of using actual rather than simulated data to inflate the minority class. We tested our framework on two sets of Covid-related Twitter data and obtained significant improvement in F1-measure on extremely imbalanced scenarios, as compared to simple classical and deep-learning data generation methods such as SMOTE, ADASYN, or GAN-based data generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.theverge.com/2020/4/22/21231956/twitter-remove-covid-19-tweets-call-to-action-harm-5g.
2.
https://www.internetlivestats.com/twitter-statistics/.
3.
Bootstrap technique is implemented by repeating the sampling and testing procedure previously described 100 times and using the results of these experiments to estimate the real F1, Precision and Recall values and evaluate their standard deviation.
4.
https://github.com/Alex-NKG/Semi-Supervised-Framework-Misinfo-Detection.

References

Bellinger, C., Drummond, C., Japkowicz, N.: Manifold-based synthetic oversampling with manifold conformance estimation. Mach. Learn. 107(3), 605–637 (2017). https://doi.org/10.1007/s10994-017-5670-4
Article MathSciNet MATH Google Scholar
Boukouvalas, Z., et al.: Independent component analysis for trustworthy cyberspace during high impact events: an application to Covid-19. arXiv:2006.01284 [cs, stat] (June 2020)
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. (CSUR) 49, 1–50 (2016)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding (2019)
Google Scholar
Drummond, C., Holte, R.C., et al.: C4. 5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on Learning from Imbalanced Datasets II, vol. 11, pp. 1–8. Citeseer (2003)
Google Scholar
Goodfellow, I.J., et al. Generative adversarial networks. arXiv preprint arXiv:1406.2661 (2014)
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328 (2008)
Google Scholar
Hu, Z., Yang, Z., Liang, X., Salakhutdinov, R., Xing, E.: Toward controlled generation of text. In: ICML (2017)
Google Scholar
Hyun, M., Jeong, J., Kwak, N.: Class-imbalanced semi-supervised learning. CoRR abs/2002.06815 (2020)
Google Scholar
Islam, M.R., Liu, S., Wang, X., Xu, G.: Deep learning for misinformation detection on online social networks: a survey and new perspectives. Soc. Netw. Anal. Min. 10(1), 1–20 (2020). https://doi.org/10.1007/s13278-020-00696-x
Article Google Scholar
Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. arXiv:1512.09300 [cs, stat] (February 2016)
Li, S., Wang, Z., Zhou, G., Lee, S.: Semi-supervised learning for imbalanced sentiment classification. In IJCAI (2011)
Google Scholar
Müller, M., Salathé, M., Kummervold, P.E.: COVID-Twitter-BERT: a natural language processing model to analyse COVID-19 content on Twitter. arXiv preprint arXiv:2005.07503 (2020)
Mullick, S.S., Datta, S., Das, S.: Generative adversarial minority oversampling. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1695–1704 (2019)
Google Scholar
Otair, D.M.: Approximate k-nearest neighbour based spatial clustering using KD-tree (2013)
Google Scholar
Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S. (eds.): CONSTRAINT 2021. CCIS, vol. 1402. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73696-5
Book Google Scholar
White, K., Li, G., Japkowicz, N.: Sampling online social networks using coupling from the past. In: 2012 IEEE 12th International Conference on Data Mining Workshops, pp. 266–272 (2012)
Google Scholar
Yang, Y., Xu, Z.: Rethinking the value of labels for improving class-imbalanced learning. arXiv:abs/2006.07529 (2020)
Zhou, Z.-H.: Machine Learning. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-1967-3
Book MATH Google Scholar
Zhu, X.J.: Semi-supervised Learning Literature Survey (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

American University, Washington D.C., 20016, USA
Yueyang Liu, Zois Boukouvalas & Nathalie Japkowicz

Authors

Yueyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zois Boukouvalas
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Japkowicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yueyang Liu .

Editor information

Editors and Affiliations

Universidade do Porto and Fraunhofer Portugal AICOS, Porto, Portugal
Carlos Soares
Dalhousie University, Halifax, NS, Canada
Luis Torgo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Boukouvalas, Z., Japkowicz, N. (2021). A Semi-supervised Framework for Misinformation Detection. In: Soares, C., Torgo, L. (eds) Discovery Science. DS 2021. Lecture Notes in Computer Science(), vol 12986. Springer, Cham. https://doi.org/10.1007/978-3-030-88942-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-88942-5_5
Published: 09 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88941-8
Online ISBN: 978-3-030-88942-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics