Skip to main content

Tensor Embeddings for Content-Based Misinformation Detection with Limited Supervision

  • Chapter
  • First Online:
Disinformation, Misinformation, and Fake News in Social Media

Part of the book series: Lecture Notes in Social Networks ((LNSN))

Abstract

Web-based technologies like social media have become primary news outlets for many people in recent years. Considering the fact that these digital outlets are extremely vulnerable to misinformation and fake news which may impact a user’s opinion toward social, political, and economic issues, the necessity of robust and efficient approaches for misinformation detection task comes to light more than ever. The majority of misinformation detection approaches previously proposed leverage manually extracted features and supervised classifiers which require a large number of labeled data which is often infeasible to collect in practice. To meet this challenge, in this work we propose a novel strategy mixing tensor-based modeling of article content and semi-supervised learning on article embeddings for the misinformation detection task which requires very few labels to achieve state-of-the-art results. We propose and experiment with three different article content modeling variations which target article body text or title, and enable meaningful representations of word co-occurrences which are discriminative in the downstream news categorization task. We tested our approach on real world data and the evaluation results show that we achieve 75% accuracy using only 30% of the labeled data of a public dataset while the previously proposed and published SVM-based classifier results in 67% accuracy. Moreover, our approach achieves 71% accuracy on a large dataset using only 2% of the labels. Additionally, our approach is able to classify articles into different fake news categories (clickbait, bias, rumor, hate, and junk science) by only using the titles of the articles, with roughly 70% accuracy and 30% of the labeled data.

The authors contributed equally to this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We experimented with small values of that window and results were qualitatively similar.

  2. 2.

    http://boilerpipe-web.appspot.com/

  3. 3.

    http://newspaper.readthedocs.io/en/latest/

  4. 4.

    https://www.diffbot.com/dev/docs/article/

  5. 5.

    https://www.alexa.com/

References

  1. Bader, B.W., Kolda, T.G.: Matlab tensor toolbox version 2.6. Available online (2015)

    Google Scholar 

  2. Biyani, P., Tsioutsiouliklis, K., Blackmer, J.: “8 amazing secrets for getting more clicks”: detecting clickbaits in news streams using article informality. In: Proceedings of the Thirtieth AAAI Conference on Artificial (AAAI’16), pp. 94–100 (2016)

    Google Scholar 

  3. Braunstein, A., Mézard, M., Zecchina, R.: Survey propagation: an algorithm for satisfiability. Random Struct. Algorithms 27(2), 201–226 (2005). https://doi.org/10.1002/rsa.v27:2

    Article  MathSciNet  Google Scholar 

  4. BS Detector (2017). http://bsdetector.tech/

  5. Chen, Y., Conroy, N., Rubin, V.: Misleading online content: recognizing clickbait as “false news” (2015). https://doi.org/10.1145/2823465.2823467

  6. Guacho, G.B., Abdali, S., Shah, N., Papalexakis, E.E.: Semi-supervised content-based detection of misinformation via tensor embeddings. In: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 322–325 (2018). https://doi.org/10.1109/ASONAM.2018.8508241

  7. Gupta, A., Lamba, H., Kumaraguru, P.: $1.00 per rt #bostonmarathon #prayforboston: analyzing fake content on twitter. In: 2013 APWG eCrime Researchers Summit, pp. 1–12 (2013). https://doi.org/10.1109/eCRS.2013.6805772

  8. Gupta, A., Lamba, H., Kumaraguru, P., Joshi, A.: Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: Proceedings of the 22Nd International Conference on World Wide Web, WWW ’13 Companion, pp. 729–736. ACM, New York (2013). https://doi.org/10.1145/2487788.2488033

  9. Gupta, M., Zhao, P., Han, J.: Evaluating Event Credibility on Twitter, pp. 153–164. https://doi.org/10.1137/1.9781611972825.14, http://epubs.siam.org/doi/abs/10.1137/1.9781611972825.14

  10. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco (2011)

    MATH  Google Scholar 

  11. Hardalov, M., Koychev, I., Nakov, P.: In Search of Credible News. Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016 Lecture Notes in Computer Science, pp. 172–180 (2016). https://doi.org/10.1007/978-3-319-44748-3_17

  12. Harshman, R.A.: Foundations of the PARAFAC procedure: models and conditions for an“ explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics 16(1), 84 (1970)

    Google Scholar 

  13. Horne, B.D., Adali, S.: This just in: fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. CoRR abs/1703.09398 (2017). http://arxiv.org/abs/1703.09398

  14. Hosseinimotlagh, S., Papalexakis, E.E.: Unsupervised content-based identification of fake news articles with tensor decomposition ensembles (2017)

    Google Scholar 

  15. Jin, Z., Cao, J., Jiang, Y.G., Zhang, Y.: News credibility evaluation on microblog with a hierarchical propagation model. In: 2014 IEEE International Conference on Data Mining, pp. 230–239 (2014). https://doi.org/10.1109/ICDM.2014.91

  16. Jin, Z., Cao, J., Zhang, Y., Luo, J.: News verification by exploiting conflicting social viewpoints in microblogs (2016). https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12128

  17. Koutra, D., Ke, T.Y., Kang, U., Chau, D., Pao, H.K., Faloutsos, C.: Unifying guilt-by-association approaches: theorems and fast algorithms. In: Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), Lecture Notes in Computer Science, vol. 6912, pp. 245–260. Springer, Berlin/Heidelberg (2011)

    Google Scholar 

  18. Kumar, S., Shah, N.: False information on web and social media: a survey. arXiv preprint arXiv:1804.08559 (2018)

    Google Scholar 

  19. Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B.J., Wong, K.F., Cha, M.: Detecting rumors from microblogs with recurrent neural networks. In: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI’16), pp. 3818–3824. AAAI Press (2016). http://dl.acm.org/citation.cfm?id=3061053.3061153

  20. Ma, J., Gao, W., Wei, Z., Lu, Y., Wong, K.F.: Detect rumors using time series of social context information on microblogging websites. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM ’15), pp. 1751–1754. ACM, New York (2015). https://doi.org/10.1145/2806416.2806607

  21. Papalexakis, E.E., Faloutsos, C., Sidiropoulos, N.D.: Tensors for data mining and data fusion: models, applications, and scalable algorithms. ACM Trans. Intell. Syst. Technol. 8(2), 16:1–16:44 (2016). https://doi.org/10.1145/2915921

  22. Pelleg, D., Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 277–281. ACM (1999)

    Google Scholar 

  23. Qazvinian, V., Rosengren, E., Radev, D.R., Mei, Q.: Rumor has it: identifying misinformation in microblogs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’11), pp. 1589–1599. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2145432.2145602

  24. Rubin, V.L., Conroy, N.J., Chen, Y., Cornwell, S.: Fake news or truth? using satirical cues to detect potentially misleading news (2016)

    Google Scholar 

  25. Ruchansky, N., Seo, S., Liu, Y.: CSI: a hybrid deep model for fake news. CoRR abs/1703.06959 (2017). http://arxiv.org/abs/1703.06959

  26. Shah, N., Beutel, A., Gallagher, B., Faloutsos, C.: Spotting suspicious link behavior with fbox: an adversarial perspective. In: IEEE International Conference on Data Mining (ICDM), 2014, pp. 959–964. IEEE (2014)

    Google Scholar 

  27. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. CoRR abs/1708.01967 (2017). http://arxiv.org/abs/1708.01967

  28. Shu, K., Le, T., Lee, D., Huan, L.: Deep headline generation for clickbait detection. In: IEEE International Conference on Data Mining (ICDM), pp. 467–476 (2018)

    Google Scholar 

  29. Shu, K., Sliva, A., Wang, S., Liu, H.: Beyond news contents: the role of social context for fake news detection. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM’19), pp. 312–320 (2019)

    Google Scholar 

  30. Shu, K., Cui, L., Wang, S., Lee, D., Liu, H.: Defend: explainable fake news detection. In: Proceedings of 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2019)

    Google Scholar 

  31. Sidiropoulos, N., De Lathauwer, L., Fu, X., Huang, K., Papalexakis, E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Signal Process. PP (2016). https://doi.org/10.1109/TSP.2017.2690524

  32. Silverman, C.: This analysis shows how fake election news stories outperformed real news on facebook. BuzzFeed News (2016)

    Google Scholar 

  33. Wu, L., Li, J., Hu, X., Liu, H.: Gleaning wisdom from the past: early detection of emerging rumors in social media. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 99–107. SIAM (2017)

    Google Scholar 

  34. Wu, L., Liu, H.: Tracing fake-news footprints: characterizing social media messages by how they propagate. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 637–645. ACM (2018)

    Google Scholar 

  35. Yedidia, J., Freeman, W., Weiss, Y.: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. Inf. Theory 51, 2282–2312 (2005). https://doi.org/10.1109/TIT.2005.850085

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was supported by a gift from Snap Inc, an Adobe Data Science Faculty Award, by the Department of the Navy, Naval Engineering Education Consortium under award no. N00174-17-1-0005, and by the National Science Foundation CDS&E Grant no. OAC-1808591. Any opinions, findings, and conclusions or recommendations expressed here are those of the author(s) and do not necessarily reflect the views of the funding parties. We would also like to thank Daniel Fonseca for proofreading of the book chapter.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sara Abdali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Abdali, S., Bastidas, G.G., Shah, N., Papalexakis, E.E. (2020). Tensor Embeddings for Content-Based Misinformation Detection with Limited Supervision. In: Shu, K., Wang, S., Lee, D., Liu, H. (eds) Disinformation, Misinformation, and Fake News in Social Media. Lecture Notes in Social Networks. Springer, Cham. https://doi.org/10.1007/978-3-030-42699-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-42699-6_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-42698-9

  • Online ISBN: 978-3-030-42699-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics