Big Data in the News Media

  • Maria Cristina PaganoniEmail author


This chapter discusses how ‘big data’ has become a catchphrase in the technology section of the news media. Through the synergic tools of Corpus-Assisted Discourse Studies (CADS), it identifies the news values and linguistic and discursive features in global big data coverage in English and elicits what kind of rhetoric is emerging. The big data narrative is rife with metaphors and novel lexical compounds. Keywords, concordance lines and collocations construct a mixed semantic prosody that takes a marked negative turn after the recent instances of data leaks and privacy violations. Finally, the analysis focuses on the strategies deployed in the construction and dissemination of expert discourse about big data by observing the processes of reconceptualisation and recontextualisation of knowledge that are activated in its argumentation.


Big data Corpus-Assisted Discourse Studies (CADS) Expert discourse Knowledge dissemination  News media 


  1. Ajana, Bithaj. 2017. “Digital Health and the Biopolitics of the Quantified Self.” Digital Health 3: 1–18. Scholar
  2. Ali, Samina. 2018. “Newspaper Corpus Design and Representativeness.” WhatEvery1Says Project, 3 July.
  3. Ames, Morgan G. 2018. “Deconstructing the Algorithmic Sublime.” Big Data & Society 5, no. 1: 1–4. Scholar
  4. Baker, Paul. 2004. “Querying Keywords: Questions of Difference, Frequency, and Sense in Keywords Analysis.” Journal of English Linguistics 32, no. 4: 346–359. Scholar
  5. Baker, Paul. 2006. Using Corpora in Discourse Analysis. London and New York: Continuum.Google Scholar
  6. Baker, Paul, Costas Gabrielatos, Majid Khosravinik, Michał Krzyżanowski, Tony McEnery, and Ruth Wodak. 2008. “A Useful Methodological Synergy? Combining Critical Discourse Analysis and Corpus Linguistics to Examine Discourses of Refugees and Asylum Seekers in the UK Press.” Discourse & Society 19, no. 3: 273–306. Scholar
  7. Baker, Paul, and Tony McEnery, eds. 2015. Corpora and Discourse Studies: Integrating Discourse and Corpora. Basingstoke and New York: Palgrave Macmillan.Google Scholar
  8. Ball, Kirstie, Maria Laura Di Domenico, and Daniel Nunan. 2016. “Big Data Surveillance and the Body-Subject.” Body & Society 22, no. 2: 58–81. Scholar
  9. Bednarek, Monika, and Helen Caple. 2014. “Why Do News Values Matter? Towards a New Methodological Framework for Analysing News Discourse in Critical Discourse Analysis and Beyond.” Discourse & Society 25, no. 2: 135–158. Scholar
  10. ———. 2017. The Discourse of News Values: How News Organizations Create ‘Newsworthiness’. Oxford: Oxford University Press.Google Scholar
  11. Bondi, Marina, Silvia Cacchiani, and Davide Mazzi, eds. 2015. Discourse In and Through the Media: Recontextualizing and Reconceptualizing Expert Discourse. Newcastle upon Tyne: Cambridge Scholars Publishing.Google Scholar
  12. Caple, Helen, and Monika Bednarek. 2016. “Rethinking News Values: What a Discursive Approach Can Tell Us about the Construction of News Discourse and News Photography.” Journalism 17, no. 4: 435–455. Scholar
  13. Caulfield, Timothy. 2004. “Biotechnology and the Popular Press: Hype and the Selling of Science.” Trends in Biotechnology 22, no. 7: 337–339. Scholar
  14. Economist. 2014. “Self-Made Wealth in America: Robber Barons and Silicon Sultans.” 30 December.
  15. Floridi, Luciano. 2011. The Philosophy of Information. Oxford: Oxford University Press.CrossRefGoogle Scholar
  16. Galtung, Johan, and Mari Holmboe Ruge. 1965. “The Structure of Foreign News: The Presentation of the Congo, Cuba and Cyprus Crises in Four Norwegian Newspapers.” Journal of Peace Research 2, no. 1: 64–91.CrossRefGoogle Scholar
  17. Garzone, Giuliana, and Francesca Santulli. 2004. “What Can Corpus Linguistics Do for Critical Discourse Analysis?” In Corpora and Discourse, edited by Alan Partington, John Morley, and Louann Haarman, 351–368. Bern: Peter Lang.Google Scholar
  18. Goodwin, Jean, and Lee Honeycutt. 2009. “When Science Goes Public: From Technical Arguments to Appeals to Authority.” Studies in Communication Sciences 9, no. 2: 19–30.Google Scholar
  19. Graves, Christopher, and Sandra Matz. 2018. “What Marketers Should Know about Personality-Based Marketing.” Harvard Business Review, May 2.
  20. Greco Morasso, Sara, and Carlo Morasso. 2014. “Argumentation from Expert Opinion in Science Journalism: The Case of Eureka’s Fight Club.” In Rhétorique et cognition - Rhetoric and Cognition: Perspectives théoriques et stratégies persuasives - Theoretical Perspectives and Persuasive Strategies, edited by Thierry Herman and Steve Oswald, 185–213. Bern: Peter Lang.Google Scholar
  21. Gurrin, Cathal, Alan Smeaton, and Aiden R. Doherty. 2014. “LifeLogging: Personal Big Data.” Foundations and Trends® in Information Retrieval 8, no. 1: 1–107. Scholar
  22. Kilgarriff, Adam. 2009. “Simple Maths for Keywords.” Proceedings of the Corpus Linguistics Conference CL2009, edited by Michaela Mahlberg, Victorina González-Díaz, and Catherine Smith, article number 171, 1–6. Liverpool: University of Liverpool.
  23. ———. 2012. “Getting to Know Your Corpus.” In Text, Speech and Dialogue. Lecture Notes in Computer Science, vol. 7499, edited by Petr Sojka, Aleš Horák, Ivan Kopeček, and Karel Pala, 3–15. Heidelberg and Berlin: Springer. Scholar
  24. Kilgarriff, Adam, and Gregory Grefenstette. 2003. “Introduction to the Special Issue on the Web as Corpus.” Computational Linguistics 29, no. 3: 333–347. Scholar
  25. Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. “The Sketch Engine: Ten Years On.” Lexicography 1, no. 7: 7–36. Scholar
  26. Kitchin, Rob. 2014. The Data Revolution: Big Data, Open Data, Data Infrastructures & Their Consequences. London: Sage.Google Scholar
  27. Koester, Almut. 2010. “Building Small Specialised Corpora.” In The Routledge Handbook of Corpus Linguistics, edited by Anne O’Keeffe and Michael McCarthy, 66–79. Abingdon and New York: Routledge.Google Scholar
  28. Koteyko, Nelya. 2010. “Mining the Internet for Linguistic and Social Data: An Analysis of Carbon Compounds in Web Feeds.” Discourse & Society 21, no. 6: 655–674. Scholar
  29. Lanzing, Marjolein. 2016. “The Transparent Self.” Ethics and Information Technology 18, no. 1: 9–16. Scholar
  30. Lohr, Steve. 2012. “How Big Data Became So Big.” New York Times, 11 August.
  31. Metcalf, Jacob, and Kate Crawford. 2016. “Where Are Human Subjects in Big Data Research? The Emerging Ethics Divide.” Big Data & Society 3, no. 1: 1–14. Scholar
  32. Mittelstadt, Brent Daniel, Patrick Allo, Mariarosa Taddeo, Sandra Wachter, and Luciano Floridi. 2016. “The Ethics of Algorithms: Mapping the Debate.” Big Data & Society 3, no. 2: 1–21. Scholar
  33. Newman, Nic. 2018. Journalism, Media, and Technology Trends and Predictions 2018. Oxford: Reuters Institute for the Study of Journalism, The University of Oxford.Google Scholar
  34. Newman, Nic, Richard Fletcher, Antonis Kalogeropoulos, David A. L. Levy, and Rasmus Kleis Nielsen. 2018. Reuters Institute Digital News Report 2018, 14 June. Oxford: Reuters Institute for the Study of Journalism, The University of Oxford.
  35. O’ Halloran, Kieran. 2010. “How to Use Corpus Linguistics in the Study of Media Discourse.” In The Routledge Handbook of Corpus Linguistics, edited by Anne O’Keeffe and Michael McCarthy, 563–577. Abingdon and New York: Routledge.Google Scholar
  36. Oxford Internet Institute. 2017. “Digital Ethics Lab.”
  37. Partington, Alan. 2004a. “Corpora and Discourse, A Most Congruous Beast.” In Corpora and Discourse, edited by Alan Partington, John Morley, and Louann Haarman, 11–20. Bern: Peter Lang.Google Scholar
  38. ———. 2004b. “‘Utterly Content in Each Other’s Company’: Semantic Prosody and Semantic Preference.” International Journal of Corpus Linguistics 9, no. 1: 131–156. Scholar
  39. Partington, Alan, Alison Duguid, and Charlotte Taylor. 2013. Patterns and Meanings in Discourse: Theory and Practice in Corpus-Assisted Discourse Studies (CADS). Amsterdam and Philadelphia: John Benjamins.CrossRefGoogle Scholar
  40. Perelman, Chaïm, and Lucie Olbrechts-Tyteca. 1969. The New Rhetoric: A Treatise on Argumentation. Notre Dame, IN: University of Notre Dame Press.Google Scholar
  41. Portmess, Lisa, and Sara Tower. 2015. “Data Barns, Ambient Intelligence and Cloud Computing: The Tacit Epistemology and Linguistic Representation of Big Data.” Ethics and Information Technology 17, no. 1: 1–9. Scholar
  42. Puaschunder, Julia M. 2017. “The Nudging Divide in the Digital Big Data Era.” International Journal of Research in Business, Economics and Management 4: 11–12, 49–53.
  43. Puschmann, Cornelius, and Jean Burgess. 2014. “Big Data, Big Questions| Metaphors of Big Data.” International Journal of Communication 8: 1690–1709.
  44. Quantified Self Institute. n.d. “What Is Quantified Self?”
  45. Ratha, Nalini K., Jonathan H. Connell, and Ruud M. Bolle. 2001. “Enhancing Security and Privacy in Biometrics-Based Authentication Systems.” IBM Systems Journal 40, no. 3: 614–643. Scholar
  46. Schofield, Alexandra, Laure Thompson, and David Mimno. 2017. “Quantifying the Effects of Text Duplication on Semantic Models.” In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, edited by Martha Palmer, Rebecca Hwa, and Sebastian Riedel, 2737–2747. Copenhagen: Association for Computational Linguistics.
  47. Scott, Mike. 1997. “PC Analysis of Key Words—And Key Key Words.” System 25, no. 2: 233–245. Scholar
  48. ———. 1999. WordSmith Tools Help Manual. Version 3.0. Oxford: Mike Scott and Oxford University Press.Google Scholar
  49. ———. 2010. “Problems in Investigating Keyness, or Cleaning the Undergrowth and Marking Out Trails…” In Keyness in Texts, edited by Marina Bondi and Mike Scott, 43–57. Bern: Peter Lang.Google Scholar
  50. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.Google Scholar
  51. Sketch Engine. n.d. “Simple Maths.”
  52. Stubbs, Michael. 1996. Text and Corpus Linguistics: Computer-Assisted Studies of Language and Culture. Oxford: Blackwell.Google Scholar
  53. ———. 2001. Words and Phrases: Corpus Studies of Lexical Semantics. Oxford: Blackwell. Google Scholar
  54. Thornbury, Scott. 2010. “What Can a Corpus Tell Us about Discourse?” In The Routledge Handbook of Corpus Linguistics, edited by Anne O’Keeffe and Michael McCarthy, 270–287. Abingdon and New York: Routledge.Google Scholar
  55. van Dijk, Teun A. 1988. News as Discourse. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  56. Watson, Sara M. 2014. “Data Is the New ‘____’: Sara M. Watson on the Industrial Metaphor of Big Data.” DIS Magazine.
  57. ———. 2016. “Toward a Constructive Technology Criticism.” Tow Center for Digital Journalism White Papers. New York: Columbia University.

Copyright information

© The Author(s) 2019

Authors and Affiliations

  1. 1.Dipartimento di Scienze della Mediazione Linguistica e di Studi InterculturaliUniversità degli Studi di MilanoMilanItaly

Personalised recommendations