Skip to main content

Automated Information Extraction

  • Chapter
  • First Online:
  • 3875 Accesses

Part of the book series: Texts in Computer Science ((TCS))

Abstract

Chapter 1 identified automated information extraction (also known as computational content analysis or media-mining) as the first area of Computational Social Science. In this chapter takes a close look at this area, beginning with roots in linguistics. Computational text mining has been the main application of this area of CSS, but audio, imagery, and social media data are also components of the expanding Big Data universe. Theory and research in automated information extraction is at the base of major social science discoveries, such as universal semantic spaces and the fundamental structure of human information processing. A major focus of this chapter is on the methodology of automated information extraction, including phases that extend from the formulation of research questions to the selection of sources, preprocessing preparations, to analysis in a technical sense. Illustrative examples are provided, including some recent transformative breakthroughs in computational events data analysis and geospatial data structures. The material in this chapter has intrinsic value as well as being instrumental for understanding networks, complexity, and simulation modeling approaches in subsequent chapters.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   129.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Much of modern science is said to have roots in the ancient Greeks. This is quite true, but others before them may have contributed earlier scientific ideas contained in media that have been lost (manuscripts, inscriptions) due to the destruction of many large ancient libraries, such as those of Alexandria, Antioch, Baghdad, Córdoba, and Damascus, just to mention some of those in the Mediterranean world. India and China also experienced the destruction of many libraries during their early history.

  2. 2.

    By contrast, John von Neumann’s (1958) computer model of the human brain–mind phenomenon turned out to be wrong. Unlike von Neumann’s, the EPA-space model of the human mind is empirically validated, even if it still lacks deep theoretical explanation.

  3. 3.

    The predecessor of Surveyor was called Attitude, which was also developed by David Heise (1982) as the first computer-based extractor of EPA ratings, replacing the old paper-based forms used since Charles E. Osgood and his collaborators.

  4. 4.

    Unfortunately, in social science the term “data mining” has quite a negative connotation, since it is understood as lacking in theoretical understanding and symptomatic of so-called “barefoot empiricism,” akin to “a fishing expedition.” CSS assigns high priority to theory—the basis of understanding—while recognizing the scientific value of inductive data mining.

  5. 5.

    Besides its scientific value in CSS research, the popular media also uses basic forms of vocabulary analysis when counting the frequency of words used by politicians, such as in inaugural addresses or similar major speeches. The value of such anecdotal uses is rather limited, sometimes even misleading, since speechwriters and communication experts are well-versed in scientific principles of applied linguistics and human information processing, including sophisticated understanding of semantic differentials and other affect control, marketing, and propaganda devices.

  6. 6.

    The operationalization of the NRR in terms of two standard deviations from the process mean was suggested to political scientist and events data pioneer Edward E. Azar [1938–1991] by the mathematician Anatol Rapoport [1911–2007]. It was first applied to international relations events data series to study protracted conflicts in the Middle East. Azar was founder and director of the Conflict and Peace Data Bank (COPDAB), founded at the University of North Carolina at Chapel Hill in the 1970 s and moved to the Centre for International Development and Conflict Management (CIDCM) of the University of Maryland at College Park in the 1980s.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudio Cioffi-Revilla .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Cioffi-Revilla, C. (2017). Automated Information Extraction. In: Introduction to Computational Social Science. Texts in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-50131-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50131-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50130-7

  • Online ISBN: 978-3-319-50131-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics