Automated Information Extraction

Cioffi-Revilla, Claudio

doi:10.1007/978-1-4471-5661-1_3

Claudio Cioffi-Revilla⁴

Part of the book series: Texts in Computer Science ((TCS))

3752 Accesses

Abstract

Chapter 1 identified automated information extraction (also known as computational content analysis or media-mining) as the first area of Computational Social Science. Chapter 3 takes a close look at this area, beginning with roots in linguistics. Computational text mining has been the main application of this area of CSS, but audio, imagery, and social media data are also components of the expanding Big Data universe. Theory and research in automated information extraction is at the base of major social science discoveries, such as universal semantic spaces and the fundamental structure of human information-processing. A major focus of this chapter is on the methodology of automated information extraction, including phases that extend from the formulation of research questions to the selection of sources, preprocessing preparations, to analysis in a technical sense. Illustrative examples are provided, including some recent transformative breakthroughs in computational events data analysis and geospatial data structures. The material in this chapter has intrinsic value as well as being instrumental for understanding networks, complexity, and simulation modeling approaches in subsequent chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Much of modern science is said to have roots in the ancient Greeks. This is quite true, but others before them may have contributed earlier scientific ideas contained in media that have been lost (manuscripts, inscriptions) due to the destruction of many large ancient libraries, such as those of Alexandria, Antioch, Baghdad, Córdoba, and Damascus, just to mention some of those in the Mediterranean world. India and China also experienced the destruction of many libraries during their early history.
2.
By contrast, John von Neumann's (1958) computer model of the human brain-mind phenomenon turned out to be wrong. Unlike von Neumann's, the EPA-space model of the human mind is empirically validated, even if it still lacks deep theoretical explanation.
3.
The predecessor of Surveyor was called Attitude, which was also developed by David Heise (1982) as the first computer-based extractor of EPA ratings, replacing the old paper-based forms used since Charles E. Osgood and his collaborators.
4.
Unfortunately, in social science the term “data mining” has quite a negative connotation, since it is understood as lacking in theoretical understanding and symptomatic of so-called “barefoot empiricism,” akin to “a fishing expedition.” CSS assigns high priority to theory—the basis of understanding—while recognizing the scientific value of inductive data mining.
5.
Besides its scientific value in CSS research, the popular media also uses basic forms of vocabulary analysis when counting the frequency of words used by politicians, such as in inaugural addresses or similar major speeches. The value of such anecdotal uses is rather limited, sometimes even misleading, since speechwriters and communication experts are well-versed in scientific principles of applied linguistics and human information processing, including sophisticated understanding of semantic differentials and other affect control, marketing, and propaganda devices.
6.
The operationalization of the NRR in terms of two standard deviations from the process mean was suggested to political scientist and events data pioneer Edward E. Azar [1938–1991] by the mathematician Anatol Rapoport [1911–2007]. It was first applied to international relations events data series to study protracted conflicts in the Middle East. Azar was founder and director of the Conflict and Peace Data Bank (COPDAB), founded at the University of North Carolina at Chapel Hill in the 1970s and moved to the Centre for International Development and Conflict Management (CIDCM) of the University of Maryland at College Park in the 1980s.

Recommended Readings

N. Agarwal, H. Liu, Modeling and Data Mining in Blogosphere (Morgan & Claypool, New York, 2009). Available free online
Google Scholar
E.E. Azar, S. Lerner, The use of semantic dimensions in the scaling of international events. Int. Interact. 7(4), 361–378 (1981)
Article Google Scholar
R. Feldman, J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data (Cambridge University Press, Cambridge, 2007)
Google Scholar
M.D. Fischer, S.M. Lyon, D. Sosna, Harmonizing diversity: tuning anthropological research to complexity. Soc. Sci. Comput. Rev. 31(1), 3–15 (2013)
Article Google Scholar
D.J. Gerner, P.A. Schrodt, Ö. Yilmaz, R. Abu-Jabr, The Creation of CAMEO (Conflict and Mediation Event Observations): An Event Data Framework for a Post Cold War World. Paper presented at the annual meeting of the American Political Science Association, San Francisco (2002)
Google Scholar
M. Gorman, Simulating Science (Indiana University Press, Bloomington, 1992)
Google Scholar
L.A. Grenoble, L.J. Whaley, Endangered Languages: Current Issues and Future Prospects (Cambridge University Press, Cambridge, 1998)
Book Google Scholar
D.R. Heise, Project Magellan: collecting cross-cultural affective meanings via the internet. Electron. J. Sociol. 5(3) (2001). Available online: http://www.indiana.edu/~socpsy/papers/magellan/Magellan.htm
T. Hermann, H. Ritter, Listen to your data: model-based sonification for data analysis, in Proceedings of the ISIMADE'99, Baden-Baden, Germany (1999)
Google Scholar
O.R. Holsti, Content Analysis for the Social Sciences and Humanities (Addison-Wesley, Reading, 1969)
Google Scholar
D.J. Hopkins, G. King, A method for automated nonparametric content analysis for social science. Am. J. Polit. Sci. 54(1), 229–247 (2010)
Article Google Scholar
W. Hsu, M.L. Lee, J. Wang, Temporal and Spatio-Temporal Data Mining (IGI Publishing, New York, 2008)
Google Scholar
G. King, W. Lowe, An automated information extraction tool for international conflict data with performance as good as human coders: a rare events evaluation design. Int. Organ. 57, 617–642 (2003)
Article Google Scholar
K. Krippendorf, Content Analysis: An Introduction to Its Methodology (Sage, Thousand Oaks, 2004)
Google Scholar
K. Krippendorf, M.A. Bock (eds.), The Content Analysis Reader (Sage, Thousand Oaks, 2008)
Google Scholar
P. Langley, Data-driven discovery of physical laws. Cogn. Sci. 5(1), 31–54 (1981)
Article Google Scholar
P. Langley, Heuristics for scientific discovery: the legacy of Herbert Simon, in Models of a Man: Essays in Memory of Herbert A. Simon, ed. by M. Augier, J.G. March (MIT Press, Cambridge, 2004), pp. 461–471
Google Scholar
D. Lazer, A. Pentland, L. Adamic, S. Aral, A.-L. Barabasi, D. Brewer, M. Van Alstyne, Computational social science. Science 323(5915), 721–723 (2009)
Article Google Scholar
K. Leetaru, Data Mining Methods for the Content Analyst: An Introduction to the Computational Analysis of Content (Routledge, London, 2011)
Google Scholar
B.L. Monroe, P.A. Schrodt (eds.), in Political Analysis (2008). Special Issue: The Statistical Analysis of Political Text 16(4), Autumn
Google Scholar
I.-C. Moon, K.M. Carley, Modeling and simulation of terrorist networks in social and geospatial dimensions. IEEE Intell. Syst. 22(5), 40–49 (2007). Special Issue on Social Computing
Article Google Scholar
C.E. Osgood, W.H. May, M.S. Miron, Cross-Cultural Universals of Affective Meaning (University of Illinois Press, Urbana, 1975)
Google Scholar
R. Popping, Computer-Assisted Text Analysis (Sage, Thousand Oaks, 2000)
Google Scholar
P.A. Schrodt, Short term prediction of international events using a Holland classifier. Math. Comput. Model. 12, 589–600 (1989)
Article MATH Google Scholar
P.A. Schrodt, Pattern recognition of international crises using hidden Markov models, in Political Complexity, ed. by D. Richards (University of Michigan Press, Ann Arbor, 2000)
Google Scholar
H.A. Simon, Autobiography, in Nobel Lectures, Economics 1969–1980, ed. by A. Lindbeck (World Scientific, Singapore, 1992)
Google Scholar
P.J. Stone, R.F. Bales, J.Z. Namenwirth, D.M. Ogilvie, The general inquirer: a computer system for content analysis and retrieval based on the sentence as a unit of information. Behav. Sci. 7(4), 484–498 (1962)
Article Google Scholar
L. Tang, H. Liu, Community Detection and Mining in Social Media (Morgan & Claypool, New York, 2010). Available free online
Google Scholar
J.J. Thomas, K.A. Cook (eds.), Illuminating the Path (IEEE Comput. Soc., Los Alamitos, 2005)
Google Scholar
C. Williford, C. Henry, A. Friedlander (eds.), One Culture: Computationally Intensive Research in the Humanities and Social Sciences—A Report on the Experiences of First Respondents to the Digging into Data Challenge (Council on Library and Information Resources, Washington, 2012)
Google Scholar
T. Zhang, C.-C.J. Kuo, Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans. Speech Audio Process. 9(4), 441–457 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

George Mason University, Fairfax, VA, USA
Claudio Cioffi-Revilla

Authors

Claudio Cioffi-Revilla
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cioffi-Revilla, C. (2014). Automated Information Extraction. In: Introduction to Computational Social Science. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-5661-1_3

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5661-1_3
Published: 01 January 2014
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5660-4
Online ISBN: 978-1-4471-5661-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics