Abstract
Chapter 1 identified automated information extraction (also known as computational content analysis or media-mining) as the first area of Computational Social Science. Chapter 3 takes a close look at this area, beginning with roots in linguistics. Computational text mining has been the main application of this area of CSS, but audio, imagery, and social media data are also components of the expanding Big Data universe. Theory and research in automated information extraction is at the base of major social science discoveries, such as universal semantic spaces and the fundamental structure of human information-processing. A major focus of this chapter is on the methodology of automated information extraction, including phases that extend from the formulation of research questions to the selection of sources, preprocessing preparations, to analysis in a technical sense. Illustrative examples are provided, including some recent transformative breakthroughs in computational events data analysis and geospatial data structures. The material in this chapter has intrinsic value as well as being instrumental for understanding networks, complexity, and simulation modeling approaches in subsequent chapters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Much of modern science is said to have roots in the ancient Greeks. This is quite true, but others before them may have contributed earlier scientific ideas contained in media that have been lost (manuscripts, inscriptions) due to the destruction of many large ancient libraries, such as those of Alexandria, Antioch, Baghdad, Córdoba, and Damascus, just to mention some of those in the Mediterranean world. India and China also experienced the destruction of many libraries during their early history.
- 2.
By contrast, John von Neumann's (1958) computer model of the human brain-mind phenomenon turned out to be wrong. Unlike von Neumann's, the EPA-space model of the human mind is empirically validated, even if it still lacks deep theoretical explanation.
- 3.
The predecessor of Surveyor was called Attitude, which was also developed by David Heise (1982) as the first computer-based extractor of EPA ratings, replacing the old paper-based forms used since Charles E. Osgood and his collaborators.
- 4.
Unfortunately, in social science the term “data mining” has quite a negative connotation, since it is understood as lacking in theoretical understanding and symptomatic of so-called “barefoot empiricism,” akin to “a fishing expedition.” CSS assigns high priority to theory—the basis of understanding—while recognizing the scientific value of inductive data mining.
- 5.
Besides its scientific value in CSS research, the popular media also uses basic forms of vocabulary analysis when counting the frequency of words used by politicians, such as in inaugural addresses or similar major speeches. The value of such anecdotal uses is rather limited, sometimes even misleading, since speechwriters and communication experts are well-versed in scientific principles of applied linguistics and human information processing, including sophisticated understanding of semantic differentials and other affect control, marketing, and propaganda devices.
- 6.
The operationalization of the NRR in terms of two standard deviations from the process mean was suggested to political scientist and events data pioneer Edward E. Azar [1938–1991] by the mathematician Anatol Rapoport [1911–2007]. It was first applied to international relations events data series to study protracted conflicts in the Middle East. Azar was founder and director of the Conflict and Peace Data Bank (COPDAB), founded at the University of North Carolina at Chapel Hill in the 1970s and moved to the Centre for International Development and Conflict Management (CIDCM) of the University of Maryland at College Park in the 1980s.
Recommended Readings
N. Agarwal, H. Liu, Modeling and Data Mining in Blogosphere (Morgan & Claypool, New York, 2009). Available free online
E.E. Azar, S. Lerner, The use of semantic dimensions in the scaling of international events. Int. Interact. 7(4), 361–378 (1981)
R. Feldman, J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data (Cambridge University Press, Cambridge, 2007)
M.D. Fischer, S.M. Lyon, D. Sosna, Harmonizing diversity: tuning anthropological research to complexity. Soc. Sci. Comput. Rev. 31(1), 3–15 (2013)
D.J. Gerner, P.A. Schrodt, Ö. Yilmaz, R. Abu-Jabr, The Creation of CAMEO (Conflict and Mediation Event Observations): An Event Data Framework for a Post Cold War World. Paper presented at the annual meeting of the American Political Science Association, San Francisco (2002)
M. Gorman, Simulating Science (Indiana University Press, Bloomington, 1992)
L.A. Grenoble, L.J. Whaley, Endangered Languages: Current Issues and Future Prospects (Cambridge University Press, Cambridge, 1998)
D.R. Heise, Project Magellan: collecting cross-cultural affective meanings via the internet. Electron. J. Sociol. 5(3) (2001). Available online: http://www.indiana.edu/~socpsy/papers/magellan/Magellan.htm
T. Hermann, H. Ritter, Listen to your data: model-based sonification for data analysis, in Proceedings of the ISIMADE'99, Baden-Baden, Germany (1999)
O.R. Holsti, Content Analysis for the Social Sciences and Humanities (Addison-Wesley, Reading, 1969)
D.J. Hopkins, G. King, A method for automated nonparametric content analysis for social science. Am. J. Polit. Sci. 54(1), 229–247 (2010)
W. Hsu, M.L. Lee, J. Wang, Temporal and Spatio-Temporal Data Mining (IGI Publishing, New York, 2008)
G. King, W. Lowe, An automated information extraction tool for international conflict data with performance as good as human coders: a rare events evaluation design. Int. Organ. 57, 617–642 (2003)
K. Krippendorf, Content Analysis: An Introduction to Its Methodology (Sage, Thousand Oaks, 2004)
K. Krippendorf, M.A. Bock (eds.), The Content Analysis Reader (Sage, Thousand Oaks, 2008)
P. Langley, Data-driven discovery of physical laws. Cogn. Sci. 5(1), 31–54 (1981)
P. Langley, Heuristics for scientific discovery: the legacy of Herbert Simon, in Models of a Man: Essays in Memory of Herbert A. Simon, ed. by M. Augier, J.G. March (MIT Press, Cambridge, 2004), pp. 461–471
D. Lazer, A. Pentland, L. Adamic, S. Aral, A.-L. Barabasi, D. Brewer, M. Van Alstyne, Computational social science. Science 323(5915), 721–723 (2009)
K. Leetaru, Data Mining Methods for the Content Analyst: An Introduction to the Computational Analysis of Content (Routledge, London, 2011)
B.L. Monroe, P.A. Schrodt (eds.), in Political Analysis (2008). Special Issue: The Statistical Analysis of Political Text 16(4), Autumn
I.-C. Moon, K.M. Carley, Modeling and simulation of terrorist networks in social and geospatial dimensions. IEEE Intell. Syst. 22(5), 40–49 (2007). Special Issue on Social Computing
C.E. Osgood, W.H. May, M.S. Miron, Cross-Cultural Universals of Affective Meaning (University of Illinois Press, Urbana, 1975)
R. Popping, Computer-Assisted Text Analysis (Sage, Thousand Oaks, 2000)
P.A. Schrodt, Short term prediction of international events using a Holland classifier. Math. Comput. Model. 12, 589–600 (1989)
P.A. Schrodt, Pattern recognition of international crises using hidden Markov models, in Political Complexity, ed. by D. Richards (University of Michigan Press, Ann Arbor, 2000)
H.A. Simon, Autobiography, in Nobel Lectures, Economics 1969–1980, ed. by A. Lindbeck (World Scientific, Singapore, 1992)
P.J. Stone, R.F. Bales, J.Z. Namenwirth, D.M. Ogilvie, The general inquirer: a computer system for content analysis and retrieval based on the sentence as a unit of information. Behav. Sci. 7(4), 484–498 (1962)
L. Tang, H. Liu, Community Detection and Mining in Social Media (Morgan & Claypool, New York, 2010). Available free online
J.J. Thomas, K.A. Cook (eds.), Illuminating the Path (IEEE Comput. Soc., Los Alamitos, 2005)
C. Williford, C. Henry, A. Friedlander (eds.), One Culture: Computationally Intensive Research in the Humanities and Social Sciences—A Report on the Experiences of First Respondents to the Digging into Data Challenge (Council on Library and Information Resources, Washington, 2012)
T. Zhang, C.-C.J. Kuo, Audio content analysis for online audiovisual data segmentation and classification. IEEE Trans. Speech Audio Process. 9(4), 441–457 (2001)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag London
About this chapter
Cite this chapter
Cioffi-Revilla, C. (2014). Automated Information Extraction. In: Introduction to Computational Social Science. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-5661-1_3
Download citation
DOI: https://doi.org/10.1007/978-1-4471-5661-1_3
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5660-4
Online ISBN: 978-1-4471-5661-1
eBook Packages: Computer ScienceComputer Science (R0)