Automated Coding of Political Event Data

Schrodt, Philip A.; Van Brackle, David

doi:10.1007/978-1-4614-5311-6_2

Philip A. Schrodt² &
David Van Brackle³

2190 Accesses
32 Citations
6 Altmetric

Abstract

Political event data have long been used in the quantitative study of international politics, dating back to the early efforts of Edward Azar’s COPDAB [1] andCharles McClelland’s WEIS [18] as well as a variety of more specialized efforts such as Leng’s BCOW [16]. By the late 1980s, the NSF-funded Data Development in International Relations project [20] had identified event data as the second most common form of data—behind the various Correlates of War data sets— used in quantitative studies. The 1990s saw the development of two practical automated event data coding systems, the NSF-funded KEDS (http://eventdata. psu.edu; [9, 31, 33]) and the proprietary VRA-Reader (http://vranet.com; [15, 27]) and in the 2000s, the development of two new political event coding ontologies— CAMEO [34] and IDEA[4,27]—designed for implementation in automated coding systems. A summary of the current status of political event projects, as well as detailed discussions of some of these, can be found in [10, 32].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Individual coders, particularly working for short periods of time, can of course reliably code much faster than this. But for the overall labor requirements—that is, the total time invested in the enterprise divided by the resulting useable events—the six events per hour is a pretty good rule of thumb and—like the labor requirements of a string quartet—has changed little over time.
2.
The count of “stories” has varied continually as we’ve updated the downloads, modified the filters and so forth, and so an exact count is both unavailable and irrelevant. But starts around around eight to nine-million.
3.
We’ve actually identified about 75 distinct sources in the stories, presumably the result of quirks in the LN search engine. However, these additional sources generate only a small number of stories, and by far the bulk of the stories come from the sources we had deliberately identified.
4.
This will not, however, catching spelling corrections in the first 48 characters. In the Reuters-based filtering for the KEDS project, we did a count of the frequency of letters in the lead sentence, and identified a duplicate if the absolute distance between that vector for two stories, ∑ | x _i − y _i | > η, where the threshold η was usually around 10. This catches spelling and date corrections, the most common source of duplicates in Reuters, but failed on AFP, which tends to expand the details in a sentence as more information becomes available.
5.
Notably to traders—carbon-based and silicon-based—in the financial sector, which drives much if not most of the international reporting. The likelihood of an event being reported is very much proportional to the possibility that someone can make or lose money on it.
6.
The phrase “cue category” refers to the broad two-digit codes, as opposed to the more specific three and four digit subcategories.
7.
To date, all of the successful automated event data coding systems are dictionary and rule based, rather than using statistical-methods: see [36]. While statistical methods would certainly be attractive, and seem to work on highly simplified “toy problems” such as those in [6], all of the successfully-deployed systems to date are dictionary-based, and numerous efforts to scale initially-promising statistical methods have failed.
8.
Including, at the request of the sponsor, some bugs in TABARI, though after the equivalence of the two systems was demonstrated, these were corrected in both systems.
9.
In principle these enhancements could also be applied to Jabari-NLP, though it is running in secure military systems rather than open environments and to date has made less use of cluster processing.
10.
Though we’ve not been able to locate this on the web. Itself interesting.

References

Azar EE (1980) The conflict and peace data bank (COPDAB) project. J Confl Resolut 24: 143–152
Google Scholar
Azar EE (1982) The codebook of the conflict and peace data bank (COPDAB). Center for International Development, University of Maryland, College Park
Google Scholar
Azar EE, Sloan T (1975) Dimensions of interaction. University Center for International Studies, University of Pittsburgh, Pittsburgh
Google Scholar
Bond D, Bond J, Oh C, Jenkins JC, Taylor CL (2003) Integrated data for events analysis (IDEA): An event typology for automated events data development. J Peace Res 40(6): 733–745
Google Scholar
Bond D, Jenkins JC, Taylor CLT, Schock K (1997) Mapping mass political conflict and civil society: Issues and prospects for the automated development of event data. J Confl Resolut 41(4):553–579
Google Scholar
Boschee E, Natarajan P, Weischedel R (2012) Automatic extraction of events from open source text for predictive forecasting. In: Subrahmanian V (ed) Handbook on computational approaches to counterterrorism. Springer, New York
Google Scholar
Chenoweth E, Dugan L (2012) Rethinking counterterrorism: evidence from israe. Working Paper, Wesleyan University, Middletown, CT
Google Scholar
Dugan L, Chenoweth E (2012) Moving beyond deterrence: the effectiveness of raising the expected utility of abstaining from terrorism in israel. Working Paper, University of Maryland, College Park, MD
Google Scholar
Gerner DJ, Schrodt PA, Francisco RA, Weddle JL (1994) The machine coding of events from regional and international sources. Int Stud Q 38:91–119
Google Scholar
Gleditsch NP (2012) Special issue: event data in the study of conflict. Int Interact 38(4): 375–569
Google Scholar
Goldstein JS (1992) A conflict-cooperation scale for WEIS events data. J Confl Resolut 36:369–385
Google Scholar
Howell LD (1983) A comparative study of the WEIS and COPDAB data sets. Int Stud Q 27:149–159
Google Scholar
Jenkins CJ, Bond D (2001) Conflict carrying capacity, political crisis, and reconstruction. J Confl Resolut 45(1):3–31
Google Scholar
Kahneman D (2011) Thinking fast and slow. Farrar, Straus and Giroux, New York
Google Scholar
King G, Lowe W (2004) An automated information extraction tool for international conflict data with performance as good as human coders: A rare events evaluation design. Int Organ 57(3):617–642
Google Scholar
Leng RJ (1987) Behavioral correlates of war, 1816–1975. (ICPSR 8606). Inter-University Consortium for Political and Social Research, Ann Arbor
Google Scholar
McClelland CA (1967) World-event-interaction-survey: a research project on the theory and measurement of international interaction and transaction. University of Southern California, Los Angeles, CA
Google Scholar
McClelland CA (1976) World event/interaction survey codebook (ICPSR 5211). Inter-University Consortium for Political and Social Research, Ann Arbor
Google Scholar
McClelland CA (1983) Let the user beware. Int Stud Q 27(2):169–177
Google Scholar
Merritt RL, Muncaster RG, Zinnes DA (eds) (1993) International event data developments: DDIR phase II. University of Michigan Press, Ann Arbor
Google Scholar
Mikhaylov S, Laver M, Benoit K Coder reliability and misclassification in the human coding of party manifestos. Political Anal 20(1):78–91 (2012)
Google Scholar
Mooney B, Simpson B (2003) Breaking News: How the Wheels Came off at Reuters. Capstone, Mankato
Google Scholar
Nardulli P (2011) The social, political and economic event database project (SPEED). http://www.clinecenter.illinois.edu/research/speed.html
Nardulli PF, Leetaru KH, Hayes M Event data, civil unrest and the SPEED project (2011). Presented at the International Studies Association Meetings, Montréal
Google Scholar
O’Brien S (2012) A multi-method approach for near real time conflict and crisis early warning. In: Subrahmanian V (ed) Handbook on computational approaches to counterterrorism. Springer, New York
Google Scholar
O’Brien SP (2010) Crisis early warning and decision support: contemporary approaches and thoughts on future research. Int Stud Rev 12(1):87–104
Google Scholar
Petroff V, Bond J, Bond D (2012) Using hidden Markov models to predict terror before it hits (again). In: Subrahmanian V (ed) Handbook on computational approaches to counterterrorism. Springer, New York
Google Scholar
Ruggeri A, Gizelis TI, Dorussen H (2011) Events data as bismarck’s sausages? intercoder reliability, coders’ selection, and data quality. Int Interact 37(1):340–361
Google Scholar
Russett BM, Singer JD, Small M (1968) National political units in the twentieth century: a standardized list. Am Political Sci Rev 62(3):932–951
Google Scholar
Schrodt PA (1994) Statistical characteristics of events data. Int Interact 20(1–2):35–53
Google Scholar
Schrodt PA (2006) Twenty years of the Kansas event data system project. Political Methodol 14(1):2–8
Google Scholar
Schrodt PA (2012) Precedents, progress and prospects in political event data. Int Interact 38(5):546–569
Google Scholar
Schrodt PA, Gerner DJ (1994) Validity assessment of a machine-coded event data set for the Middle East, 1982–1992. Am J Political Sci 38:825–854
Google Scholar
Schrodt PA, Gerner DJ, Yilmaz Ö (2009) Conflict and mediation event observations (CAMEO): an event data framework for a post Cold War world. In: Bercovitch J, Gartner S (eds) International conflict mediation: new approaches and findings. Routledge, New York
Google Scholar
Schrodt PA, Palmer G, Hatipoglu ME (2008) Automated detection of reports of militarized interstate disputes using the SVM document classification algorithm. Paper presented at American Political Science Association, Chicago, IL
Google Scholar
Shilliday A, Lautenschlager J (2012) Data for a global icews and ongoing research. In: 2nd international conference on cross-cultural decision making: focus 2012, San Francisco, CA
Google Scholar
Tetlock PE (2005) Expert political judgment: how good is it? how can we know? Princeton University Press, Princeton
Google Scholar
Van Brackle D, Wedgwood J (2011) Event coding for hscb modeling: challenges and approaches. In: Human social culture behavior modeling focus 2011, Chantilly, VA
Google Scholar

Download references

Acknowledgements

This research was supported in part by contracts from the Defense Advanced Research Projects Agency under the Integrated Crisis Early Warning System (ICEWS) program (Prime Contract #FA8650-07-C-7749: Lockheed-Martin Advance Technology Laboratories) as well as grants from the National Science Foundation (SES-0096086, SES-0455158, SES-0527564, SES-1004414) and by a Fulbright-Hays Research Fellowship for work by Schrodt at the Peace Research Institute, Oslo (http://www.prio.no). The results and findings in no way represent the views of Lockheed-Martin, the Department of Defense, DARPA, or NSF. It has benefitted from extended discussions and experimentation within the ICEWS team and the KEDS research group at the University of Kansas; we would note in particular contributions from Steve Shellman, Hans Leonard, Brandon Stewart, Jennifer Lautenschlager, Andrew Shilliday, Will Lowe, Steve Purpura, Vladimir Petroff, Baris Kesgin and Matthias Heilke.

Author information

Authors and Affiliations

Political Science, Pennsylvania State University, University Park, PA, 16801, USA
Philip A. Schrodt
Lockheed Martin Advanced Technology Laboratories, Lockheed Martin Advanced Technology Laboratories, 3550 George Busbee Parkway, Kennesaw, GA, 30144, USA
David Van Brackle

Authors

Philip A. Schrodt
View author publications
You can also search for this author in PubMed Google Scholar
David Van Brackle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philip A. Schrodt .

Editor information

Editors and Affiliations

, Computer Science Department, University of Maryland, AV Williams Building, College Park, 20854, Maryland, USA
V.S. Subrahmanian

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Schrodt, P.A., Van Brackle, D. (2013). Automated Coding of Political Event Data. In: Subrahmanian, V. (eds) Handbook of Computational Approaches to Counterterrorism. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5311-6_2

Download citation

DOI: https://doi.org/10.1007/978-1-4614-5311-6_2
Published: 08 November 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5310-9
Online ISBN: 978-1-4614-5311-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics