Skip to main content

Automated Coding of Political Event Data

  • Chapter
  • First Online:
Book cover Handbook of Computational Approaches to Counterterrorism

Abstract

Political event data have long been used in the quantitative study of international politics, dating back to the early efforts of Edward Azar’s COPDAB [1] andCharles McClelland’s WEIS [18] as well as a variety of more specialized efforts such as Leng’s BCOW [16]. By the late 1980s, the NSF-funded Data Development in International Relations project [20] had identified event data as the second most common form of data—behind the various Correlates of War data sets— used in quantitative studies. The 1990s saw the development of two practical automated event data coding systems, the NSF-funded KEDS (http://eventdata. psu.edu; [9, 31, 33]) and the proprietary VRA-Reader (http://vranet.com; [15, 27]) and in the 2000s, the development of two new political event coding ontologies— CAMEO [34] and IDEA[4,27]—designed for implementation in automated coding systems. A summary of the current status of political event projects, as well as detailed discussions of some of these, can be found in [10, 32].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 149.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Individual coders, particularly working for short periods of time, can of course reliably code much faster than this. But for the overall labor requirements—that is, the total time invested in the enterprise divided by the resulting useable events—the six events per hour is a pretty good rule of thumb and—like the labor requirements of a string quartet—has changed little over time.

  2. 2.

    The count of “stories” has varied continually as we’ve updated the downloads, modified the filters and so forth, and so an exact count is both unavailable and irrelevant. But starts around around eight to nine-million.

  3. 3.

    We’ve actually identified about 75 distinct sources in the stories, presumably the result of quirks in the LN search engine. However, these additional sources generate only a small number of stories, and by far the bulk of the stories come from the sources we had deliberately identified.

  4. 4.

    This will not, however, catching spelling corrections in the first 48 characters. In the Reuters-based filtering for the KEDS project, we did a count of the frequency of letters in the lead sentence, and identified a duplicate if the absolute distance between that vector for two stories, ∑ | x i  − y i  |  > η, where the threshold η was usually around 10. This catches spelling and date corrections, the most common source of duplicates in Reuters, but failed on AFP, which tends to expand the details in a sentence as more information becomes available.

  5. 5.

    Notably to traders—carbon-based and silicon-based—in the financial sector, which drives much if not most of the international reporting. The likelihood of an event being reported is very much proportional to the possibility that someone can make or lose money on it.

  6. 6.

    The phrase “cue category” refers to the broad two-digit codes, as opposed to the more specific three and four digit subcategories.

  7. 7.

    To date, all of the successful automated event data coding systems are dictionary and rule based, rather than using statistical-methods: see [36]. While statistical methods would certainly be attractive, and seem to work on highly simplified “toy problems” such as those in [6], all of the successfully-deployed systems to date are dictionary-based, and numerous efforts to scale initially-promising statistical methods have failed.

  8. 8.

    Including, at the request of the sponsor, some bugs in TABARI, though after the equivalence of the two systems was demonstrated, these were corrected in both systems.

  9. 9.

    In principle these enhancements could also be applied to Jabari-NLP, though it is running in secure military systems rather than open environments and to date has made less use of cluster processing.

  10. 10.

    Though we’ve not been able to locate this on the web. Itself interesting.

References

  1. Azar EE (1980) The conflict and peace data bank (COPDAB) project. J Confl Resolut 24: 143–152

    Google Scholar 

  2. Azar EE (1982) The codebook of the conflict and peace data bank (COPDAB). Center for International Development, University of Maryland, College Park

    Google Scholar 

  3. Azar EE, Sloan T (1975) Dimensions of interaction. University Center for International Studies, University of Pittsburgh, Pittsburgh

    Google Scholar 

  4. Bond D, Bond J, Oh C, Jenkins JC, Taylor CL (2003) Integrated data for events analysis (IDEA): An event typology for automated events data development. J Peace Res 40(6): 733–745

    Google Scholar 

  5. Bond D, Jenkins JC, Taylor CLT, Schock K (1997) Mapping mass political conflict and civil society: Issues and prospects for the automated development of event data. J Confl Resolut 41(4):553–579

    Google Scholar 

  6. Boschee E, Natarajan P, Weischedel R (2012) Automatic extraction of events from open source text for predictive forecasting. In: Subrahmanian V (ed) Handbook on computational approaches to counterterrorism. Springer, New York

    Google Scholar 

  7. Chenoweth E, Dugan L (2012) Rethinking counterterrorism: evidence from israe. Working Paper, Wesleyan University, Middletown, CT

    Google Scholar 

  8. Dugan L, Chenoweth E (2012) Moving beyond deterrence: the effectiveness of raising the expected utility of abstaining from terrorism in israel. Working Paper, University of Maryland, College Park, MD

    Google Scholar 

  9. Gerner DJ, Schrodt PA, Francisco RA, Weddle JL (1994) The machine coding of events from regional and international sources. Int Stud Q 38:91–119

    Google Scholar 

  10. Gleditsch NP (2012) Special issue: event data in the study of conflict. Int Interact 38(4): 375–569

    Google Scholar 

  11. Goldstein JS (1992) A conflict-cooperation scale for WEIS events data. J Confl Resolut 36:369–385

    Google Scholar 

  12. Howell LD (1983) A comparative study of the WEIS and COPDAB data sets. Int Stud Q 27:149–159

    Google Scholar 

  13. Jenkins CJ, Bond D (2001) Conflict carrying capacity, political crisis, and reconstruction. J Confl Resolut 45(1):3–31

    Google Scholar 

  14. Kahneman D (2011) Thinking fast and slow. Farrar, Straus and Giroux, New York

    Google Scholar 

  15. King G, Lowe W (2004) An automated information extraction tool for international conflict data with performance as good as human coders: A rare events evaluation design. Int Organ 57(3):617–642

    Google Scholar 

  16. Leng RJ (1987) Behavioral correlates of war, 1816–1975. (ICPSR 8606). Inter-University Consortium for Political and Social Research, Ann Arbor

    Google Scholar 

  17. McClelland CA (1967) World-event-interaction-survey: a research project on the theory and measurement of international interaction and transaction. University of Southern California, Los Angeles, CA

    Google Scholar 

  18. McClelland CA (1976) World event/interaction survey codebook (ICPSR 5211). Inter-University Consortium for Political and Social Research, Ann Arbor

    Google Scholar 

  19. McClelland CA (1983) Let the user beware. Int Stud Q 27(2):169–177

    Google Scholar 

  20. Merritt RL, Muncaster RG, Zinnes DA (eds) (1993) International event data developments: DDIR phase II. University of Michigan Press, Ann Arbor

    Google Scholar 

  21. Mikhaylov S, Laver M, Benoit K Coder reliability and misclassification in the human coding of party manifestos. Political Anal 20(1):78–91 (2012)

    Google Scholar 

  22. Mooney B, Simpson B (2003) Breaking News: How the Wheels Came off at Reuters. Capstone, Mankato

    Google Scholar 

  23. Nardulli P (2011) The social, political and economic event database project (SPEED). http://www.clinecenter.illinois.edu/research/speed.html

  24. Nardulli PF, Leetaru KH, Hayes M Event data, civil unrest and the SPEED project (2011). Presented at the International Studies Association Meetings, Montréal

    Google Scholar 

  25. O’Brien S (2012) A multi-method approach for near real time conflict and crisis early warning. In: Subrahmanian V (ed) Handbook on computational approaches to counterterrorism. Springer, New York

    Google Scholar 

  26. O’Brien SP (2010) Crisis early warning and decision support: contemporary approaches and thoughts on future research. Int Stud Rev 12(1):87–104

    Google Scholar 

  27. Petroff V, Bond J, Bond D (2012) Using hidden Markov models to predict terror before it hits (again). In: Subrahmanian V (ed) Handbook on computational approaches to counterterrorism. Springer, New York

    Google Scholar 

  28. Ruggeri A, Gizelis TI, Dorussen H (2011) Events data as bismarck’s sausages? intercoder reliability, coders’ selection, and data quality. Int Interact 37(1):340–361

    Google Scholar 

  29. Russett BM, Singer JD, Small M (1968) National political units in the twentieth century: a standardized list. Am Political Sci Rev 62(3):932–951

    Google Scholar 

  30. Schrodt PA (1994) Statistical characteristics of events data. Int Interact 20(1–2):35–53

    Google Scholar 

  31. Schrodt PA (2006) Twenty years of the Kansas event data system project. Political Methodol 14(1):2–8

    Google Scholar 

  32. Schrodt PA (2012) Precedents, progress and prospects in political event data. Int Interact 38(5):546–569

    Google Scholar 

  33. Schrodt PA, Gerner DJ (1994) Validity assessment of a machine-coded event data set for the Middle East, 1982–1992. Am J Political Sci 38:825–854

    Google Scholar 

  34. Schrodt PA, Gerner DJ, Yilmaz Ö (2009) Conflict and mediation event observations (CAMEO): an event data framework for a post Cold War world. In: Bercovitch J, Gartner S (eds) International conflict mediation: new approaches and findings. Routledge, New York

    Google Scholar 

  35. Schrodt PA, Palmer G, Hatipoglu ME (2008) Automated detection of reports of militarized interstate disputes using the SVM document classification algorithm. Paper presented at American Political Science Association, Chicago, IL

    Google Scholar 

  36. Shilliday A, Lautenschlager J (2012) Data for a global icews and ongoing research. In: 2nd international conference on cross-cultural decision making: focus 2012, San Francisco, CA

    Google Scholar 

  37. Tetlock PE (2005) Expert political judgment: how good is it? how can we know? Princeton University Press, Princeton

    Google Scholar 

  38. Van Brackle D, Wedgwood J (2011) Event coding for hscb modeling: challenges and approaches. In: Human social culture behavior modeling focus 2011, Chantilly, VA

    Google Scholar 

Download references

Acknowledgements

This research was supported in part by contracts from the Defense Advanced Research Projects Agency under the Integrated Crisis Early Warning System (ICEWS) program (Prime Contract #FA8650-07-C-7749: Lockheed-Martin Advance Technology Laboratories) as well as grants from the National Science Foundation (SES-0096086, SES-0455158, SES-0527564, SES-1004414) and by a Fulbright-Hays Research Fellowship for work by Schrodt at the Peace Research Institute, Oslo (http://www.prio.no). The results and findings in no way represent the views of Lockheed-Martin, the Department of Defense, DARPA, or NSF. It has benefitted from extended discussions and experimentation within the ICEWS team and the KEDS research group at the University of Kansas; we would note in particular contributions from Steve Shellman, Hans Leonard, Brandon Stewart, Jennifer Lautenschlager, Andrew Shilliday, Will Lowe, Steve Purpura, Vladimir Petroff, Baris Kesgin and Matthias Heilke.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philip A. Schrodt .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Schrodt, P.A., Van Brackle, D. (2013). Automated Coding of Political Event Data. In: Subrahmanian, V. (eds) Handbook of Computational Approaches to Counterterrorism. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5311-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-5311-6_2

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-5310-9

  • Online ISBN: 978-1-4614-5311-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics