Skip to main content

A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11438))

Abstract

We propose a coherent set of tasks for protest information collection in the context of generalizable natural language processing. The tasks are news article classification, event sentence detection, and event extraction. Having tools for collecting event information from data produced in multiple countries enables comparative sociology and politics studies. We have annotated news articles in English from a source and a target country in order to be able to measure the performance of the tools developed using data from one country on data from a different country. Our preliminary experiments have shown that the performance of the tools developed using English texts from India drops to a level that are not usable when they are applied on English texts from China. We think our setting addresses the challenge of building generalizable NLP tools that perform well independent of the source of the text and will accelerate progress in line of developing generalizable NLP systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The term used when referring to these events in collective is “repertoires of contention” [7, 15]. We will use “protest events” from here on for the sake of brevity simplicity.

  2. 2.

    http://www.clef-initiative.eu, accessed January 19, 2019.

  3. 3.

    http://clef2019.clef-initiative.eu, accessed January 19, 2019.

  4. 4.

    https://emw.ku.edu.tr/clef-protestnews-2019, accessed January 19, 2019.

  5. 5.

    Using available corpora that are already being allowed to be distributed freely is not an option for our setting due to the requirement of having a representative sample from the source and target countries. Also, the dataset should contain data created in more than one country in order to be useful in our setting.

  6. 6.

    The overlap ratio is 100%.

  7. 7.

    We mainly annotate the event trigger, place, time, participant, organizer, and target of the protest.

  8. 8.

    The difference between our and these projects’ annotation manuals potentially affects the precision and recall as well.

  9. 9.

    https://github.com/emerging-welfare/ie-tools-test-on-India-b1, accessed January 19.

References

  1. Akdemir, A., Hürriyetoğlu, A., Yörük, E., Gürel, B., Yoltar, C., Yüret, D.: Towards generalizable place name recognition systems: analysis and enhancement of NER systems on English News from India. In: Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018, pp. 8:1–8:10. ACM, New York (2018). https://doi.org/10.1145/3281354.3281363

  2. Boschee, E., Natarajan, P., Weischedel, R.: Automatic extraction of events from open source text for predictive forecasting. In: Subrahmanian, V. (ed.) Handbook of Computational Approaches to Counterterrorism, pp. 51–67. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-5311-6_3

    Chapter  Google Scholar 

  3. Büyüköz, B., Hürriyetoğlu, A., Yörük, E., Yüret, D.: Examining existing information extraction tools on manually-annotated protest events in Indian news. In: Proceedings of Computational Linguistics in Netherlands (CLIN), CLIN29 (2019)

    Google Scholar 

  4. Chenoweth, E., Lewis, O.A.: Unpacking nonviolent campaigns: introducing the NAVCO 2.0 dataset. J. Peace Res. 50(3), 415–423 (2013). https://doi.org/10.1177/0022343312471551

    Article  Google Scholar 

  5. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  6. Ettinger, A., Rao, S., Daumé III, H., Bender, E.M.: Towards linguistically generalizable NLP systems: a workshop and shared task. In: Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems, pp. 1–10. Association for Computational Linguistics (2017). http://aclweb.org/anthology/W17-5401

  7. Giugni, M.G.: Was it worth the effort? The outcomes and consequences of social movements. Ann. Rev. Sociol. 24, 371–393 (1998). http://www.jstor.org/stable/223486

    Article  Google Scholar 

  8. Hammond, J., Weidmann, N.B.: Using machine-coded event data for the micro-level study of political violence. Res. Polit. 1(2) (2014). https://doi.org/10.1177/2053168014539924

    Article  Google Scholar 

  9. Leetaru, K., Schrodt, P.A.: GDELT: global data on events, location, and tone, 1979–2012. In: ISA Annual Convention, vol. 2, pp. 1–49. Citeseer (2013)

    Google Scholar 

  10. Lorenzini, J., Makarov, P., Kriesi, H., Wueest, B.: Towards a dataset of automatically coded protest events from English-language Newswire documents. In: Paper Presented at the Amsterdam Text Analysis Conference (2016)

    Google Scholar 

  11. Nardulli, P.F., Althaus, S.L., Hayes, M.: A progressive supervised-learning approach to generating rich civil strife data. Sociol. Methodol. 45(1), 148–183 (2015). https://doi.org/10.1177/0081175015581378

    Article  Google Scholar 

  12. Schrodt, P.A., Beieler, J., Idris, M.: Three’sa charm? Open event data coding with el: Diablo, Petrarch, and the open event data alliance. In: ISA Annual Convention (2014)

    Google Scholar 

  13. Soboroff, I., Ferro, N., Fuhr, N.: Report on GLARE 2018: 1st workshop on generalization in information retrieval: can we predict performance in new domains? SIGIR Forum 52(2), 132–137 (2018). http://sigir.org/wp-content/uploads/2019/01/p132.pdf

    Google Scholar 

  14. Sönmez, Ç., Özgür, A., Yörük, E.: Towards building a political protest database to explain changes in the welfare state. In: Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pp. 106–110. Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/W16-2113, http://www.aclweb.org/anthology/W16-2113

  15. Tarrow, S.: Power in Movement: Social Movements, Collective Action and Politics. Cambridge Studies in Comparative Politics, Cambridge University Press (1994). https://books.google.com.tr/books?id=hN5nQgAACAAJ

  16. Wang, W.: Event detection and extraction from news articles. Ph.D. thesis, Virginia Tech (2018)

    Google Scholar 

  17. Wang, W., Kennedy, R., Lazer, D., Ramakrishnan, N.: Growing pains for global monitoring of societal events. Science 353(6307), 1502–1503 (2016). https://doi.org/10.1126/science.aaf6758. http://science.sciencemag.org/content/353/6307/1502

    Article  Google Scholar 

  18. Weidmann, N.B., Rød, E.G.: The Internet and Political Protest in Autocracies, Chap. Coding Protest Events in Autocracies. Oxford University Press, Oxford (2019)

    Google Scholar 

  19. Yoruk, E.: The politics of the Turkish welfare system transformation in the neoliberal era: welfare as mobilization and containment. The Johns Hopkins University (2012)

    Google Scholar 

Download references

Acknowledgments

This work is funded by the European Research Council (ERC) Starting Grant 714868 awarded to Dr. Erdem Yörük for his project Emerging Welfare. (https://emw.ku.edu.tr, accessed January 19) We are grateful to our steering committee members for the CLEF 2019 lab Sophia Ananiadou, Antal van den Bosch, Kemal Oflazer, Arzucan Özgür, Aline Villavicencio, and Hristo Tanev. Finally, we thank to Theresa Gessler and Peter Makarov for their contribution in organizing the CLEF lab by reviewing the annotation manuals and sharing their work with us respectively.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Hürriyetoğlu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hürriyetoğlu, A. et al. (2019). A Task Set Proposal for Automatic Protest Information Collection Across Multiple Countries. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11438. Springer, Cham. https://doi.org/10.1007/978-3-030-15719-7_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15719-7_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15718-0

  • Online ISBN: 978-3-030-15719-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics