Using Games to Create Language Resources: Successes and Limitations of the Approach

Chamberlain, Jon; Fort, Karën; Kruschwitz, Udo; Lafourcade, Mathieu; Poesio, Massimo

doi:10.1007/978-3-642-35085-6_1

Jon Chamberlain³,
Karën Fort⁴,
Udo Kruschwitz³,
Mathieu Lafourcade⁵ &
…
Massimo Poesio³

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

1623 Accesses
11 Citations
1 Altmetric

Abstract

One of the more novel approaches to collaboratively creating language resources in recent years is to use online games to collect and validate data. The most significant challenges collaborative systems face are how to train users with the necessary expertise and how to encourage participation on a scale required to produce high quality data comparable with data produced by “traditional” experts. In this chapter we provide a brief overview of collaborative creation and the different approaches that have been used to create language resources, before analysing games used for this purpose. We discuss some key issues in using a gaming approach, including task design, player motivation and data quality, and compare the costs of each approach in terms of development, distribution and ongoing administration. In conclusion, we summarise the benefits and limitations of using a gaming approach to resource creation and suggest key considerations for evaluating its utility in different research scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.wikipedia.org
2.
http://www.gwap.com/gwap
3.
http://www.phrasedetectives.com
4.
http://www.jeuxdemots.org
5.
http://scripts.mit.edu/~cci/HCI
6.
http://www.galaxyzoo.org
7.
http://www.google.com/recaptcha
8.
http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2011T03
9.
http://www.coli.uni-saarland.de/projects/salsa
10.
http://www.mturk.com
11.
http://samasource.org
12.
http://en.wikipedia.org/wiki/Wikipedia:Featured_articles
13.
http://en.wikipedia.org/wiki/Wikipedia:Unusual_articles
14.
http://www.gutenberg.org
15.
http://www.facebook.com
16.
http://developers.facebook.com/docs/reference/php
17.
http://www.usability.gov/guidelines
18.
https://www.zooniverse.org
19.
http://www.infosolutionsgroup.com/2010_PopCap_Social_Gaming_Research_Results.pdf
20.
http://www.lightspeedresearch.com/press-releases/it’s-game-on-for-facebook-users
21.
It is possible for an interpretation to have more annotations and validations than required if a player enters an existing interpretation after disagreeing or if several players are working on the same markables simultaneously.
22.
http://en.wikipedia.org/wiki/Wikipedia:Wikipedians
23.
http://groups.csail.mit.edu/uid/deneme/?p=502
24.
http://en.wikipedia.org/wiki/Iron_Man
25.
http://en.wikipedia.org/wiki/Welsh_poetry
26.
http://www.jeuxdemots.org/AKI.php
27.
http://www.payscale.com/research/UK/Job=Research_Scientist/Salary
28.
This figure was obtained by informally asking several experienced researchers involved in funding applications for annotation projects.
29.
http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2003T11
30.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2009T24
31.
From personal communication with K. Cohen.
32.
http://quaero.org/

References

Aker A, El-haj M, Albakour D, Kruschwitz U (2012) Assessing crowdsourcing quality through objective tasks. In: Proceedings of LREC’12, Istanbul
Google Scholar
Alonso O, Mizzaro S (2009) Can we get rid of TREC assessors? Using mechanical turk for relevance assessment. In: Proceedings of SIGIR ’09: workshop on the future of IR evaluation, Boston
Google Scholar
Alonso O, Rose DE, Stewart B (2008) Crowdsourcing for relevance evaluation. SIGIR Forum 42(2):9–15
Article Google Scholar
Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics. Comput Linguist 34(4):555–596
Article Google Scholar
Bernstein MS, Karger DR, Miller RC, Brandt J (2012) Analytic methods for optimizing realtime crowdsourcing. In: Proceedings of the collective intelligence 2012, Boston
Google Scholar
Bhardwaj V, Passonneau R, Salleb-Aouissi A, Ide N (2010) Anveshan: a tool for analysis of multiple annotators’ labeling behavior. In: Proceedings of the 4th linguistic annotation workshop (LAW IV), Uppsala
Google Scholar
Bigham JP, Jayant C, Ji H, Little G, Miller A, Miller RC, Miller R, Tatarowicz A, White B, White S, Yeh T (2010) Vizwiz: nearly real-time answers to visual questions. In: Proceedings of the 23nd annual ACM symposium on user interface software and technology, UIST ’10, New York
Google Scholar
Bonneau-Maynard H, Rosset S, Ayache C, Kuhn A, Mostefa D (2005) Semantic annotation of the French media dialog corpus. In: Proceedings of InterSpeech, Lisbon
Google Scholar
Callison-Burch C (2009) Fast, cheap, and creative: evaluating translation quality using Amazon’s Mechanical Turk. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore
Google Scholar
Callison-Burch C, Dredze M (2010) Creating speech and language data with Amazon’s Mechanical Turk. In: CSLDAMT ’10: proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, Los Angeles
Google Scholar
Carletta J (1996) Assessing agreement on classification tasks: the kappa statistic. Comput Linguist 22:249–254
Google Scholar
Chamberlain J, Poesio M, Kruschwitz U (2008) Phrase detectives: a web-based collaborative annotation game. In: Proceedings of the international conference on semantic systems (I-Semantics’08), Graz, Austria
Google Scholar
Chamberlain J, Kruschwitz U, Poesio M (2009) Constructing an anaphorically annotated corpus with non-experts: assessing the quality of collaborative annotations. In: Proceedings of the 2009 workshop on the people’s web meets NLP: collaboratively constructed semantic resources, Singapore
Google Scholar
Chamberlain J, Poesio M, Kruschwitz U (2009) A new life for a dead parrot: incentive structures in the phrase detectives game. In: Proceedings of the WWW 2009 workshop on web incentives (WEBCENTIVES’09)
Google Scholar
Chamberlain J, Kruschwitz U, Poesio M (2012) Motivations for participation in socially networked collective intelligence systems. In: Proceedings of CI2012. MIT, Cambridge
Google Scholar
Chklovski T (2005) Collecting paraphrase corpora from volunteer contributors. In: Proceedings of K-CAP ’05, Banff
Google Scholar
Chklovski T, Gil Y (2005) Improving the design of intelligent acquisition interfaces for collecting world knowledge from web contributors. In: Proceedings of K-CAP ’05, Banff
Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Article Google Scholar
Csikszentmihalyi M (1990) Flow : the psychology of optimal experience. Harper and Row, New York
Google Scholar
Dandapat S, Biswas P, Choudhury M, Bali K (2009) Complex linguistic annotation – No easy way out! a case from Bangla and Hindi POS labeling tasks. In: Proceedings of the 3rd ACL linguistic annotation workshop, Singapore
Google Scholar
Fellbaum C (1998) WordNet: an electronic lexical database. MIT, Cambridge
Book Google Scholar
Feng D, Besana S, Zajac R (2009) Acquiring high quality non-expert knowledge from on-demand workforce. In: Proceedings of the 2009 workshop on the people’s web meets NLP: collaboratively constructed semantic resources, Singapore
Google Scholar
Fenouillet F, Kaplan J, Yennek N (2009) Serious games et motivation. In: George S, Sanchez E (eds) 4ème Conférence francophone sur les Environnements Informatiques pour l’Apprentissage Humain (EIAH’09), vol. Actes de l’Atelier “Jeux Sérieux: conception et usages”, p. 41–52. Le Mans
Google Scholar
Fort K, Sagot B (2010) Influence of pre-annotation on POS-tagged corpus development. In: Proceedings of the 4th ACL linguistic annotation workshop (LAW), Uppsala
Google Scholar
Fort K, Adda G, Cohen KB (2011) Amazon Mechanical Turk: gold mine or coal mine? Comput Linguist (editorial) 37:413–420
Article Google Scholar
Gillick D, Liu Y (2010) Non-expert evaluation of summarization systems is risky. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, Los Angeles
Google Scholar
Glott R, Schmidt P, Ghosh R (2010) Wikipedia survey – overview of results. UNU-MERIT, Maastricht, pp 1–11
Google Scholar
Green N, Breimyer P, Kumar V, Samatova NF (2010) Packplay: mining semantic data in collaborative games. In: Proceedings of the 4th linguistic annotation workshop, Uppsala
Google Scholar
Hladká B, Mírovskỳ J, Schlesinger P (2009) Play the language: play coreference. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, Singapore
Google Scholar
Hong J, Baker CF (2011) How good is the crowd at “real” WSD? In: Proceedings of the 5th linguistic annotation workshop, Portland
Google Scholar
Howe J (2008) Crowdsourcing: why the power of the crowd is driving the future of business. Crown Publishing Group, New York
Google Scholar
Ipeirotis P (2010) Analyzing the Amazon Mechanical Turk marketplace. CeDER working papers. http://hdl.handle.net/2451/29801
Ipeirotis P (2010) Demographics of Mechanical Turk. CeDER working papers. http://hdl.handle.net/2451/29585
Johnson NL, Rasmussen S, Joslyn C, Rocha L, Smith S, Kantor M (1998) Symbiotic intelligence: self-organizing knowledge on distributed networks driven by human interaction. In: Proceedings of the 6th international conference on artificial life. MIT, Cambridge
Google Scholar
Joubert A, Lafourcade M (2008) Jeuxdemots : Un prototype ludique pour l’émergence de relations entre termes. In: Proceedings of JADT’2008, Ecole normale supérieure Lettres et sciences humaines, Lyon
Google Scholar
Jurafsky D, Martin JH (2008) Speech and language processing, 2nd edn. Prentice-Hall, Upper Saddle River
Google Scholar
Kanefsky B, Barlow N, Gulick V (2001) Can distributed volunteers accomplish massive data analysis tasks? In: Lunar and planetary science XXXII, Houston
Google Scholar
Kazai G (2011) In search of quality in crowdsourcing for search engine evaluation. In: Proceedings of the 33rd European conference on information retrieval (ECIR’11), Dublin
Google Scholar
Kazai G, Milic-Frayling N, Costello J (2009) Towards methods for the collective gathering and quality control of relevance assessments. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, Boston
Google Scholar
Koller A, Striegnitz K, Gargett A, Byron D, Cassell J, Dale R, Moore J, Oberlander J (2010) Report on the 2nd NLG challenge on generating instructions in virtual environments (GIVE-2). In: Proceedings of the 6th INLG, Dublin
Google Scholar
Koster R (2005) A theory of fun for game design. Paraglyph, Scottsdale
Google Scholar
Lafourcade M (2007) Making people play for lexical acquisition. In: Proceedings SNLP 2007, 7th symposium on natural language processing, Pattaya
Google Scholar
Lafourcade M, Joubert A (2012) A new dynamic approach for lexical networks evaluation. In: Proceedings of LREC’12: 8th international conference on language resources and evaluation, Istanbul
Google Scholar
Laniado D, Castillo C, Kaltenbrunner A, Fuster-Morell M (2012) Emotions and dialogue in a peer-production community: the case of Wikipedia. In: Proceedings of the 8th international symposium on Wikis and open collaboration (WikiSym’12), Linz
Google Scholar
Lieberman H, A SD, Teeters A (2007) Common consensus: a web-based game for collecting commonsense goals. In: Proceedings of IUI, Honolulu
Google Scholar
Malone T, Laubacher R, Dellarocas C (2009) Harnessing crowds: mapping the genome of collective intelligence. Research paper no. 4732-09, Sloan School of Management, MIT, Cambridge
Google Scholar
Marchetti A, Tesconi M, Ronzano F, Rosella M, Minutol S (2007) SemKey: a semantic collaborative tagging system. In: Proceedings of WWW 2007 workshop on tagging and metadata for social information organization, Banff
Google Scholar
Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English : the Penn Treebank. Comput Linguist 19(2):313–330
Google Scholar
Marge M, Banerjee S, Rudnicky AI (2010) Using the Amazon Mechanical Turk for transcription of spoken language. In: IEEE international conference on acoustics speech and signal processing (ICASSP), Dallas
Google Scholar
Mason W, Watts DJ (2009) Financial incentives and the “performance of crowds”. In: Proceedings of the ACM SIGKDD workshop on human computation, Paris
Book Google Scholar
Michael DR, Chen SL (2005) Serious games: games that educate, train, and inform. Muska & Lipman/Premier-Trade
Google Scholar
Mihalcea R, Chklovski T (2003) Open mind word expert: creating large annotated data collections with web users help. In: Proceedings of the EACL 2003 workshop on linguistically annotated corpora (LINC 2003), Budapest
Google Scholar
Mrozinski J, Whittaker E, Furui S (2008) Collecting a why-question corpus for development and evaluation of an automatic QA-system. In: Proceedings of ACL-08: HLT, Columbus
Google Scholar
Nov O (2007) What motivates Wikipedians? Commun ACM 50(11):60–64
Article Google Scholar
Novotney S, Callison-Burch C (2010) Cheap, fast and good enough: automatic speech recognition with non-expert transcription. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, Los Angeles
Google Scholar
Poesio M (2004) Discourse annotation and semantic annotation in the GNOME corpus. In: Proceedings of the ACL workshop on discourse annotation, Barcelona
Book Google Scholar
Poesio M, Artstein R (2008) Anaphoric annotation in the ARRAU corpus. In: LREC’08, Marrakech
Google Scholar
Poesio M, Vieira R (1998) A corpus-based investigation of definite description use. Comput Linguist 24(2):183–216
Google Scholar
Poesio M, Sturt P, Arstein R, Filik R (2006) Underspecification and anaphora: theoretical issues and preliminary evidence. Discourse Process 42(2):157–175
Article Google Scholar
Poesio M, Chamberlain J, Kruschwitz U, Robaldo L, Ducceschi L (2012) The phrase detective multilingual corpus, release 0.1. In: Proceedings of LREC’12 workshop on collaborative resource development and delivery, Istanbul
Google Scholar
Poesio M, Chamberlain J, Kruschwitz U, Robaldo L, Ducceschi L (2012 forthcoming) Phrase detectives: utilizing collective intelligence for internet-scale language resource creation. ACM Trans Interact Intell Syst
Google Scholar
Quinn A, Bederson B (2011) Human computation: a survey and taxonomy of a growing field. In: CHI, Vancouver
Book Google Scholar
Rafelsberger W, Scharl A (2009) Games with a purpose for social networking platforms. In: Proceedings of the 20th ACM conference on hypertext and hypermedia, Torino
Google Scholar
Ross J, Irani L, Silberman MS, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers?: shifting demographics in Mechanical Turk. In: Proceedings of CHI EA ’10, Atlanta
Google Scholar
Siorpaes K, Hepp M (2008) Games with a purpose for the semantic web. IEEE Intell Syst 23(3):50–60
Article Google Scholar
Smadja F (2009) Mixing financial, social and fun incentives for social voting. Proceedings of the WWW 2009 workshop on web incentives (WEBCENTIVES’09), Madrid
Google Scholar
Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: EMNLP’08: proceedings of the conference on empirical methods in natural language processing, Honolulu
Google Scholar
Surowiecki J (2005) The wisdom of crowds. Anchor, New York
Google Scholar
Sweetser P, Wyeth P (2005) Gameflow: a model for evaluating player enjoyment in games. Comput Entertain 3(3):1–24
Article Google Scholar
Thaler S, Siorpaes K, Simperl E, Hofer C (2011) A survey on games for knowledge acquisition. Technical Report STI TR 2011-05-01, Semantic Technology Institute
Google Scholar
Tratz S, Hovy E (2010) A taxonomy, dataset, and classifier for automatic noun compound interpretation. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala
Google Scholar
von Ahn L (2006) Games with a purpose. Computer 39(6):92–94
Article Google Scholar
von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Vienna
Book Google Scholar
von Ahn L, Dabbish L (2008) Designing games with a purpose. Commun ACM 51(8):58–67
Article Google Scholar
von Ahn L, Liu R, Blum M (2006) Peekaboom: a game for locating objects in images. In: Proceedings of CHI ’06, Montréal
Google Scholar
Wais P, Lingamneni S, Cook D, Fennell J, Goldenberg B, Lubarov D, Marin D, Simons H (2010) Towards building a high-quality workforce with Mechanical Turk. In: Proceedings of computational social science and the wisdom of crowds (NIPS)
Google Scholar
Wang A, Hoang CDV, Kan MY (2010) Perspectives on crowdsourcing annotations for natural language processing. Lang Res Eval 1–23
Google Scholar
Woolley AW, Chabris CF, Pentland A, Hashmi N, Malone TW (2010) Evidence for a collective intelligence factor in the performance of human groups. Science 330:686–688. URL http://dx.doi.org/10.1126/science.1193147
Yang H, Lai C (2010) Motivations of Wikipedia content contributors. Comput Human Behav 26(6):1377–1383
Article Google Scholar
Yuen M, Chen L, King I (2009) A survey of human computation systems playing having fun. In: International conference on computational science and engineering, Vanoucer (CSE’09), vol 4, pp 723–728. IEEE
Google Scholar

Download references

Acknowledgements

We would like to thank Jean Heutte (CREF-CNRS) for his help with the concepts of game flow and for the comments of the reviewers of this chapter. The contribution of Karën Fort to this work was realized as part of the Quæro Programme^{Footnote 32}, funded by OSEO, French State agency for innovation. The original Phrase Detectives game was funded as part of the EPSRC AnaWiki project, EP/F00575X/1.

Author information

Authors and Affiliations

University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, England
Jon Chamberlain, Udo Kruschwitz & Massimo Poesio
INIST-CNRS/LIPN, 2, allée de Brabois, 54500, Vandoeuvre-lès-Nancy, France
Karën Fort
LIRMM, UMR 5506 – CC 477, 161 rue Ada, 34392, Montpellier Cedex 5, France
Mathieu Lafourcade

Authors

Jon Chamberlain
View author publications
You can also search for this author in PubMed Google Scholar
Karën Fort
View author publications
You can also search for this author in PubMed Google Scholar
Udo Kruschwitz
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Lafourcade
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Poesio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jon Chamberlain .

Editor information

Editors and Affiliations

Department of Computer Science Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt, Darmstadt, Germany
Iryna Gurevych & Jungi Kim &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chamberlain, J., Fort, K., Kruschwitz, U., Lafourcade, M., Poesio, M. (2013). Using Games to Create Language Resources: Successes and Limitations of the Approach. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-35085-6_1
Published: 21 February 2013
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35084-9
Online ISBN: 978-3-642-35085-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics