Abstract
One of the more novel approaches to collaboratively creating language resources in recent years is to use online games to collect and validate data. The most significant challenges collaborative systems face are how to train users with the necessary expertise and how to encourage participation on a scale required to produce high quality data comparable with data produced by “traditional” experts. In this chapter we provide a brief overview of collaborative creation and the different approaches that have been used to create language resources, before analysing games used for this purpose. We discuss some key issues in using a gaming approach, including task design, player motivation and data quality, and compare the costs of each approach in terms of development, distribution and ongoing administration. In conclusion, we summarise the benefits and limitations of using a gaming approach to resource creation and suggest key considerations for evaluating its utility in different research scenarios.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
It is possible for an interpretation to have more annotations and validations than required if a player enters an existing interpretation after disagreeing or if several players are working on the same markables simultaneously.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
This figure was obtained by informally asking several experienced researchers involved in funding applications for annotation projects.
- 29.
- 30.
- 31.
From personal communication with K. Cohen.
- 32.
References
Aker A, El-haj M, Albakour D, Kruschwitz U (2012) Assessing crowdsourcing quality through objective tasks. In: Proceedings of LREC’12, Istanbul
Alonso O, Mizzaro S (2009) Can we get rid of TREC assessors? Using mechanical turk for relevance assessment. In: Proceedings of SIGIR ’09: workshop on the future of IR evaluation, Boston
Alonso O, Rose DE, Stewart B (2008) Crowdsourcing for relevance evaluation. SIGIR Forum 42(2):9–15
Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics. Comput Linguist 34(4):555–596
Bernstein MS, Karger DR, Miller RC, Brandt J (2012) Analytic methods for optimizing realtime crowdsourcing. In: Proceedings of the collective intelligence 2012, Boston
Bhardwaj V, Passonneau R, Salleb-Aouissi A, Ide N (2010) Anveshan: a tool for analysis of multiple annotators’ labeling behavior. In: Proceedings of the 4th linguistic annotation workshop (LAW IV), Uppsala
Bigham JP, Jayant C, Ji H, Little G, Miller A, Miller RC, Miller R, Tatarowicz A, White B, White S, Yeh T (2010) Vizwiz: nearly real-time answers to visual questions. In: Proceedings of the 23nd annual ACM symposium on user interface software and technology, UIST ’10, New York
Bonneau-Maynard H, Rosset S, Ayache C, Kuhn A, Mostefa D (2005) Semantic annotation of the French media dialog corpus. In: Proceedings of InterSpeech, Lisbon
Callison-Burch C (2009) Fast, cheap, and creative: evaluating translation quality using Amazon’s Mechanical Turk. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore
Callison-Burch C, Dredze M (2010) Creating speech and language data with Amazon’s Mechanical Turk. In: CSLDAMT ’10: proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, Los Angeles
Carletta J (1996) Assessing agreement on classification tasks: the kappa statistic. Comput Linguist 22:249–254
Chamberlain J, Poesio M, Kruschwitz U (2008) Phrase detectives: a web-based collaborative annotation game. In: Proceedings of the international conference on semantic systems (I-Semantics’08), Graz, Austria
Chamberlain J, Kruschwitz U, Poesio M (2009) Constructing an anaphorically annotated corpus with non-experts: assessing the quality of collaborative annotations. In: Proceedings of the 2009 workshop on the people’s web meets NLP: collaboratively constructed semantic resources, Singapore
Chamberlain J, Poesio M, Kruschwitz U (2009) A new life for a dead parrot: incentive structures in the phrase detectives game. In: Proceedings of the WWW 2009 workshop on web incentives (WEBCENTIVES’09)
Chamberlain J, Kruschwitz U, Poesio M (2012) Motivations for participation in socially networked collective intelligence systems. In: Proceedings of CI2012. MIT, Cambridge
Chklovski T (2005) Collecting paraphrase corpora from volunteer contributors. In: Proceedings of K-CAP ’05, Banff
Chklovski T, Gil Y (2005) Improving the design of intelligent acquisition interfaces for collecting world knowledge from web contributors. In: Proceedings of K-CAP ’05, Banff
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Csikszentmihalyi M (1990) Flow : the psychology of optimal experience. Harper and Row, New York
Dandapat S, Biswas P, Choudhury M, Bali K (2009) Complex linguistic annotation – No easy way out! a case from Bangla and Hindi POS labeling tasks. In: Proceedings of the 3rd ACL linguistic annotation workshop, Singapore
Fellbaum C (1998) WordNet: an electronic lexical database. MIT, Cambridge
Feng D, Besana S, Zajac R (2009) Acquiring high quality non-expert knowledge from on-demand workforce. In: Proceedings of the 2009 workshop on the people’s web meets NLP: collaboratively constructed semantic resources, Singapore
Fenouillet F, Kaplan J, Yennek N (2009) Serious games et motivation. In: George S, Sanchez E (eds) 4ème Conférence francophone sur les Environnements Informatiques pour l’Apprentissage Humain (EIAH’09), vol. Actes de l’Atelier “Jeux Sérieux: conception et usages”, p. 41–52. Le Mans
Fort K, Sagot B (2010) Influence of pre-annotation on POS-tagged corpus development. In: Proceedings of the 4th ACL linguistic annotation workshop (LAW), Uppsala
Fort K, Adda G, Cohen KB (2011) Amazon Mechanical Turk: gold mine or coal mine? Comput Linguist (editorial) 37:413–420
Gillick D, Liu Y (2010) Non-expert evaluation of summarization systems is risky. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, Los Angeles
Glott R, Schmidt P, Ghosh R (2010) Wikipedia survey – overview of results. UNU-MERIT, Maastricht, pp 1–11
Green N, Breimyer P, Kumar V, Samatova NF (2010) Packplay: mining semantic data in collaborative games. In: Proceedings of the 4th linguistic annotation workshop, Uppsala
Hladká B, Mírovskỳ J, Schlesinger P (2009) Play the language: play coreference. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, Singapore
Hong J, Baker CF (2011) How good is the crowd at “real” WSD? In: Proceedings of the 5th linguistic annotation workshop, Portland
Howe J (2008) Crowdsourcing: why the power of the crowd is driving the future of business. Crown Publishing Group, New York
Ipeirotis P (2010) Analyzing the Amazon Mechanical Turk marketplace. CeDER working papers. http://hdl.handle.net/2451/29801
Ipeirotis P (2010) Demographics of Mechanical Turk. CeDER working papers. http://hdl.handle.net/2451/29585
Johnson NL, Rasmussen S, Joslyn C, Rocha L, Smith S, Kantor M (1998) Symbiotic intelligence: self-organizing knowledge on distributed networks driven by human interaction. In: Proceedings of the 6th international conference on artificial life. MIT, Cambridge
Joubert A, Lafourcade M (2008) Jeuxdemots : Un prototype ludique pour l’émergence de relations entre termes. In: Proceedings of JADT’2008, Ecole normale supérieure Lettres et sciences humaines, Lyon
Jurafsky D, Martin JH (2008) Speech and language processing, 2nd edn. Prentice-Hall, Upper Saddle River
Kanefsky B, Barlow N, Gulick V (2001) Can distributed volunteers accomplish massive data analysis tasks? In: Lunar and planetary science XXXII, Houston
Kazai G (2011) In search of quality in crowdsourcing for search engine evaluation. In: Proceedings of the 33rd European conference on information retrieval (ECIR’11), Dublin
Kazai G, Milic-Frayling N, Costello J (2009) Towards methods for the collective gathering and quality control of relevance assessments. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, Boston
Koller A, Striegnitz K, Gargett A, Byron D, Cassell J, Dale R, Moore J, Oberlander J (2010) Report on the 2nd NLG challenge on generating instructions in virtual environments (GIVE-2). In: Proceedings of the 6th INLG, Dublin
Koster R (2005) A theory of fun for game design. Paraglyph, Scottsdale
Lafourcade M (2007) Making people play for lexical acquisition. In: Proceedings SNLP 2007, 7th symposium on natural language processing, Pattaya
Lafourcade M, Joubert A (2012) A new dynamic approach for lexical networks evaluation. In: Proceedings of LREC’12: 8th international conference on language resources and evaluation, Istanbul
Laniado D, Castillo C, Kaltenbrunner A, Fuster-Morell M (2012) Emotions and dialogue in a peer-production community: the case of Wikipedia. In: Proceedings of the 8th international symposium on Wikis and open collaboration (WikiSym’12), Linz
Lieberman H, A SD, Teeters A (2007) Common consensus: a web-based game for collecting commonsense goals. In: Proceedings of IUI, Honolulu
Malone T, Laubacher R, Dellarocas C (2009) Harnessing crowds: mapping the genome of collective intelligence. Research paper no. 4732-09, Sloan School of Management, MIT, Cambridge
Marchetti A, Tesconi M, Ronzano F, Rosella M, Minutol S (2007) SemKey: a semantic collaborative tagging system. In: Proceedings of WWW 2007 workshop on tagging and metadata for social information organization, Banff
Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English : the Penn Treebank. Comput Linguist 19(2):313–330
Marge M, Banerjee S, Rudnicky AI (2010) Using the Amazon Mechanical Turk for transcription of spoken language. In: IEEE international conference on acoustics speech and signal processing (ICASSP), Dallas
Mason W, Watts DJ (2009) Financial incentives and the “performance of crowds”. In: Proceedings of the ACM SIGKDD workshop on human computation, Paris
Michael DR, Chen SL (2005) Serious games: games that educate, train, and inform. Muska & Lipman/Premier-Trade
Mihalcea R, Chklovski T (2003) Open mind word expert: creating large annotated data collections with web users help. In: Proceedings of the EACL 2003 workshop on linguistically annotated corpora (LINC 2003), Budapest
Mrozinski J, Whittaker E, Furui S (2008) Collecting a why-question corpus for development and evaluation of an automatic QA-system. In: Proceedings of ACL-08: HLT, Columbus
Nov O (2007) What motivates Wikipedians? Commun ACM 50(11):60–64
Novotney S, Callison-Burch C (2010) Cheap, fast and good enough: automatic speech recognition with non-expert transcription. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, Los Angeles
Poesio M (2004) Discourse annotation and semantic annotation in the GNOME corpus. In: Proceedings of the ACL workshop on discourse annotation, Barcelona
Poesio M, Artstein R (2008) Anaphoric annotation in the ARRAU corpus. In: LREC’08, Marrakech
Poesio M, Vieira R (1998) A corpus-based investigation of definite description use. Comput Linguist 24(2):183–216
Poesio M, Sturt P, Arstein R, Filik R (2006) Underspecification and anaphora: theoretical issues and preliminary evidence. Discourse Process 42(2):157–175
Poesio M, Chamberlain J, Kruschwitz U, Robaldo L, Ducceschi L (2012) The phrase detective multilingual corpus, release 0.1. In: Proceedings of LREC’12 workshop on collaborative resource development and delivery, Istanbul
Poesio M, Chamberlain J, Kruschwitz U, Robaldo L, Ducceschi L (2012 forthcoming) Phrase detectives: utilizing collective intelligence for internet-scale language resource creation. ACM Trans Interact Intell Syst
Quinn A, Bederson B (2011) Human computation: a survey and taxonomy of a growing field. In: CHI, Vancouver
Rafelsberger W, Scharl A (2009) Games with a purpose for social networking platforms. In: Proceedings of the 20th ACM conference on hypertext and hypermedia, Torino
Ross J, Irani L, Silberman MS, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers?: shifting demographics in Mechanical Turk. In: Proceedings of CHI EA ’10, Atlanta
Siorpaes K, Hepp M (2008) Games with a purpose for the semantic web. IEEE Intell Syst 23(3):50–60
Smadja F (2009) Mixing financial, social and fun incentives for social voting. Proceedings of the WWW 2009 workshop on web incentives (WEBCENTIVES’09), Madrid
Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: EMNLP’08: proceedings of the conference on empirical methods in natural language processing, Honolulu
Surowiecki J (2005) The wisdom of crowds. Anchor, New York
Sweetser P, Wyeth P (2005) Gameflow: a model for evaluating player enjoyment in games. Comput Entertain 3(3):1–24
Thaler S, Siorpaes K, Simperl E, Hofer C (2011) A survey on games for knowledge acquisition. Technical Report STI TR 2011-05-01, Semantic Technology Institute
Tratz S, Hovy E (2010) A taxonomy, dataset, and classifier for automatic noun compound interpretation. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala
von Ahn L (2006) Games with a purpose. Computer 39(6):92–94
von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Vienna
von Ahn L, Dabbish L (2008) Designing games with a purpose. Commun ACM 51(8):58–67
von Ahn L, Liu R, Blum M (2006) Peekaboom: a game for locating objects in images. In: Proceedings of CHI ’06, Montréal
Wais P, Lingamneni S, Cook D, Fennell J, Goldenberg B, Lubarov D, Marin D, Simons H (2010) Towards building a high-quality workforce with Mechanical Turk. In: Proceedings of computational social science and the wisdom of crowds (NIPS)
Wang A, Hoang CDV, Kan MY (2010) Perspectives on crowdsourcing annotations for natural language processing. Lang Res Eval 1–23
Woolley AW, Chabris CF, Pentland A, Hashmi N, Malone TW (2010) Evidence for a collective intelligence factor in the performance of human groups. Science 330:686–688. URL http://dx.doi.org/10.1126/science.1193147
Yang H, Lai C (2010) Motivations of Wikipedia content contributors. Comput Human Behav 26(6):1377–1383
Yuen M, Chen L, King I (2009) A survey of human computation systems playing having fun. In: International conference on computational science and engineering, Vanoucer (CSE’09), vol 4, pp 723–728. IEEE
Acknowledgements
We would like to thank Jean Heutte (CREF-CNRS) for his help with the concepts of game flow and for the comments of the reviewers of this chapter. The contribution of Karën Fort to this work was realized as part of the Quæro ProgrammeFootnote 32, funded by OSEO, French State agency for innovation. The original Phrase Detectives game was funded as part of the EPSRC AnaWiki project, EP/F00575X/1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chamberlain, J., Fort, K., Kruschwitz, U., Lafourcade, M., Poesio, M. (2013). Using Games to Create Language Resources: Successes and Limitations of the Approach. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-35085-6_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35084-9
Online ISBN: 978-3-642-35085-6
eBook Packages: Computer ScienceComputer Science (R0)