Skip to main content

Using Games to Create Language Resources: Successes and Limitations of the Approach

  • Chapter
  • First Online:
The People’s Web Meets NLP

Abstract

One of the more novel approaches to collaboratively creating language resources in recent years is to use online games to collect and validate data. The most significant challenges collaborative systems face are how to train users with the necessary expertise and how to encourage participation on a scale required to produce high quality data comparable with data produced by “traditional” experts. In this chapter we provide a brief overview of collaborative creation and the different approaches that have been used to create language resources, before analysing games used for this purpose. We discuss some key issues in using a gaming approach, including task design, player motivation and data quality, and compare the costs of each approach in terms of development, distribution and ongoing administration. In conclusion, we summarise the benefits and limitations of using a gaming approach to resource creation and suggest key considerations for evaluating its utility in different research scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.wikipedia.org

  2. 2.

    http://www.gwap.com/gwap

  3. 3.

    http://www.phrasedetectives.com

  4. 4.

    http://www.jeuxdemots.org

  5. 5.

    http://scripts.mit.edu/~cci/HCI

  6. 6.

    http://www.galaxyzoo.org

  7. 7.

    http://www.google.com/recaptcha

  8. 8.

    http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2011T03

  9. 9.

    http://www.coli.uni-saarland.de/projects/salsa

  10. 10.

    http://www.mturk.com

  11. 11.

    http://samasource.org

  12. 12.

    http://en.wikipedia.org/wiki/Wikipedia:Featured_articles

  13. 13.

    http://en.wikipedia.org/wiki/Wikipedia:Unusual_articles

  14. 14.

    http://www.gutenberg.org

  15. 15.

    http://www.facebook.com

  16. 16.

    http://developers.facebook.com/docs/reference/php

  17. 17.

    http://www.usability.gov/guidelines

  18. 18.

    https://www.zooniverse.org

  19. 19.

    http://www.infosolutionsgroup.com/2010_PopCap_Social_Gaming_Research_Results.pdf

  20. 20.

    http://www.lightspeedresearch.com/press-releases/it’s-game-on-for-facebook-users

  21. 21.

    It is possible for an interpretation to have more annotations and validations than required if a player enters an existing interpretation after disagreeing or if several players are working on the same markables simultaneously.

  22. 22.

    http://en.wikipedia.org/wiki/Wikipedia:Wikipedians

  23. 23.

    http://groups.csail.mit.edu/uid/deneme/?p=502

  24. 24.

    http://en.wikipedia.org/wiki/Iron_Man

  25. 25.

    http://en.wikipedia.org/wiki/Welsh_poetry

  26. 26.

    http://www.jeuxdemots.org/AKI.php

  27. 27.

    http://www.payscale.com/research/UK/Job=Research_Scientist/Salary

  28. 28.

    This figure was obtained by informally asking several experienced researchers involved in funding applications for annotation projects.

  29. 29.

    http://www.ldc.upenn.edu/Catalog/catalogEntry.jsp?catalogId=LDC2003T11

  30. 30.

    http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2009T24

  31. 31.

    From personal communication with K. Cohen.

  32. 32.

    http://quaero.org/

References

  1. Aker A, El-haj M, Albakour D, Kruschwitz U (2012) Assessing crowdsourcing quality through objective tasks. In: Proceedings of LREC’12, Istanbul

    Google Scholar 

  2. Alonso O, Mizzaro S (2009) Can we get rid of TREC assessors? Using mechanical turk for relevance assessment. In: Proceedings of SIGIR ’09: workshop on the future of IR evaluation, Boston

    Google Scholar 

  3. Alonso O, Rose DE, Stewart B (2008) Crowdsourcing for relevance evaluation. SIGIR Forum 42(2):9–15

    Article  Google Scholar 

  4. Artstein R, Poesio M (2008) Inter-coder agreement for computational linguistics. Comput Linguist 34(4):555–596

    Article  Google Scholar 

  5. Bernstein MS, Karger DR, Miller RC, Brandt J (2012) Analytic methods for optimizing realtime crowdsourcing. In: Proceedings of the collective intelligence 2012, Boston

    Google Scholar 

  6. Bhardwaj V, Passonneau R, Salleb-Aouissi A, Ide N (2010) Anveshan: a tool for analysis of multiple annotators’ labeling behavior. In: Proceedings of the 4th linguistic annotation workshop (LAW IV), Uppsala

    Google Scholar 

  7. Bigham JP, Jayant C, Ji H, Little G, Miller A, Miller RC, Miller R, Tatarowicz A, White B, White S, Yeh T (2010) Vizwiz: nearly real-time answers to visual questions. In: Proceedings of the 23nd annual ACM symposium on user interface software and technology, UIST ’10, New York

    Google Scholar 

  8. Bonneau-Maynard H, Rosset S, Ayache C, Kuhn A, Mostefa D (2005) Semantic annotation of the French media dialog corpus. In: Proceedings of InterSpeech, Lisbon

    Google Scholar 

  9. Callison-Burch C (2009) Fast, cheap, and creative: evaluating translation quality using Amazon’s Mechanical Turk. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore

    Google Scholar 

  10. Callison-Burch C, Dredze M (2010) Creating speech and language data with Amazon’s Mechanical Turk. In: CSLDAMT ’10: proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, Los Angeles

    Google Scholar 

  11. Carletta J (1996) Assessing agreement on classification tasks: the kappa statistic. Comput Linguist 22:249–254

    Google Scholar 

  12. Chamberlain J, Poesio M, Kruschwitz U (2008) Phrase detectives: a web-based collaborative annotation game. In: Proceedings of the international conference on semantic systems (I-Semantics’08), Graz, Austria

    Google Scholar 

  13. Chamberlain J, Kruschwitz U, Poesio M (2009) Constructing an anaphorically annotated corpus with non-experts: assessing the quality of collaborative annotations. In: Proceedings of the 2009 workshop on the people’s web meets NLP: collaboratively constructed semantic resources, Singapore

    Google Scholar 

  14. Chamberlain J, Poesio M, Kruschwitz U (2009) A new life for a dead parrot: incentive structures in the phrase detectives game. In: Proceedings of the WWW 2009 workshop on web incentives (WEBCENTIVES’09)

    Google Scholar 

  15. Chamberlain J, Kruschwitz U, Poesio M (2012) Motivations for participation in socially networked collective intelligence systems. In: Proceedings of CI2012. MIT, Cambridge

    Google Scholar 

  16. Chklovski T (2005) Collecting paraphrase corpora from volunteer contributors. In: Proceedings of K-CAP ’05, Banff

    Google Scholar 

  17. Chklovski T, Gil Y (2005) Improving the design of intelligent acquisition interfaces for collecting world knowledge from web contributors. In: Proceedings of K-CAP ’05, Banff

    Google Scholar 

  18. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  19. Csikszentmihalyi M (1990) Flow : the psychology of optimal experience. Harper and Row, New York

    Google Scholar 

  20. Dandapat S, Biswas P, Choudhury M, Bali K (2009) Complex linguistic annotation – No easy way out! a case from Bangla and Hindi POS labeling tasks. In: Proceedings of the 3rd ACL linguistic annotation workshop, Singapore

    Google Scholar 

  21. Fellbaum C (1998) WordNet: an electronic lexical database. MIT, Cambridge

    Book  Google Scholar 

  22. Feng D, Besana S, Zajac R (2009) Acquiring high quality non-expert knowledge from on-demand workforce. In: Proceedings of the 2009 workshop on the people’s web meets NLP: collaboratively constructed semantic resources, Singapore

    Google Scholar 

  23. Fenouillet F, Kaplan J, Yennek N (2009) Serious games et motivation. In: George S, Sanchez E (eds) 4ème Conférence francophone sur les Environnements Informatiques pour l’Apprentissage Humain (EIAH’09), vol. Actes de l’Atelier “Jeux Sérieux: conception et usages”, p. 41–52. Le Mans

    Google Scholar 

  24. Fort K, Sagot B (2010) Influence of pre-annotation on POS-tagged corpus development. In: Proceedings of the 4th ACL linguistic annotation workshop (LAW), Uppsala

    Google Scholar 

  25. Fort K, Adda G, Cohen KB (2011) Amazon Mechanical Turk: gold mine or coal mine? Comput Linguist (editorial) 37:413–420

    Article  Google Scholar 

  26. Gillick D, Liu Y (2010) Non-expert evaluation of summarization systems is risky. In: Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical Turk, Los Angeles

    Google Scholar 

  27. Glott R, Schmidt P, Ghosh R (2010) Wikipedia survey – overview of results. UNU-MERIT, Maastricht, pp 1–11

    Google Scholar 

  28. Green N, Breimyer P, Kumar V, Samatova NF (2010) Packplay: mining semantic data in collaborative games. In: Proceedings of the 4th linguistic annotation workshop, Uppsala

    Google Scholar 

  29. Hladká B, Mírovskỳ J, Schlesinger P (2009) Play the language: play coreference. In: Proceedings of the ACL-IJCNLP 2009 conference short papers, Singapore

    Google Scholar 

  30. Hong J, Baker CF (2011) How good is the crowd at “real” WSD? In: Proceedings of the 5th linguistic annotation workshop, Portland

    Google Scholar 

  31. Howe J (2008) Crowdsourcing: why the power of the crowd is driving the future of business. Crown Publishing Group, New York

    Google Scholar 

  32. Ipeirotis P (2010) Analyzing the Amazon Mechanical Turk marketplace. CeDER working papers. http://hdl.handle.net/2451/29801

  33. Ipeirotis P (2010) Demographics of Mechanical Turk. CeDER working papers. http://hdl.handle.net/2451/29585

  34. Johnson NL, Rasmussen S, Joslyn C, Rocha L, Smith S, Kantor M (1998) Symbiotic intelligence: self-organizing knowledge on distributed networks driven by human interaction. In: Proceedings of the 6th international conference on artificial life. MIT, Cambridge

    Google Scholar 

  35. Joubert A, Lafourcade M (2008) Jeuxdemots : Un prototype ludique pour l’émergence de relations entre termes. In: Proceedings of JADT’2008, Ecole normale supérieure Lettres et sciences humaines, Lyon

    Google Scholar 

  36. Jurafsky D, Martin JH (2008) Speech and language processing, 2nd edn. Prentice-Hall, Upper Saddle River

    Google Scholar 

  37. Kanefsky B, Barlow N, Gulick V (2001) Can distributed volunteers accomplish massive data analysis tasks? In: Lunar and planetary science XXXII, Houston

    Google Scholar 

  38. Kazai G (2011) In search of quality in crowdsourcing for search engine evaluation. In: Proceedings of the 33rd European conference on information retrieval (ECIR’11), Dublin

    Google Scholar 

  39. Kazai G, Milic-Frayling N, Costello J (2009) Towards methods for the collective gathering and quality control of relevance assessments. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, Boston

    Google Scholar 

  40. Koller A, Striegnitz K, Gargett A, Byron D, Cassell J, Dale R, Moore J, Oberlander J (2010) Report on the 2nd NLG challenge on generating instructions in virtual environments (GIVE-2). In: Proceedings of the 6th INLG, Dublin

    Google Scholar 

  41. Koster R (2005) A theory of fun for game design. Paraglyph, Scottsdale

    Google Scholar 

  42. Lafourcade M (2007) Making people play for lexical acquisition. In: Proceedings SNLP 2007, 7th symposium on natural language processing, Pattaya

    Google Scholar 

  43. Lafourcade M, Joubert A (2012) A new dynamic approach for lexical networks evaluation. In: Proceedings of LREC’12: 8th international conference on language resources and evaluation, Istanbul

    Google Scholar 

  44. Laniado D, Castillo C, Kaltenbrunner A, Fuster-Morell M (2012) Emotions and dialogue in a peer-production community: the case of Wikipedia. In: Proceedings of the 8th international symposium on Wikis and open collaboration (WikiSym’12), Linz

    Google Scholar 

  45. Lieberman H, A SD, Teeters A (2007) Common consensus: a web-based game for collecting commonsense goals. In: Proceedings of IUI, Honolulu

    Google Scholar 

  46. Malone T, Laubacher R, Dellarocas C (2009) Harnessing crowds: mapping the genome of collective intelligence. Research paper no. 4732-09, Sloan School of Management, MIT, Cambridge

    Google Scholar 

  47. Marchetti A, Tesconi M, Ronzano F, Rosella M, Minutol S (2007) SemKey: a semantic collaborative tagging system. In: Proceedings of WWW 2007 workshop on tagging and metadata for social information organization, Banff

    Google Scholar 

  48. Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English : the Penn Treebank. Comput Linguist 19(2):313–330

    Google Scholar 

  49. Marge M, Banerjee S, Rudnicky AI (2010) Using the Amazon Mechanical Turk for transcription of spoken language. In: IEEE international conference on acoustics speech and signal processing (ICASSP), Dallas

    Google Scholar 

  50. Mason W, Watts DJ (2009) Financial incentives and the “performance of crowds”. In: Proceedings of the ACM SIGKDD workshop on human computation, Paris

    Book  Google Scholar 

  51. Michael DR, Chen SL (2005) Serious games: games that educate, train, and inform. Muska & Lipman/Premier-Trade

    Google Scholar 

  52. Mihalcea R, Chklovski T (2003) Open mind word expert: creating large annotated data collections with web users help. In: Proceedings of the EACL 2003 workshop on linguistically annotated corpora (LINC 2003), Budapest

    Google Scholar 

  53. Mrozinski J, Whittaker E, Furui S (2008) Collecting a why-question corpus for development and evaluation of an automatic QA-system. In: Proceedings of ACL-08: HLT, Columbus

    Google Scholar 

  54. Nov O (2007) What motivates Wikipedians? Commun ACM 50(11):60–64

    Article  Google Scholar 

  55. Novotney S, Callison-Burch C (2010) Cheap, fast and good enough: automatic speech recognition with non-expert transcription. In: Human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, Los Angeles

    Google Scholar 

  56. Poesio M (2004) Discourse annotation and semantic annotation in the GNOME corpus. In: Proceedings of the ACL workshop on discourse annotation, Barcelona

    Book  Google Scholar 

  57. Poesio M, Artstein R (2008) Anaphoric annotation in the ARRAU corpus. In: LREC’08, Marrakech

    Google Scholar 

  58. Poesio M, Vieira R (1998) A corpus-based investigation of definite description use. Comput Linguist 24(2):183–216

    Google Scholar 

  59. Poesio M, Sturt P, Arstein R, Filik R (2006) Underspecification and anaphora: theoretical issues and preliminary evidence. Discourse Process 42(2):157–175

    Article  Google Scholar 

  60. Poesio M, Chamberlain J, Kruschwitz U, Robaldo L, Ducceschi L (2012) The phrase detective multilingual corpus, release 0.1. In: Proceedings of LREC’12 workshop on collaborative resource development and delivery, Istanbul

    Google Scholar 

  61. Poesio M, Chamberlain J, Kruschwitz U, Robaldo L, Ducceschi L (2012 forthcoming) Phrase detectives: utilizing collective intelligence for internet-scale language resource creation. ACM Trans Interact Intell Syst

    Google Scholar 

  62. Quinn A, Bederson B (2011) Human computation: a survey and taxonomy of a growing field. In: CHI, Vancouver

    Book  Google Scholar 

  63. Rafelsberger W, Scharl A (2009) Games with a purpose for social networking platforms. In: Proceedings of the 20th ACM conference on hypertext and hypermedia, Torino

    Google Scholar 

  64. Ross J, Irani L, Silberman MS, Zaldivar A, Tomlinson B (2010) Who are the crowdworkers?: shifting demographics in Mechanical Turk. In: Proceedings of CHI EA ’10, Atlanta

    Google Scholar 

  65. Siorpaes K, Hepp M (2008) Games with a purpose for the semantic web. IEEE Intell Syst 23(3):50–60

    Article  Google Scholar 

  66. Smadja F (2009) Mixing financial, social and fun incentives for social voting. Proceedings of the WWW 2009 workshop on web incentives (WEBCENTIVES’09), Madrid

    Google Scholar 

  67. Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: EMNLP’08: proceedings of the conference on empirical methods in natural language processing, Honolulu

    Google Scholar 

  68. Surowiecki J (2005) The wisdom of crowds. Anchor, New York

    Google Scholar 

  69. Sweetser P, Wyeth P (2005) Gameflow: a model for evaluating player enjoyment in games. Comput Entertain 3(3):1–24

    Article  Google Scholar 

  70. Thaler S, Siorpaes K, Simperl E, Hofer C (2011) A survey on games for knowledge acquisition. Technical Report STI TR 2011-05-01, Semantic Technology Institute

    Google Scholar 

  71. Tratz S, Hovy E (2010) A taxonomy, dataset, and classifier for automatic noun compound interpretation. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala

    Google Scholar 

  72. von Ahn L (2006) Games with a purpose. Computer 39(6):92–94

    Article  Google Scholar 

  73. von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the SIGCHI conference on Human factors in computing systems, Vienna

    Book  Google Scholar 

  74. von Ahn L, Dabbish L (2008) Designing games with a purpose. Commun ACM 51(8):58–67

    Article  Google Scholar 

  75. von Ahn L, Liu R, Blum M (2006) Peekaboom: a game for locating objects in images. In: Proceedings of CHI ’06, Montréal

    Google Scholar 

  76. Wais P, Lingamneni S, Cook D, Fennell J, Goldenberg B, Lubarov D, Marin D, Simons H (2010) Towards building a high-quality workforce with Mechanical Turk. In: Proceedings of computational social science and the wisdom of crowds (NIPS)

    Google Scholar 

  77. Wang A, Hoang CDV, Kan MY (2010) Perspectives on crowdsourcing annotations for natural language processing. Lang Res Eval 1–23

    Google Scholar 

  78. Woolley AW, Chabris CF, Pentland A, Hashmi N, Malone TW (2010) Evidence for a collective intelligence factor in the performance of human groups. Science 330:686–688. URL http://dx.doi.org/10.1126/science.1193147

  79. Yang H, Lai C (2010) Motivations of Wikipedia content contributors. Comput Human Behav 26(6):1377–1383

    Article  Google Scholar 

  80. Yuen M, Chen L, King I (2009) A survey of human computation systems playing having fun. In: International conference on computational science and engineering, Vanoucer (CSE’09), vol 4, pp 723–728. IEEE

    Google Scholar 

Download references

Acknowledgements

We would like to thank Jean Heutte (CREF-CNRS) for his help with the concepts of game flow and for the comments of the reviewers of this chapter. The contribution of Karën Fort to this work was realized as part of the Quæro ProgrammeFootnote 32, funded by OSEO, French State agency for innovation. The original Phrase Detectives game was funded as part of the EPSRC AnaWiki project, EP/F00575X/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jon Chamberlain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chamberlain, J., Fort, K., Kruschwitz, U., Lafourcade, M., Poesio, M. (2013). Using Games to Create Language Resources: Successes and Limitations of the Approach. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35085-6_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35084-9

  • Online ISBN: 978-3-642-35085-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics