An Open Source Tool for Crowd-Sourcing the Manual Annotation of Texts

Drury, Brett; Cardoso, Paula C. F.; Valverde-Rebaza, Jorge; Valejo, Alan; Pereira, Fabio; de Andrade Lopes, Alneu

doi:10.1007/978-3-319-09761-9_31

An Open Source Tool for Crowd-Sourcing the Manual Annotation of Texts

Brett Drury²⁵,
Paula C. F. Cardoso²⁵,
Jorge Valverde-Rebaza²⁵,
Alan Valejo²⁵,
Fabio Pereira²⁵ &
…
Alneu de Andrade Lopes²⁵

Conference paper

661 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8775))

Abstract

Manually annotated data is the basis for a large number of tasks in natural language processing as either: evaluation or training data. The annotation of large amounts of data by dedicated full-time annotators can be an expensive task, which may be beyond the budgets of many research projects. An alternative is crowd-sourcing where annotations are split among many part time annotators. This paper presents a freely available open-source platform for crowd-sourcing manual annotation tasks, and describes its application to annotating causative relations.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bermingham, A., Smeaton, A.F.: A study of inter-annotator agreement for opinion retrieval. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2009, pp. 784–785 (2009)
Google Scholar
Bird, S.: Nltk: The natural language toolkit. In: COLING, COLING-ACL 2006, pp. 69–72. Association for Computational Linguistics (2006)
Google Scholar
Brants, T.: Inter-annotator agreement for a german newspaper corpus. In: Proceedings of Second International Conference on Language Resources and Evaluation, LREC 2000 (2000)
Google Scholar
Cohen, J.: A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement 20(1), 37 (1960)
Article Google Scholar
Hsueh, P.-Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: A study of annotation selection criteria. In: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, HLT 2009, Stroudsburg, PA, USA, pp. 27–35. Association for Computational Linguistics (2009)
Google Scholar
Malkowski, S., Hedwig, M., Pu, C.: Experimental evaluation of n-tier systems: Observation and analysis of multi-bottlenecks. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 118–127. IEEE (2009)
Google Scholar
Mason, W., Watts, D.J.: Financial incentives and the “performance of crowds”. SIGKDD Explor. Newsl. 11(2), 100–108 (2010)
Article Google Scholar
Navarro, G.: A guided tour to approximate string matching. ACM Computing Surveys 33(2001) (1999)
Google Scholar
Ng, H.T., Yong, C., Foo, K.S.: A case study on Inter-Annotator agreement for word sense disambiguation. In: Proceedings of the ACL SIGLEX Workshop on Standardizing Lexical Resources (SIGLEX 1999). College Park, Maryland (1999)
Google Scholar
Nowak, S., Rüger, S.: How reliable are annotations via crowdsourcing: A study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the International Conference on Multimedia Information Retrieval, MIR 2010, pp. 557–566. ACM, New York (2010)
Google Scholar
Passonneau, R., Habash, N.Y., Rambow, O.: Inter-annotator agreement on a multilingual semantic annotation task. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, LREC (2006)
Google Scholar
Sabou, M., Bontcheva, K., Scharl, A.: Crowdsourcing research opportunities: Lessons from natural language processing. In: Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies, I-KNOW 2012, pp. 17:1–17:8. ACM (2012)
Google Scholar
Wang, A., Hoang, V.C.D., Kan, M.-Y.: Perspectives on crowdsourcing annotations for natural language processing. Language Resources and Evaluation 47(1), 9–31 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

ICMC, University of São Paulo, Av. Trabalhador São Carlense 400, São Carlos, SP, Brazil, C.P. 668, CEP 13560-970
Brett Drury, Paula C. F. Cardoso, Jorge Valverde-Rebaza, Alan Valejo, Fabio Pereira & Alneu de Andrade Lopes

Authors

Brett Drury
View author publications
You can also search for this author in PubMed Google Scholar
Paula C. F. Cardoso
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Valverde-Rebaza
View author publications
You can also search for this author in PubMed Google Scholar
Alan Valejo
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Alneu de Andrade Lopes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

FCHS, Universidade do Algarve, Campus de Gambelas,, 8005-139, Faro, Portugal
Jorge Baptista
INESC-ID Lisboa, Lisbon, Portugal
Nuno Mamede
IT-University of Coimbra, Coimbra, Portugal
Sara Candeias
USP-EACH, São Paulo-SP, Brazil
Ivandré Paraboni
USP-ICMC, Universidade de São Paulo, São Carlos, SP, Brazil
Thiago A. S. Pardo
SCC-ICMC, University of São Paulo, São Carlos, SP, Brazil
Maria das Graças Volpe Nunes

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Drury, B., Cardoso, P.C.F., Valverde-Rebaza, J., Valejo, A., Pereira, F., de Andrade Lopes, A. (2014). An Open Source Tool for Crowd-Sourcing the Manual Annotation of Texts. In: Baptista, J., Mamede, N., Candeias, S., Paraboni, I., Pardo, T.A.S., Volpe Nunes, M.d.G. (eds) Computational Processing of the Portuguese Language. PROPOR 2014. Lecture Notes in Computer Science(), vol 8775. Springer, Cham. https://doi.org/10.1007/978-3-319-09761-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-09761-9_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09760-2
Online ISBN: 978-3-319-09761-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics