Skip to main content

Collaborative Web-Based Tools for Multi-layer Text Annotation

  • Chapter
  • First Online:

Abstract

Effectively managing the collaboration of many annotators is a crucial ingredient for the success of larger annotation projects. For collaboration, web-based tools offer a low-entry way gathering annotations from distributed contributors. While the management structure of annotation tools is more or less stable across projects, the kind of annotations vary widely between projects. The challenge for web-based tools for multi-layer text annotation is to combine ease of use and availability through the web with maximal flexibility regarding the types and layers of annotations. In this chapter, we outline requirements for web-based annotation tools in detail and review a variety of tools in respect to these requirements. Further, we discuss two web-based multi-layer annotation tools in detail: GATE Teamware and WebAnno. While differing in some aspects, both tools largely fulfill the requirements for today’s web-based annotation tools. Finally, we point out further directions, such as increased schema flexibility and tighter integration of automation for annotation suggestions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The corresponding code still is present in ELAN 4.6.1, but is disabled and appears not to have been touched for several years.

  2. 2.

    http://www.w3.org/TR/SVG/.

  3. 3.

    https://www.webkit.org/.

  4. 4.

    Source code and documentation are available from http://gate.ac.uk/teamware/.

  5. 5.

    Available to use and trial at http://gatecloud.net.

  6. 6.

    http://www.jboss.com/products/jbpm/.

  7. 7.

    Available for download at: http://webanno.googlecode.com/.

  8. 8.

    Formats: plain text, CoNLL [36], TCF [25], UIMA XMI [21].

  9. 9.

    www.crowdflower.com.

  10. 10.

    http://www.ucomp.eu.

  11. 11.

    http://www.clarin.eu/.

References

  1. Bauer, C., King, G.: Java Persistence with Hibernate. Manning Publications Co, Bruce Park Avenue Typesetters, Greenwich, CT, USA (2007)

    Google Scholar 

  2. Benikova, D., Biemann, C., Reznicek, M.: NoSta-D Named Entity Annotation for German: Guidelines and Dataset. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pp. 2524–2531. European Language Resources Association (ELRA), Reykjavik, Iceland (2014)

    Google Scholar 

  3. Bollmann. M., Dipper, S., Krasselt, J., Petran, F.: Manual and semi-automatic normalization of historical spelling – case studies from early new high German. In: Proceedings of the First International Workshop on Language Technology for Historical Text(s) (LThist2012), KONVENS, Vienna, Austria (2012)

    Google Scholar 

  4. Bontcheva, K., Cunningham, H., Roberts, I., Roberts, A., Tablan, V., Aswani, N., Gorrell, G.: GATE Teamware: a web-based, collaborative text annotation framework. Lang. Resour. Eval. 47(4), 1007–1029 (2013). doi:10.1007/s10579-013-9215-6

    Article  Google Scholar 

  5. Brants, T., Plaehn, O.: Interactive corpus annotation. In: Calzolari, N., Carayannis, G., Choukri, K., Höge, H., Maegaard, B., Mariani, J., Zampolli, A. (eds.) Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC’00), pp. 453–459. European Language Resources Association (ELRA), Athens, Greece (2000)

    Google Scholar 

  6. Brugman, H., Russel, A.: Annotating Multi-media / Multi-modal resources with ELAN. In: Lino, M.T., Xavier, M.F., Ferreira, F., Costa, R., Silva, R., Pereira, C., Carvalho, F., Lopes, M., Catarino, M., Barros, S. (eds.) Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04), pp. 2065–2068. European Language Resources Association (ELRA), Lisbon, Portugal (2004)

    Google Scholar 

  7. Brugman, H., Crasborn, O., Russel, A.: Collaborative annotation of sign language data with peer-to-peer technology. In: Lino, M.T., Xavier, M.F., Ferreira, F., Costa, R., Silva, R., Pereira, C., Carvalho, F., Lopes, M., Catarino, M., Barros, S. (eds.) Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC’04). European Language Resources Association (ELRA), Lisbon, Portugal (2004)

    Google Scholar 

  8. Burchardt, A., Erk, K., Frank, A., Kowalski, A., Pado, S.: SALTO: a versatile multi-level annotation tool. In: Calzolari, N., Choukri, K., Gangemi, A., Maegaard, B., Mariani, J., Odijk, J., Tapias, D. (eds.) Proceedings of the 5th international conference on language resources and evaluation (LREC’06), pp. 517–520. European Language Resources Association (ELRA), Genoa, Italy (2006)

    Google Scholar 

  9. Carletta, J.: Assessing agreement on classification tasks: the kappa statistic. Comput. Linguist. 22(2), 249–254 (1996)

    Google Scholar 

  10. Carletta, J., Evert, S., Heid, U., Kilgour, J.: The NITE XML Toolkit: data model and query language. Lang. Resour. Eval. 39(4), 313–334 (2005). doi:10.1007/s10579-006-9001-9

    Article  Google Scholar 

  11. Chen, W.T., Styler, W.: Anafora: a web-based general purpose annotation tool. In: Proceedings of the 2013 NAACL HLT Demonstration Session. Association for Computational Linguistics, Atlanta, Georgia, pp. 14–19. http://www.aclweb.org/anthology/N13-3004 (2013)

  12. Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res. 3, 951–991 (2003). doi:10.1162/jmlr.2003.3.4-5.951

    Google Scholar 

  13. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: an architecture for development of robust HLT applications. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL’02), pp. 168–175. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA (2002). doi:10.3115/1073083.1073112

  14. Cunningham, H., Tablan, V., Roberts, A., Bontcheva, K.: Getting more out of biomedical documents with GATE’s full lifecycle open source text analytics. PLoS Comput. Biol. 9(2), e1002854 (2013). doi:10.1371/journal.pcbi.1002854

    Article  Google Scholar 

  15. Dashorst, M., Hillenius, E.: Wicket in Action. Manning Publications Co, Sound View Court 3B, Greenwich (2009)

    Google Scholar 

  16. Day, D., Aberdeen, J., Hirschman, L., Kozierok, R., Robinson, P., Vilain, M.: Mixed-initiative development of language processing systems. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (ANLC ’97), pp. 348–355. Association for Computational Linguistics, Washington, DC (1997). doi:10.3115/974557.974608

  17. Day, D., McHenry, C., Kozierok, R., Riek, L.: Callisto: a configurable annotation workbench. In: Lino, M.T., Xavier, M.F., Ferreira, F., Costa, R., Silva, R., Pereira, C., Carvalho, F., Lopes, M., Catarino, M., Barros, S. (eds.) Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04), pp. 2073–2076. European Language Resources Association (ELRA), Lisbon, Portugal (2004)

    Google Scholar 

  18. Dipper, S., Götze, M., Stede, M.: Simple annotation tools for complex annotation tasks: an evaluation. In: Proceedings of the LREC Workshop on XML-based Richly Annotated Corpora, Lisbon, Portugal, pp. 54–62 (2004)

    Google Scholar 

  19. Dipper, S., Lüdeling, A., Reznicek, M.: NoSta-D: A corpus of german non-standard varieties. In: Zampieri, M., Diwersy, S. (eds.) Non-standard Data Sources in Corpus-based Research, Shaker, pp. 69–76 (2013)

    Google Scholar 

  20. Eckart de Castilho, R., Gurevych, I.: DKPro-UGD: a flexible data-cleansing approach to processing user-generated discourse. In: Online-proceedings of the First French-speaking meeting around the framework Apache UIMA, LINA CNRS UMR 6241 - University of Nantes, France (2009)

    Google Scholar 

  21. Ferrucci, D., Lally, A.: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. 10(3–4), 327–348 (2004). doi:10.1017/S1351324904003523

    Article  Google Scholar 

  22. Francis, W.N., Kucera, H.: Brown corpus manual. Technical report, Department of Linguistics, Brown University, Providence, Rhode Island, USA. http://icame.uib.no/brown/bcm.html (1979). (Last accessed: 2015-02-11)

  23. Garrett, J.J.: Ajax: A New Approach to Web Applications. http://www.adaptivepath.com/ideas/ajax-new-approach-web-applications/ (2005). (Last accessed: 2015-02-11)

  24. Gerdes, K.: Arborator - a tool for collaborative dependency annotation. https://launchpad.net/arborator (2013). (Last accessed: 2015-02-08)

  25. Heid, U., Schmid, H., Eckart, K., Hinrichs, E.: A corpus representation format for linguistic web services: the d-spin text corpus format and its relationship with ISO standards. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’10), pp. 494–499. European Language Resources Association (ELRA), Valletta, Malta (2010)

    Google Scholar 

  26. Hovy, E.: Annotation. In: Tutorial Abstracts of ACL 2010. Association for Computational Linguistics, Uppsala, Sweden, p. 4. http://www.aclweb.org/anthology/P10-5004 (2010)

  27. Ide, N., Romary, L.: Towards international standards for language resources. In: Dybkjær, L., Hemsen, H., Minker, W. (eds.) Evaluation of Text and Speech Systems, chap 9, vol. 37, pp. 263–284. Springer, Netherlands (2007)

    Google Scholar 

  28. Ide, N., Bonhomme, P., Romary, L.: XCES: an XML-based encoding standard for linguistic corpora encoding standard for linguistic corpora. In: Calzolari, N., Carayannis, G., Choukri, K., Höge, H., Maegaard, B., Mariani, J., Zampolli, A. (eds.) Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC’00), pp. 825–830. European Language Resources Association (ELRA), Athens, Greece (2000)

    Google Scholar 

  29. Kaplan, D., Iida, R., Nishina, K., Tokunaga, T.: Slate - a tool for creating and maintaining annotated corpora. J. Lang. Technol. Comput. Linguist. 26(2), 89–101 (2011)

    Google Scholar 

  30. Lin, B., Chen, Y., Chen, X., Yu, Y.: Comparison between JSON and XML in Applications Based on AJAX. In: Guerrero JE (ed) Proceedings of the International Conference on Computer Science & Service System (CSSS’12). IEEE Computer Society, Nanjing, China, pp. 1174–1177 (2012). doi:10.1109/CSSS.2012.297

  31. Maeda, K., Lee, H., Medero, S., Medero, J., Parker, R., Strassel, S.: Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Tapias, D. (eds.) Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC’08), pp. 3052–3056. European Language Resources Association (ELRA), Marrakech, Morocco (2008)

    Google Scholar 

  32. Meurs, M.J., Murphy, C., Naderi, N., Morgenstern, I., Cantu, C., Semarjit, S., Butler, G., Powlowski, J., Tsang, A., Witte, R.: Towards evaluating the impact of semantic support for curating the fungus scientific literature. In: Baker, C.J.O., Chen, H., Bagheri, E., Du, W. (eds.) Proceedings of the 3rd Canadian Semantic Web Symposium (CSWS’11), pp. 34–39. Vancouver, British Columbia, Canada (2011)

    Google Scholar 

  33. Morton, T., LaCivita, J.: WordFreak: an open tool for linguistic annotation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Demonstrations - Volume 4 (NAACL-Demonstrations ’03), pp. 17–18. Association for Computational Linguistics, Stroudsburg, PA, USA (2003). doi:10.3115/1073427.1073436

  34. Müller, C., Strube, M.: Multi-level annotation of linguistic data with MMAX2. In: Braun, S., Kohn, K., Mukherjee, J. (eds.) Corpus Technology and Language Pedagogy: New Resources, New Tools, New Methods, Peter Lang, Frankfurt a.M., Germany, pp. 197–214 (2006)

    Google Scholar 

  35. Nakov, P., Schwartz, A., Wolf, B., Hearst, M.: Supporting annotation layers for natural language processing. In: Proceedings of the ACL 2005 on Interactive Poster and Demonstration Sessions, pp. 65–68. Association for Computational Linguistics, Ann Arbor, Michigan (2005). doi:10.3115/1225753.1225770

  36. Nivre, J., Hall, J., Kübler, S., McDonald, R., Nilsson, J., Riedel, S., Yuret, D.: The CoNLL 2007 shared task on dependency parsing. In: Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pp. 915–932. Association for Computational Linguistics Prague, Czech Republic (2007)

    Google Scholar 

  37. Ogren, P.V.: Knowtator: A protégé plug-in for annotated corpus construction. In: Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: Companion Volume: Demonstrations, pp. 273–275. Association for Computational Linguistics, Stroudsburg, PA, USA, NAACL-Demonstrations ’06. doi:10.3115/1225785.1225791 (2006)

  38. Pajas, P., Štěpánek, J.: Recent advances in a feature-rich framework for treebank annotation. In: Proceedings of the 22nd International Conference on Computational Linguistics (COLING’08), Manchester, UK, pp. 673–680. http://www.aclweb.org/anthology/C08-1085 (2008)

  39. Rak, R., Rowley, A., Black, W., Ananiadou, S.: Argo: an integrative, interactive, text mining-based workbench supporting curation. Database 2012 (2012). doi:10.1093/database/bas010

  40. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: Brat: a web-based tool for NLP-assisted text annotation. In: Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 102–107. Association for Computational Linguistics, Avignon, France. http://www.aclweb.org/anthology/E12-2021 (2012)

  41. Stührenberg, M., Goecke, D., Diewald, N., Mehler, A., Cramer, I.: Web-based annotation of anaphoric relations and lexical chains. Proceedings of the Linguistic Annotation Workshop (LAW’07), pp. 140–147. Association for Computational Linguistics, Prague, Czech Republic (2007)

    Google Scholar 

  42. Tablan, V., Roberts, I., Cunningham, H., Bontcheva, K.: GATECloud.net: a platform for large-scale, open-source text processing on the cloud. Philos. Trans. R. Soc. Lond. A: Math. Phys. Eng. Sci. 371(1983) (2012). doi:10.1098/rsta.2012.0071

  43. Walls, C.: Spring in Action, 3rd edn. Manning Publications Co, Sound View Court 3B, Greenwich, CT, USA (2011)

    Google Scholar 

  44. Wang, A., Hoang, C.D.V., Kan, M.Y.: Perspectives on crowdsourcing annotations for natural language processing. Lang. Resour. Eval. 47(1), 9–31 (2013). doi:10.1007/s10579-012-9176-1

    Article  Google Scholar 

  45. Yimam, S.M., Gurevych, I., Eckart de Castilho, R., Biemann, C.: WebAnno: A flexible, web-based and visually supported system for distributed annotations. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 1–6. Association for Computational Linguistics, Sofia, Bulgaria. http://www.aclweb.org/anthology/P13-4001 (2013)

  46. Yimam, S.M., Biemann, C., Eckart de Castilho, R., Gurevych, I.: Automatic annotation suggestions and custom annotation layers in WebAnno. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 91–96. Association for Computational Linguistics, Baltimore, Maryland. http://aclweb.org/anthology/P14-5016 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris Biemann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Biemann, C., Bontcheva, K., Eckart de Castilho, R., Gurevych, I., Yimam, S.M. (2017). Collaborative Web-Based Tools for Multi-layer Text Annotation. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_8

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics