Language Resources and Evaluation

, Volume 52, Issue 1, pp 341–364 | Cite as

Cross-language transfer of semantic annotation via targeted crowdsourcing: task design and evaluation

  • Evgeny A. Stepanov
  • Shammur Absar Chowdhury
  • Ali Orkan Bayer
  • Arindam Ghosh
  • Ioannis Klasinas
  • Marcos Calvo
  • Emilio Sanchis
  • Giuseppe Riccardi
Project Notes


Modern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks, like cross-language semantic annotation transfer, may generate low judgment agreement and/or poor performance. The most serious issue in cross-language porting is the absence of reference annotations in the target language; thus, crowd quality control and the evaluation of the collected annotations is difficult. In this paper we investigate targeted crowdsourcing for semantic annotation transfer that delegates to crowds a complex task such as segmenting and labeling of concepts taken from a domain ontology; and evaluation using source language annotation. To test the applicability and effectiveness of the crowdsourced annotation transfer we have considered the case of close and distant language pairs: Italian–Spanish and Italian–Greek. The corpora annotated via crowdsourcing are evaluated against source and target language expert annotations. We demonstrate that the two evaluation references (source and target) highly correlate with each other; thus, drastically reduce the need for the target language reference annotations.


Crowdsourcing Evaluation Semantic annotation Cross-language transfer 


  1. Allahbakhsh, M., Benatallah, B., Ignjatovic, A., Motahari-Nezhad, H., Bertino, E., & Dustdar, S. (2013). Quality control in crowdsourcing systems. IEEE Internet Computing, 17(2), 76–81.CrossRefGoogle Scholar
  2. Bayer, A. O., & Riccardi, G. (2012). Joint language models for automatic speech recognition and understanding. In Proceeding of the IEEE spoken language technology workshop.Google Scholar
  3. Bentivogli, L., Forner, P., & Pianta, E. (2004). Evaluating cross-language annotation transfer in the multisemcor corpus. In Proceedings of the 20th international conference on computational linguistics, association for computational linguistics.Google Scholar
  4. Calvo, M., Hurtado, L. F., Garcia, F., Sanchis, E., & Segarra, E. (2016). Multilingual spoken language understanding using graphs and multiple translations. Computer Speech and Language, 38, 86–103.CrossRefGoogle Scholar
  5. Chowdhury, S. A., Calvo, M., Ghosh, A., Stepanov, E. A., Bayer, A. O., Riccardi, G., et al. (2015). Selection and aggregation techniques for crowdsourced semantic annotation task. In The 16th annual conference of the international speech communication association (INTERSPEECH) (pp. 2779–2783). Dresden: ISCA.Google Scholar
  6. Chowdhury, S. A., Ghosh, A., Stepanov, E. A., Bayer, A. O., Riccardi, G., & Klasinas, I. (2014). Cross-language transfer of semantic annotation via targeted crowdsourcing. In The 15th annual conference of the international speech communication association (INTERSPEECH) (pp. 2108–2112). Singapore: ISCA.Google Scholar
  7. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46.CrossRefGoogle Scholar
  8. Dice, L. R. (1945). Measures of the amount of ecologic association between species. Ecology, 26(3), 297–302.CrossRefGoogle Scholar
  9. Dinarelli, M., Quarteroni, S., Tonelli, S., Moschitti, A., & Riccardi, G. (2009). Annotating spoken dialogs: From speech segments to dialog acts and frame semantics. In Proceedings of EACL workshop on the semantic representation of spoken language. Athens.Google Scholar
  10. Finin, T., Murnane, W., Karandikar, A., Keller, N., Martineau, J., & Dredze, M. (2010). Annotating named entities in twitter data with crowdsourcing. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk, association for computational linguistics (pp. 80–88).Google Scholar
  11. Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5), 378–382.CrossRefGoogle Scholar
  12. Fleiss, J. L. (1975). Measuring agreement between two judges on the presence or absence of a trait. Biometrics, 31, 651–659.CrossRefGoogle Scholar
  13. Fort, K., Adda, G., & Cohen, K. B. (2011). Amazon mechanical turk: Gold mine or coal mine? Computational Linguistics, 37(2), 413–420.CrossRefGoogle Scholar
  14. Gonzàlez, M., Mateva, M., Enache, R., Na, C. E., Arquez, L. M., Popov, B., & Ranta, A. (2013). MT techniques in a retrieval system of semantically enriched patents. In MT Summit.Google Scholar
  15. Hripcsak, G., & Rothschild, A. S. (2005). Agreement, the f-measure, and reliability in information retrieval. Journal of the American Medical Informatics Association, 12(3), 296–298.CrossRefGoogle Scholar
  16. Hwa, R., Resnik, P., Weinberg, A., & Kolak, O. (2002). Evaluating translational correspondence using annotation projection. In Proceedings of the 40th annual meeting on association for computational linguistics, association for computational linguistics (pp. 392–399).Google Scholar
  17. Jabaian, B., Besacier, L., & Lefèvre, F. (2010). Investigating multiple approaches for SLU portability to a new language. In Proceedings of INTERSPEECH.Google Scholar
  18. Jabaian, B., Besacier, L., & Lefèvre, F. (2011). Combination of stochastic understanding and machine translation systems for language portability of dialogue systems. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP).Google Scholar
  19. Jabaian, B., Besacier, L., & Lefèvre, F. (2013). Comparison and combination of lightly supervised approaches for language portability of a spoken language understanding system. IEEE Transactions on Audio, Speech, and Language Processing, 21(3), 636–648.CrossRefGoogle Scholar
  20. Johansson, R., & Moschitti, A. (2010). Syntactic and semantic structure for opinion expression detection. In Proceedings of the 40th conference on computational natural language learning (pp. 67–76).Google Scholar
  21. Lawson, N., Eustice, K., Perkowitz, M., & Yetisgen-Yildiz, M. (2010). Annotating large email datasets for named entity recognition with mechanical turk. In Proceedings of the NAACL HLT 2010 workshop on creating speech and language data with amazon’s mechanical turk, association for computational linguistics (pp. 71–79).Google Scholar
  22. Lefèvre, F., Mairesse, F., & Young, S. (2010). Cross-lingual spoken language understanding from unaligned data using discriminative classification models and machine translation. In Proceedings of INTERSPEECH.Google Scholar
  23. Marge, M., Banerjee, S., & Rudnicky, A. I. (2010). Using the amazon mechanical turk for transcription of spoken language. In 2010 IEEE international conference on acoustics speech and signal processing (ICASSP) (pp. 5270–5273). IEEE.Google Scholar
  24. Padó, S., & Lapata, M. (2009). Cross-lingual annotation projection for semantic roles. Journal of Artificial Intelligence Research, 36(1), 307–340.Google Scholar
  25. Parent, G., & Eskenazi, M. (2010). Toward better crowdsourced transcription: Transcription of a year of the let’s go bus information system data. In IEEE spoken language technology workshop (SLT) (pp. 312–317). IEEE.Google Scholar
  26. Pustejovsky, J., & Rumshisky, A. (2014). Deep semantic annotation with shallow methods. LREC 2014 Tutorial.Google Scholar
  27. Rigo, S., Stepanov, E. A., Roberti, P., Quarteroni, S., & Riccardi, G. (2009). The 2009 UNITN EVALITA Italian spoken dialogue system. In Evaluation of NLP and speech tools for Italian workshop (EVALITA). Reggio Emilia.Google Scholar
  28. Riloff, E., Schafer, C., & Yarowsky, D. (2002). Inducing information extraction systems for new languages via cross-language projection. In: Proceedings of the 19th international conference on computational linguistics—Volume 1, association for computational linguistics (pp. 1–7).Google Scholar
  29. Ross, J., Zaldivar, A., Irani, L., & Tomlinson, B. (2009). Who are the turkers? Worker demographics in amazon mechanical turk. Tech Rep. Irvine: Department of Informatics: University of California.Google Scholar
  30. Spreyer, K., & Frank, A. (2008). Projection-based acquisition of a temporal labeller. In Proceedings of the international joint conference on natural language processing (pp. 489–496).Google Scholar
  31. Stepanov, E. A., Kashkarev, I., Bayer, A. O., Riccardi, G., & Ghosh, A. (2013). Language style and domain adaptation for cross-language SLU porting. In IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 144–149). Olomouc: IEEE.Google Scholar
  32. Stepanov, E. A., Riccardi, G., & Bayer, A. O. (2014). The development of the multilingual LUNA corpus for spoken language system porting. In The 9th international conference on language resources and evaluation (LREC’14) (pp. 2675–2678). Reykjavik: ELRA.Google Scholar
  33. Xi, C., & Hwa, R. (2005). A Backoff model for bootstrapping resources for non-english languages. In Proceedings of the conference on human language technology and empirical methods in natural language processing, association for computational linguistics (pp. 851–858).Google Scholar
  34. Yarowsky, D., Ngai, G., & Wicentowski, R. (2001). Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of the 1st international conference on human language technology research, association for computational linguistics (pp. 1–8).Google Scholar
  35. Zaidan, O. F., & Callison-Burch, C. (2011). Crowdsourcing translation: Professional quality from non-professionals. In Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies-volume 1, association for computational linguistics (pp. 1220–1229).Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.Signals and Interactive Systems Lab, Department of Information Engineering and Computer ScienceUniversity of TrentoTrentoItaly
  2. 2.Department of Electronics and Computer EngineeringTechnical University of CreteChaniaGreece
  3. 3.Google SwitzerlandZurichSwitzerland
  4. 4.Departamento de Sistemas Informáticos y ComputaciónUniversitat Politècnica de ValènciaValenciaSpain

Personalised recommendations