Advertisement

Reexaminatin on Voting for Crowd Sourcing MT Evaluation

  • Yiming Wang
  • Muyun Yang
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 493)

Abstract

We describe a model based on Ranking Support Vector Machine(SVM) used to deal with the crowdsourcing data. Our model focuses on how to use poor quality crowdsourcing data to get high quality sorted data. The data sets, used for model training and testing, has the situation of data missing. And we found that our model achieves better results than voting model in all the cases in our experiment, including sorting of two translations and four translations.

Keywords

crowdsourcing automatic evaluation of machine translation voting model SVM model data missing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Howe, J.: Crowdsourcing: A definition. Crowdsourcing: Tracking the rise of the amateur (2006)Google Scholar
  2. 2.
    Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. Association for Computational Linguistics (2008)Google Scholar
  3. 3.
    Kittur, A., Chi, E.H., Suh, B.: Crowd sourcing user studies with Mechanical Turk. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 453–456. ACM (2008)Google Scholar
  4. 4.
    Sorokin, A., Forsyth, D.: Utility data annotation with amazon mechanical turk. Urbana 51, 820 (2008)Google Scholar
  5. 5.
    Novotney, S., Callison-Burch, C.: Cheap, fast and good enough: Automatic speech recogni- tion with non-expert transcription. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 207–215. Association for Computational Linguistics (2010)Google Scholar
  6. 6.
    Kazai, G., Kamps, J., Koolen, M., Milic-Frayling, N.: Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 205–214. ACM (2011)Google Scholar
  7. 7.
    Xintong, G., Hongzhi, W., Song, Y., Hong, G.: Brief survey of crowdsourcing for data mining. Expert Systems with Applications 41, 7987–7994 (2014)CrossRefGoogle Scholar
  8. 8.
    Fort, K., Adda, G., Sagot, B., Mariani, J., Couillault, A.: Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS, vol. 8387, pp. 303–314. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  9. 9.
    Callison-Burch, C., Dredze, M.: Creating speech and language data with Amazon’s Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Lan- guage Data with Amazon’s Mechanical Turk, pp. 1–12. Association for Computational Linguistics (2010)Google Scholar
  10. 10.
    Lawson, N., Eustice, K., Perkowitz, M., Yetisgen-Yildiz, M.: Annotating large email datasets for named entity recognition with Mechanical Turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 71–79. Association for Computational Linguistics (2010)Google Scholar
  11. 11.
    Whitehill, J., Wu, T., Bergsma, J., Movellan, J.R., Ruvolo, P.L.: Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems, pp. 2035–2043 (2009)Google Scholar
  12. 12.
    Raykar, V.C., Yu, S., Zhao, L.H., Valadez, G.H., Florin, C., Bogoni, L., Moy, L.: Learning from crowds. The Journal of Machine Learning Research 11, 1297–1322 (2010)MathSciNetGoogle Scholar
  13. 13.
    Yan, Y., Rosales, R., Fung, G., Schmidt, M.W., Valadez, G.H., Bogoni, L., Moy, L., Dy, J.G.: Modeling annotator expertise: Learning when everybody knows a bit of something. In: International Conference on Artificial Intelligence and Statistics, pp. 932–939 (2010)Google Scholar
  14. 14.
    Raykar, V.C., Yu, S.: Ranking annotators for crowdsourced labeling tasks. In: Advances in Neural Information Processing Systems, pp. 1809–1817 (2011)Google Scholar
  15. 15.
    Callison-Burch, C.: Fast, cheap, and creative: evaluating translation quality using Amazon’s Mechanical Turk. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 1, pp. 286–295. Association for Computational Linguistics (2009)Google Scholar
  16. 16.
    Bojar, O., Buck, C., Callison-Burch, C., Federmann, C., Haddow, B., Koehn, P., Monz, C., Post, M., Soricut, R., Specia, L.: Findings of the 2013 workshop on statistical machine trans- lation. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, pp. 1–44 (2013)Google Scholar
  17. 17.
    Chen, X., Bennett, P.N., Collins-Thompson, K., Horvitz, E.: Pairwise ranking aggregation in a crowdsourced setting. In: Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, pp. 193–202. ACM (2013)Google Scholar
  18. 18.
    Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics, 159–174 (1977)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Yiming Wang
    • 1
  • Muyun Yang
    • 2
  1. 1.Harbin Institute of TechnologyHarbinChina
  2. 2.Institute of TechnologyHarbinChina

Personalised recommendations