Abstract
The basic requirement of supervised data-driven methods for various NLP tasks like part-of-speech tagging, dependency parsing, machine translation is large-scale annotated data. Since statistical methods have taken places overrule/heuristic methods over the years, text annotation has become an essential NLP research. Annotation refers to the task of manually labeling of text, image, or other data with comments, explanation, tags or markups—for NLP, often carried out by linguists to label raw text. While the outcome of the annotation process, i.e., the labeled data is valuable, capturing user activities may help in understanding the cognitive subprocesses underlying text annotation.
Declaration: Consent of the subjects participating in the eye-tracking experiments for collecting data used for the work reported in this chapter has been obtained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
\(20\%\) of the translation sessions were discarded as it was difficult to rectify the gaze logs for these sessions.
- 4.
Anything beyond the upper bound is hard to translate and can be assigned with the maximum score.
- 5.
- 6.
- 7.
- 8.
- 9.
The MSE values are absolute, as opposed to the percentage values presented in the paper. Also, the results reported here slightly differ from the paper due to the fact that an updated version of TPR dataset was used for this experimentation.
- 10.
The online version that was active in the year of 2013.
- 11.
BLEU, another popular metric was not used, as techniques to measure sentence wise BLEU scores were non-existent at the time of this experimentation. Moreover, BLEU may not be the most appropriate metric for English–Indian language translation evaluation as shown by Ananthakrishnan et al. (2007).
- 12.
The fixation duration per word is calculated for each sentence, and an average is taken.
- 13.
The complete eye-tracking data (with recorded values of fixations, saccades, eye regression patterns, pupil dilation, and gaze-to-word mapping) are available for academic use at http://www.cfilt.iitb.ac.in/~cognitive-nlp.
- 14.
- 15.
In case of SVM, the probability of predicted class is computed as given in Platt (1999).
References
Ananthakrishnan, R., Bhattacharyya, P., Sasikumar, M., & Shah, R. M. (2007). Some issues in automatic evaluation of English-Hindi MT: More blues for bleu. In ICON.
Balahur, A., Hermida, J. M., & Montoyo, A. (2011). Detecting implicit expressions of sentiment in text based on commonsense knowledge. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (pp. 53–60). Association for Computational Linguistics.
Bird, S. (2006). NLTK: The natural language toolkit. In Proceedings of the COLING/ACL on Interactive Presentation Sessions (pp. 69–72). Association for Computational Linguistics.
Campbell, S., & Hale, S. (1999). What makes a text difficult to translate? In Refereed Proceedings of the 23rd Annual ALAA Congress.
Carl, M. (2012a). The CRITT TPR-DB 1.0: A database for empirical human translation process research. In AMTA 2012 Workshop on Post-Editing Technology and Practice (WPTP-2012).
Carl, M. (2012b). Translog-II: A program for recording user activity data for empirical reading and writing research. In LREC (pp. 4108–4112).
Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Cambridge: Brookline Books.
Denkowski, M., & Lavie, A. (2011). Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the Sixth Workshop on Statistical Machine Translation (pp. 85–91). Association for Computational Linguistics.
Dragsted, B. (2010). Coordination of reading and writing processes in translation. Translation and Cognition, 15, 41.
Esuli, A. & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. In Proceedings of LREC (Vol. 6, pp. 417–422).
Fellbaum, C. (1998). WordNet. Wiley Online Library.
Fort, K., Nazarenko, A., & Rosset, S. (2012). Modeling the complexity of manual annotation tasks: a grid of analysis. In International Conference on Computational Linguistics (pp. 895–910).
Ganapathibhotla, G. & Liu, B. (2008). Identifying preferred entities in comparative sentences. In Proceedings of the International Conference on Computational Linguistics, COLING.
Gunning, R. (1969). The fog index after twenty years. Journal of Business Communication, 6(2), 3–13.
Hornof, A. J., & Halverson, T. (2002). Cleaning up systematic error in eye-tracking data by using required fixation locations. Behavior Research Methods, Instruments, & Computers, 34(4), 592–604.
Joachims, T. (2006). Training linear SVMS in linear time. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 217–226). ACM.
Joshi, S., Kanojia, D., & Bhattacharyya, P. (2013). More than meets the eye: Study of human cognition in sense annotation. In NAACL HLT 2013. Atlanta, USA.
Kincaid, J. P., Fishburne, R. P. Jr., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. Technical report, DTIC Document.
Lin, D. (1996). On the structural complexity of natural language sentences. In Proceedings of the 16th Conference on Computational Linguistics (Vol. 2, pp. 729–733). Association for Computational Linguistics.
Martınez-Gómez, P., & Aizawa, A. (2013). Diagnosing causes of reading difficulty using Bayesian networks. In IJCNLP.
McAuley, J. J. & Leskovec, J. (2013). From amateurs to connoisseurs: Modeling the evolution of user expertise through online reviews. In Proceedings of the 22nd International Conference on World Wide Web (pp. 897–908). International World Wide Web Conferences Steering Committee.
Mishra, A., Bhattacharyya, P., Carl, M., & CRITT, I. (2013). Automatically predicting sentence translation difficulty. In ACL (Vol. 2, pp. 346–351).
Mishra, A., Carl, M., & Bhattacharyya, P. (2012). A heuristic-based approach for systematic error correction of gaze data for reading. In Proceedings of the First Workshop on Eyetracking and Natural Language Processing. Mumbai, India.
Pang, B., & Lee, L. (2005). Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 115–124). Association for Computational Linguistics.
Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers. Citeseer.
Ramteke, A., Malu, A., Bhattacharyya, P., & Nath, J. S. (2013). Detecting turnarounds in sentiment analysis: Thwarting. In ACL (Vol. 2, pp. 860–865).
Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., & Huang, R. (2013). Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of Empirical Methods in Natural Language Processing (pp. 704–714).
Scott, G. G., O’Donnell, P. J., & Sereno, S. C. (2012). Emotion words affect eye fixations during reading. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38(3), 783.
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (Vol. 200).
Von der Malsburg, T., & Vasishth, S. (2011). What is the scanpath signature of syntactic reanalysis? Journal of Memory and Language, 65(2), 109–127.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Mishra, A., Bhattacharyya, P. (2018). Estimating Annotation Complexities of Text Using Gaze and Textual Information. In: Cognitively Inspired Natural Language Processing. Cognitive Intelligence and Robotics. Springer, Singapore. https://doi.org/10.1007/978-981-13-1516-9_3
Download citation
DOI: https://doi.org/10.1007/978-981-13-1516-9_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1515-2
Online ISBN: 978-981-13-1516-9
eBook Packages: Computer ScienceComputer Science (R0)