Abstract
Mean length of utterance (MLU) is an important indicator for measuring complexity in child language. A generally employed method for calculating MLU is to use the CLAN toolkit, which includes modules that enable the measurement of utterance length in morphemes. However, these methods are based on rules which are only available for just a few languages not involving Hungarian. Therefore, in order to automatically analyze and measure Hungarian transcripts adequate methods need to be developed. In this paper we describe a new toolkit which is able to estimate MLU counts (in morphemes) while providing morphosyntactic tagging as well. Its components are based on existing resources; however, many of them were adapted to the language of the transcripts. The tool-chain performs the annotation task with a high precision and its MLU estimates are correlated with that of human experts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bick, E., Mello, H., Panunzi, A., Raso, T.: The annotation of the C-ORAL-BRASIL oral through the implementation of the Palavras Parser. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), pp. 3382–3386. ELRA, Istanbul (2012)
Bowerman, M.: Early syntactic development: A cross-linguistic study with special reference to Finnish. Cambridge University Press (1973)
Brown, R.: A first language: The early stages. Harvard University Press (1973)
Crystal, D.: Review of R. Brown ‘A first language’. Journal of Child Language 11, 289–307 (1974)
Csendes, D., Csirik, J.A., Gyimóthy, T.: The Szeged Corpus: A POS Tagged and Syntactically Annotated Hungarian Natural Language Corpus. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 41–47. Springer, Heidelberg (2004)
Eeg-Olofsson, M.: Probabilistic Tagging of a Corpus of Spoken English. University of Goteborg: Department of Computational Linguistics (1991)
Hickey, T.: Mean length of utterance and the acquisition of Irish. Journal of Child Language 18(3), 553–569 (1991)
MacWhinney, B.: The childes project: Tools for analyzing talk. Child Language Teaching and Therapy 8(2), 217–218 (1992)
MacWhinney, B.: CHAT manual (1996)
Mátyus, K., Orosz, G.: MONYEK: morfológiailag egyértelműsített óvodai nyelvi korpusz. Beszédkutatás (in press, 2014)
Mendes, A., Amaro, R., do Nascimento, M.F.B.: Morphological tagging of a spoken Portuguese corpus using available resources. In: Branco, A., Mendes, A., Ribeiro, R. (eds.) Language Technology for Portuguese: Shallow Processing Tools and Resources, pp. 47–62. Colibri, Lisboa (2004)
Moreno, A., Guirao, J.M.: Tagging a spontaneous speech corpus of Spanish. In: Proceedings of Recent Advances in Natural Language Processing (RANPL 2003), Borovets, Bulgaria, pp. 292–296 (2003)
Nivre, J., Grönqvist, L., Gustafsson, M., Lager, T., Sofkova, S.: Tagging spoken language using written language statistics. In: Proceedings of the 16th conference on Computational Linguistics, vol. 2, pp. 1078–1081. Association for Computational Linguistics (1996)
Orosz, G., Novák, A.: PurePos 2.0: a hybrid tool for morphological disambiguation. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2013), pp. 539–545. INCOMA Ltd., Shoumen (2013)
Panunzi, A., Picchi, E., Moneglia, M.: Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-ORAL-ROM Italian. In: 4th Language Resource and Evaluation Conference (LREC), pp. 563–566 (2004)
Parker, M.D., Brorson, K.: A comparative study between mean length of utterance in morphemes (MLUm) and mean length of utterance in words (MLUw). First Language 25(3), 365–376 (2005)
Prószéky, G.: Industrial applications of unification morphology. In: Proceedings of the Fourth Conference on Applied Natural Language Processing, p. 213. Association for Computational Linguistics, Morristown (1994)
Réger, Z.: Mothers’ speech in different social groups in Hungary. Children’s Language 7, 197–222 (1990)
Retherford, K.S.: Guide to analysis of language transcripts. Thinking Publications University (1993)
Saygın, A.P.: A Computational Analysis of Interaction Patterns in the Acquisition of Turkish. Research on Language and Computation 8(4), 239–253 (2010)
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Hearst, M., Ostendorf, M. (eds.) Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 173–180. Association for Computational Linguistics (2003)
Trón, V., Halácsy, P., Rebrus, P., Rung, A., Vajda, P., Simon, E.: Morphdb.hu: Hungarian lexical database and morphological grammar. In: Proceedings of the Fifth Conference on International Language Resources and Evaluation, pp. 1670–1673 (2006)
Wéber, K.: Rejtelmes kétféleség. – A kétféle igeragozás elkülönülés a magyar nyelvben. Ph.D. thesis, University of Pécs, Pécs, Hungary (2011)
Zsibrita, J., Vincze, V., Farkas, R.: Magyarlanc: A Toolkit for Morphological and Dependency Parsing of Hungarian. In: Proceedings of Recent Advances in Natural Language Provessing 2013, pp. 763–771 (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Orosz, G., Mátyus, K. (2014). An MLU Estimation Method for Hungarian Transcripts. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-10816-2_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)