An MLU Estimation Method for Hungarian Transcripts

Orosz, György; Mátyus, Kinga

doi:10.1007/978-3-319-10816-2_22

György Orosz^21,22 &
Kinga Mátyus²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1509 Accesses

Abstract

Mean length of utterance (MLU) is an important indicator for measuring complexity in child language. A generally employed method for calculating MLU is to use the CLAN toolkit, which includes modules that enable the measurement of utterance length in morphemes. However, these methods are based on rules which are only available for just a few languages not involving Hungarian. Therefore, in order to automatically analyze and measure Hungarian transcripts adequate methods need to be developed. In this paper we describe a new toolkit which is able to estimate MLU counts (in morphemes) while providing morphosyntactic tagging as well. Its components are based on existing resources; however, many of them were adapted to the language of the transcripts. The tool-chain performs the annotation task with a high precision and its MLU estimates are correlated with that of human experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bick, E., Mello, H., Panunzi, A., Raso, T.: The annotation of the C-ORAL-BRASIL oral through the implementation of the Palavras Parser. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), pp. 3382–3386. ELRA, Istanbul (2012)
Google Scholar
Bowerman, M.: Early syntactic development: A cross-linguistic study with special reference to Finnish. Cambridge University Press (1973)
Google Scholar
Brown, R.: A first language: The early stages. Harvard University Press (1973)
Google Scholar
Crystal, D.: Review of R. Brown ‘A first language’. Journal of Child Language 11, 289–307 (1974)
Google Scholar
Csendes, D., Csirik, J.A., Gyimóthy, T.: The Szeged Corpus: A POS Tagged and Syntactically Annotated Hungarian Natural Language Corpus. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 41–47. Springer, Heidelberg (2004)
Chapter Google Scholar
Eeg-Olofsson, M.: Probabilistic Tagging of a Corpus of Spoken English. University of Goteborg: Department of Computational Linguistics (1991)
Google Scholar
Hickey, T.: Mean length of utterance and the acquisition of Irish. Journal of Child Language 18(3), 553–569 (1991)
Article Google Scholar
MacWhinney, B.: The childes project: Tools for analyzing talk. Child Language Teaching and Therapy 8(2), 217–218 (1992)
Article Google Scholar
MacWhinney, B.: CHAT manual (1996)
Google Scholar
Mátyus, K., Orosz, G.: MONYEK: morfológiailag egyértelműsített óvodai nyelvi korpusz. Beszédkutatás (in press, 2014)
Google Scholar
Mendes, A., Amaro, R., do Nascimento, M.F.B.: Morphological tagging of a spoken Portuguese corpus using available resources. In: Branco, A., Mendes, A., Ribeiro, R. (eds.) Language Technology for Portuguese: Shallow Processing Tools and Resources, pp. 47–62. Colibri, Lisboa (2004)
Google Scholar
Moreno, A., Guirao, J.M.: Tagging a spontaneous speech corpus of Spanish. In: Proceedings of Recent Advances in Natural Language Processing (RANPL 2003), Borovets, Bulgaria, pp. 292–296 (2003)
Google Scholar
Nivre, J., Grönqvist, L., Gustafsson, M., Lager, T., Sofkova, S.: Tagging spoken language using written language statistics. In: Proceedings of the 16th conference on Computational Linguistics, vol. 2, pp. 1078–1081. Association for Computational Linguistics (1996)
Google Scholar
Orosz, G., Novák, A.: PurePos 2.0: a hybrid tool for morphological disambiguation. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2013), pp. 539–545. INCOMA Ltd., Shoumen (2013)
Google Scholar
Panunzi, A., Picchi, E., Moneglia, M.: Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-ORAL-ROM Italian. In: 4th Language Resource and Evaluation Conference (LREC), pp. 563–566 (2004)
Google Scholar
Parker, M.D., Brorson, K.: A comparative study between mean length of utterance in morphemes (MLUm) and mean length of utterance in words (MLUw). First Language 25(3), 365–376 (2005)
Article Google Scholar
Prószéky, G.: Industrial applications of unification morphology. In: Proceedings of the Fourth Conference on Applied Natural Language Processing, p. 213. Association for Computational Linguistics, Morristown (1994)
Chapter Google Scholar
Réger, Z.: Mothers’ speech in different social groups in Hungary. Children’s Language 7, 197–222 (1990)
Google Scholar
Retherford, K.S.: Guide to analysis of language transcripts. Thinking Publications University (1993)
Google Scholar
Saygın, A.P.: A Computational Analysis of Interaction Patterns in the Acquisition of Turkish. Research on Language and Computation 8(4), 239–253 (2010)
Article Google Scholar
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Hearst, M., Ostendorf, M. (eds.) Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 173–180. Association for Computational Linguistics (2003)
Google Scholar
Trón, V., Halácsy, P., Rebrus, P., Rung, A., Vajda, P., Simon, E.: Morphdb.hu: Hungarian lexical database and morphological grammar. In: Proceedings of the Fifth Conference on International Language Resources and Evaluation, pp. 1670–1673 (2006)
Google Scholar
Wéber, K.: Rejtelmes kétféleség. – A kétféle igeragozás elkülönülés a magyar nyelvben. Ph.D. thesis, University of Pécs, Pécs, Hungary (2011)
Google Scholar
Zsibrita, J., Vincze, V., Farkas, R.: Magyarlanc: A Toolkit for Morphological and Dependency Parsing of Hungarian. In: Proceedings of Recent Advances in Natural Language Provessing 2013, pp. 763–771 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, 50/a Práter street, 1083, Budapest, Hungary
György Orosz
MTA-PPKE Hungarian Language Technology Research Group, 50/a Práter street, 1083, Budapest, Hungary
György Orosz
MTA Research Institute for Linguistics, 33. Benczúr street, 1068, Budapest, Hungary
Kinga Mátyus

Authors

György Orosz
View author publications
You can also search for this author in PubMed Google Scholar
Kinga Mátyus
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Botanicá 6a, 60200, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, 602 00, Brno, Czech Republic
Aleš Horák , Ivan Kopeček & Karel Pala , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Orosz, G., Mátyus, K. (2014). An MLU Estimation Method for Hungarian Transcripts. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-10816-2_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics