Skip to main content

An MLU Estimation Method for Hungarian Transcripts

  • Conference paper
Text, Speech and Dialogue (TSD 2014)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

  • 1509 Accesses

Abstract

Mean length of utterance (MLU) is an important indicator for measuring complexity in child language. A generally employed method for calculating MLU is to use the CLAN toolkit, which includes modules that enable the measurement of utterance length in morphemes. However, these methods are based on rules which are only available for just a few languages not involving Hungarian. Therefore, in order to automatically analyze and measure Hungarian transcripts adequate methods need to be developed. In this paper we describe a new toolkit which is able to estimate MLU counts (in morphemes) while providing morphosyntactic tagging as well. Its components are based on existing resources; however, many of them were adapted to the language of the transcripts. The tool-chain performs the annotation task with a high precision and its MLU estimates are correlated with that of human experts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bick, E., Mello, H., Panunzi, A., Raso, T.: The annotation of the C-ORAL-BRASIL oral through the implementation of the Palavras Parser. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), pp. 3382–3386. ELRA, Istanbul (2012)

    Google Scholar 

  2. Bowerman, M.: Early syntactic development: A cross-linguistic study with special reference to Finnish. Cambridge University Press (1973)

    Google Scholar 

  3. Brown, R.: A first language: The early stages. Harvard University Press (1973)

    Google Scholar 

  4. Crystal, D.: Review of R. Brown ‘A first language’. Journal of Child Language 11, 289–307 (1974)

    Google Scholar 

  5. Csendes, D., Csirik, J.A., Gyimóthy, T.: The Szeged Corpus: A POS Tagged and Syntactically Annotated Hungarian Natural Language Corpus. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 41–47. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  6. Eeg-Olofsson, M.: Probabilistic Tagging of a Corpus of Spoken English. University of Goteborg: Department of Computational Linguistics (1991)

    Google Scholar 

  7. Hickey, T.: Mean length of utterance and the acquisition of Irish. Journal of Child Language 18(3), 553–569 (1991)

    Article  Google Scholar 

  8. MacWhinney, B.: The childes project: Tools for analyzing talk. Child Language Teaching and Therapy 8(2), 217–218 (1992)

    Article  Google Scholar 

  9. MacWhinney, B.: CHAT manual (1996)

    Google Scholar 

  10. Mátyus, K., Orosz, G.: MONYEK: morfológiailag egyértelműsített óvodai nyelvi korpusz. Beszédkutatás (in press, 2014)

    Google Scholar 

  11. Mendes, A., Amaro, R., do Nascimento, M.F.B.: Morphological tagging of a spoken Portuguese corpus using available resources. In: Branco, A., Mendes, A., Ribeiro, R. (eds.) Language Technology for Portuguese: Shallow Processing Tools and Resources, pp. 47–62. Colibri, Lisboa (2004)

    Google Scholar 

  12. Moreno, A., Guirao, J.M.: Tagging a spontaneous speech corpus of Spanish. In: Proceedings of Recent Advances in Natural Language Processing (RANPL 2003), Borovets, Bulgaria, pp. 292–296 (2003)

    Google Scholar 

  13. Nivre, J., Grönqvist, L., Gustafsson, M., Lager, T., Sofkova, S.: Tagging spoken language using written language statistics. In: Proceedings of the 16th conference on Computational Linguistics, vol. 2, pp. 1078–1081. Association for Computational Linguistics (1996)

    Google Scholar 

  14. Orosz, G., Novák, A.: PurePos 2.0: a hybrid tool for morphological disambiguation. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2013), pp. 539–545. INCOMA Ltd., Shoumen (2013)

    Google Scholar 

  15. Panunzi, A., Picchi, E., Moneglia, M.: Using PiTagger for Lemmatization and PoS Tagging of a Spontaneous Speech Corpus: C-ORAL-ROM Italian. In: 4th Language Resource and Evaluation Conference (LREC), pp. 563–566 (2004)

    Google Scholar 

  16. Parker, M.D., Brorson, K.: A comparative study between mean length of utterance in morphemes (MLUm) and mean length of utterance in words (MLUw). First Language 25(3), 365–376 (2005)

    Article  Google Scholar 

  17. Prószéky, G.: Industrial applications of unification morphology. In: Proceedings of the Fourth Conference on Applied Natural Language Processing, p. 213. Association for Computational Linguistics, Morristown (1994)

    Chapter  Google Scholar 

  18. Réger, Z.: Mothers’ speech in different social groups in Hungary. Children’s Language 7, 197–222 (1990)

    Google Scholar 

  19. Retherford, K.S.: Guide to analysis of language transcripts. Thinking Publications University (1993)

    Google Scholar 

  20. Saygın, A.P.: A Computational Analysis of Interaction Patterns in the Acquisition of Turkish. Research on Language and Computation 8(4), 239–253 (2010)

    Article  Google Scholar 

  21. Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Hearst, M., Ostendorf, M. (eds.) Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 173–180. Association for Computational Linguistics (2003)

    Google Scholar 

  22. Trón, V., Halácsy, P., Rebrus, P., Rung, A., Vajda, P., Simon, E.: Morphdb.hu: Hungarian lexical database and morphological grammar. In: Proceedings of the Fifth Conference on International Language Resources and Evaluation, pp. 1670–1673 (2006)

    Google Scholar 

  23. Wéber, K.: Rejtelmes kétféleség. – A kétféle igeragozás elkülönülés a magyar nyelvben. Ph.D. thesis, University of Pécs, Pécs, Hungary (2011)

    Google Scholar 

  24. Zsibrita, J., Vincze, V., Farkas, R.: Magyarlanc: A Toolkit for Morphological and Dependency Parsing of Hungarian. In: Proceedings of Recent Advances in Natural Language Provessing 2013, pp. 763–771 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Orosz, G., Mátyus, K. (2014). An MLU Estimation Method for Hungarian Transcripts. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10816-2_22

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10815-5

  • Online ISBN: 978-3-319-10816-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics