Designing a Hungarian Multimodal Database – Speech Recording and Annotation

  • Kinga Papay
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6456)


The Hungarian spontaneous speech recording and annotation subproject is being carried out by our Computational Linguistics research group and my PhD work at the University of Debrecen and is a part of a comprehensive multimodal human-machine interaction development project and multimodal (audio and video) database collection. The efficiency of speech recognition systems can be increased by proper acoustic preprocessing and by investigation of the suprasegmental characteristics of spontaneous speech. The research aims to contribute to the exact knowledge of prosody through the examination of spontaneous speech, with special regard to syntactic embeddings, insertions, iterations, hesitations and restarts, various kinds of emotions and discourse markers regarding Hungarian, the lack of a prosodically labelled, representative spontaneous speech database makes the development more difficult. The spontaneous multimodal database is being recorded via guided formal and informal conversations. During the conversation, several points are to be discussed in order to provoke longer monologues, including those phenomena of spontaneous speech, which are to be examined within our research. Designing a continuous spontaneous speech recognition system that is speaker-independent and is able to contribute to our theoretical assumptions, requires the construction of a speech database for which we need to take several personnel and technical aspects into account. The visual channel also needs to be annotated, which will enable us to examine and implement multimodal features as well.


database planning spontaneous speech prosody research multimodality 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Beckman, M., Hirschberg, J., Pierrehumbert, J., Pitrelli, J., Price, P., Silverman, K., Ostendorf, M.: TOBI: A Standard Scheme for Labeling Prosody. In: Proceedings of the International Conference on Spoken Language, Banff, pp. 867–870 (1992),
  2. 2.
    Boersma, P., Weenink, D.: Praat: Doing Phonetics by Computer 5.1.43 (2010),
  3. 3.
    Burkhardt, F., Paeschke, A., et al.: A Database of German Emotional Speech. In: Proc. Of Interspeech 2005, pp. 1517–1520 (2005)Google Scholar
  4. 4.
    Chomsky, N., Halle, M.: The Sound Pattern of English. Harper and Row, New York (1968)Google Scholar
  5. 5.
    Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional Speech: Towards a New Generation of Databases. Speech Communication 40, 33–60 (2003)CrossRefzbMATHGoogle Scholar
  6. 6.
    Hirschberg, J.: Pragmatics and Intonation. In: Horn, L.R., Ward, G. (eds.) The Handbook of Pragmatics. Blackwell Publishing, Oxford (2007)Google Scholar
  7. 7.
    Hunyadi, L.: Hungarian Sentence Prosody and Universal Grammar. Peter Lang, Frankfurt am Main (2002)Google Scholar
  8. 8.
    Hunyadi, L.: Grouping, the Cognitive Basis of Recursion in Language. In: Kertsz, A. (ed.) Argumentum, vol. 2, pp. 67–114. Kossuth University Press, Debrecen (2006)Google Scholar
  9. 9.
    Hunyadi, L.: Experimental Evidence for Recursion in Prosody. In: Benjamins, J., Diken, T., ten Vago, R. (eds.) Approaches to Hungarian, vol. 11, pp. 119–141 (2009)Google Scholar
  10. 10.
    Keszler, B.: Die grammatischen und satzphonetischen Eigenschaften der Parenthesen. In: Szende, T. (ed.) Proceedings of the Speech Research 89 International Conference. Notes on Hungarian Phonetics, vol. 21, pp. 355–358. MTA Linguistics Research Institute, Budapest (1989)Google Scholar
  11. 11.
    Kramber, E., Swerts, M., Wilting, J.: Real vs. Acted Emotional Speech. In: Proceedings of the Interspeech, pp. 805–808 (2006)Google Scholar
  12. 12.
    Papay, K.: The Prosodic Phrase Structure of Spontaneous Speech - Modelling and Application in Speech Recognition. In: Tanacs, A., Szauter, D., Vincze, V. (eds.) Proc. of the 6th Hungarian Computational Linguistics Conference, pp. 373–375. Szeged University Press, Szeged (2009)Google Scholar
  13. 13.
    Papay, K.: Experimental Methods in Speech Technology Research. In: Reference Works for Studying Linguistics. Tinta Press, Budapest (in press 2010)Google Scholar
  14. 14.
    Rabiner, L.: Fundamentals of Speech Recognition. Prentice Hall, Englewood Hills (1993)Google Scholar
  15. 15.
    Selkirk, E.O.: Phonology and Syntax: The Relation between Sound and Structure. MIT Press, Cambridge (1984)Google Scholar
  16. 16.
    Seppanen, T., Toivanen, J., Vayrynen, E.: Automatic Discrimination of Emotion from Spoken Finnish. Language and Speech 47(4), 383–412 (2004)CrossRefGoogle Scholar
  17. 17.
    Szaszak, Gy., Vicsi, K.: Automatic Segmentation of Continuous Speech on Word Level Based on Suprasegmental Features. International Journal of Speech Technology 8(4), 363–370 (2005)CrossRefGoogle Scholar
  18. 18.
    Szaszak, Gy., Vicsi, K.: Using Prosody for the Improvement of ASR: Sentence Modality Recognition. In: Interspeech 2008, Brisbane. ISCA Archive (2008),
  19. 19.
    Szaszak, Gy.: The Role of Suprasegmental Features in Automatic Speech Recognition. PhD thesis, BME TMIT, Budapest (2009)Google Scholar
  20. 20.
    Szaszak, Gy., Vicsi, K.: Using Prosody to Improve Automatic Speech Recognition. Speech Communication 52, 413–426 (2010)CrossRefGoogle Scholar
  21. 21.
    Sztaho, D.: Speech Emotion Perception by Human and Machine. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds.) HH and HM Interaction. LNCS (LNAI), vol. 5042, pp. 213–224. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  22. 22.
    Sztaho, D., Vicsi, K.: Problems of Automatic Emotion Recognitions in Spontaneous Speech; an Example for the Recognition in a Dispatcher Center. In: Proceedings of COST 2102 International Training School Caserta, Italy. Springer, Heidelberg (in press, 2010)Google Scholar
  23. 23.
    Varga, L.: The Unit of the Hungarian Intonation. In: Szathmari, I. (ed.) Annales Universitatis Scientiarum Budapestinensis de Rolando Etvs nominatae. Sectio Linguistica tomus, vol. 24, pp. 5–13. ELTE University Press, Budapest (2001)Google Scholar
  24. 24.
    Varga, L.: Intonation and Stress. Evidence from Hungarian. Palgrave Macmillan, Houndmills (2002)CrossRefGoogle Scholar
  25. 25.
    Young, S., et al.: The HTK Book (for version 3.3). Cambridge University, Cambridge (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Kinga Papay
    • 1
  1. 1.University of DebrecenDebrecenHungary

Personalised recommendations