Skip to main content

Analysis of Verbal and Nonverbal Acoustic Signals with the Dresden UASR System

  • Conference paper
Verbal and Nonverbal Communication Behaviours

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4775))

Abstract

During the last few years, a framework for the development of algorithms for speech analysis and synthesis was implemented. The algorithms are connected to common databases on the different levels of a hierarchical structure. This framework which is called UASR (Unified Approach for Speech Synthesis and Recognition) and some related experiments and applications are described. Special focus is directed to the suitability of the system for processing nonverbal signals. This part is related to the analysis methods which are addressed in the COST 2102 initiative now. A potential application field in interaction research is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hoffmann, R.: Speech synthesis on the way to embedded systems. Keynote lecture, SPECOM 2006, XI. In: International Conference Speech and Computer, St. Petersburg, Proceedings June 25-29, 2006, pp. 17–26 (2006)

    Google Scholar 

  2. Eichner, M., Wolff, M., Hoffmann, R.: A unified approach for speech synthesis and speech recognition using stochastic Markov graphs. In: Proc. 6th Conf. on Spoken Language Processing (ICSLP), Beijing, vol. I, pp. 701–704 (October 16-20, 2000)

    Google Scholar 

  3. Wolfertstetter, F., Ruske, G.: Structured Markov models for speech recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Detroit, pp. 544–547 (May 7-12, 1995)

    Google Scholar 

  4. Eichner, M.: Sprachsynthese und Spracherkennung mit gemeinsamen Datenbasen: Akustische Analyse und Modellierung. PhD thesis, TU Dresden, Dresden: TUDpress 2007 Studientexte zur Sprachkommunikation, vol. 43 (2006)

    Google Scholar 

  5. Westendorf, C.-M.: Learning pronunciation dictionary from speech data. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP), Philadelphia, pp. 1045–1048 (October 3-6, 1996)

    Google Scholar 

  6. Eichner, M., Wolff, M.: Data-driven generation of pronunciation dictionaries in the German Verbmobil project – Discussion of experimental results. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, June 5-9, 2000, vol. III, pp. 1687–1690. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  7. Eichner, M., Wolff, M., Hoffmann, R.: Data driven generation of pronunciation dictionaries. In: Wahlster, W. (ed.) Verbmobil: Foundations of Speech-to-Speech Translation, pp. 95–105. Springer, Berlin (2000)

    Google Scholar 

  8. Wolff, M., Eichner, M., Hoffmann, R.: Measuring the quality of pronunciation dictionaries. In: PMLA. Proc. ISCA Workshop on Pronunciation Modeling and Lexicon Adaptation for Spoken Language, Estes Park, CO, pp. 117–122 (September 14-15, 2002)

    Google Scholar 

  9. Wolff, M.: Automatisches Lernen von Aussprachewörterbüchern. PhD thesis, TU Dresden, Dresden: w.e.b. Universitätsverlag 2004 (Studientexte zur Sprachkommunikation, vol. 32) (2004)

    Google Scholar 

  10. Flach, G., Holzapfel, M., Just, C., Wachtler, A., Wolff, M.: Automatic learning of numeral grammars for multi-lingual speech synthesizers. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, June 5-9, 2000, vol. III, pp. 1291–1294. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  11. Eichner, M., Göcks, M., Hoffmann, R., Kühne, M., Wolff, M.: Speech-enabled services in a web-based e-learning environment. Advanced Technology for Learning 1, 2, 91–98 (2004)

    Google Scholar 

  12. Falaschi, A., Giustiniani, M., Verola, M.: A hidden Markov model approach to speech synthesis. In: Proc. European Conf. on Speech Communication and Technology (EUROSPEECH), Paris, pp. 187–190 (1989)

    Google Scholar 

  13. Tokuda, K., et al.: Speech parameter generation algorithms for HMM-based speech synthesis. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, June 5-9, 2000, vol. III, pp. 1315–1318. IEEE Computer Society Press, Los Alamitos (2000)

    Google Scholar 

  14. http://hts.sp.nitech.ac.jp/?Publications

  15. Imai, S., Sumita, K., Furuichi, C.: Mel log spectrum approximation (MLSA) filter for speech synthesis. Trans. IECE J66-A, 122–129 (1983)

    Google Scholar 

  16. Eichner, M., Wolff, M., Ohnewald, S., Hoffmann, R.: Speech synthesis using stochastic Markov graphs. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Salt Lake City, pp. 829–832 (May 7-11, 2001)

    Google Scholar 

  17. Strecha, G., Eichner, M.: Low resource TTS synthesis based on cepstral filter with phase randomized excitation. In: Proc. XI. International Conference Speech and Computer (SPECOM), St. Petersburg, pp. 284–287 (June 25-29, 2006)

    Google Scholar 

  18. Eichner, M., Wolff, M., Hoffmann, R.: Voice characteristics conversion for TTS using reverse VTLN. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Montreal, May 17-21, 2004, vol. I, pp. 17–20. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  19. Bell, A., Gregory, M.L., Brenier, J.M., Jurafsky, D., Ikeno, A., Girand, C.: Which predictability measures affect content word Duration. In: PMLA. Proc. ISCA Workshop on Pronunciation Modeling and Lexicon Adaptation for Spoken Language, Estes Park, CO, pp. 1–5 (September 14-15, 2002)

    Google Scholar 

  20. Jurafsky, D., Bell, A., Gregory, M., Raymond, W.D.: The effect of language model probability on pronunciation reduction. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Salt Lake City, pp. 801–804 (May 7-11, 2001)

    Google Scholar 

  21. Werner, S., Eichner, M., Wolff, M., Hoffmann, R.: Towards spontaneous speech synthesis - Utilizing language model information in TTS. IEEE Trans. on Speech and Audio Processing 12(4), 436–445 (2004)

    Article  Google Scholar 

  22. Werner, S., Wolff, M., Hoffmann, R.: Pronunciation variant selection for spontaneous speech synthesis - Listening effort as a quality parameter. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Toulouse, May 14-19, 2006, vol. I, pp. 857–860. IEEE Computer Society Press, Los Alamitos (2006)

    Google Scholar 

  23. Marx, G.: Entwicklung einer Methode zur numerischen Lautanalyse. PhD thesis, Univ. Halle-Wittenberg. Landbauforschung Völkenrode, Sonderheft 149 (1994)

    Google Scholar 

  24. TU Dresden, Institut für Akustik und Sprachkommunikation, Jahresbericht, p. 34 (1999)

    Google Scholar 

  25. Hoffmann, R., Richter, T.: Anwendung von Spracherkennern für die Klassifikation von Schnarchlauten. DAGA, Aachen, 766–767 (March 18-20, 2003)

    Google Scholar 

  26. Tschöpe, C., Hirschfeld, D., Hoffmann, R.: Klassifikation technischer Signale für die Geräuschdiagnose von Maschinen und Bauteilen. 4. In: Tschöke, H., Henze, W. (eds.) Symposium Motor- und Aggregateakustik, Magdeburg, June 15-16, 2005. Motor- und Aggregateakustik II. Renningen: expert Verlag (2005)

    Google Scholar 

  27. Tschöpe, C., Hentschel, D., Wolff, M., Eichner, M., Hoffmann, R.: Classification of non-speech acoustic signals using structure models. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Montreal, May 17-21, 2004, vol. V, pp. 653–656. IEEE Computer Society Press, Los Alamitos (2004)

    Google Scholar 

  28. Kordon, U., Wolff, M., Hussein, H.: Auswertung von Korotkoff-Geräuschsignalen mit Verfahren der Mustererkennung für die Blutdruckmessung am aktiven Menschen. DAGA, Braunschweig, 719–720 (March 20-23, 2006)

    Google Scholar 

  29. Wolff, M., Kordon, U., Hussein, H., Eichner, M., Hoffmann, R., Tschöpe, C.: Auscultatory blood pressure measurement using HMMs. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Honolulu, April 16-20, 2007, pp. 16–20. IEEE Computer Society Press, Los Alamitos (2007)

    Google Scholar 

  30. Eichner, M., Wolff, M., Hoffmann, R.: Instrument classification using HMMs. In: ISMIR. Proc. 7th International Conference on Music Information Retrieval, Victoria, pp. 349–350 (October 8-12, 2006)

    Google Scholar 

  31. Isačenko, A.V., Schädlich, H.J.: Untersuchungen über die deutsche Satzintonation. Studia Grammatica. Akademie-Verlag, Berlin (1964)

    Google Scholar 

  32. Isačenko, A.V., Schädlich, H.J.: A model of standard German intonation. The Hague Paris, Mouton (Janua Linguarum, Series Practica, 113) (1970)

    Google Scholar 

  33. Mehnert, D.: Grundfrequenzanalyse und -synthese der stimmhaften Anregungsfunktion. PhD thesis, TU Dresden (1975)

    Google Scholar 

  34. Mehnert, D.: Analyse und Synthese suprasegmentaler Intonationsstrukturen des Deutschen. Habil. thesis, TU Dresden (1985)

    Google Scholar 

  35. Mixdorff, H., Fujisaki, H.: Analysis of voice fundamental frequency contours of German utterances using a quantitative model. In: ICSLP. Proc. Int. Conference on Spoken Language Processing, Yokohama, (September 18-22, 1994)

    Google Scholar 

  36. Mixdorff, H.: Intonation patterns of German - quantitative analysis and synthesis of F0 contours. PhD thesis TU Dresden (1998)

    Google Scholar 

  37. Jokisch, O., Kordon, U.: Generierung von Grundfrequenzverläufen in einem Sprachsynthesesystem mit neuronalen Netzen. 6. Konf. Elektronische Sprachsignalverarbeitung, Wolfenbüttel, pp. 113–119 (September 4-6, 1995)

    Google Scholar 

  38. Jokisch, O., Hirschfeld, D., Eichner, M., Hoffmann, R.: Multi-level rhythm control for speech synthesis using hybrid data driven and rule-based approaches. In: ICSLP. Proc. Int. Conf. on Spoken Language Processing, Sydney, pp. 607–610 (November 30-December 4, 1998)

    Google Scholar 

  39. Jokisch, O., Mixdorff, H., Kruschke, H., Kordon, U.: Learning the parameters of quantitative prosody models. In: ICSLP. Proc. Int. Conf. on Spoken Language Processing, Beijing, pp. 645–648 (October 16-20, 2000)

    Google Scholar 

  40. Mixdorff, H., Jokisch, O.: Evaluating the quality of an integrated model of German prosody. Intern. Journal of Speech Technology 6(1), 45–55 (2003)

    Article  MATH  Google Scholar 

  41. Kruschke, H., Koch, A.: Parameter extraction of a quantitative intonation model with wavelet analysis and evolutionary optimization. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Hongkong, April 6-10, 2003, vol. I, pp. 524–527. IEEE Computer Society Press, Los Alamitos (2003)

    Google Scholar 

  42. Jokisch, O., Hofmann, M.: Evolutionary optimization of an adaptive prosody model. In: ICSLP. Proc. Int. Conf. on Spoken Language Processing, Jeju, Korea, pp. 797–800 (October 4-8, 2004)

    Google Scholar 

  43. Kruschke, H.: Simulation of speaking styles with adapted prosody. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 278–284. Springer, Heidelberg (2001)

    Google Scholar 

  44. Jokisch, O., Kruschke, H., Hoffmann, R.: Prosodic reading style simulation for text-to-speech synthesis. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 426–434. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  45. Engel, T.: Robuste Markierung von Grundfrequenzperioden. Diplomarbeit, TU Dresden (2003)

    Google Scholar 

  46. Raidt, S.: Cross-language comparison of two approaches to modelling prosody. Studienarbeit, TU Dresden/ICP Grenoble (2002)

    Google Scholar 

  47. Jokisch, O., Ding, H., Kruschke, H.: Towards a multilingual prosody model for text-to-speech. In: ICASSP. Proc. IEEE Int. Conf. in Acoustics, Speech, and Signal Processing, Orlando, pp. 421–424 (May 13-17, 2002)

    Google Scholar 

  48. Jokisch, O., Kühne, M.: An investigation of intensity patterns for German. In: EUROSPEECH. Proc. 8th European Conf. on Speech Communication and Technology, Geneva, pp. 165–168 (September 1-4, 2003)

    Google Scholar 

  49. Hofmann, M., Jokisch, O.: Optimization of MFNs for signal-based phrase break prediction. In: Proc. 3rd Intern. Conference on Speech Prosody, Dresden, (May 2-5, 2006)

    Google Scholar 

  50. Kühne, M., Wolff, M., Eichner, M., Hoffmann, R.: Voice activation using prosodic features. In: ICSLP. Proc. Int. Conf. on Spoken Language Processing, Jeju, Korea, pp. 3001–3004 (October 4-8, 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Anna Esposito Marcos Faundez-Zanuy Eric Keller Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hoffmann, R., Eichner, M., Wolff, M. (2007). Analysis of Verbal and Nonverbal Acoustic Signals with the Dresden UASR System. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds) Verbal and Nonverbal Communication Behaviours. Lecture Notes in Computer Science(), vol 4775. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76442-7_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76442-7_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76441-0

  • Online ISBN: 978-3-540-76442-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics