Analysis of Verbal and Nonverbal Acoustic Signals with the Dresden UASR System

Hoffmann, Rüdiger; Eichner, Matthias; Wolff, Matthias

doi:10.1007/978-3-540-76442-7_18

Rüdiger Hoffmann¹,
Matthias Eichner¹ &
Matthias Wolff¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4775))

2469 Accesses
11 Citations

Abstract

During the last few years, a framework for the development of algorithms for speech analysis and synthesis was implemented. The algorithms are connected to common databases on the different levels of a hierarchical structure. This framework which is called UASR (Unified Approach for Speech Synthesis and Recognition) and some related experiments and applications are described. Special focus is directed to the suitability of the system for processing nonverbal signals. This part is related to the analysis methods which are addressed in the COST 2102 initiative now. A potential application field in interaction research is discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hoffmann, R.: Speech synthesis on the way to embedded systems. Keynote lecture, SPECOM 2006, XI. In: International Conference Speech and Computer, St. Petersburg, Proceedings June 25-29, 2006, pp. 17–26 (2006)
Google Scholar
Eichner, M., Wolff, M., Hoffmann, R.: A unified approach for speech synthesis and speech recognition using stochastic Markov graphs. In: Proc. 6th Conf. on Spoken Language Processing (ICSLP), Beijing, vol. I, pp. 701–704 (October 16-20, 2000)
Google Scholar
Wolfertstetter, F., Ruske, G.: Structured Markov models for speech recognition. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Detroit, pp. 544–547 (May 7-12, 1995)
Google Scholar
Eichner, M.: Sprachsynthese und Spracherkennung mit gemeinsamen Datenbasen: Akustische Analyse und Modellierung. PhD thesis, TU Dresden, Dresden: TUDpress 2007 Studientexte zur Sprachkommunikation, vol. 43 (2006)
Google Scholar
Westendorf, C.-M.: Learning pronunciation dictionary from speech data. In: Proc. Int. Conf. on Spoken Language Processing (ICSLP), Philadelphia, pp. 1045–1048 (October 3-6, 1996)
Google Scholar
Eichner, M., Wolff, M.: Data-driven generation of pronunciation dictionaries in the German Verbmobil project – Discussion of experimental results. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, June 5-9, 2000, vol. III, pp. 1687–1690. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
Eichner, M., Wolff, M., Hoffmann, R.: Data driven generation of pronunciation dictionaries. In: Wahlster, W. (ed.) Verbmobil: Foundations of Speech-to-Speech Translation, pp. 95–105. Springer, Berlin (2000)
Google Scholar
Wolff, M., Eichner, M., Hoffmann, R.: Measuring the quality of pronunciation dictionaries. In: PMLA. Proc. ISCA Workshop on Pronunciation Modeling and Lexicon Adaptation for Spoken Language, Estes Park, CO, pp. 117–122 (September 14-15, 2002)
Google Scholar
Wolff, M.: Automatisches Lernen von Aussprachewörterbüchern. PhD thesis, TU Dresden, Dresden: w.e.b. Universitätsverlag 2004 (Studientexte zur Sprachkommunikation, vol. 32) (2004)
Google Scholar
Flach, G., Holzapfel, M., Just, C., Wachtler, A., Wolff, M.: Automatic learning of numeral grammars for multi-lingual speech synthesizers. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, June 5-9, 2000, vol. III, pp. 1291–1294. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
Eichner, M., Göcks, M., Hoffmann, R., Kühne, M., Wolff, M.: Speech-enabled services in a web-based e-learning environment. Advanced Technology for Learning 1, 2, 91–98 (2004)
Google Scholar
Falaschi, A., Giustiniani, M., Verola, M.: A hidden Markov model approach to speech synthesis. In: Proc. European Conf. on Speech Communication and Technology (EUROSPEECH), Paris, pp. 187–190 (1989)
Google Scholar
Tokuda, K., et al.: Speech parameter generation algorithms for HMM-based speech synthesis. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, June 5-9, 2000, vol. III, pp. 1315–1318. IEEE Computer Society Press, Los Alamitos (2000)
Google Scholar
http://hts.sp.nitech.ac.jp/?Publications
Imai, S., Sumita, K., Furuichi, C.: Mel log spectrum approximation (MLSA) filter for speech synthesis. Trans. IECE J66-A, 122–129 (1983)
Google Scholar
Eichner, M., Wolff, M., Ohnewald, S., Hoffmann, R.: Speech synthesis using stochastic Markov graphs. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Salt Lake City, pp. 829–832 (May 7-11, 2001)
Google Scholar
Strecha, G., Eichner, M.: Low resource TTS synthesis based on cepstral filter with phase randomized excitation. In: Proc. XI. International Conference Speech and Computer (SPECOM), St. Petersburg, pp. 284–287 (June 25-29, 2006)
Google Scholar
Eichner, M., Wolff, M., Hoffmann, R.: Voice characteristics conversion for TTS using reverse VTLN. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Montreal, May 17-21, 2004, vol. I, pp. 17–20. IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar
Bell, A., Gregory, M.L., Brenier, J.M., Jurafsky, D., Ikeno, A., Girand, C.: Which predictability measures affect content word Duration. In: PMLA. Proc. ISCA Workshop on Pronunciation Modeling and Lexicon Adaptation for Spoken Language, Estes Park, CO, pp. 1–5 (September 14-15, 2002)
Google Scholar
Jurafsky, D., Bell, A., Gregory, M., Raymond, W.D.: The effect of language model probability on pronunciation reduction. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Salt Lake City, pp. 801–804 (May 7-11, 2001)
Google Scholar
Werner, S., Eichner, M., Wolff, M., Hoffmann, R.: Towards spontaneous speech synthesis - Utilizing language model information in TTS. IEEE Trans. on Speech and Audio Processing 12(4), 436–445 (2004)
Article Google Scholar
Werner, S., Wolff, M., Hoffmann, R.: Pronunciation variant selection for spontaneous speech synthesis - Listening effort as a quality parameter. In: Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Toulouse, May 14-19, 2006, vol. I, pp. 857–860. IEEE Computer Society Press, Los Alamitos (2006)
Google Scholar
Marx, G.: Entwicklung einer Methode zur numerischen Lautanalyse. PhD thesis, Univ. Halle-Wittenberg. Landbauforschung Völkenrode, Sonderheft 149 (1994)
Google Scholar
TU Dresden, Institut für Akustik und Sprachkommunikation, Jahresbericht, p. 34 (1999)
Google Scholar
Hoffmann, R., Richter, T.: Anwendung von Spracherkennern für die Klassifikation von Schnarchlauten. DAGA, Aachen, 766–767 (March 18-20, 2003)
Google Scholar
Tschöpe, C., Hirschfeld, D., Hoffmann, R.: Klassifikation technischer Signale für die Geräuschdiagnose von Maschinen und Bauteilen. 4. In: Tschöke, H., Henze, W. (eds.) Symposium Motor- und Aggregateakustik, Magdeburg, June 15-16, 2005. Motor- und Aggregateakustik II. Renningen: expert Verlag (2005)
Google Scholar
Tschöpe, C., Hentschel, D., Wolff, M., Eichner, M., Hoffmann, R.: Classification of non-speech acoustic signals using structure models. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Montreal, May 17-21, 2004, vol. V, pp. 653–656. IEEE Computer Society Press, Los Alamitos (2004)
Google Scholar
Kordon, U., Wolff, M., Hussein, H.: Auswertung von Korotkoff-Geräuschsignalen mit Verfahren der Mustererkennung für die Blutdruckmessung am aktiven Menschen. DAGA, Braunschweig, 719–720 (March 20-23, 2006)
Google Scholar
Wolff, M., Kordon, U., Hussein, H., Eichner, M., Hoffmann, R., Tschöpe, C.: Auscultatory blood pressure measurement using HMMs. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Honolulu, April 16-20, 2007, pp. 16–20. IEEE Computer Society Press, Los Alamitos (2007)
Google Scholar
Eichner, M., Wolff, M., Hoffmann, R.: Instrument classification using HMMs. In: ISMIR. Proc. 7th International Conference on Music Information Retrieval, Victoria, pp. 349–350 (October 8-12, 2006)
Google Scholar
Isačenko, A.V., Schädlich, H.J.: Untersuchungen über die deutsche Satzintonation. Studia Grammatica. Akademie-Verlag, Berlin (1964)
Google Scholar
Isačenko, A.V., Schädlich, H.J.: A model of standard German intonation. The Hague Paris, Mouton (Janua Linguarum, Series Practica, 113) (1970)
Google Scholar
Mehnert, D.: Grundfrequenzanalyse und -synthese der stimmhaften Anregungsfunktion. PhD thesis, TU Dresden (1975)
Google Scholar
Mehnert, D.: Analyse und Synthese suprasegmentaler Intonationsstrukturen des Deutschen. Habil. thesis, TU Dresden (1985)
Google Scholar
Mixdorff, H., Fujisaki, H.: Analysis of voice fundamental frequency contours of German utterances using a quantitative model. In: ICSLP. Proc. Int. Conference on Spoken Language Processing, Yokohama, (September 18-22, 1994)
Google Scholar
Mixdorff, H.: Intonation patterns of German - quantitative analysis and synthesis of F0 contours. PhD thesis TU Dresden (1998)
Google Scholar
Jokisch, O., Kordon, U.: Generierung von Grundfrequenzverläufen in einem Sprachsynthesesystem mit neuronalen Netzen. 6. Konf. Elektronische Sprachsignalverarbeitung, Wolfenbüttel, pp. 113–119 (September 4-6, 1995)
Google Scholar
Jokisch, O., Hirschfeld, D., Eichner, M., Hoffmann, R.: Multi-level rhythm control for speech synthesis using hybrid data driven and rule-based approaches. In: ICSLP. Proc. Int. Conf. on Spoken Language Processing, Sydney, pp. 607–610 (November 30-December 4, 1998)
Google Scholar
Jokisch, O., Mixdorff, H., Kruschke, H., Kordon, U.: Learning the parameters of quantitative prosody models. In: ICSLP. Proc. Int. Conf. on Spoken Language Processing, Beijing, pp. 645–648 (October 16-20, 2000)
Google Scholar
Mixdorff, H., Jokisch, O.: Evaluating the quality of an integrated model of German prosody. Intern. Journal of Speech Technology 6(1), 45–55 (2003)
Article MATH Google Scholar
Kruschke, H., Koch, A.: Parameter extraction of a quantitative intonation model with wavelet analysis and evolutionary optimization. In: ICASSP. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Hongkong, April 6-10, 2003, vol. I, pp. 524–527. IEEE Computer Society Press, Los Alamitos (2003)
Google Scholar
Jokisch, O., Hofmann, M.: Evolutionary optimization of an adaptive prosody model. In: ICSLP. Proc. Int. Conf. on Spoken Language Processing, Jeju, Korea, pp. 797–800 (October 4-8, 2004)
Google Scholar
Kruschke, H.: Simulation of speaking styles with adapted prosody. In: Matoušek, V., Mautner, P., Mouček, R., Tauser, K. (eds.) TSD 2001. LNCS (LNAI), vol. 2166, pp. 278–284. Springer, Heidelberg (2001)
Google Scholar
Jokisch, O., Kruschke, H., Hoffmann, R.: Prosodic reading style simulation for text-to-speech synthesis. In: Tao, J., Tan, T., Picard, R.W. (eds.) ACII 2005. LNCS, vol. 3784, pp. 426–434. Springer, Heidelberg (2005)
Chapter Google Scholar
Engel, T.: Robuste Markierung von Grundfrequenzperioden. Diplomarbeit, TU Dresden (2003)
Google Scholar
Raidt, S.: Cross-language comparison of two approaches to modelling prosody. Studienarbeit, TU Dresden/ICP Grenoble (2002)
Google Scholar
Jokisch, O., Ding, H., Kruschke, H.: Towards a multilingual prosody model for text-to-speech. In: ICASSP. Proc. IEEE Int. Conf. in Acoustics, Speech, and Signal Processing, Orlando, pp. 421–424 (May 13-17, 2002)
Google Scholar
Jokisch, O., Kühne, M.: An investigation of intensity patterns for German. In: EUROSPEECH. Proc. 8th European Conf. on Speech Communication and Technology, Geneva, pp. 165–168 (September 1-4, 2003)
Google Scholar
Hofmann, M., Jokisch, O.: Optimization of MFNs for signal-based phrase break prediction. In: Proc. 3rd Intern. Conference on Speech Prosody, Dresden, (May 2-5, 2006)
Google Scholar
Kühne, M., Wolff, M., Eichner, M., Hoffmann, R.: Voice activation using prosodic features. In: ICSLP. Proc. Int. Conf. on Spoken Language Processing, Jeju, Korea, pp. 3001–3004 (October 4-8, 2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Dresden, Institut für Akustik und Sprachkommunikation, 01062 Dresden, Germany
Rüdiger Hoffmann, Matthias Eichner & Matthias Wolff

Authors

Rüdiger Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Eichner
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Wolff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Anna Esposito Marcos Faundez-Zanuy Eric Keller Maria Marinaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hoffmann, R., Eichner, M., Wolff, M. (2007). Analysis of Verbal and Nonverbal Acoustic Signals with the Dresden UASR System. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds) Verbal and Nonverbal Communication Behaviours. Lecture Notes in Computer Science(), vol 4775. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76442-7_18

Download citation

DOI: https://doi.org/10.1007/978-3-540-76442-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76441-0
Online ISBN: 978-3-540-76442-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics