Skip to main content

The Organization of a Neurocomputational Control Model for Articulatory Speech Synthesis

  • Conference paper
Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5042))

Abstract

The organization of a computational control model of articulatory speech synthesis is outlined in this paper. The model is based on general principles of neurophysiology and cognitive psychology. Thus it is based on such neural control circuits, neural maps and mappings as are hypothesized to exist in the human brain, and the model is based on learning or training mechanisms similar to those occurring during the human process of speech acquisition. The task of the control module is to generate articulatory data for controlling an articulatory-acoustic speech synthesizer. Thus a com plete “BIONIC” (i.e. BIOlogically motivated and techNICally realized) speech syn the sizer is described, capable of generating linguistic, sensory, and motor neural representations of sounds, syllables, and words, capable of generating articu latory speech movements from neuromuscular activation, and subse quently capable of generating acoustic speech signals by controlling an articu latory-acoustic vocal tract model. The module developed thus far is capable of producing single sounds (vowels and consonants), simple CV- and VC-syllables, and first sample words. In addition, processes of human-human interaction occurring during speech acquisition (mother-child or carer-child interactions) are briefly discussed in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Badin, P., Bailly, G., Revéret, L., Baciu, M., Segebarth, C., Savariaux, C.: Three-dimensional articulatory modeling of tongue, lips and face, based on MRI and video images. Journal of Phonetics 30, 533–553 (2002)

    Article  Google Scholar 

  • Beautemps, D., Badin, P., Bailly, G.: Linear degrees of freedom in speech production: Analysis of cineradio- and labio-film data and articulatory-acoustic modeling. Journal of the Acoustical Society of America 109, 2165–2180 (2001)

    Article  Google Scholar 

  • Birkholz, P., Jackèl, D., Kröger, B.J.: Construction and control of a three-dimensional vocal tract model. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006), Toulouse, France, pp. 873–876 (2006)

    Google Scholar 

  • Birkholz, P., Jackèl, D., Kröger, B.J.: Simulation of losses due to turbulence in the time-varying vocal system. IEEE Transactions on Audio, Speech, and Language Processing 15, 1218–1225 (2007a)

    Article  Google Scholar 

  • Birkholz, P., Steiner, I., Breuer, S.: Control Concepts for Articulatory Speech Synthesis. In: Proceedings of the 6th ISCA Workshop on Speech Synthesis, Bonn, Germany, pp. 5–10 (2007b)

    Google Scholar 

  • Dell, G.S., Chang, F., Griffin, Z.M.: Connectionist models of language production: lexical access and grammatical encoding. Cognitive Science 23, 517–541 (1999)

    Article  Google Scholar 

  • Engwall, O.: Combining MRI, EMA and EPG measurements in a three-dimensional tongue model. Speech Communication 41, 303–329 (2003)

    Article  Google Scholar 

  • Fadiga, L., Craighero, L.: Electrophysiology of action representation. Journal of Clinical Neurophysiology 21, 157–168 (2004)

    Article  Google Scholar 

  • Fadiga, L., Craighero, L., Buccino, G., Rizzolatti, G.: Speech listening specifically modulates the excitability of tongue muscles: a TMS study. European Journal of Neuroscience 15, 399–402 (2002)

    Article  Google Scholar 

  • Frackowiak, R.S.J., Friston, K.J., Frith, C.D., Dolan, R.J., Price, C.J., Zeki, S., Ashburner, J., Penny, W.: Human Brain Function, 2nd edn. Elsevier Academic Press, Amsterdam (2004)

    Google Scholar 

  • Guenther, F.H.: Cortical interactions underlying the production of speech sounds. Journal of Communication Disorders 39, 350–365 (2006)

    Article  Google Scholar 

  • Guenther, F.H., Ghosh, S.S., Tourville, J.A.: Neural modeling and imaging of the cortical interactions underlying syllable production. Brain and Language 96, 280–301 (2006)

    Article  Google Scholar 

  • Indefrey, P., Levelt, W.J.M.: The spatial and temporal signatures of word production components. Cognition 92, 101–144 (2004)

    Article  Google Scholar 

  • Kandel, E.R., Schwartz, J.H., Jessell, T.M.: Principles of Neural Science, 4th edn. MacGraw-Hill, New York (2000)

    Google Scholar 

  • Kohonen, T.: Self-organizing maps. Springer, Berlin (2001)

    Book  MATH  Google Scholar 

  • Kohler, E., Keysers, C., Umilta, M.A., Fogassi, L., Gallese, V., Rizzolatti, G.: Hearing sounds, understanding actions: action representation in mirror neurons. Science 297, 846–848 (2002)

    Article  Google Scholar 

  • Kröger, B.J., Birkholz, P.: A gesture-based concept for speech movement control in articulatory speech synthesis. In: Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M. (eds.) COST Action 2102. LNCS (LNAI), vol. 4775, pp. 174–189. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  • Kröger, B.J., Birkholz, P., Kannampuzha, J., Neuschaefer-Rube, C.: Modeling the perceptual magnet effect and categorical perception using self-organizing neural networks. In: Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken, Germany, pp. 789–792 (2007)

    Google Scholar 

  • Levelt, W.J.M., Wheeldon, L.: Do speakers have access to a mental syllabary? Cognition 50, 239–269 (1994)

    Article  Google Scholar 

  • Levelt, W.J.M., Roelofs, A., Meyer, A.: A theory of lexical access in speech production. Behavioral and Brain Sciences 22, 1–75 (1999)

    Google Scholar 

  • Oller, D.K., Eilers, R.E., Neal, A.R., Schwartz, H.K.: Precursors to speech in infancy: the prediction of speech and language disorders. Journal of Communication Disorders 32, 223–245 (1999)

    Article  Google Scholar 

  • Perkell, J., Cohen, M., Svirsky, M., Matthies, M., Garabieta, I., Jackson, M.: Electronetic midsaggital articulometer (EMMA) systems for transducing speech articulatory movements. Journal of the Acoustical Society of America 92, 3078–3096 (1992)

    Article  Google Scholar 

  • Stone, M.: Laboratory techniques for investigating speech articulation. In: Hardcastle, J., Laver, J. (eds.) The Handbook of Phonetic Sciences, pp. 11–32. Blackwell, Oxford (1997)

    Google Scholar 

  • Zell, A.: Simulation neuronaler Netze. Oldenbourg Verlag, München Wien (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kröger, B.J., Lowit, A., Schnitker, R. (2008). The Organization of a Neurocomputational Control Model for Articulatory Speech Synthesis. In: Esposito, A., Bourbakis, N.G., Avouris, N., Hatzilygeroudis, I. (eds) Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction. Lecture Notes in Computer Science(), vol 5042. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70872-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-70872-8_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-70871-1

  • Online ISBN: 978-3-540-70872-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics