A Mixed Inventory Structure for German Concatenative Synthesis

  • Thomas Portele
  • Florian Höfer
  • Wolfgang J. Hess


In speech synthesis by unit concatenation a major point is the definition of the unit inventory. Diphone or demisyliable inventories are widely used but both unit types have their drawbacks. This chapter describes a mixed inventory structure that is syllable-oriented but does not demand a definite decision about the position of a syllable boundary. In the definition process of the inventory the results of a comprehensive investigation of coarticulatory phenomena at syllable boundaries were used as well as a machine-readable pronunciation dictionary. An evaluation comparing the mixed inventory with a demisyllable and a diphone inventory confirms that speech generated with the mixed inventory is superior regarding general acceptance. A segmental intelligibility test shows the high intelligibility of the synthetic speech.


Speech Synthesis Inventory Structure Natural Speech Synthetic Speech Human Voice 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BCEW93]
    O. Boeffard, B. Cherbonnel, F. Emerard, and S. White. Automatic segmentation and quality evaluation of speech unit inventories for concatenation-based, multilingual PSOLA text-to-speech systems. In Proceedings Eurospeech’93, Berlin, Germany, 1449–1452, 1993.Google Scholar
  2. [Bou88]
    V. J. Boucher. A parameter of syllabification for VstopV and relative timing invariance. J. Phonetics 16:299–326, 1988.Google Scholar
  3. [CI94]
    A. Conkie and S. Isard. Optimal coupling of diphones. In Second ESCA/IEEE-Workshop on Speech Synthesis, New Paltz, NY, 119–122, 1994.Google Scholar
  4. [CGN90]
    R. Carlson, B. Granström, and L. Nord. Segmental evaluation using the ESPRIT/SAM test procedures and monosyllablic words. In First ESCA-Workshop on Speech Synthesis, Autrans, France, 257–260, 1990.Google Scholar
  5. [DC91]
    R. Drullman and R. Collier. On the combined use of accented and unaccented diphones in speech synthesis. J. Acoust. Soc. Amer. 90:1766–1775, 1991.CrossRefGoogle Scholar
  6. [Fuj75]
    O. Fujimura. Syllable as the unit of speech synthesis. Unpublished paper.Google Scholar
  7. [FML77]
    O. Fujimura, M. J. Macchi, and J. B. Lovins. Demisyllables and affixes for speech synthesis. In Ninth ICA, Madrid, 513, 1977.Google Scholar
  8. [Fuj79]
    O. Fujimura. An analysis of English syllables as cores and affixes. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 4/5:471–476, 1979.Google Scholar
  9. [Hei79]
    G. Heike. Prerequisites of speech synthesis on the basis of an articulatory model. AIPUK 12:91–99, 1979.Google Scholar
  10. [Kea84]
    K. A. Keating. Phonetic and phonological representation of stop consonant voicing. Language 60:286–319, 1984.CrossRefGoogle Scholar
  11. [Koh77]
    K. Kohler. Einführung in die Phonetik des Deutschen. Erich Schmidt, Berlin, 1977.Google Scholar
  12. [Ko90]
    K. Kohler. Segmental reduction in connected speech in German: Phonological facts and phonetic explanations. In Speech Production and Speech Modeling, W. J. Hardcastle and A. Marchai, eds. Kluwer, Dordrecht, 69–92, 1990.CrossRefGoogle Scholar
  13. [KA92]
    V. Kraft and J. Andrews. Design, evaluation, and acquisition of a speech database for German synthesis-by-concatenation. In Proc. SST-92, Brisbane, Australia, 724–729, 1992.Google Scholar
  14. [Kra94]
    V. Kraft. Does the resulting speech quality improvement make a sophisticated concatenation of time-domain synthesis units worthwhile? In Second ESCA/IEEE-Workshop on Speech Synthesis, New Paltz, NY, 65–68, 1994.Google Scholar
  15. [KW56]
    K. Küpfmüller and O. Warns. Sprachsynthese aus Lauten. Nachrichtentechnische Fachberichte 3:28–31, 1956.Google Scholar
  16. [MM61]
    C. Martens and P. Martens. Phonetik der deutschen Sprache. Hueber, Munich, 1961.Google Scholar
  17. [MC90]
    E. Moulines and F. Charpentier. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Comm. 9:453–467, 1990.CrossRefGoogle Scholar
  18. [Oli90]
    J. Olive. A new algorithm for a concatenative speech synthesis system using an augmented acoustic inventory of speech sounds. In First ESCA-Workshop on Speech Synthesis, Autrans, France, 25–30, 1990.Google Scholar
  19. [PS60]
    G. E. Peterson and E. Sievertsen. Objectives and techniques in speech synthesis. Language and Speech 3:84–95, 1960.Google Scholar
  20. [PSPSH92]
    T. Portele, B. Steffan, R. Preuss, W. F. Sendlmeier, and W. Hess. HADIFIX—A speech synthesis system for German. In Proceedings ICSLP’92, Banff, Alberta, Canada, 1227–1230, 1992.Google Scholar
  21. [Por93]
    T. Portele. Evaluation der segmentalen Verständlichkeit des Sprachsynthesesystems HADIFIX mit der SAM-Testprozedur. In Fortschritte der Akustik — DAGA’93, Frankfurt, Germany, 1032–1035, 1993.Google Scholar
  22. [Por94]
    T. Portele. Ein phonetisch-akustisch motiviertes Inventar zur Sprachsynthese deutscher Äusserungen. Dissertation, University of Bonn, 1994.Google Scholar
  23. [Rap36]
    K. M. Rapp. Versuch einer Physiologie der Sprache nebst historischer Entwicklung der abendländischen Idiome nach physiologischen Grundsätzen. Cotta, Stuttgart-Tübingen, 1836.Google Scholar
  24. [SKT84]
    A. G. Samuel, D. Kat, and V. Tartter. Which syllable does an intervocalic stop belong to? A selective adaptation study. J. Acoust. Soc. Amer. 76:1652–1663, 1984.CrossRefGoogle Scholar
  25. [Sto71]
    D. Stock. Untersuchungen zur Stimmhaftigkeit hochdeutscher Phonemrealisierungen. Buske, Hamburg, 1971.Google Scholar
  26. [Twa38]
    W. F. Twadell. A phonological analysis of intervocalic consonant clusters in German. In Actes du 4e congrès int. des linguistes, Copenhagen, Denmark, 218–225, 1938.Google Scholar
  27. [Wan60]
    H.-H. Wängler. Grundriss einer Phonetik des Deutschen. Elwert, Marburg, 1960.Google Scholar
  28. [Wha90]
    D. H. Whalen. Coarticulation is largely planned. J. Phonetics 18:3–35, 1990.MathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York 1997

Authors and Affiliations

  • Thomas Portele
  • Florian Höfer
  • Wolfgang J. Hess

There are no affiliations available

Personalised recommendations