A Structured Way of Looking at the Performance of Text-to-Speech Systems

  • Louis C. W. Pols
  • Ute Jekosch


Via the COCOSDA Bulletin Board an extensive questionnaire was distributed in 1993, upon which 16 reactions have been received. Almost all these reactions were very interesting and detailed. They contain a wealth of ideas and suggestions. In this chapter we combine suggestions from the questionnaire respondents with the knowledge and experience collected in various projects (SPIN, SAM(-A), Eagles, and (Euro)COCOSDA), into a new approach for a structured way of evaluating the performance of text-to-speech systems. The basic idea is to define a set of key words/descriptors that specify the system under study, with special emphasis on its application. In a similar way the available and to-be-developed tests should be characterized. System and application can then be linked, in a matrix way, to the suite of tests, and a proper selection can then be made, or it might become apparent that additional specific tests are still required.


Mean Opinion Score Speech Synthesis Synthesis Development Speech Database Concatenative Synthesis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [BTG94]
    J. Bernstein, K. Taussig, and J. Godfrey. Macrophone: An American English telephone speech corpus for the Polyphone project. In Proceedings, ICASSP-94, Adelaide, 181–183, 1994.Google Scholar
  2. [Cam94]
    N. Campbell. Prosody and the selection of units for concatenative synthesis. In Proceedings, ESCA/IEEE Workshop on Speech Synthesis, New Paltz, NY, 61–64, 1993.Google Scholar
  3. [DBVSB94]
    M. Damhuis, T. Boogaart, C. Veld, C. M. Versteylen, W. Schelvis, L. Bos, and L. Boves. Creation and analysis of the Dutch Polyphone corpus. In Proceedings, ICSLP-94, vol. 4, Yokohama, 1803–1806, 1994.Google Scholar
  4. [Eag95]
    Eagles. Spoken Language Systems, Chapter 4 on Assessment of Speech Output Systems, in press.Google Scholar
  5. [FHBH89]
    A. Fourcin, G. Harland, W. Barry and V. Hazands, eds. Speech Input and Output Assessment. Multilingual Methods and Standards. Ellis Horwood Ltd., Chichester, 1989.Google Scholar
  6. [Jek93]
    U. Jekosch. Speech quality assessment and evaluation. In Proceedings, Eurospeech’93, vol. 2, Berlin, 13870–1394, 1993.Google Scholar
  7. [JP94]
    U. Jekosch and L. C. W. Pols. A structured approach towards a framework for application-specific, speech quality assessment. In Proceedings, ICSLP-94, vol. 3, Yokohama, 1319–1322, 1994.Google Scholar
  8. [Pol90a]
    L. C. W. Pols. How useful are speech databases for rule synthesis development and assessment? In Proceedings, ICSLP-90, vol. 2, Kobe, 1289–1292, 1990.Google Scholar
  9. [Pol90b]
    L. C. W. Pols, ed. Speech input/output assessment and speech databases. Special issue of Speech Comm. 9(4):261–388, 1990.Google Scholar
  10. [Pol91]
    L. C. W. Pols. Quality assessment of text-to-speech synthesis-by-rule. In Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, eds. Marcel Dekker Inc., New York, chapter 13, 387–416, 1991.Google Scholar
  11. [Pol94a]
    L. C. W. Pols. Speech technology systems: Performance and evaluation. In The Encyclopedia of Language and Linguistics, R. E. Asher, ed. Pergamon Press, Oxford, vol. 8, 4289–4296, 1994.Google Scholar
  12. [Pol94b]
    L. C. W. Pols. Voice quality of synthetic speech: Representation and evaluation. In Proceedings, ICSLP-94, vol. 3, Yokohama, 1443–1446, 1994.Google Scholar
  13. [PSAM92]
    L. C. W. Pols and SAM-partners. Multi-lingual synthesis evaluation methods. In Proceedings, ICSLP-92, vol. 2, Banff, 181–184, 1992.Google Scholar
  14. [Spi93]
    M. E. Spiegel. Using the ORATOR synthesizer for a public reverse-directory service: Design, lessons, and recommendations. In Proceedings, Eurospeech’ 93, vol. 3, Berlin, 1897–1900, 1993.Google Scholar
  15. [vJ93]
    R. van Bezooijen and W. Jongenburger. Evaluation of an electronic newspaper for the blind in the Netherlands. In Proceedings, ESCA Workshop on Speech and Language Technology for Disabled Persons, B. Granström, S. Hunnicutt, and K.-E. Spens, eds. Stockholm, 195–198, 1993.Google Scholar
  16. [vP93]
    V. J. van Heuven and L. C. W. Pols, eds. Analysis and Synthesis of Speech. Strategic Research Towards High-Quality Text-to-Speech Generation. Speech Research 11, Mouton de Gruyter, Berlin, 1993.Google Scholar

Copyright information

© Springer Science+Business Media New York 1997

Authors and Affiliations

  • Louis C. W. Pols
  • Ute Jekosch

There are no affiliations available

Personalised recommendations