Skip to main content

Can Phonetic Knowledge be Used to Improve the Performance of Speech Recognisers and Synthesisers?

  • Chapter
The Integration of Phonetic Knowledge in Speech Technology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 25))

  • 425 Accesses

Abstract

A chronological survey of the development of machine recognition of speech is contrasted with the beginnings of speech synthesis, and the advantages and disadvantages of the different systems and approaches as well as their changing degrees of dependency on phonetic knowledge are sketched. The unsolved fundamental problem of concatenation quality in present-day synthesis is discussed and a knowledge based solution mooted which can be projected onto recognition: A mathematical model of the relationship between temporally overlapping underlying articulatory gestures and the resulting surface acoustic signal.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ainsworth, W.A. A system for converting English text speech. In: IEEE Transactions, AU-21, 1973: 288–290.

    Google Scholar 

  • Ainsworth, W.A. Speech Recognition by Machine. Peter Peregrinus Ltd., London, 1988.

    Google Scholar 

  • Allen, J.B. How do humans process and recognize speech? In: IEEE Transactions on Speech and Audio Processing, 2(4), 1994: 567–577.

    Article  Google Scholar 

  • Allen, J., Hunnicutt, S., and Klatt, D. From Text to Speech: The MITalk System. Cambridge University Press, Cambridge, 1987.

    Google Scholar 

  • Baker, J.K. The DRAGON system — an overview. In: IEEE Transactions, ASSP-23, 1975: 24–29.

    Google Scholar 

  • Berthommier, F. and Meyer, G.F. Source separation by a functional model of amplitude demodulation. In: Proceedings of Eurospeech’95, 1995: 135–138.

    Google Scholar 

  • Carré, R., Ainsworth, W.A., Jospa, P., Maeda, S., and Pasdeloup, V. Perception of vowel-to-vowel transitions with different formant trajectories. Phonetica, 58 (2001): 163–178.

    PubMed  Google Scholar 

  • Christensen, H., Lindberg, B., and Andersen, O. Introducing phonetically motivated information into ASR. In: Proceedings of Eurospeech’01, Volume 4, 2001: 2289–2292.

    Google Scholar 

  • Crouzet, O. and Ainsworth, W.A. Envelope information in speech processing: acoustic-phonetic analysis vs. auditory figure-ground segregation. In: Proceedings of Eurospeech’01, 1, 2001: 477–480.

    Google Scholar 

  • Davis, K.H., Biddulph, R., and Balashek, S. Automatic recognition of spoken digits. Journal of the Acoustical Society of America, 24 (1952): 637–642.

    Google Scholar 

  • Denes, P. and Mathews, M.V. Spoken digit recognition using time-frequency pattern-matching. Journal of the Acoustical Society of America, 32 (1960): 1450–1455.

    Google Scholar 

  • Fletcher, H. Speech and Hearing in Communication. Van Nostrand, New York, 1953.

    Google Scholar 

  • Forgie, J.W. and Forgie, C.D. Results obtained from a vowel recognition computer program. Journal of the Acoustical Society of America, 31 (1959): 1480–1489.

    Google Scholar 

  • Godfrey, J.J., Holliman, E.C., and McDaniel, J. Switchboard: Telephone speech corpus for research and development. In: Proceedings of ICASSP-92, 1, 1992: 517–520.

    Google Scholar 

  • Greenberg, S., Arai, T., and Silipo, R. Speech intelligibility derived from exceedingly sparse spectral information. In: Proceedings of ICSLP, Sydney, 1998: 2803–2806.

    Google Scholar 

  • Hagen, A., Morris, A., and Bourlard, H. From multi-band full combination to multi-stream full combination processing in robust ASR. In: Proceedings of the ISCA Workshop on Automatic Speech Recognition: Challenges for the New Millennium, ISCA ITRW ASR2000, 2000: 175–180.

    Google Scholar 

  • Holmes, J.N., Mattingly, I.G., and Shearme, J.N. Speech synthesis by rule. Language and Speech, 7 (1964): 127–143.

    Google Scholar 

  • Jacobson, R., Fant, G.C.M., and Halle, M. Preliminaries to speech analysis. MIT Tech. Report 13, 1952.

    Google Scholar 

  • Jelinek, F. Continuous speech recognition by statistical methods. In: Proceedings of IEEE, 64 (1976): 532–556.

    Google Scholar 

  • Klatt, D.H. Review of the ARPA speech recognition project. Journal of the Acoustical Society of America, 62 (1979): 1345–1366.

    Google Scholar 

  • Kollmeier, B., Kock, R., and Kohlrauch, A. Speech enhancement based on physiological and psychoacoustical models of modulation perception and binaural interaction. Journal of the Acoustical Society of America, 95 (1994): 1593–1602.

    PubMed  Google Scholar 

  • Levinson, S.E., Rabiner, L.R., and Sondhi, M.M. Speaker-independent digit recognition using hidden Markov models. In: Proceedings of ICASSP 1983, 1983: 1049–1052.

    Google Scholar 

  • Liberman, A.M., Delattre, P., and Cooper, F.S. The role of selected stimulus-variables in the perception of unvoiced stop consonants. American Journal of Psychology, 65 (1952): 497–516.

    PubMed  Google Scholar 

  • Liberman, A.M., Delattre, P., Cooper, F.S., and Gerstman, L.J. The role of consonant-vowel transitions in the perception of stop and nasal consonants. Psychological Monographs, 68 (1954): 1–13.

    Google Scholar 

  • Liberman, A.M., Delattre, P., Gerstman, L.J., and Cooper, F.S. Tempo of frequency change as a cue for distinguishing classes of speech sounds. Journal of Experimental Psychology, 54 (1956): 358–368.

    Google Scholar 

  • Liberman, A.M., Delattre, P., and Cooper, F.S. Cues for the distinction between voiced and voiceless stops in initial position. Language and Speech, 1 (1958): 153–167.

    Google Scholar 

  • Liberman, A.M., Cooper, F.S., Shankweiler, D.P., and Studdert-Kennedy, M. Perception of the speech code. Psychological Review, 74 (1967): 431–461.

    PubMed  Google Scholar 

  • Meyer, G.M., Edmonds, B.A., Yang, D., and Ainsworth, W.A. Amplitude modulation maps for robust speech recognition. In: Proceedings of the ISCA Workshop on Automatic Speech Recognition: Challenges for the New Millennium, ISCA ITRW ASR2000, 2000: 168–174.

    Google Scholar 

  • Nelson, A.L., Herscher, M.B., Martin, T.B., Zadell, H.J., and Falter, J.W. Acoustic recognition by by analog feature-abstraction techniques. In: Models for the Perception of Speech and Visual Form, MIT Press, 1967: 428.

    Google Scholar 

  • Olson, H.F. and Belar, H. Phonetic typewriter. Journal of the Acoustical Society of America, 28 (1956): 1072–1081.

    Google Scholar 

  • Pastor, M. and Casacuberta, F. Automatic learning of finite state automata for pronunciation modelling. In: Proceedings of Eurospeech’01, 4, 2001: 2293–2296.

    Google Scholar 

  • Potter, R.K., Kopp, G.A., and Green, H.C. Visible Speech. Van Nostrand, New York, 1947.

    Google Scholar 

  • Sakoe, H., and Chiba, S. Dynamic programming algorithms optimization for spoken word recognition. In: IEEE Transactions, ASSP-26, 1978: 43–49.

    Google Scholar 

  • van Santen, J.P.H. Perceptual experiments for diagnostic testing of text-to-speech systems. Computer Speech and Language, 7 (1993): 49–100.

    Article  Google Scholar 

  • van Santen, J.P.H., Sproat, R.W. Olive, J.P., and Hirschberg, J. (Eds.). Progress in Speech Synthesis. Springer-Verlag, New York, 1997.

    Google Scholar 

  • Velichko, V.M., and Zagoruyko, N.G. Automatic recognition of 200 words. International Journal of Man-Machine Studies, 2 (1970): 223–234.

    Google Scholar 

  • Wiren, J., and Stubbs, H.L. Electronic binary selection system for phoneme classification. Journal of the Acoustical Society of America, 28 (1956): 1082–1091.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer

About this chapter

Cite this chapter

Ainsworth, W.A. (2005). Can Phonetic Knowledge be Used to Improve the Performance of Speech Recognisers and Synthesisers?. In: Barry, W.J., van Dommelen, W.A. (eds) The Integration of Phonetic Knowledge in Speech Technology. Text, Speech and Language Technology, vol 25. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2637-4_2

Download citation

Publish with us

Policies and ethics