Skip to main content

From Here to Utility

Melding Phonetic Insight with Speech Technology

  • Chapter
The Integration of Phonetic Knowledge in Speech Technology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 25))

Abstract

Technology and science are often perceived as polar extremes with respect to spoken language. Speech applications rarely incorporate scientific insight and conversely, basic research is often viewed as oblivious to practical concerns of the real world. Melding phonetic insight with speech technology can, however, yield extremely productive results for both applications and basic science if performed within the appropriate theoretical framework. Such an approach is illustrated with respect to the relation between prosodic (stress accent) and phonetic properties of conversational telephone dialogues (American English) using the Switchboard corpus. Phonetic properties, such as vocalic identity and duration, are shown to reflect prosodic phenomena, and thus could be used to enhance the quality of automatic speech recognition performance, as well as provide detailed insight into the nature of spoken language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Beckman, M. Stress and Non-Stress Accent. Dordrecht, Fortis, 1986.

    Google Scholar 

  • Clark, J. and Yallop, C. Introduction to Phonology and Phonetics. Oxford, Blackwell, 1990.

    Google Scholar 

  • Cole, R., Fanty, M., Noel, M., and Lander, T. Telephone speech corpus development at CSLU, In: Proceeding of the Third International Conference on Spoken Language Processing 1994.

    Google Scholar 

  • Darwin, C. Voyage of the Beagle. New York, Collier [reprinted, 1909] 1839.

    Google Scholar 

  • Darwin, C. On the Origin of Species. Cambridge, MA, Harvard University Press (facsimile of the 1st edition, 1964) 1859.

    Google Scholar 

  • Fry, D. Experiments in the perception of stress. Language and Speech 1 (1955): 126–152.

    Google Scholar 

  • Fudge, E. English Word-Stress. London, Allen and Unwin, 1984.

    Google Scholar 

  • Gimson, A. An Introduction to the Pronunciation of English (3rd ed.). London, Edward Arnold, 1980.

    Google Scholar 

  • Godfrey, J.J., Holliman, E.C., and McDaniel, J. SWITCHBOARD: Telephone speech corpus for research and development. In: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing, 1992: 517–520.

    Google Scholar 

  • Greenberg, S. The Switchboard Transcription Project. In Research Report #24, 1996 Large Vocabulary Continuous Speech Recognition Summer Research Workshop Technical Report Series. Center for Language and Speech Processing, Johns Hopkins University, Baltimore, MD, 1997.

    Google Scholar 

  • Greenberg, S. Recognition in a new key—Towards a science of spoken language. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1998: 1041–1045.

    Google Scholar 

  • Greenberg, S. Speaking in shorthand—A syllable-centric perspective for understanding pronunciation variation. Speech Communication 29 (1999): 159–176.

    Article  Google Scholar 

  • Greenberg, S. Whither speech technology?—A twenty-first century perspective. In: Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech-2001), 2001: 3–6.

    Google Scholar 

  • Greenberg, S., Carvey, H., and Hitchcock, L. The relation between stress accent and pronunciation variation in spontaneous American English discourse. In: Proceedings of the International Conference on Speech Prosody-2002, 2002.

    Google Scholar 

  • Greenberg, S. and Chang, S. Linguistic dissection of switchboard-corpus automatic speech recognition systems. In: Proceedings of the ISCA Workshop on Automatic Speech Recognition: Challenges for the New Millennium, 2000: 195–202.

    Google Scholar 

  • Hitchcock, L. Acoustic Properties of Vocalic Nuclei Associated with Prosodic Stress Accent in Spontaneous American English Discourse, Undergraduate Honors Thesis, Department of Linguistics, University of California, Berkeley, 2001. (available from http://www.icsi.berkeley.edu/steveng/prosody).

    Google Scholar 

  • Hitchcock, L. and Greenberg, S. Vowel height is intimately associated with stress-accent in spontaneous American English discourse. In: 7th European Conference on Speech Communication and Technology (Eurospeech-2001), 2001: 79–82.

    Google Scholar 

  • Jakobson, R., Fant, G., and Halle, M. Preliminaries to Speech Anlysis: The Distinctive Features and Their Correlates. Cambridge, MA, MIT Press, 1961.

    Google Scholar 

  • Koumpis, K. and Renals, S. The role of prosody in a voicemail summarization system. In: Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding, 2001: 93–98.

    Google Scholar 

  • Kuijk, D. and van and Boves, L. Acoustic characteristics of lexical prominence in continuous telephone speech. Speech Communication 27 (1999): 95–111.

    Article  Google Scholar 

  • Ladefoged, P. A Course in Phonetics (3rd ed.). New York, Harcourt, 1993.

    Google Scholar 

  • Lehiste, I. Suprasegmentals. Cambridge, MA, MIT Press, 1970.

    Google Scholar 

  • Lehiste, I. Suprasegmental features of speech. In: N. Lass (ed.). Principles of Experimental Phonetics, St. Louis, Mosby, 1996: 226–244.

    Google Scholar 

  • Lindblom, B. Spectrographic study of vowel reduction. Journal of the Acoustical Society of America 35 1963: 1773–1781.

    Google Scholar 

  • Lindblom, B. Explaining phonetic variation: A sketch of the H and H theory. In: W.J. Hardcastle and A. Marchal (eds.), Speech Production and Speech Modelling, Dordrecht, Kluwer, 1990: 403–439.

    Google Scholar 

  • Lovejoy, A.O. The Great Chain of Being. Cambridge, MA, Harvard University Press, 1939.

    Google Scholar 

  • Öhman, S.E.G. Coarticulation in VCV-utterances: Spectrographic measurements. Journal of the Acoustical Society of America 39 1965: 151–168.

    Google Scholar 

  • Popper, K. The Logic of Scientific Discovery. London, Hutchinson. [originally published in German, 1934] 1959.

    Google Scholar 

  • Ries, A. and Ries, L. The 22 Immutable Laws of Branding New York, Harper, 1998.

    Google Scholar 

  • Ritchie, D. (ed.) Rashomon. New Brunswick, NJ, Rutgers University Press, 1987.

    Google Scholar 

  • Silipo, R. and Greenberg, S. Automatic transcription of prosodic prominence for spontaneous English discourse. In: Proceedings of the XIVth International Congress of Phonetic Sciences, 1999: 2351–2354.

    Google Scholar 

  • Silipo, R. and Greenberg, S. Prosodic stress revisited: Reassessing the role of fundamental frequency. In: Proceedings of the NIST Speech Transcription Workshop 2000.

    Google Scholar 

  • Weiner, J. The Beak of the Finch. New York, Knopf, 1994.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer

About this chapter

Cite this chapter

Greenberg, S. (2005). From Here to Utility. In: Barry, W.J., van Dommelen, W.A. (eds) The Integration of Phonetic Knowledge in Speech Technology. Text, Speech and Language Technology, vol 25. Springer, Dordrecht. https://doi.org/10.1007/1-4020-2637-4_7

Download citation

Publish with us

Policies and ethics