Skip to main content

Prosodic Parameters of French in a Speech Recognition System

  • Chapter
Intonation

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 15))

  • 281 Accesses

Abstract

The importance of integrating prosody into different speech processing systems is nowadays widely acknowledged. Prosodie parameter modelling was first carried out for text-to-speech synthesis, since this speech processing technique could not work without prosody (Emerard, 1977; Klatt, 1979). It appeared to speech researchers that prosody would also be helpful in speech recognition (Carbonell, Haton, Lonchamp & Pierrel, 1982; Waibel, 1987; Ljolje & Fallside, 1987; Wang & Hirschberg, 1992). However, the way to use prosodie parameters in a speech recognition system is less straightforward than in text-to-speech synthesis. A good predicting model in speech recognition has to forecast different varieties of speaking styles while in speech synthesis one correct speaking style prediction, appropriate to a given application, is sufficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • André-Obrecht, R. 1990. Reconnaissance automatique de parole à partir de segments acoustiques et de modèles de Markov cachés. 18ième Journées d’Etudes sur la Parole (Montreal, Canada), 212–216.

    Google Scholar 

  • Bartkova, K., P. Haffner and D. Larreur. 1993. Intensity prediction for speech synthesis in French. Proc. ESCA Workshop on Prosody (Lund, Sweden), 280–283.

    Google Scholar 

  • Bartkova, K. and D. Jouvet. 1995. Using segmental duration prediction for rescoring the N-best solution in speech recognition. Proc. 13th ICPhS (Stockholm, Sweden), vol. 4, 248–251.

    Google Scholar 

  • Bartkova, K. and D. Jouvet. 1997. Usefulness of phonetic parameters in a rejection procedure of an HMM based speech recognition system. Proc. EUROSPEECH’ 97 (Rhodes, Greece), 267–270.

    Google Scholar 

  • Bartkova, K. and C. Sorin. 1987. A model of segmental duration for speech synthesis in French. Speech Communication 6, 245–260.

    Article  Google Scholar 

  • Batliner, A., C. Weiand, A. Kiesling and E. NÖth. 1993. Why sentence modality in spontaneous speech is more difficult to classify and why this fact is not too bad for prosody. Proc. ESC A Workshop on Prosody (Lund, Sweden), 112–115.

    Google Scholar 

  • Boeffard, O., B. Cherbonnel, F. Emerard and S. White. 1993. Automatic segmentation and quality evaluation of speech unit inventories for concatenation-based, multilingual PSOLA text-to-speech systems. Proc. EUROSPEECH’ 93 (Berlin, Germany), 1449–1452.

    Google Scholar 

  • Botte, M.C., G. Canévet, L. Demany and C. Sorin. 1989. Psychoacoustique et Perception Auditive. INSERM/SFA/CNET.

    Google Scholar 

  • Bush, M.A. and G.E. Kopec. 1987. Network-based connected digit recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP 35, 1401–1413.

    Article  Google Scholar 

  • Campbell W.N. and S.D. Isard. 1991. Segment duration in a syllable frame. Journal of Phonetics 19, 37–47.

    Google Scholar 

  • Carbonell, N., J.P. Haton, F. Lonchamp and J.M. Pierrel. 1982. Indices prosodiques pour l’analyse syntaxico-sémantique dans Myrtille II. Actes du Séminaire Prosodie et Reconnaissance (Aix-en-Provence, France), 59–61.

    Google Scholar 

  • Carbonell N. and Y. Laprie. 1993. Automatic detection of prosodic cues for segmenting continuous speech into surpralexical units. Proc. ESCA Workshop on Prosody (Lund, Sweden), 184–187.

    Google Scholar 

  • Chung G. and S. Seneff. 1997. Hierarchical duration modelling for speech recognition using the Angie framework. Proc. EUROSPEECH’ 97 (Rhodes, Greece), 1476–1478.

    Google Scholar 

  • Cosi, P., D. Flalaviga and M. Omologo. 1991. A preliminary statistical evaluation of manual and automatic segmentation discrepancies. Proc. EUROSPEECH’ 91 (Genova, Italy), 693–696.

    Google Scholar 

  • Daly, N. and V. Zue. 1995. Acoustic, perceptual and linguistic analyses of intonation contours in human /machine dialogues. Proc. ICSLP’ 95, 497–500.

    Google Scholar 

  • Delattre, P. 1966. Studies in French and Comparative Phonetics. London: Mouton.

    Google Scholar 

  • Di Cristo, A. 1978. De la Mircroprosodie à l’Intonosyntaxe. Thèse d’Etat, Université de Provence.

    Google Scholar 

  • Emerard, F. 1977. Les diphones et le traitement de la prosodie dans la synthèse de la parole. Bulletin de l’Institut Phonétique de Grenoble, vol. VI, 103–147.

    Google Scholar 

  • Fletcher J. 1991. Rhythm and final lengthening in French. Journal of Phonetics 19, 193–212.

    Google Scholar 

  • FÖnagy, I. 1980. L’accent français, accent probabilitaire (dynamique d’un changement prosodique). Studia Phonetica 15, Montreal.

    Google Scholar 

  • FÖnagy, I. and K. Magdics. 1960. Speech of utterance in phrases of different lengths. Language and Speech 3, 179–192.

    Google Scholar 

  • Gong, Y. and C.W. Treurnier. 1993. Duration of phones as function of utterance length and its use in automatic speech recognition. Proc. EUROSPEECH’ 93 (Berlin, Germany), 315–318.

    Google Scholar 

  • Gupta, V., M. Lenning and P. Mermelstein. 1992. Use of minimum duration and energy contour for phonemes to improve large vocabulary isolated-word recognition. Computer Speech and Language 6, 331–344.

    Article  Google Scholar 

  • Jouvet, D., K. Bartkova and J. Monné. 1991. On the modélisation of allophones in an HMM based speech recognition system. Proc. EUROSPEECH’ 91 (Genova, Italy), 923–926.

    Google Scholar 

  • Klatt, D.H. 1979. Synthesis by rule of segmental durations in English sentences. In Lindblom and Öhman (eds.), 287–300.

    Google Scholar 

  • Langlais P. 1995. Traitement de la Prosodie en Reconnaissance Atomatique de la Parole. Thèse de l’Université d’Avignon.

    Google Scholar 

  • Lehiste, I. 1970. Suprasegmentals. Cambridge, Mass. MIT Press.

    Google Scholar 

  • Lindblom, B. and S.E.G. Öhman (eds.). 1979. Frontiers of Speech Communication Research. New York: Academic Press.

    Google Scholar 

  • Lindblom, B. and K. Rapp. 1973. Some temporal regularities of spoken Swedish. Symposium on Auditory Analysis and Perception of Speech (Leningrad, USSR), 21–23.

    Google Scholar 

  • Ljolje, A. and F. Fallside. 1987. Modelling of speech using primarily prosodic parameters. Computer Speech and Language 2, 185–204.

    Article  Google Scholar 

  • Lokbani, M.N., D. Jouvet and J. Monné. 1993. Segmental post-processing of the N-best solutions in a speech recognition system. Proc. EUROSPEECH’ 93 (Berlin, Germany), 811–814.

    Google Scholar 

  • McNeilage, P. and J.L. De Clerk. 1968. Cinefluorographic study of speaking rate. Proc. 76th Meeting of the Acoustic Society of America (Cleveland, USA), 19–22.

    Google Scholar 

  • Mercier, G., D. Bigorgne, L. Miclet, L. Le Guennec and M. Querré. 1990. Recognition of speaker-dependent continuous speech with KEAL. In Waibel and Lee (eds.), 225–234.

    Google Scholar 

  • Morin D. 1991. Influence of field data in HMM training for a vocal server. Proc. EUROSPEECH’ 91 (Genova, Italy), 735–738.

    Google Scholar 

  • Nespor, M. and I. Vogel. 1986. Prosodic Phonology. Dordrecht: Foris

    Google Scholar 

  • O’Shaughnessy, D. 1984. A multispeaker analysis of durations in read French paragraphs. J. Acoust. Soc. Am. 76, 1664–1672.

    Article  Google Scholar 

  • Ostendorf, M.F. and K. Ross. 1997. A Multi-level model for intonation labels. In Sagisaka et al (eds.), 291–308.

    Google Scholar 

  • Ostendorf, M.F., P.J. Price, J. Bear and C.W. Wightman. 1990. The use of relative duration in syntactic disambiguation. Proc. DARPA Speech and Natural Language Workshop (Hidden Valley, USA).

    Google Scholar 

  • Ostendorf, M.F., C.W. Wightman and N.M. Veilleux. 1993. Parse scoring with prosodic information: an analysis/synthesis approach. Computer Speech and Language 7, 193–210.

    Article  Google Scholar 

  • Pols, L.C.W., X. Wang and L.F.M. ten Bosch. 1996. Modelling of phone duration (using the TIMIT database) and its potential benefit for ASR. Speech Communication 19, 161–176.

    Article  Google Scholar 

  • Rose, R.C. and D.B. Paul. 1990. A Hidden Markov model based keyword recognition system. Proc. IEEE International Conference on Acoustic, Speech and Speech Processing (Albuquerque, USA), 129–132.

    Google Scholar 

  • Rouat, J., Ch.Y. Liu and D. Morissette. 1997. A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Communication 2, 191–207.

    Article  Google Scholar 

  • Sagisaka, Y., N. Campbell and N. Higuchi (eds.). 1997. Computing Prosody. New York: Springer-Verlag.

    Google Scholar 

  • Selkirk, E.O. 1986. On derived domains in sentence prosody. Phonology Yearbook 3, 371–405.

    Article  Google Scholar 

  • Soong, F.K. 1998. A phonetically labelled acoustic segment (PLAS) approach to speech analysis-synthesis. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (Glasgow, UK), 584–587.

    Google Scholar 

  • Sorin, C. 1989. Perception de la parole continue. In Botte et al. (eds.), 123–139.

    Google Scholar 

  • Sorin, C, D. Jouvet, C. Gagnoulet, D., Dubois, D., Sadek and M. Toularhoat. 1995. Operational and experimental French telecommunication services using CNET speech recognition and text-to-speech synthesis. Speech Communication 17, 273–286.

    Article  Google Scholar 

  • Suaudeau, N. and R. André-Obrecht. 1993. Sound duration modelling and time-variable speaking rate in a speech recognition system. Proc. EUROSPEECH’ 93 (Berlin, Germany), 307–310.

    Google Scholar 

  • Vaissière, J. 1977. Premiers essais d’utilisation de la durée pour la segmentation en mots dans un système de reconnaissance. 8ème Journée d’Etudes sur la Parole (Aix-en-Provence, France), 345–352.

    Google Scholar 

  • Waibel, A. 1987. Prosody and Speech Recognition. London: Pitman.

    Google Scholar 

  • Waibel, A. and K.F. Lee (eds.). 1990. Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann Publishers.

    Google Scholar 

  • Wang, M.Q. and J. Hirschberg. 1992. Automatic classification of intonation phrase boundaries. Speech Computer and Language 6, 175–196.

    Article  Google Scholar 

  • Wang, X. 1997. Incorporating knowledge on segmental duration in HMM-based continuous speech recognition. EFOTT, Amsterdam, The Netherlands.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Bartkova, K. (2000). Prosodic Parameters of French in a Speech Recognition System. In: Botinis, A. (eds) Intonation. Text, Speech and Language Technology, vol 15. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-4317-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-94-011-4317-2_15

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-0-7923-6723-9

  • Online ISBN: 978-94-011-4317-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics