Prosodic Parameters of French in a Speech Recognition System

Bartkova, Katarina

doi:10.1007/978-94-011-4317-2_15

Katarina Bartkova

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 15))

281 Accesses

Abstract

The importance of integrating prosody into different speech processing systems is nowadays widely acknowledged. Prosodie parameter modelling was first carried out for text-to-speech synthesis, since this speech processing technique could not work without prosody (Emerard, 1977; Klatt, 1979). It appeared to speech researchers that prosody would also be helpful in speech recognition (Carbonell, Haton, Lonchamp & Pierrel, 1982; Waibel, 1987; Ljolje & Fallside, 1987; Wang & Hirschberg, 1992). However, the way to use prosodie parameters in a speech recognition system is less straightforward than in text-to-speech synthesis. A good predicting model in speech recognition has to forecast different varieties of speaking styles while in speech synthesis one correct speaking style prediction, appropriate to a given application, is sufficient.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

André-Obrecht, R. 1990. Reconnaissance automatique de parole à partir de segments acoustiques et de modèles de Markov cachés. 18ième Journées d’Etudes sur la Parole (Montreal, Canada), 212–216.
Google Scholar
Bartkova, K., P. Haffner and D. Larreur. 1993. Intensity prediction for speech synthesis in French. Proc. ESCA Workshop on Prosody (Lund, Sweden), 280–283.
Google Scholar
Bartkova, K. and D. Jouvet. 1995. Using segmental duration prediction for rescoring the N-best solution in speech recognition. Proc. 13th ICPhS (Stockholm, Sweden), vol. 4, 248–251.
Google Scholar
Bartkova, K. and D. Jouvet. 1997. Usefulness of phonetic parameters in a rejection procedure of an HMM based speech recognition system. Proc. EUROSPEECH’ 97 (Rhodes, Greece), 267–270.
Google Scholar
Bartkova, K. and C. Sorin. 1987. A model of segmental duration for speech synthesis in French. Speech Communication 6, 245–260.
Article Google Scholar
Batliner, A., C. Weiand, A. Kiesling and E. NÖth. 1993. Why sentence modality in spontaneous speech is more difficult to classify and why this fact is not too bad for prosody. Proc. ESC A Workshop on Prosody (Lund, Sweden), 112–115.
Google Scholar
Boeffard, O., B. Cherbonnel, F. Emerard and S. White. 1993. Automatic segmentation and quality evaluation of speech unit inventories for concatenation-based, multilingual PSOLA text-to-speech systems. Proc. EUROSPEECH’ 93 (Berlin, Germany), 1449–1452.
Google Scholar
Botte, M.C., G. Canévet, L. Demany and C. Sorin. 1989. Psychoacoustique et Perception Auditive. INSERM/SFA/CNET.
Google Scholar
Bush, M.A. and G.E. Kopec. 1987. Network-based connected digit recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP 35, 1401–1413.
Article Google Scholar
Campbell W.N. and S.D. Isard. 1991. Segment duration in a syllable frame. Journal of Phonetics 19, 37–47.
Google Scholar
Carbonell, N., J.P. Haton, F. Lonchamp and J.M. Pierrel. 1982. Indices prosodiques pour l’analyse syntaxico-sémantique dans Myrtille II. Actes du Séminaire Prosodie et Reconnaissance (Aix-en-Provence, France), 59–61.
Google Scholar
Carbonell N. and Y. Laprie. 1993. Automatic detection of prosodic cues for segmenting continuous speech into surpralexical units. Proc. ESCA Workshop on Prosody (Lund, Sweden), 184–187.
Google Scholar
Chung G. and S. Seneff. 1997. Hierarchical duration modelling for speech recognition using the Angie framework. Proc. EUROSPEECH’ 97 (Rhodes, Greece), 1476–1478.
Google Scholar
Cosi, P., D. Flalaviga and M. Omologo. 1991. A preliminary statistical evaluation of manual and automatic segmentation discrepancies. Proc. EUROSPEECH’ 91 (Genova, Italy), 693–696.
Google Scholar
Daly, N. and V. Zue. 1995. Acoustic, perceptual and linguistic analyses of intonation contours in human /machine dialogues. Proc. ICSLP’ 95, 497–500.
Google Scholar
Delattre, P. 1966. Studies in French and Comparative Phonetics. London: Mouton.
Google Scholar
Di Cristo, A. 1978. De la Mircroprosodie à l’Intonosyntaxe. Thèse d’Etat, Université de Provence.
Google Scholar
Emerard, F. 1977. Les diphones et le traitement de la prosodie dans la synthèse de la parole. Bulletin de l’Institut Phonétique de Grenoble, vol. VI, 103–147.
Google Scholar
Fletcher J. 1991. Rhythm and final lengthening in French. Journal of Phonetics 19, 193–212.
Google Scholar
FÖnagy, I. 1980. L’accent français, accent probabilitaire (dynamique d’un changement prosodique). Studia Phonetica 15, Montreal.
Google Scholar
FÖnagy, I. and K. Magdics. 1960. Speech of utterance in phrases of different lengths. Language and Speech 3, 179–192.
Google Scholar
Gong, Y. and C.W. Treurnier. 1993. Duration of phones as function of utterance length and its use in automatic speech recognition. Proc. EUROSPEECH’ 93 (Berlin, Germany), 315–318.
Google Scholar
Gupta, V., M. Lenning and P. Mermelstein. 1992. Use of minimum duration and energy contour for phonemes to improve large vocabulary isolated-word recognition. Computer Speech and Language 6, 331–344.
Article Google Scholar
Jouvet, D., K. Bartkova and J. Monné. 1991. On the modélisation of allophones in an HMM based speech recognition system. Proc. EUROSPEECH’ 91 (Genova, Italy), 923–926.
Google Scholar
Klatt, D.H. 1979. Synthesis by rule of segmental durations in English sentences. In Lindblom and Öhman (eds.), 287–300.
Google Scholar
Langlais P. 1995. Traitement de la Prosodie en Reconnaissance Atomatique de la Parole. Thèse de l’Université d’Avignon.
Google Scholar
Lehiste, I. 1970. Suprasegmentals. Cambridge, Mass. MIT Press.
Google Scholar
Lindblom, B. and S.E.G. Öhman (eds.). 1979. Frontiers of Speech Communication Research. New York: Academic Press.
Google Scholar
Lindblom, B. and K. Rapp. 1973. Some temporal regularities of spoken Swedish. Symposium on Auditory Analysis and Perception of Speech (Leningrad, USSR), 21–23.
Google Scholar
Ljolje, A. and F. Fallside. 1987. Modelling of speech using primarily prosodic parameters. Computer Speech and Language 2, 185–204.
Article Google Scholar
Lokbani, M.N., D. Jouvet and J. Monné. 1993. Segmental post-processing of the N-best solutions in a speech recognition system. Proc. EUROSPEECH’ 93 (Berlin, Germany), 811–814.
Google Scholar
McNeilage, P. and J.L. De Clerk. 1968. Cinefluorographic study of speaking rate. Proc. 76th Meeting of the Acoustic Society of America (Cleveland, USA), 19–22.
Google Scholar
Mercier, G., D. Bigorgne, L. Miclet, L. Le Guennec and M. Querré. 1990. Recognition of speaker-dependent continuous speech with KEAL. In Waibel and Lee (eds.), 225–234.
Google Scholar
Morin D. 1991. Influence of field data in HMM training for a vocal server. Proc. EUROSPEECH’ 91 (Genova, Italy), 735–738.
Google Scholar
Nespor, M. and I. Vogel. 1986. Prosodic Phonology. Dordrecht: Foris
Google Scholar
O’Shaughnessy, D. 1984. A multispeaker analysis of durations in read French paragraphs. J. Acoust. Soc. Am. 76, 1664–1672.
Article Google Scholar
Ostendorf, M.F. and K. Ross. 1997. A Multi-level model for intonation labels. In Sagisaka et al (eds.), 291–308.
Google Scholar
Ostendorf, M.F., P.J. Price, J. Bear and C.W. Wightman. 1990. The use of relative duration in syntactic disambiguation. Proc. DARPA Speech and Natural Language Workshop (Hidden Valley, USA).
Google Scholar
Ostendorf, M.F., C.W. Wightman and N.M. Veilleux. 1993. Parse scoring with prosodic information: an analysis/synthesis approach. Computer Speech and Language 7, 193–210.
Article Google Scholar
Pols, L.C.W., X. Wang and L.F.M. ten Bosch. 1996. Modelling of phone duration (using the TIMIT database) and its potential benefit for ASR. Speech Communication 19, 161–176.
Article Google Scholar
Rose, R.C. and D.B. Paul. 1990. A Hidden Markov model based keyword recognition system. Proc. IEEE International Conference on Acoustic, Speech and Speech Processing (Albuquerque, USA), 129–132.
Google Scholar
Rouat, J., Ch.Y. Liu and D. Morissette. 1997. A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Communication 2, 191–207.
Article Google Scholar
Sagisaka, Y., N. Campbell and N. Higuchi (eds.). 1997. Computing Prosody. New York: Springer-Verlag.
Google Scholar
Selkirk, E.O. 1986. On derived domains in sentence prosody. Phonology Yearbook 3, 371–405.
Article Google Scholar
Soong, F.K. 1998. A phonetically labelled acoustic segment (PLAS) approach to speech analysis-synthesis. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (Glasgow, UK), 584–587.
Google Scholar
Sorin, C. 1989. Perception de la parole continue. In Botte et al. (eds.), 123–139.
Google Scholar
Sorin, C, D. Jouvet, C. Gagnoulet, D., Dubois, D., Sadek and M. Toularhoat. 1995. Operational and experimental French telecommunication services using CNET speech recognition and text-to-speech synthesis. Speech Communication 17, 273–286.
Article Google Scholar
Suaudeau, N. and R. André-Obrecht. 1993. Sound duration modelling and time-variable speaking rate in a speech recognition system. Proc. EUROSPEECH’ 93 (Berlin, Germany), 307–310.
Google Scholar
Vaissière, J. 1977. Premiers essais d’utilisation de la durée pour la segmentation en mots dans un système de reconnaissance. 8ème Journée d’Etudes sur la Parole (Aix-en-Provence, France), 345–352.
Google Scholar
Waibel, A. 1987. Prosody and Speech Recognition. London: Pitman.
Google Scholar
Waibel, A. and K.F. Lee (eds.). 1990. Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann Publishers.
Google Scholar
Wang, M.Q. and J. Hirschberg. 1992. Automatic classification of intonation phrase boundaries. Speech Computer and Language 6, 175–196.
Article Google Scholar
Wang, X. 1997. Incorporating knowledge on segmental duration in HMM-based continuous speech recognition. EFOTT, Amsterdam, The Netherlands.
Google Scholar

Download references

Authors

Katarina Bartkova
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Skövde, Sweden
Antonis Botinis
University of Athens, Greece
Antonis Botinis

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bartkova, K. (2000). Prosodic Parameters of French in a Speech Recognition System. In: Botinis, A. (eds) Intonation. Text, Speech and Language Technology, vol 15. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-4317-2_15

Download citation

DOI: https://doi.org/10.1007/978-94-011-4317-2_15
Publisher Name: Springer, Dordrecht
Print ISBN: 978-0-7923-6723-9
Online ISBN: 978-94-011-4317-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics