Abstract
The importance of integrating prosody into different speech processing systems is nowadays widely acknowledged. Prosodie parameter modelling was first carried out for text-to-speech synthesis, since this speech processing technique could not work without prosody (Emerard, 1977; Klatt, 1979). It appeared to speech researchers that prosody would also be helpful in speech recognition (Carbonell, Haton, Lonchamp & Pierrel, 1982; Waibel, 1987; Ljolje & Fallside, 1987; Wang & Hirschberg, 1992). However, the way to use prosodie parameters in a speech recognition system is less straightforward than in text-to-speech synthesis. A good predicting model in speech recognition has to forecast different varieties of speaking styles while in speech synthesis one correct speaking style prediction, appropriate to a given application, is sufficient.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
André-Obrecht, R. 1990. Reconnaissance automatique de parole à partir de segments acoustiques et de modèles de Markov cachés. 18ième Journées d’Etudes sur la Parole (Montreal, Canada), 212–216.
Bartkova, K., P. Haffner and D. Larreur. 1993. Intensity prediction for speech synthesis in French. Proc. ESCA Workshop on Prosody (Lund, Sweden), 280–283.
Bartkova, K. and D. Jouvet. 1995. Using segmental duration prediction for rescoring the N-best solution in speech recognition. Proc. 13th ICPhS (Stockholm, Sweden), vol. 4, 248–251.
Bartkova, K. and D. Jouvet. 1997. Usefulness of phonetic parameters in a rejection procedure of an HMM based speech recognition system. Proc. EUROSPEECH’ 97 (Rhodes, Greece), 267–270.
Bartkova, K. and C. Sorin. 1987. A model of segmental duration for speech synthesis in French. Speech Communication 6, 245–260.
Batliner, A., C. Weiand, A. Kiesling and E. NÖth. 1993. Why sentence modality in spontaneous speech is more difficult to classify and why this fact is not too bad for prosody. Proc. ESC A Workshop on Prosody (Lund, Sweden), 112–115.
Boeffard, O., B. Cherbonnel, F. Emerard and S. White. 1993. Automatic segmentation and quality evaluation of speech unit inventories for concatenation-based, multilingual PSOLA text-to-speech systems. Proc. EUROSPEECH’ 93 (Berlin, Germany), 1449–1452.
Botte, M.C., G. Canévet, L. Demany and C. Sorin. 1989. Psychoacoustique et Perception Auditive. INSERM/SFA/CNET.
Bush, M.A. and G.E. Kopec. 1987. Network-based connected digit recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP 35, 1401–1413.
Campbell W.N. and S.D. Isard. 1991. Segment duration in a syllable frame. Journal of Phonetics 19, 37–47.
Carbonell, N., J.P. Haton, F. Lonchamp and J.M. Pierrel. 1982. Indices prosodiques pour l’analyse syntaxico-sémantique dans Myrtille II. Actes du Séminaire Prosodie et Reconnaissance (Aix-en-Provence, France), 59–61.
Carbonell N. and Y. Laprie. 1993. Automatic detection of prosodic cues for segmenting continuous speech into surpralexical units. Proc. ESCA Workshop on Prosody (Lund, Sweden), 184–187.
Chung G. and S. Seneff. 1997. Hierarchical duration modelling for speech recognition using the Angie framework. Proc. EUROSPEECH’ 97 (Rhodes, Greece), 1476–1478.
Cosi, P., D. Flalaviga and M. Omologo. 1991. A preliminary statistical evaluation of manual and automatic segmentation discrepancies. Proc. EUROSPEECH’ 91 (Genova, Italy), 693–696.
Daly, N. and V. Zue. 1995. Acoustic, perceptual and linguistic analyses of intonation contours in human /machine dialogues. Proc. ICSLP’ 95, 497–500.
Delattre, P. 1966. Studies in French and Comparative Phonetics. London: Mouton.
Di Cristo, A. 1978. De la Mircroprosodie à l’Intonosyntaxe. Thèse d’Etat, Université de Provence.
Emerard, F. 1977. Les diphones et le traitement de la prosodie dans la synthèse de la parole. Bulletin de l’Institut Phonétique de Grenoble, vol. VI, 103–147.
Fletcher J. 1991. Rhythm and final lengthening in French. Journal of Phonetics 19, 193–212.
FÖnagy, I. 1980. L’accent français, accent probabilitaire (dynamique d’un changement prosodique). Studia Phonetica 15, Montreal.
FÖnagy, I. and K. Magdics. 1960. Speech of utterance in phrases of different lengths. Language and Speech 3, 179–192.
Gong, Y. and C.W. Treurnier. 1993. Duration of phones as function of utterance length and its use in automatic speech recognition. Proc. EUROSPEECH’ 93 (Berlin, Germany), 315–318.
Gupta, V., M. Lenning and P. Mermelstein. 1992. Use of minimum duration and energy contour for phonemes to improve large vocabulary isolated-word recognition. Computer Speech and Language 6, 331–344.
Jouvet, D., K. Bartkova and J. Monné. 1991. On the modélisation of allophones in an HMM based speech recognition system. Proc. EUROSPEECH’ 91 (Genova, Italy), 923–926.
Klatt, D.H. 1979. Synthesis by rule of segmental durations in English sentences. In Lindblom and Öhman (eds.), 287–300.
Langlais P. 1995. Traitement de la Prosodie en Reconnaissance Atomatique de la Parole. Thèse de l’Université d’Avignon.
Lehiste, I. 1970. Suprasegmentals. Cambridge, Mass. MIT Press.
Lindblom, B. and S.E.G. Öhman (eds.). 1979. Frontiers of Speech Communication Research. New York: Academic Press.
Lindblom, B. and K. Rapp. 1973. Some temporal regularities of spoken Swedish. Symposium on Auditory Analysis and Perception of Speech (Leningrad, USSR), 21–23.
Ljolje, A. and F. Fallside. 1987. Modelling of speech using primarily prosodic parameters. Computer Speech and Language 2, 185–204.
Lokbani, M.N., D. Jouvet and J. Monné. 1993. Segmental post-processing of the N-best solutions in a speech recognition system. Proc. EUROSPEECH’ 93 (Berlin, Germany), 811–814.
McNeilage, P. and J.L. De Clerk. 1968. Cinefluorographic study of speaking rate. Proc. 76th Meeting of the Acoustic Society of America (Cleveland, USA), 19–22.
Mercier, G., D. Bigorgne, L. Miclet, L. Le Guennec and M. Querré. 1990. Recognition of speaker-dependent continuous speech with KEAL. In Waibel and Lee (eds.), 225–234.
Morin D. 1991. Influence of field data in HMM training for a vocal server. Proc. EUROSPEECH’ 91 (Genova, Italy), 735–738.
Nespor, M. and I. Vogel. 1986. Prosodic Phonology. Dordrecht: Foris
O’Shaughnessy, D. 1984. A multispeaker analysis of durations in read French paragraphs. J. Acoust. Soc. Am. 76, 1664–1672.
Ostendorf, M.F. and K. Ross. 1997. A Multi-level model for intonation labels. In Sagisaka et al (eds.), 291–308.
Ostendorf, M.F., P.J. Price, J. Bear and C.W. Wightman. 1990. The use of relative duration in syntactic disambiguation. Proc. DARPA Speech and Natural Language Workshop (Hidden Valley, USA).
Ostendorf, M.F., C.W. Wightman and N.M. Veilleux. 1993. Parse scoring with prosodic information: an analysis/synthesis approach. Computer Speech and Language 7, 193–210.
Pols, L.C.W., X. Wang and L.F.M. ten Bosch. 1996. Modelling of phone duration (using the TIMIT database) and its potential benefit for ASR. Speech Communication 19, 161–176.
Rose, R.C. and D.B. Paul. 1990. A Hidden Markov model based keyword recognition system. Proc. IEEE International Conference on Acoustic, Speech and Speech Processing (Albuquerque, USA), 129–132.
Rouat, J., Ch.Y. Liu and D. Morissette. 1997. A pitch determination and voiced/unvoiced decision algorithm for noisy speech. Speech Communication 2, 191–207.
Sagisaka, Y., N. Campbell and N. Higuchi (eds.). 1997. Computing Prosody. New York: Springer-Verlag.
Selkirk, E.O. 1986. On derived domains in sentence prosody. Phonology Yearbook 3, 371–405.
Soong, F.K. 1998. A phonetically labelled acoustic segment (PLAS) approach to speech analysis-synthesis. Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (Glasgow, UK), 584–587.
Sorin, C. 1989. Perception de la parole continue. In Botte et al. (eds.), 123–139.
Sorin, C, D. Jouvet, C. Gagnoulet, D., Dubois, D., Sadek and M. Toularhoat. 1995. Operational and experimental French telecommunication services using CNET speech recognition and text-to-speech synthesis. Speech Communication 17, 273–286.
Suaudeau, N. and R. André-Obrecht. 1993. Sound duration modelling and time-variable speaking rate in a speech recognition system. Proc. EUROSPEECH’ 93 (Berlin, Germany), 307–310.
Vaissière, J. 1977. Premiers essais d’utilisation de la durée pour la segmentation en mots dans un système de reconnaissance. 8ème Journée d’Etudes sur la Parole (Aix-en-Provence, France), 345–352.
Waibel, A. 1987. Prosody and Speech Recognition. London: Pitman.
Waibel, A. and K.F. Lee (eds.). 1990. Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann Publishers.
Wang, M.Q. and J. Hirschberg. 1992. Automatic classification of intonation phrase boundaries. Speech Computer and Language 6, 175–196.
Wang, X. 1997. Incorporating knowledge on segmental duration in HMM-based continuous speech recognition. EFOTT, Amsterdam, The Netherlands.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Bartkova, K. (2000). Prosodic Parameters of French in a Speech Recognition System. In: Botinis, A. (eds) Intonation. Text, Speech and Language Technology, vol 15. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-4317-2_15
Download citation
DOI: https://doi.org/10.1007/978-94-011-4317-2_15
Publisher Name: Springer, Dordrecht
Print ISBN: 978-0-7923-6723-9
Online ISBN: 978-94-011-4317-2
eBook Packages: Springer Book Archive