Abstract
Automatic Speech Processing (Speech Recognition, Coding, Synthesis, Language Identification, Speaker Verification, Interpreting Telephony, etc.) has progressed to a level which allows its integration in the context of Interactive Voice Servers (IVS). The description of a personal telephone attendant (’Majordome’) focuses on some of the issues in the development of IVS. In particular, users should be allowed to dialogue with automatic systems over the telephone in their native language. To achieve this goal, we propose an approach called ALISP (Automatic Language Independent Speech Processing). The needs for ALISP are justified and some of the corresponding tools are described. Applications to very low bit-rate coders, automatic speech recognition and speaker verification illustrate our proposal.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Atal, B.: Efficient coding of LPC parameters by temporal decomposition. Proc. IEEE ICASSP 83, (1983) 81–84
Bennani, Y., Gallinari, P.: Connectionist approaches for automatic speaker recognition. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, Martigny, Switzerland, (1994) 95–102
Bimbot, F.: An evaluation of temporal decomposition. Technical report, Acoustic research departement AT&T Bell Labs, (1990)
Bimbot, F., Deligne, S., Yvon, F.: Unsupervised decomposition of phoneme strings into variable-length sequences, by multigrams. ICPHS, Stockholm, (1995)
Bimbot, F., Pierraccini, R., Levin, E., Atal, B.: Modèles de sequence à horizon variable: multigrammes. Actes des XXèmes journées d’études sur la parole, Trégastel, (1994) 467–472
Bourlard, H., Wellekens, C.: Links between markov models and multi-layer per- ceptrons. IEEE Trans. Patt. Anal. Machine Intell. 12 (12) (1990) 1167–1178
Chollet, G., Cochard, J.L., Constantinescu, A., Jaboulet, C., Langlais, P.: Swiss French Poly Phone and PolyVar: Telephone speech databases to model inter- and intra-speaker variability. John NERBONNE, editor, Linguistic databases CSLI Publications (1997)
Chollet, G., Černocký, J., Constantinescu, A., Deligne S., Bimbot, P.: Towards ALISP: a proposal for Automatic Language Independent Speech Processing. In Keith Ponting, editor, NATO ASI: Computational models of speech pattern processing Springer Verlag, in press
Cole, R., Roginski, H., Fanty, M.: English alphabet recognition with telephone speech. Eurospeech Proceedings (1991) 479–482
Dedina, M.J., Nusbaum, H.C.: PRONOUNCE: a program for pronunciation by analogy. Computer Speech and Langage 5 (1991) 55–64
Deligne, S.: Modèles de séquences de longueurs variables: Application au traite- ment du langage écrit et de la parole. PhD thesis École nationale supérieure des télécommunications (ENST) Paris (1996)
Deligne, S., Sakisaga, Y.: Learning a syntagmatic and paradigmatic structure from language data with a bi-multigram model. Proceeding of COLING/ACL’98 Montral (1998) 300–306
Deligne, S., Yvon, F., Bimbot, F.: Introducing statistical dependencies and structural constraints in variable-length sequence models. In Laurent Miclet and Colin de la Higuera, editors, Grammatical Inference: Learning Syntax from Sentences Lecture Notes in Artificial Intelligence 1147 Springer (1996) 156–167
Dietterich, T.G., Hild, H., Bakiri, G.: A comparison of ID3 and backpropagation for English text-to-speech mapping. Machine Learning 18 (1) (1995) 51–80
Eatock, J.P., Mason, J.S.: A quantitative assessment of the relative speaker discriminant properties of phonemes. ICASSP 1 (1994) 133–136
Fukada, T., Bacchiani, M., Paliwal-Sagisaka, K.: Speech recognition based on acoustically derived segment units. Proc. ICSLP 96 (1996) 1077–1080
Gorin, A.L., Riccardi, G., Wright, J.H.: How May I Help You? In Keith Ponting, editor, NATO ASI: Computational models of speech pattern processing. Springer Verlag, in press
Gravier, G.,Etorre, G., Yvon, F., Chollet, G.: Directory name retrieval using HMM modeling and robust lexical access. Workshop on Automatic Speech Recognition and Understanding (1997)
Hennebert, J., Petrovska-Delacréraz, D.: Phoneme based text-prompted speaker verification with Multi-Layer Perceptrons. RLA2C 98 Avignon Prance (1998) 55–58
Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the theory of Neural Computation Santa Fe Institute Studies in the Sciences of Complexity Addison Wesley (1991)
Jouvet, D. etal.: Speaker-independent spelling recognition over the telephone. Int. Conf. on ASSP 2 (1993) 235–238
Junqua, J.-C. etal.: An N-best strategy, dynamic grammars and selectively trained neural networks for real-time recognition of continuously spelled names over the telephone. Int. Conf. on ASSP (1995) 852–855
Lennig, M.: Deploying large-scale speech recognition applications: experience from the field. 4th IEEE Workshop on Interactive Voice Technology for Telecommunication Applications (IVTTA) Torino September (1998)
Loizou, P.C., Spanias, A.S.: High-performance alphabet recognition. IEEE Trans, on Speech and Audio Processing 4(6) November (1996) 430–445
Luk, R., Damper, R.I.: Stochastic phonographic transduction for English. Computer Speech and Language 10 (1996) 133–153
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assesment of detection task performance. Eurospeech Proceedings Rhodes Greece (1997) 1895–1898
Meyer, M., Hild, H.: Recognition of spoken and spelled proper names. Eurospeech Proceedings (1997) 1579–1582
Neubert, F., Gravier, G., Yvon, F., Chollet, G.: Directory name retrieval over the telephone in the PICASSO project. IVTTA (1998)
Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics, 22 (1) (1996) 73–89
Olsen, J.: A two-stage procedure for phone based speaker verification. In Borgefors, G., Bigün, J., Chollet, G., editor, First International Conference on Audio and Video Based Biometric Person Authentication (AVBPA) Crans Switzerland Springer Verlag Lecture Notes in computer Science 1206 (1997) 219–226
Ostendorf, M., Price, P.J., Shattuck-Hufnagel, S.: The Boston University radio news corpus. Technical report Boston University (1995)
Petrovska-Delacrétaz, D., Hennebert, J.: Text-prompted speaker verification experiments with phoneme specific MLPs. ICASSP Seattle (1998) 777–780
Pye, D.: Automatic recognition of continuous spelled Swiss-German letters. Technical report IDIAP (1994)
Reynolds, D.A.: Automatic speaker recognition using gaussian mixture speaker models. The Lincoln Laboratory Journal 8 (2) (1995) 173–191
Reynolds, D.A.: Comparison of background normalisation methods for text- independent speaker verification. Eurospeech Proceedings (1997) 963–966
Schmid, P. etal.: Real-time, neural network-based, French alphabet recognition with telephone speech. Eurospeech Proceedings (1993) 1723–1726
Schmidt, M., Fitt, S., Scott, T., Jack, M.: Phonetic transcription standards for European names (ONOMASTICA). Eurospeech Proceedings 1 Berlin (1993) 279– 282
Sejnowski, T.J., Rosenberg, C.: Parallel networks that learn to pronounce English text. Complex Systems 1 (1987) 145–168
van den Bosch, A.: Learning to pronounce written words: A study in inductive language learning. PhD thesis University of Maastricht (1997)
Černocký, J., Baudoin, G., Chollet, G.: Segmental vocoder - going beyond the phonetic approach. Proc. IEEE ICASSP Seattle WA May (1998) 605–608
Vitale, T.: An algorithm for high accuracy name pronunciation by parametric speech synthesizer. Computational Linguistics, 17 (3) (1991) 257–276
Yvon, F.: Grapheme-to-phoneme conversion using multiple unbounded overlapping chunks. Proceedings of the conference on New Methods in Natural Language Processing (NeMLaP II) Ankara Turkey (1996) 218–228
Yvon, F.: Prononcer par analogie: motivation, formalisation et valuation. PhD thesis, Ecole Nationale Superieure des Telecommunications (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag London Limited
About this paper
Cite this paper
Chollet, G., Černocký, J., Gravier, G., Hennebert, J., Petrovska-Delacrétaz, D., Yvon, F. (1999). Towards Fully Automatic Speech Processing Techniques for Interactive Voice Servers. In: Chollet, G., Di Benedetto, M.G., Esposito, A., Marinaro, M. (eds) Speech Processing, Recognition and Artificial Neural Networks. Springer, London. https://doi.org/10.1007/978-1-4471-0845-0_17
Download citation
DOI: https://doi.org/10.1007/978-1-4471-0845-0_17
Publisher Name: Springer, London
Print ISBN: 978-1-85233-094-1
Online ISBN: 978-1-4471-0845-0
eBook Packages: Springer Book Archive