Towards Fully Automatic Speech Processing Techniques for Interactive Voice Servers

Chollet, Gérard; Černocký, Jan; Gravier, Guillaume; Hennebert, Jean; Petrovska-Delacrétaz, Dijana; Yvon, François

doi:10.1007/978-1-4471-0845-0_17

Towards Fully Automatic Speech Processing Techniques for Interactive Voice Servers

Gérard Chollet⁴,
Jan Černocký⁵,
Guillaume Gravier⁴,
Jean Hennebert^6,7,
Dijana Petrovska-Delacrétaz⁶ &
…
François Yvon⁴

Conference paper

250 Accesses
2 Citations

Abstract

Automatic Speech Processing (Speech Recognition, Coding, Synthesis, Language Identification, Speaker Verification, Interpreting Telephony, etc.) has progressed to a level which allows its integration in the context of Interactive Voice Servers (IVS). The description of a personal telephone attendant (’Majordome’) focuses on some of the issues in the development of IVS. In particular, users should be allowed to dialogue with automatic systems over the telephone in their native language. To achieve this goal, we propose an approach called ALISP (Automatic Language Independent Speech Processing). The needs for ALISP are justified and some of the corresponding tools are described. Applications to very low bit-rate coders, automatic speech recognition and speaker verification illustrate our proposal.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atal, B.: Efficient coding of LPC parameters by temporal decomposition. Proc. IEEE ICASSP 83, (1983) 81–84
Google Scholar
Bennani, Y., Gallinari, P.: Connectionist approaches for automatic speaker recognition. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, Martigny, Switzerland, (1994) 95–102
Google Scholar
Bimbot, F.: An evaluation of temporal decomposition. Technical report, Acoustic research departement AT&T Bell Labs, (1990)
Google Scholar
Bimbot, F., Deligne, S., Yvon, F.: Unsupervised decomposition of phoneme strings into variable-length sequences, by multigrams. ICPHS, Stockholm, (1995)
Google Scholar
Bimbot, F., Pierraccini, R., Levin, E., Atal, B.: Modèles de sequence à horizon variable: multigrammes. Actes des XXèmes journées d’études sur la parole, Trégastel, (1994) 467–472
Google Scholar
Bourlard, H., Wellekens, C.: Links between markov models and multi-layer per- ceptrons. IEEE Trans. Patt. Anal. Machine Intell. 12 (12) (1990) 1167–1178
Article Google Scholar
Chollet, G., Cochard, J.L., Constantinescu, A., Jaboulet, C., Langlais, P.: Swiss French Poly Phone and PolyVar: Telephone speech databases to model inter- and intra-speaker variability. John NERBONNE, editor, Linguistic databases CSLI Publications (1997)
Google Scholar
Chollet, G., Černocký, J., Constantinescu, A., Deligne S., Bimbot, P.: Towards ALISP: a proposal for Automatic Language Independent Speech Processing. In Keith Ponting, editor, NATO ASI: Computational models of speech pattern processing Springer Verlag, in press
Google Scholar
Cole, R., Roginski, H., Fanty, M.: English alphabet recognition with telephone speech. Eurospeech Proceedings (1991) 479–482
Google Scholar
Dedina, M.J., Nusbaum, H.C.: PRONOUNCE: a program for pronunciation by analogy. Computer Speech and Langage 5 (1991) 55–64
Article Google Scholar
Deligne, S.: Modèles de séquences de longueurs variables: Application au traite- ment du langage écrit et de la parole. PhD thesis École nationale supérieure des télécommunications (ENST) Paris (1996)
Google Scholar
Deligne, S., Sakisaga, Y.: Learning a syntagmatic and paradigmatic structure from language data with a bi-multigram model. Proceeding of COLING/ACL’98 Montral (1998) 300–306
Google Scholar
Deligne, S., Yvon, F., Bimbot, F.: Introducing statistical dependencies and structural constraints in variable-length sequence models. In Laurent Miclet and Colin de la Higuera, editors, Grammatical Inference: Learning Syntax from Sentences Lecture Notes in Artificial Intelligence 1147 Springer (1996) 156–167
Google Scholar
Dietterich, T.G., Hild, H., Bakiri, G.: A comparison of ID3 and backpropagation for English text-to-speech mapping. Machine Learning 18 (1) (1995) 51–80
Google Scholar
Eatock, J.P., Mason, J.S.: A quantitative assessment of the relative speaker discriminant properties of phonemes. ICASSP 1 (1994) 133–136
Google Scholar
Fukada, T., Bacchiani, M., Paliwal-Sagisaka, K.: Speech recognition based on acoustically derived segment units. Proc. ICSLP 96 (1996) 1077–1080
Google Scholar
Gorin, A.L., Riccardi, G., Wright, J.H.: How May I Help You? In Keith Ponting, editor, NATO ASI: Computational models of speech pattern processing. Springer Verlag, in press
Google Scholar
Gravier, G.,Etorre, G., Yvon, F., Chollet, G.: Directory name retrieval using HMM modeling and robust lexical access. Workshop on Automatic Speech Recognition and Understanding (1997)
Google Scholar
Hennebert, J., Petrovska-Delacréraz, D.: Phoneme based text-prompted speaker verification with Multi-Layer Perceptrons. RLA2C 98 Avignon Prance (1998) 55–58
Google Scholar
Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the theory of Neural Computation Santa Fe Institute Studies in the Sciences of Complexity Addison Wesley (1991)
Google Scholar
Jouvet, D. etal.: Speaker-independent spelling recognition over the telephone. Int. Conf. on ASSP 2 (1993) 235–238
Google Scholar
Junqua, J.-C. etal.: An N-best strategy, dynamic grammars and selectively trained neural networks for real-time recognition of continuously spelled names over the telephone. Int. Conf. on ASSP (1995) 852–855
Google Scholar
Lennig, M.: Deploying large-scale speech recognition applications: experience from the field. 4th IEEE Workshop on Interactive Voice Technology for Telecommunication Applications (IVTTA) Torino September (1998)
Google Scholar
Loizou, P.C., Spanias, A.S.: High-performance alphabet recognition. IEEE Trans, on Speech and Audio Processing 4(6) November (1996) 430–445
Google Scholar
Luk, R., Damper, R.I.: Stochastic phonographic transduction for English. Computer Speech and Language 10 (1996) 133–153
Article Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assesment of detection task performance. Eurospeech Proceedings Rhodes Greece (1997) 1895–1898
Google Scholar
Meyer, M., Hild, H.: Recognition of spoken and spelled proper names. Eurospeech Proceedings (1997) 1579–1582
Google Scholar
Neubert, F., Gravier, G., Yvon, F., Chollet, G.: Directory name retrieval over the telephone in the PICASSO project. IVTTA (1998)
Google Scholar
Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics, 22 (1) (1996) 73–89
Google Scholar
Olsen, J.: A two-stage procedure for phone based speaker verification. In Borgefors, G., Bigün, J., Chollet, G., editor, First International Conference on Audio and Video Based Biometric Person Authentication (AVBPA) Crans Switzerland Springer Verlag Lecture Notes in computer Science 1206 (1997) 219–226
Google Scholar
Ostendorf, M., Price, P.J., Shattuck-Hufnagel, S.: The Boston University radio news corpus. Technical report Boston University (1995)
Google Scholar
Petrovska-Delacrétaz, D., Hennebert, J.: Text-prompted speaker verification experiments with phoneme specific MLPs. ICASSP Seattle (1998) 777–780
Google Scholar
Pye, D.: Automatic recognition of continuous spelled Swiss-German letters. Technical report IDIAP (1994)
Google Scholar
Reynolds, D.A.: Automatic speaker recognition using gaussian mixture speaker models. The Lincoln Laboratory Journal 8 (2) (1995) 173–191
Google Scholar
Reynolds, D.A.: Comparison of background normalisation methods for text- independent speaker verification. Eurospeech Proceedings (1997) 963–966
Google Scholar
Schmid, P. etal.: Real-time, neural network-based, French alphabet recognition with telephone speech. Eurospeech Proceedings (1993) 1723–1726
Google Scholar
Schmidt, M., Fitt, S., Scott, T., Jack, M.: Phonetic transcription standards for European names (ONOMASTICA). Eurospeech Proceedings 1 Berlin (1993) 279– 282
Google Scholar
Sejnowski, T.J., Rosenberg, C.: Parallel networks that learn to pronounce English text. Complex Systems 1 (1987) 145–168
MATH Google Scholar
van den Bosch, A.: Learning to pronounce written words: A study in inductive language learning. PhD thesis University of Maastricht (1997)
Google Scholar
Černocký, J., Baudoin, G., Chollet, G.: Segmental vocoder - going beyond the phonetic approach. Proc. IEEE ICASSP Seattle WA May (1998) 605–608
Google Scholar
Vitale, T.: An algorithm for high accuracy name pronunciation by parametric speech synthesizer. Computational Linguistics, 17 (3) (1991) 257–276
Google Scholar
Yvon, F.: Grapheme-to-phoneme conversion using multiple unbounded overlapping chunks. Proceedings of the conference on New Methods in Natural Language Processing (NeMLaP II) Ankara Turkey (1996) 218–228
Google Scholar
Yvon, F.: Prononcer par analogie: motivation, formalisation et valuation. PhD thesis, Ecole Nationale Superieure des Telecommunications (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

ENST — CNRS URA 820, 46 rue Barrault, 75634, Paris Cedex 13, France
Gérard Chollet, Guillaume Gravier & François Yvon
Institute of Radioelectronics, Technical University Brno, Czech Republic
Jan Černocký
Swiss Federal Institute of Technology, Circuits and Systems Group, 1015, Lausanne, Switzerland
Jean Hennebert & Dijana Petrovska-Delacrétaz
Ubilab, UBS IT Innovation Laboratory, Bahnhofstrasse 45, CH-8098, Zurich, Switzerland
Jean Hennebert

Authors

Gérard Chollet
View author publications
You can also search for this author in PubMed Google Scholar
Jan Černocký
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Gravier
View author publications
You can also search for this author in PubMed Google Scholar
Jean Hennebert
View author publications
You can also search for this author in PubMed Google Scholar
Dijana Petrovska-Delacrétaz
View author publications
You can also search for this author in PubMed Google Scholar
François Yvon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ENST-CNR URA 820, 46 rue Barrault, 75634, Paris Cedex 13, France
Gerard Chollet PhD
INFOCOM Department, Rome University “La Sapienza”, via Eudossiana 18, I00184, Rome, Italy
Maria Gabriella Di Benedetto PhD
IIASS, via G Pellegrino 19, I-84019, Vietri sul Mare (SA), Italy
Anna Esposito PhD & Maria Marinaro PhD &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chollet, G., Černocký, J., Gravier, G., Hennebert, J., Petrovska-Delacrétaz, D., Yvon, F. (1999). Towards Fully Automatic Speech Processing Techniques for Interactive Voice Servers. In: Chollet, G., Di Benedetto, M.G., Esposito, A., Marinaro, M. (eds) Speech Processing, Recognition and Artificial Neural Networks. Springer, London. https://doi.org/10.1007/978-1-4471-0845-0_17

Download citation

DOI: https://doi.org/10.1007/978-1-4471-0845-0_17
Publisher Name: Springer, London
Print ISBN: 978-1-85233-094-1
Online ISBN: 978-1-4471-0845-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics