Skip to main content

Towards Fully Automatic Speech Processing Techniques for Interactive Voice Servers

  • Conference paper

Abstract

Automatic Speech Processing (Speech Recognition, Coding, Synthesis, Language Identification, Speaker Verification, Interpreting Telephony, etc.) has progressed to a level which allows its integration in the context of Interactive Voice Servers (IVS). The description of a personal telephone attendant (’Majordome’) focuses on some of the issues in the development of IVS. In particular, users should be allowed to dialogue with automatic systems over the telephone in their native language. To achieve this goal, we propose an approach called ALISP (Automatic Language Independent Speech Processing). The needs for ALISP are justified and some of the corresponding tools are described. Applications to very low bit-rate coders, automatic speech recognition and speaker verification illustrate our proposal.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Atal, B.: Efficient coding of LPC parameters by temporal decomposition. Proc. IEEE ICASSP 83, (1983) 81–84

    Google Scholar 

  2. Bennani, Y., Gallinari, P.: Connectionist approaches for automatic speaker recognition. ESCA Workshop on Automatic Speaker Recognition, Identification and Verification, Martigny, Switzerland, (1994) 95–102

    Google Scholar 

  3. Bimbot, F.: An evaluation of temporal decomposition. Technical report, Acoustic research departement AT&T Bell Labs, (1990)

    Google Scholar 

  4. Bimbot, F., Deligne, S., Yvon, F.: Unsupervised decomposition of phoneme strings into variable-length sequences, by multigrams. ICPHS, Stockholm, (1995)

    Google Scholar 

  5. Bimbot, F., Pierraccini, R., Levin, E., Atal, B.: Modèles de sequence à horizon variable: multigrammes. Actes des XXèmes journées d’études sur la parole, Trégastel, (1994) 467–472

    Google Scholar 

  6. Bourlard, H., Wellekens, C.: Links between markov models and multi-layer per- ceptrons. IEEE Trans. Patt. Anal. Machine Intell. 12 (12) (1990) 1167–1178

    Article  Google Scholar 

  7. Chollet, G., Cochard, J.L., Constantinescu, A., Jaboulet, C., Langlais, P.: Swiss French Poly Phone and PolyVar: Telephone speech databases to model inter- and intra-speaker variability. John NERBONNE, editor, Linguistic databases CSLI Publications (1997)

    Google Scholar 

  8. Chollet, G., Černocký, J., Constantinescu, A., Deligne S., Bimbot, P.: Towards ALISP: a proposal for Automatic Language Independent Speech Processing. In Keith Ponting, editor, NATO ASI: Computational models of speech pattern processing Springer Verlag, in press

    Google Scholar 

  9. Cole, R., Roginski, H., Fanty, M.: English alphabet recognition with telephone speech. Eurospeech Proceedings (1991) 479–482

    Google Scholar 

  10. Dedina, M.J., Nusbaum, H.C.: PRONOUNCE: a program for pronunciation by analogy. Computer Speech and Langage 5 (1991) 55–64

    Article  Google Scholar 

  11. Deligne, S.: Modèles de séquences de longueurs variables: Application au traite- ment du langage écrit et de la parole. PhD thesis École nationale supérieure des télécommunications (ENST) Paris (1996)

    Google Scholar 

  12. Deligne, S., Sakisaga, Y.: Learning a syntagmatic and paradigmatic structure from language data with a bi-multigram model. Proceeding of COLING/ACL’98 Montral (1998) 300–306

    Google Scholar 

  13. Deligne, S., Yvon, F., Bimbot, F.: Introducing statistical dependencies and structural constraints in variable-length sequence models. In Laurent Miclet and Colin de la Higuera, editors, Grammatical Inference: Learning Syntax from Sentences Lecture Notes in Artificial Intelligence 1147 Springer (1996) 156–167

    Google Scholar 

  14. Dietterich, T.G., Hild, H., Bakiri, G.: A comparison of ID3 and backpropagation for English text-to-speech mapping. Machine Learning 18 (1) (1995) 51–80

    Google Scholar 

  15. Eatock, J.P., Mason, J.S.: A quantitative assessment of the relative speaker discriminant properties of phonemes. ICASSP 1 (1994) 133–136

    Google Scholar 

  16. Fukada, T., Bacchiani, M., Paliwal-Sagisaka, K.: Speech recognition based on acoustically derived segment units. Proc. ICSLP 96 (1996) 1077–1080

    Google Scholar 

  17. Gorin, A.L., Riccardi, G., Wright, J.H.: How May I Help You? In Keith Ponting, editor, NATO ASI: Computational models of speech pattern processing. Springer Verlag, in press

    Google Scholar 

  18. Gravier, G.,Etorre, G., Yvon, F., Chollet, G.: Directory name retrieval using HMM modeling and robust lexical access. Workshop on Automatic Speech Recognition and Understanding (1997)

    Google Scholar 

  19. Hennebert, J., Petrovska-Delacréraz, D.: Phoneme based text-prompted speaker verification with Multi-Layer Perceptrons. RLA2C 98 Avignon Prance (1998) 55–58

    Google Scholar 

  20. Hertz, J., Krogh, A., Palmer, R.G.: Introduction to the theory of Neural Computation Santa Fe Institute Studies in the Sciences of Complexity Addison Wesley (1991)

    Google Scholar 

  21. Jouvet, D. etal.: Speaker-independent spelling recognition over the telephone. Int. Conf. on ASSP 2 (1993) 235–238

    Google Scholar 

  22. Junqua, J.-C. etal.: An N-best strategy, dynamic grammars and selectively trained neural networks for real-time recognition of continuously spelled names over the telephone. Int. Conf. on ASSP (1995) 852–855

    Google Scholar 

  23. Lennig, M.: Deploying large-scale speech recognition applications: experience from the field. 4th IEEE Workshop on Interactive Voice Technology for Telecommunication Applications (IVTTA) Torino September (1998)

    Google Scholar 

  24. Loizou, P.C., Spanias, A.S.: High-performance alphabet recognition. IEEE Trans, on Speech and Audio Processing 4(6) November (1996) 430–445

    Google Scholar 

  25. Luk, R., Damper, R.I.: Stochastic phonographic transduction for English. Computer Speech and Language 10 (1996) 133–153

    Article  Google Scholar 

  26. Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assesment of detection task performance. Eurospeech Proceedings Rhodes Greece (1997) 1895–1898

    Google Scholar 

  27. Meyer, M., Hild, H.: Recognition of spoken and spelled proper names. Eurospeech Proceedings (1997) 1579–1582

    Google Scholar 

  28. Neubert, F., Gravier, G., Yvon, F., Chollet, G.: Directory name retrieval over the telephone in the PICASSO project. IVTTA (1998)

    Google Scholar 

  29. Oflazer, K.: Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction. Computational Linguistics, 22 (1) (1996) 73–89

    Google Scholar 

  30. Olsen, J.: A two-stage procedure for phone based speaker verification. In Borgefors, G., Bigün, J., Chollet, G., editor, First International Conference on Audio and Video Based Biometric Person Authentication (AVBPA) Crans Switzerland Springer Verlag Lecture Notes in computer Science 1206 (1997) 219–226

    Google Scholar 

  31. Ostendorf, M., Price, P.J., Shattuck-Hufnagel, S.: The Boston University radio news corpus. Technical report Boston University (1995)

    Google Scholar 

  32. Petrovska-Delacrétaz, D., Hennebert, J.: Text-prompted speaker verification experiments with phoneme specific MLPs. ICASSP Seattle (1998) 777–780

    Google Scholar 

  33. Pye, D.: Automatic recognition of continuous spelled Swiss-German letters. Technical report IDIAP (1994)

    Google Scholar 

  34. Reynolds, D.A.: Automatic speaker recognition using gaussian mixture speaker models. The Lincoln Laboratory Journal 8 (2) (1995) 173–191

    Google Scholar 

  35. Reynolds, D.A.: Comparison of background normalisation methods for text- independent speaker verification. Eurospeech Proceedings (1997) 963–966

    Google Scholar 

  36. Schmid, P. etal.: Real-time, neural network-based, French alphabet recognition with telephone speech. Eurospeech Proceedings (1993) 1723–1726

    Google Scholar 

  37. Schmidt, M., Fitt, S., Scott, T., Jack, M.: Phonetic transcription standards for European names (ONOMASTICA). Eurospeech Proceedings 1 Berlin (1993) 279– 282

    Google Scholar 

  38. Sejnowski, T.J., Rosenberg, C.: Parallel networks that learn to pronounce English text. Complex Systems 1 (1987) 145–168

    MATH  Google Scholar 

  39. van den Bosch, A.: Learning to pronounce written words: A study in inductive language learning. PhD thesis University of Maastricht (1997)

    Google Scholar 

  40. Černocký, J., Baudoin, G., Chollet, G.: Segmental vocoder - going beyond the phonetic approach. Proc. IEEE ICASSP Seattle WA May (1998) 605–608

    Google Scholar 

  41. Vitale, T.: An algorithm for high accuracy name pronunciation by parametric speech synthesizer. Computational Linguistics, 17 (3) (1991) 257–276

    Google Scholar 

  42. Yvon, F.: Grapheme-to-phoneme conversion using multiple unbounded overlapping chunks. Proceedings of the conference on New Methods in Natural Language Processing (NeMLaP II) Ankara Turkey (1996) 218–228

    Google Scholar 

  43. Yvon, F.: Prononcer par analogie: motivation, formalisation et valuation. PhD thesis, Ecole Nationale Superieure des Telecommunications (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag London Limited

About this paper

Cite this paper

Chollet, G., Černocký, J., Gravier, G., Hennebert, J., Petrovska-Delacrétaz, D., Yvon, F. (1999). Towards Fully Automatic Speech Processing Techniques for Interactive Voice Servers. In: Chollet, G., Di Benedetto, M.G., Esposito, A., Marinaro, M. (eds) Speech Processing, Recognition and Artificial Neural Networks. Springer, London. https://doi.org/10.1007/978-1-4471-0845-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-0845-0_17

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-85233-094-1

  • Online ISBN: 978-1-4471-0845-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics