A statistical model for an automatic procedure to compress a word transcription dictionary

  • Fériel Mouria-Beji
Poster Papers
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1451)

Abstract

Various experiments have conclusively shown that superior continuous speech recognition performance is obtained when using context-dependent phonemic models. However, we have observed that using an explicit context-dependent phonemic model can yield many transcriptions for a single lexicon entry. In this work, we study the compression of the word transcription dictionaries (WTD) into a more compact form to balance the need between flexibility and reliability. Based on a measure of a likelihood function, a statistical model for an automatic procedure to compress a WTD is developed. The compressed dictionary is then used for sentence recognition in a continuous speech recognition system. Experimental results indicate a substantial improvement of the recognition rate after compression.

Keywords

Continuous Speech Speaking Rate Continuous Speech Recognition Phonetic Transcription Sentence Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. [1]
    Y. Zhao. A speaker-independent continuous speech recognition system using continuous mixture gaussian density HMM of phoneme-sized units. IEEE Trans. on Speech and Audio Processing, 1(3):345–361, July 1993.CrossRefGoogle Scholar
  2. [2]
    L. R. Bahl, P. F. Brown, P. V. de Souza, R. L. Mercer, and M. A. Picheny. A method or the construction of acoustic Markov models for words. IEEE Trans. on Speech and Audio Processing, 1(4):443–452, October 1993.CrossRefGoogle Scholar
  3. [3]
    Y. Zhao, H. Wakita, and X. Zhuang. Generate word transcription dictionary from sentence utterances and evaluate its effects on speaker independant continuous speech recognition. In Proceedings of European Conference on Speech Technology, pages 679–682, Genova, Italy, September 1991.Google Scholar
  4. [4]
    F. Mouria-Beji. Context and Speed Dependent Phonetic Models for Continuous Speech Recognition. In ESCA Tutorial Proceedings of Modeling Pronunciation Variation for Automatic Speech Recognition, Kerkrade, Netherlands, May 1998.Google Scholar
  5. [5]
    K.F. Lee. Large-vocabulary speaker-independent continuous speech recognition: the SPHINX system. PhD thesis, Carnegie Mellon Univ., Pittsburgh, PA, April 1988.Google Scholar
  6. [6]
    F. Beji Mouria. Un Systéme de Reconnaissance de la Parole Continue et son Expérimentation avec un Large Vocabulaire. To appear in revue magrébine de l'ingénieur. 1998.Google Scholar
  7. [7]
    C.H. Lee. Acoustic modeling of subword units for speech recognition. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pages 721–724, Albuquerque, USA, April 1990.Google Scholar
  8. [8]
    L.R. Bahl, P.V. de Souza, P.S. Gopalakrishnan, N. Nahamoo, and M.A. Pichney. Decision trees for phonological roles in continuous speech. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pages 185–188, Toronto, Canada, May 1991.Google Scholar
  9. [9]
    F. Mouria-Beji. CODEPHON-NN: A COntext-DEpendent PHONemic model based on Neural Networks. In Computational Engineering in Systems Applications multiconference, CESA'98. IEEE-SMC, April 1998.Google Scholar
  10. [10]
    R.M. Schwartz, Y. Chow, O.A. Kimball, S. Roucos, M. Krasner, and J. Makhoul. Contextdependent modeling for acoustic-phonetic recognition of continuous speech. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1205–1208, Tampa, Florida, March 1985.Google Scholar
  11. [11]
    F. Mouria-Beji, Y. Gong, and J. P. Haton. Use of explicit context-dependent phonemic model in continuous speech recognition. In Proc. European Conf. on Speech Communication and Technology, pages 2223–2226, Berlin, Germany, 1993,EUROSPEECH'93.Google Scholar
  12. [12]
    L.R. Bahl, P.V. de Souza, P.S. Gopalakrishnan, D. Nahamoo, and M. Picheney. Word lookahead scheme for cross-word right context models in a stack decoder. In Proceedings of European Conference on Speech Technology, pages 851–854, Berlin, 1993.Google Scholar
  13. [13]
    K.F. Lee. Context dependent phonetic hidden Markov models for speaker independent continuous speech recognition. IEEE Trans. on Acoust., Speech and Signal Processing, 38(4):599–609, April 1990.Google Scholar
  14. [14]
    F. Mouria-Beji. A Multi-Lingual Continuous Speech Recognition System. In Proc. 6th International Conference and Exhibition on Multi-lingual Computing, Cambridge, UK, April 1998. ICEMCO-98.Google Scholar
  15. [15]
    Y. Gong, J.-P. Haton, and F. Mouria-Beji. Continuous speech recognition based on high plausibility regions. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing 1991, volume 1, pages 725–728, Toronto, Canada, May 1991, IEEE-ICASSP'91.Google Scholar
  16. [16]
    F. Mouria-Beji. Neural network use in a non-linear vectorial interpolation technique for speaker recognition. In IEEE World Congress on Computational Intelligence, Anchorage, Alaska, May 1998,IEEE WCCLGoogle Scholar
  17. [17]
    F. Mouria-Beji, Y. Gong, and J. P. Haton. Un modéle phonétique tenant compte explicitement du contexte pour la reconnaissance de la parole. In Actes du 9 éme congrés Reconnaissance des Formes et Intelligence Artificielle, volume 1, pages 265–275, Paris, France, January 1994, AFCET RFIA'94.Google Scholar
  18. [18]
    F. Mouria-Beji and J. P. Haton. Utilisation des réseaux de neurones pour le traitement de la variabilité du signal de parole. In The 15th Tunisian Conference on Electrical Machinery and Automatic Controle, volume 1, pages 122–130, Nabeul, November. 1995. JTEA'95.Google Scholar
  19. [19]
    S. Takahashi, T. Matsuoka, Y. Minami, and K. Shikano. Phoneme HMMS constrained by frame correlations. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, volume II, pages 219–222, 1993.Google Scholar
  20. [20]
    M. Ostendorf and S. Roucos. A stochastic segment model for phoneme-based continuous speech recognition. IEEE Trans. Acoust., Speech and Signal Processing, 37(12):1857–1869,1989.Google Scholar
  21. [21]
    V. V. Digalakis, M. Ostendorf, and J. R. Rohlicek. Fast algorithms for phone classification and recognition using segment-based models. IEEE Trans. on Signal Processing, 40(12):2885–2896, Dec. 1992.CrossRefGoogle Scholar
  22. [22]
    S. Furui. Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust., Speech and Signal Processing, ASSP-34(1):53–59, 1986.Google Scholar
  23. [23]
    Wellekens. Explicite time correlation in hidden Markov models for speech recognition. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, pages 384–386, Dallas, 1987.Google Scholar
  24. [24]
    K. K. PaliwaL Use of temporal correlation between successive frames in a hidden Markov model based speech recognizes. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, volume II, pages 215–218, 1993.Google Scholar
  25. [25]
    T. Robinson. A real-time recurrent error propagation network word recognition system. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, volume I, pages 617–620, 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Fériel Mouria-Beji
    • 1
  1. 1.ENSI/LIA. Artificial Intelligence Group.TunisTunisia

Personalised recommendations