Skip to main content

A Study of Phoneme and Grapheme Based Context-Dependent ASR Systems

  • Conference paper
Machine Learning for Multimodal Interaction (MLMI 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4892))

Included in the following conference series:

Abstract

In this paper we present a study of automatic speech recognition systems using context-dependent phonemes and graphemes as sub-word units based on the conventional HMM/GMM system as well as tandem system. Experimental studies conducted on three different continuous speech recognition tasks show that systems using only context-dependent graphemes can yield competitive performance on small to medium vocabulary tasks when compared to a context-dependent phoneme-based automatic speech recognition system. In particular, we demonstrate the utility of tandem features that use an MLP trained to estimate phoneme posterior probabilities in improving grapheme based recognition system performance by implicitly incorporating phonemic knowledge into the system without having to define a phonetically transcribed lexicon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kanthak, S., Ney, H.: Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition. In: Proceedings of Int. Conf. Acoustics, Speech and Signal Processing (ICASSP), pp. 845–848 (2002)

    Google Scholar 

  2. Killer, M., Stüker, S., Schultz, T.: Grapheme based speech recognition. In: Proceedings of Eurospeech, pp. 3141–3144 (2003)

    Google Scholar 

  3. Magimai.-Doss, M., Stephenson, T.A., Bourlard, H., Bengio, S.: Phoneme-Grapheme based automatic speech recognition system. In: Proceedings of Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 94–98 (2003)

    Google Scholar 

  4. Schukat-Talamazzini, E.G., Niemann, H., Eckert, W., Kuhn, T., Rieck, S.: Automatic speech recognition without phonemes. In: Eurospeech, pp. 129–132 (1993)

    Google Scholar 

  5. Magimai.-Doss, M., Bengio, S., Bourlard, H.: Joint decoding for phoneme-grapheme continuous speech recognition. In: ICASSP. Proceedings of Int. Conf. Acoustics, Speech and Signal Processing, pp. I–177–I–180 (2004)

    Google Scholar 

  6. Hermansky, H.: Perceptual Linear Predictive (PLP) analysis of speech. Journal of Acoustical Society of America 87(4), 1738–1752 (1990)

    Article  Google Scholar 

  7. Hermansky, H., Ellis, D., Sharma, S.: Tandem connectionist feature stream extraction for conventional HMM systems. In: ICASSP. Proceedings of Int. Conf. Acoustics, Speech and Signal Processing, pp. III–1635–1638 (2000)

    Google Scholar 

  8. Cole, R.A., Fanty, M., Noel, M., Lander, T.: Telephone speech corpus development at CSLU. In: ICSLP 1994. Proceedings of Int. Conf. Spoken Language Processing (1994)

    Google Scholar 

  9. Price, P.J., Fisher, W., Bernstein, J.: A database for continuous speech recognition in a 1000 word domain. In: ICASSP 1988. Proceedings of Int. Conf. Acoustics, Speech and Signal Processing, vol. 1, pp. 651–654 (1988)

    Google Scholar 

  10. Chen, B., Çetin, Ö., Doddington, G., Morgan, N., Ostendorf, M., Shinozaki, T., Zhu, Q.: A CTS task for meaningful fast-turnaround experiments. In: Proceedings of Rich Transcription Fall Workshop, Palisades, NY (2004)

    Google Scholar 

  11. Black, A.W., Lenzo, K., Pagel, V.: Issues in building general letter to sound rules. In: Proceedings of 3rd ESCA Workshop on Speech Synthesis, Jenolan Caves, Australia, pp. 77–80 (1998)

    Google Scholar 

  12. Odell, J.J.: The use of context in large vocabulary continuous speech recognition. PhD thesis, Queens College, University of Cambridge (1995)

    Google Scholar 

  13. Ciprian, C., Morton, R.: Mutual information phone clustering for decision tree induction. In: ICSLP 2002. Proceedings of Int. Conf. Spoken Language Processing, Denver, Collorado (2002)

    Google Scholar 

  14. Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: On using MLP features in lvcsr. In: ICSLP 2004. Proceedings of Int. Conf. Spoken Language Processing, Korea (2004)

    Google Scholar 

  15. Ikbal, S., Misra, H., Sivadas, S., Hermansky, H., Bourlard, H.: Entropy based combination of tandem representations for robust speech recognition. In: ICSLP 2004. Proceedings of Int. Conf. Spoken Language Processing, Korea (2004)

    Google Scholar 

  16. Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: Hidden Markov model toolkit V3.2.1 reference manual. Technical report, Speech group, Engineering Department, Cambridge University, UK (2002)

    Google Scholar 

  17. Mirghafori, N., Morgan, N.: Combining connectionist multi-band and full-band probability streams for speech recognition of natural numbers. In: Proceedings of Int. Conf. Spoken Language Processing, pp. 743–746 (1998)

    Google Scholar 

  18. Stolcke, A., Grézl, F., Hwang, M.Y., Lei, X., Morgan, N., Vergyri, D.: Cross-domain and cross-language portability of acoustic features estimated by multilayer perceptrons. In: ICASSP 2006. Proceedings of Int. Conf. on Acoustics, Speech and Signal Processing, Toulouse, France (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Andrei Popescu-Belis Steve Renals Hervé Bourlard

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dines, J., Magimai Doss, M. (2008). A Study of Phoneme and Grapheme Based Context-Dependent ASR Systems. In: Popescu-Belis, A., Renals, S., Bourlard, H. (eds) Machine Learning for Multimodal Interaction. MLMI 2007. Lecture Notes in Computer Science, vol 4892. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78155-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78155-4_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78154-7

  • Online ISBN: 978-3-540-78155-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics