Using Syllables as Acoustic Units for Spontaneous Speech Recognition

Hejtmánek, Jan

doi:10.1007/978-3-642-15760-8_38

Jan Hejtmánek²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6231))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1431 Accesses

Abstract

In this work, we deal with advanced context-dependent automatic speech recognition (ASR) of Czech spontaneous talk using hidden Markov models (HMM). Context-dependent units (e.g. triphones, diphones) in ASR systems provide significant improvement against simple non-context-dependent units. However, for spontaneous speech recognition we had to overcome some very challenging tasks. For one, the number of syllables compared to the size of spontaneous speech corpus makes the usage of context-dependent units very difficult. The main part of this article shows problems and procedures to effectively build and use a syllable-based ASR with the LASER (ASR system developed at Department of Computer Science and Engineering, Faculty of Applied Sciences). The procedures are usable with virtual any modern ASR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lánský, J., Žemlička, M.: Text Compression: Syllables. In: Proceedings of the Dateso 2005 Annual International Workshop on Databases, Texts, Specifications and Objects. CEUR-WS, vol. 129, pp. 32–45 (2005)
Google Scholar
Hejtmánek, J.: Use of Context-Dependent Units in Speech Recognition. Master thesis, University of West Bohemia in Pilsen, Faculty of Applied Sciences (2007)
Google Scholar
Hejtmánek, J., Pavelka, T.: Use of Context-Dependent Units in Czech Speech. In: Proc. of Ph.D. Workshop 2007, Balatonfred, Hungary (2007)
Google Scholar
Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.3), Cambridge University Engineering Department (2005)
Google Scholar
Yu, K., Mason, J., Oglesby, J.: Speaker Recognition Models. In: Proceedings of Eurospeech 1995, pp. 629–632 (1995)
Google Scholar
Laurinčukaité, S., Lipeika, A.: Syllable-Phoneme Based Continuous Speech Recognition. Institute of Mathematics and Informatics, Vilnius (2006)
Google Scholar
Chang, S.: A Syllable, Articulatory-Feature and Stress-Accent Model of Speech Recognition. Berkeley. Ph.D. thesis. International Computer Science Institute (2002)
Google Scholar
Ananthakrishnan, S., Narayanan, S.: Improved Speech Recognition Using Acoustic and Lexical Correlates of Pitch Accent in a N-best Rescoring Framework. Speech Analysis and Interpretation Laboratory Department of Electrical Engineerig Viterbi School of Engineering University of Southern California, Los Angeles (2007)
Google Scholar
Chen, K., Hasegawa-Johnson, M., Cohen, A.: An automatic Prosody Labeling System Using ANN-based Syntactic-Prosodic Model and GMM-Based Acoustic-Prosodic Model. In: International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. 509–512 (2004)
Google Scholar
Han, Y., Boves, L.: EM Algorithm with Split and Merge in Trajectory Clustering for Automatic Speech Recognition. Department of Language and Speech, Radboud University Nijmegen (2006)
Google Scholar
Shafran, I., Ostendorf, M.: Acoustic Model Clustering Based on Syllable Structure. Washington, Department of Electrical Engineering (2002)
Google Scholar
SIL International, Glosary of linguistic Terms (2008), http://www.sil.org

Download references

Author information

Authors and Affiliations

Laboratory of Intelligent Communication Systems, Dept. of Computer Science and Engineering, University of West Bohemia in Pilsen, Czech Republic
Jan Hejtmánek

Authors

Jan Hejtmánek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Aleš Horák
Faculty of Informatics, Masaryk University, Botanická 68a, CZ-602 00, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hejtmánek, J. (2010). Using Syllables as Acoustic Units for Spontaneous Speech Recognition. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2010. Lecture Notes in Computer Science(), vol 6231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15760-8_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-15760-8_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15759-2
Online ISBN: 978-3-642-15760-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics