Language Model Speaker Adaptation for Transcription of Slovak Parliament Proceedings

Staš, Ján; Hládek, Daniel; Juhár, Jozef

doi:10.1007/978-3-319-23132-7_32

Ján Staš⁷,
Daniel Hládek⁷ &
Jozef Juhár⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9319))

Included in the following conference series:

International Conference on Speech and Computer

1599 Accesses
3 Citations

Abstract

Language model and acoustic model adaptation play an important role in enhancing performance and robustness of automatic speech recognition, especially in the case of domain-specific, gender-dependent, or user-adapted systems development. This paper is oriented on the language model speaker adaptation for transcription of parliament proceedings in Slovak for individual speaker. Based on the current research studies, we have developed a framework combining multiple speech recognition outputs with acoustic and language model adaptation at different stages. The preliminary results show a significant decrease in the model perplexity from 45 % to 74 % relatively and the speech recognition word error rate from 29 % to 43 %, for male and female speakers respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.nrsr.sk/dl/.

References

Rusko, M., et al.: Slovak automatic dictation system for judicial domain. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS, vol. 8387, pp. 16–27. Springer, Heidelberg (2014)
Google Scholar
Niesler, T., Willett, D.: Unsupervised language model adaptation for lecture speech transcription. In: Proceedings of ICSLP 2002, pp. 1413–1416 (2002)
Google Scholar
Nanjo, H., Kawahara, T.: Language model and speaking rate adaptation for spontaneous presentation speech recognition. IEEE Trans. Speech Audio Process. 12(4), 391–400 (2004)
Article Google Scholar
Hsu, B.-J., Glass, J.: Language model parameter estimation using user transcriptions. In: Proceedings of ICASSP 2009, Taipei, Taiwan, pp. 4805–4808 (2009)
Google Scholar
Ariki, Y., et al.: Live speech recognition in sports games by adaptation of acoustic and language model. In: Proceedings of EUROSPEECH 2003, pp. 1453–1456 (2003)
Google Scholar
Chen, L., Gauvain, J.-L., Lamel, L., Adda, G.: Dynamic language modeling for broadcast news. In: Proceedings of ICSLP 2004, Jeju Island, Korea, pp. 997–1000 (2004)
Google Scholar
Cerva, P., Nouza, J., Kolorenc, J., David, P.: Improved transcription of Czech parliament speeches by acoustic and language model adaptation. In: Proceedings of SPECOM 2006, St. Petersburg, Russia, pp. 103–106 (2006)
Google Scholar
Tur, G., Stolcke, A.: Unsupervised language model adaptation for meeting recognition. In: Proceedings of ICASSP 2007, Honolulu, Hawaii, USA, pp. IV-173–IV-176 (2007)
Google Scholar
Vergyri, D., Stolcke, A., Tur, G.: Exploiting user feedback for language model adaptation in meeting recognition. In: Proceedings of ICASSP 2009, pp. 4737–4740 (2009)
Google Scholar
Besling, S., Meier, H.-G.: Language model speaker adaptation. In: Proceedings of EUROSPEECH 1995, Madrid, Spain, pp. 1755–1758 (1995)
Google Scholar
Klakow, D.: Language model adaptation for tiny adaptation corpora. In: Proceedings of INTERSPEECH 2006, Pittsburgh, PA, USA, pp. 2214–2217 (2006)
Google Scholar
Kneser, R., Peters, J., Klakow, D.: Language model adaptation using dynamic marginals. In: Proceedings of EUROSPEECH 1997, Rhodes, Greece, pp. 1971–1974 (1997)
Google Scholar
Bacchiani, M., Roark, B.: Unsupervised language model adaptation. In: Proceedings of ICASSP 2003, Hong Kong, China, pp. I-224–I-227 (2003)
Google Scholar
Staš, J., Juhár, J., Hládek, D.: Classification of heterogeneous text data for robust domain-specific language modeling. EURASIP J. Audio Speech Music Process. 2014(14), 12 (2014)
Google Scholar
Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of ICSLP 2002, Denver, Colorado, USA, pp. 901–904 (2002)
Google Scholar
Lee, A., Kawahara, T., Shikano, K.: Julius - an open source real-time large vocabulary recognition engine. In: Proceedings of EUROSPEECH 2001, Aalborg, Denmark, pp. 1691–1694 (2001)
Google Scholar
Fiscus, J.G.: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER). In: Proceedings of IEEE ASRU Workshop, Santa Barbara, CA, USA, pp. 347–354 (1997)
Google Scholar
Lojka, M., Juhár, J.: Hypothesis combination for Slovak dictation speech recognition. In: Proceedings of 56th International Symposium on ELMAR 2014, Zadar, Croatia, pp. 43–46 (2014)
Google Scholar

Download references

Acknowledgments

The research presented in this paper was supported by the Ministry of Education, Science, Research and Sport of the Slovak Republic under the project VEGA 1/0075/15 (50 %) and the Research and Development Operational Programme funded by the ERDF under the project ITMS: 26220220182 (50 %).

Author information

Authors and Affiliations

Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Košice, Park Komenského 13, 042 10, Košice, Slovak Republic
Ján Staš, Daniel Hládek & Jozef Juhár

Authors

Ján Staš
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Hládek
View author publications
You can also search for this author in PubMed Google Scholar
Jozef Juhár
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ján Staš .

Editor information

Editors and Affiliations

SPIIRAS, Saint-Petersburg, Russia
Andrey Ronzhin
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova
University of Patras, Patras, Greece
Nikos Fakotakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Staš, J., Hládek, D., Juhár, J. (2015). Language Model Speaker Adaptation for Transcription of Slovak Parliament Proceedings. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-23132-7_32
Published: 04 September 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics