Building an ASR Corpus Based on Bulgarian Parliament Speeches

Geneva, Diana; Shopov, Georgi; Mihov, Stoyan

doi:10.1007/978-3-030-31372-2_16

Diana Geneva¹¹,
Georgi Shopov¹¹ &
Stoyan Mihov¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11816))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

Abstract

This paper presents the methodology we applied for building a new corpus of Bulgarian speech suitable for training and evaluating modern speech recognition systems. The Bulgarian Parliament ASR (BG-PARLAMA) corpus is derived from the recordings of the plenary sessions of the Bulgarian Parliament. The manually transcribed texts and the audio data of the speeches are processed automatically to build an aligned and segmented corpus. NLP tools and resources for Bulgarian are utilized for the language specific tasks. The resulting corpus consists of 249 hours of speech from 572 speakers and is freely available for academic use. First experiments with an ASR system trained on the BG-PARLAMA corpus have been conducted showing word error rate of around 7% on parliament speeches from unseen speakers using time-delay deep neural network (TD-DNN) architecture. The BG-PARLAMA corpus is to our knowledge the largest speech corpus currently available for Bulgarian.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
ISLRN:250-105-856-478-2 http://www.islrn.org/resources/250-105-856-478-2/.
2.
ISLRN:755-406-235-455-4 http://www.islrn.org/resources/755-406-235-455-4/.
3.
https://www.parliament.bg.
4.
http://lml.bas.bg/BG-PARLAMA.
5.
http://www.ffmpeg.org.
6.
In our setup training with audio for individual speakers limited to 60 min did not improve the ASR accuracy.

References

Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: A compact model for speaker-adaptive training. In: Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP 1996. vol. 2, pp. 1137–1140 (Oct 1996).https://doi.org/10.1109/ICSLP.1996.607807
Andreas Stolcke: SRILM - an extensible language modeling toolkit. In: INTERSPEECH (2002)
Google Scholar
Andreeva, M., Marinov, I., Mihov, S.: SpeechLab 2.0: A high-quality text-to-speech system for bulgarian. In: Proceedings of the RANLP International Conference 2005. pp. 52–58 (September 2005)
Google Scholar
Gopinath, R.A.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998 (Cat. No.98CH36181). vol. 2, pp. 661–664 (May 1998). https://doi.org/10.1109/ICASSP.1998.675351
Hateva, N., Mitankin, P., Mihov, S.: BulPhonC: bulgarian speech corpus for the development of ASR technology. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23–28, 2016. pp. 771–774 (2016). http://www.lrec-conf.org/proceedings/lrec2016/summaries/478.html
Helgadóttir, I.R., Kjaran, R., Nikulásdóttir, A.B., Guonason, J.: Building an ASR corpus using Althingi’s parliamentary speeches. In: Proceedings INTERSPEECH. pp. 2163–2167 (2017). 10.21437/Interspeech.2017-903. http://dx.doi.org/10.21437/Interspeech.2017-903
Mitankin, P., Mihov, S., Tinchev, T.: Large vocabulary continuous speech recognition for Bulgarian. Proc. RANLP 2009, 246–250 (2009)
Google Scholar
Gales, M.J.F., Woodland, P.C.: Mean and variance adaptation within the MLLR framework. Comput. Speech Lang. 10(4), 249–264 (1996). https://doi.org/10.1006/csla.1996.0013. http://www.sciencedirect.com/science/article/pii/S0885230896900133
Article Google Scholar
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 5206–5210 (April 2015). https://doi.org/10.1109/ICASSP.2015.7178964
Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (Dec 2011), IEEE Catalog No.: CFP11SRW-USB
Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974). https://doi.org/10.1145/321796.321811. http://doi.acm.org/10.1145/321796.321811
Article MathSciNet MATH Google Scholar
Anguera Miró, X., Luque, J., Gracia, C.: Audio-to-text alignment for speech recognition with very limited resources. In: INTERSPEECH. pp. 1405–1409 (2014)
Google Scholar
Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 215–219 (May 2014). https://doi.org/10.1109/ICASSP.2014.6853589

Download references

Acknowledgments

The research presented in this paper is partially funded by the Bulgarian Ministry of Education and Science via grant DO1-200/2018 ’Electronic healthcare in Bulgaria’ (e-Zdrave). We acknowledge the provided access to the e-infrastructure of the Centre for Advanced Computing and Data Processing, with the financial support by the Grant No BG05M2OP001-1.001-0003, financed by the Science and Education for Smart Growth Operational Program (2014–2020) and co-financed by the European Union through the European structural and Investment funds.

Author information

Authors and Affiliations

IICT - BAS, 2, Acad. G. Bonchev Street, 1113, Sofia, Bulgaria
Diana Geneva, Georgi Shopov & Stoyan Mihov

Authors

Diana Geneva
View author publications
You can also search for this author in PubMed Google Scholar
Georgi Shopov
View author publications
You can also search for this author in PubMed Google Scholar
Stoyan Mihov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stoyan Mihov .

Editor information

Editors and Affiliations

Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
Queen Mary University of London, London, UK
Matthew Purver
Jožef Stefan Institute, Ljubljana, Slovenia
Senja Pollak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Geneva, D., Shopov, G., Mihov, S. (2019). Building an ASR Corpus Based on Bulgarian Parliament Speeches. In: Martín-Vide, C., Purver, M., Pollak, S. (eds) Statistical Language and Speech Processing. SLSP 2019. Lecture Notes in Computer Science(), vol 11816. Springer, Cham. https://doi.org/10.1007/978-3-030-31372-2_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-31372-2_16
Published: 27 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31371-5
Online ISBN: 978-3-030-31372-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics