Skip to main content

Building an ASR Corpus Based on Bulgarian Parliament Speeches

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11816))

Included in the following conference series:

Abstract

This paper presents the methodology we applied for building a new corpus of Bulgarian speech suitable for training and evaluating modern speech recognition systems. The Bulgarian Parliament ASR (BG-PARLAMA) corpus is derived from the recordings of the plenary sessions of the Bulgarian Parliament. The manually transcribed texts and the audio data of the speeches are processed automatically to build an aligned and segmented corpus. NLP tools and resources for Bulgarian are utilized for the language specific tasks. The resulting corpus consists of 249 hours of speech from 572 speakers and is freely available for academic use. First experiments with an ASR system trained on the BG-PARLAMA corpus have been conducted showing word error rate of around 7% on parliament speeches from unseen speakers using time-delay deep neural network (TD-DNN) architecture. The BG-PARLAMA corpus is to our knowledge the largest speech corpus currently available for Bulgarian.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ISLRN:250-105-856-478-2 http://www.islrn.org/resources/250-105-856-478-2/.

  2. 2.

    ISLRN:755-406-235-455-4 http://www.islrn.org/resources/755-406-235-455-4/.

  3. 3.

    https://www.parliament.bg.

  4. 4.

    http://lml.bas.bg/BG-PARLAMA.

  5. 5.

    http://www.ffmpeg.org.

  6. 6.

    In our setup training with audio for individual speakers limited to 60 min did not improve the ASR accuracy.

References

  1. Anastasakos, T., McDonough, J., Schwartz, R., Makhoul, J.: A compact model for speaker-adaptive training. In: Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP 1996. vol. 2, pp. 1137–1140 (Oct 1996).https://doi.org/10.1109/ICSLP.1996.607807

  2. Andreas Stolcke: SRILM - an extensible language modeling toolkit. In: INTERSPEECH (2002)

    Google Scholar 

  3. Andreeva, M., Marinov, I., Mihov, S.: SpeechLab 2.0: A high-quality text-to-speech system for bulgarian. In: Proceedings of the RANLP International Conference 2005. pp. 52–58 (September 2005)

    Google Scholar 

  4. Gopinath, R.A.: Maximum likelihood modeling with Gaussian distributions for classification. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998 (Cat. No.98CH36181). vol. 2, pp. 661–664 (May 1998). https://doi.org/10.1109/ICASSP.1998.675351

  5. Hateva, N., Mitankin, P., Mihov, S.: BulPhonC: bulgarian speech corpus for the development of ASR technology. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23–28, 2016. pp. 771–774 (2016). http://www.lrec-conf.org/proceedings/lrec2016/summaries/478.html

  6. Helgadóttir, I.R., Kjaran, R., Nikulásdóttir, A.B., Guonason, J.: Building an ASR corpus using Althingi’s parliamentary speeches. In: Proceedings INTERSPEECH. pp. 2163–2167 (2017). 10.21437/Interspeech.2017-903. http://dx.doi.org/10.21437/Interspeech.2017-903

  7. Mitankin, P., Mihov, S., Tinchev, T.: Large vocabulary continuous speech recognition for Bulgarian. Proc. RANLP 2009, 246–250 (2009)

    Google Scholar 

  8. Gales, M.J.F., Woodland, P.C.: Mean and variance adaptation within the MLLR framework. Comput. Speech Lang. 10(4), 249–264 (1996). https://doi.org/10.1006/csla.1996.0013. http://www.sciencedirect.com/science/article/pii/S0885230896900133

    Article  Google Scholar 

  9. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 5206–5210 (April 2015). https://doi.org/10.1109/ICASSP.2015.7178964

  10. Povey, D., et al.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (Dec 2011), IEEE Catalog No.: CFP11SRW-USB

    Google Scholar 

  11. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974). https://doi.org/10.1145/321796.321811. http://doi.acm.org/10.1145/321796.321811

    Article  MathSciNet  MATH  Google Scholar 

  12. Anguera Miró, X., Luque, J., Gracia, C.: Audio-to-text alignment for speech recognition with very limited resources. In: INTERSPEECH. pp. 1405–1409 (2014)

    Google Scholar 

  13. Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 215–219 (May 2014). https://doi.org/10.1109/ICASSP.2014.6853589

Download references

Acknowledgments

The research presented in this paper is partially funded by the Bulgarian Ministry of Education and Science via grant DO1-200/2018 ’Electronic healthcare in Bulgaria’ (e-Zdrave). We acknowledge the provided access to the e-infrastructure of the Centre for Advanced Computing and Data Processing, with the financial support by the Grant No BG05M2OP001-1.001-0003, financed by the Science and Education for Smart Growth Operational Program (2014–2020) and co-financed by the European Union through the European structural and Investment funds.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stoyan Mihov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Geneva, D., Shopov, G., Mihov, S. (2019). Building an ASR Corpus Based on Bulgarian Parliament Speeches. In: Martín-Vide, C., Purver, M., Pollak, S. (eds) Statistical Language and Speech Processing. SLSP 2019. Lecture Notes in Computer Science(), vol 11816. Springer, Cham. https://doi.org/10.1007/978-3-030-31372-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31372-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31371-5

  • Online ISBN: 978-3-030-31372-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics