Skip to main content

A Speech Test Set of Practice Business Presentations with Additional Relevant Texts

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2019)

Abstract

We present a test corpus of audio recordings and transcriptions of presentations of students’ enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic speech recognition (ASR) systems, especially in conditions where the prior availability of in-domain vocabulary and named entities is benefitable. The corpus consists of 39 presentations in English, each up to 90 s long. The speakers are high school students from European countries with English as their second language. We benchmark three baseline ASR systems on the corpus and show their imperfection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://filters.matecat.com/.

  2. 2.

    https://cloud.google.com/speech-to-text/.

References

  1. Abdulaziz, A., Kepuska, V.: Noisy TIMIT speech LDC2017S04. In: Linguistic Data Consortium (LDC). Linguistic Data Consortium (LDC), University of Pennsylvania (2017)

    Google Scholar 

  2. Bell, P., et al.: The MGB challenge: evaluating multi-genre broadcast media recognition. In: Proceedings of the ASRU (2015)

    Google Scholar 

  3. Ghahremani, P., Manohar, V., Hadian, H., Povey, D., Khudanpur, S.: Investigation of transfer learning for ASR using LF-MMI trained neural networks. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 279–286, December 2017

    Google Scholar 

  4. Graff, D.: The 1996 broadcast news speech and language-model corpus. In: Proceedings of the 1997 DARPA Speech Recognition Workshop, pp. 11–14 (1996)

    Google Scholar 

  5. Gretter, R.: Euronews: a multilingual speech corpus for ASR. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 2635–2638. European Language Resources Association (ELRA), Reykjavik, Iceland, May 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/695_Paper.pdf

  6. Hernandez, F., Nguyen, V., Ghannay, S., Tomashenko, N.A., Estève, Y.: TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation. CoRR abs/1805.04699 (2018). http://arxiv.org/abs/1805.04699

  7. Hu, Y., Loizou, P.: Subjective comparison of speech enhancement algorithms. In: Proceedings of ICASSP, vol. 1, June 2006

    Google Scholar 

  8. Kim, C., et al.: Generation of large-scale simulated utterances in virtual rooms to train deep-neural networks for far-field speech recognition in Google Home. In: Proceedings of INTERSPEECH, August 2017

    Google Scholar 

  9. Lavie, A., Waibel, A., Levin, L., Gates, D., Zeppenfeld, T., Zhan, P.: JANUS III: speech-to-speech translation in multiple languages. In: Proceedings of ICASSP 1997, January 1997

    Google Scholar 

  10. Narayanan, A., et al.: Toward Domain-Invariant Speech Recognition via Large Scale Training. In: 2018 IEEE Spoken Language Technology Workshop, SLT 2018, Athens, Greece, December 18–21, 2018, pp. 441–447 (2018). https://doi.org/10.1109/SLT.2018.8639610

  11. Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210, April 2015

    Google Scholar 

  12. Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December 2011. iEEE Catalog No.: CFP11SRW-USB

    Google Scholar 

  13. Povey, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Interspeech 2016, pp. 2751–2755 (2016). https://doi.org/10.21437/Interspeech.2016-595

  14. Schmidt-Nielsen, A., et al.: Speech in Noisy Environments (SPINE) training audio LDC2000S87. In: Linguistic Data Consortium (LDC). Linguistic Data Consortium (LDC), University of Pennsylvania (2000)

    Google Scholar 

  15. Soltau, H., Metze, F., Fugen, C., Waibel, A.: A one-pass decoder based on polymorphic linguistic context assignment. In: IEEE Workshop on Automatic Speech Recognition and Understanding. ASRU 2001, pp. 214–217, December 2001

    Google Scholar 

  16. Tchistiakova, S.: Acoustic Models for Second Language Learners. master thesis, Universität des Saarlandes, Università degli studi di Trento (2018)

    Google Scholar 

  17. Nguyen, T.-S., Müller, M., Sperber, S., Zenkel, T., Stüker, S., Waibel, A.: The 2017 KIT IWSLT speech-to-text systems for English and German. In: The International Workshop on Spoken Language Translation (IWSLT), Tokyo, Japan, 14–15 December 2017

    Google Scholar 

  18. Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F., Matassoni, M.: The second ‘CHiME’ speech separation and recognition challenge: an overview of challenge systems and outcomes. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 162–167, December 2013

    Google Scholar 

  19. Wang, Z., Schultz, T., Waibel, A.: Comparison of acoustic model adaptation techniques on non-native speech. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 1, May 2003

    Google Scholar 

  20. Zhao, G., et al.: L2-ARCTIC: a non-native english speech corpus. In: Proceedings of the Interspeech 2018, pp. 2783–2787 (2018). https://doi.org/10.21437/Interspeech.2018-1110

Download references

Acknowledgements

This research was supported in parts by the grants H2020-ICT-2018-2-825460 (ELITR) of the European Union and 19-26934X (NEUREM3) of Czech Science Foundation.

We are grateful to the organization team of the fictional student firms fair, who allowed us to conduct the competition during the event. We are also grateful to the students, who presented their firm and transcribed their audio recordings. Last but not least we are thankful to the team in Karlsruhe Institute of Technology and to the PerVoice team, who helped us overcome the technical difficulties that we have encountered.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dominik Macháček .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Macháček, D., Kratochvíl, J., Vojtěchová, T., Bojar, O. (2019). A Speech Test Set of Practice Business Presentations with Additional Relevant Texts. In: Martín-Vide, C., Purver, M., Pollak, S. (eds) Statistical Language and Speech Processing. SLSP 2019. Lecture Notes in Computer Science(), vol 11816. Springer, Cham. https://doi.org/10.1007/978-3-030-31372-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31372-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31371-5

  • Online ISBN: 978-3-030-31372-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics