Abstract
We present a test corpus of audio recordings and transcriptions of presentations of students’ enterprises together with their slides and web-pages. The corpus is intended for evaluation of automatic speech recognition (ASR) systems, especially in conditions where the prior availability of in-domain vocabulary and named entities is benefitable. The corpus consists of 39 presentations in English, each up to 90 s long. The speakers are high school students from European countries with English as their second language. We benchmark three baseline ASR systems on the corpus and show their imperfection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdulaziz, A., Kepuska, V.: Noisy TIMIT speech LDC2017S04. In: Linguistic Data Consortium (LDC). Linguistic Data Consortium (LDC), University of Pennsylvania (2017)
Bell, P., et al.: The MGB challenge: evaluating multi-genre broadcast media recognition. In: Proceedings of the ASRU (2015)
Ghahremani, P., Manohar, V., Hadian, H., Povey, D., Khudanpur, S.: Investigation of transfer learning for ASR using LF-MMI trained neural networks. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 279–286, December 2017
Graff, D.: The 1996 broadcast news speech and language-model corpus. In: Proceedings of the 1997 DARPA Speech Recognition Workshop, pp. 11–14 (1996)
Gretter, R.: Euronews: a multilingual speech corpus for ASR. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 2635–2638. European Language Resources Association (ELRA), Reykjavik, Iceland, May 2014. http://www.lrec-conf.org/proceedings/lrec2014/pdf/695_Paper.pdf
Hernandez, F., Nguyen, V., Ghannay, S., Tomashenko, N.A., Estève, Y.: TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation. CoRR abs/1805.04699 (2018). http://arxiv.org/abs/1805.04699
Hu, Y., Loizou, P.: Subjective comparison of speech enhancement algorithms. In: Proceedings of ICASSP, vol. 1, June 2006
Kim, C., et al.: Generation of large-scale simulated utterances in virtual rooms to train deep-neural networks for far-field speech recognition in Google Home. In: Proceedings of INTERSPEECH, August 2017
Lavie, A., Waibel, A., Levin, L., Gates, D., Zeppenfeld, T., Zhan, P.: JANUS III: speech-to-speech translation in multiple languages. In: Proceedings of ICASSP 1997, January 1997
Narayanan, A., et al.: Toward Domain-Invariant Speech Recognition via Large Scale Training. In: 2018 IEEE Spoken Language Technology Workshop, SLT 2018, Athens, Greece, December 18–21, 2018, pp. 441–447 (2018). https://doi.org/10.1109/SLT.2018.8639610
Panayotov, V., Chen, G., Povey, D., Khudanpur, S.: Librispeech: an ASR corpus based on public domain audio books. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210, April 2015
Povey, D., et al.: The kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December 2011. iEEE Catalog No.: CFP11SRW-USB
Povey, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Interspeech 2016, pp. 2751–2755 (2016). https://doi.org/10.21437/Interspeech.2016-595
Schmidt-Nielsen, A., et al.: Speech in Noisy Environments (SPINE) training audio LDC2000S87. In: Linguistic Data Consortium (LDC). Linguistic Data Consortium (LDC), University of Pennsylvania (2000)
Soltau, H., Metze, F., Fugen, C., Waibel, A.: A one-pass decoder based on polymorphic linguistic context assignment. In: IEEE Workshop on Automatic Speech Recognition and Understanding. ASRU 2001, pp. 214–217, December 2001
Tchistiakova, S.: Acoustic Models for Second Language Learners. master thesis, Universität des Saarlandes, Università degli studi di Trento (2018)
Nguyen, T.-S., Müller, M., Sperber, S., Zenkel, T., Stüker, S., Waibel, A.: The 2017 KIT IWSLT speech-to-text systems for English and German. In: The International Workshop on Spoken Language Translation (IWSLT), Tokyo, Japan, 14–15 December 2017
Vincent, E., Barker, J., Watanabe, S., Le Roux, J., Nesta, F., Matassoni, M.: The second ‘CHiME’ speech separation and recognition challenge: an overview of challenge systems and outcomes. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 162–167, December 2013
Wang, Z., Schultz, T., Waibel, A.: Comparison of acoustic model adaptation techniques on non-native speech. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 1, May 2003
Zhao, G., et al.: L2-ARCTIC: a non-native english speech corpus. In: Proceedings of the Interspeech 2018, pp. 2783–2787 (2018). https://doi.org/10.21437/Interspeech.2018-1110
Acknowledgements
This research was supported in parts by the grants H2020-ICT-2018-2-825460 (ELITR) of the European Union and 19-26934X (NEUREM3) of Czech Science Foundation.
We are grateful to the organization team of the fictional student firms fair, who allowed us to conduct the competition during the event. We are also grateful to the students, who presented their firm and transcribed their audio recordings. Last but not least we are thankful to the team in Karlsruhe Institute of Technology and to the PerVoice team, who helped us overcome the technical difficulties that we have encountered.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Macháček, D., Kratochvíl, J., Vojtěchová, T., Bojar, O. (2019). A Speech Test Set of Practice Business Presentations with Additional Relevant Texts. In: Martín-Vide, C., Purver, M., Pollak, S. (eds) Statistical Language and Speech Processing. SLSP 2019. Lecture Notes in Computer Science(), vol 11816. Springer, Cham. https://doi.org/10.1007/978-3-030-31372-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-31372-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31371-5
Online ISBN: 978-3-030-31372-2
eBook Packages: Computer ScienceComputer Science (R0)