The Influence of Errors in Phonetic Annotations on Performance of Speech Recognition System
This paper deals with errors in acoustic training data and the influence on speech recognition performance. The training data can be prepared manually, automatically or by combination of these two. In all cases, some mislabeled phonemes can appear in phonetic annotations. We conducted series of experiments which simulate some common errors. The experiments deal with various amount of changes in phonetic annotations such as different types of changes in voicing of obstruents, random substitution of consonants or vowels and random deleting of phonemes. All experiments were done for Czech language using GlobalPhone speech data set and both Gaussian mixture models and deep neural networks were used for acoustic modeling. The results show that some amount of such errors in training data does not influence speech recognition accuracy. The accuracy is significantly influenced only by large amount of errors (more than 50%).
KeywordsSpeech recognition Gaussian mixture models Deep neural networks Phonetic annotations Phoneme substitution
This work was supported by the Technology Agency of the Czech Republic in project no. TH03010018 and by the Student Grant Scheme 2018 of the Technical University in Liberec.
- 5.Nouza, J., Cerva, P., Safarik, R.: Cross-lingual adaptation of broadcast transcription system to polish language using public data sources. In: Vetulani, Z., Mariani, J., Kubis, M. (eds.) LTC 2015. LNCS (LNAI), vol. 10930, pp. 31–41. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93782-3_3CrossRefGoogle Scholar
- 6.Matějů, L., P.C., Ždánský, J.: Investigation into the use of deep neural networks for LVCSR of Czech. In: IEEE International Workshop of Electronics, Control, Measurement, Signals and Their Application to Mechatronics (ECMSM), pp. 1–4 (2015)Google Scholar
- 7.Nouza, J., Ždánský, J., Červa, P.: System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search. In: Melecon 2010–2010 15th IEEE Mediterranean Electrotechnical Conference, pp. 202–205, April 2010. https://doi.org/10.1109/MELCON.2010.5476306
- 9.Nouza, J., Šafaří k, R., Červa, P.: ASR for South Slavic languages developed in almost automated way. In: INTERSPEECH 2016, pp. 3868–3872, September 2016Google Scholar
- 10.Nouza, J., et al.: Speech-to-text technology to transcribe and disclose 100,000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive. In: INTERSPEECH, pp. 964–968. ISCA (2014)Google Scholar
- 11.Schultz, T.: Globalphone: a multilingual speech and text database developed at Karlsruhe University. In: Proceedings of the ICSLP, pp. 345–348 (2002)Google Scholar
- 12.Sundaram, R., Picone, J.: Effects on transcription errors on supervised learning in speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2004, vol. 1, p. I–169. IEEE (2004)Google Scholar