Advertisement

The Influence of Errors in Phonetic Annotations on Performance of Speech Recognition System

  • Radek Šafařík
  • Lukáš Matějů
  • Lenka Weingartová
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11107)

Abstract

This paper deals with errors in acoustic training data and the influence on speech recognition performance. The training data can be prepared manually, automatically or by combination of these two. In all cases, some mislabeled phonemes can appear in phonetic annotations. We conducted series of experiments which simulate some common errors. The experiments deal with various amount of changes in phonetic annotations such as different types of changes in voicing of obstruents, random substitution of consonants or vowels and random deleting of phonemes. All experiments were done for Czech language using GlobalPhone speech data set and both Gaussian mixture models and deep neural networks were used for acoustic modeling. The results show that some amount of such errors in training data does not influence speech recognition accuracy. The accuracy is significantly influenced only by large amount of errors (more than 50%).

Keywords

Speech recognition Gaussian mixture models Deep neural networks Phonetic annotations Phoneme substitution 

Notes

Acknowledgements

This work was supported by the Technology Agency of the Czech Republic in project no. TH03010018 and by the Student Grant Scheme 2018 of the Technical University in Liberec.

References

  1. 1.
    Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6, 20–29 (2004)CrossRefGoogle Scholar
  2. 2.
    Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)CrossRefGoogle Scholar
  3. 3.
    Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30–42 (2012)CrossRefGoogle Scholar
  4. 4.
    Hansen, M.S., Kozerke, S., Pruessmann, K.P., Boesiger, P., Pedersen, E.M., Tsao, J.: On the influence of training data quality in k-t BLAST reconstruction. Magn. Reson. Med. 52(5), 1175–1183 (2004)CrossRefGoogle Scholar
  5. 5.
    Nouza, J., Cerva, P., Safarik, R.: Cross-lingual adaptation of broadcast transcription system to polish language using public data sources. In: Vetulani, Z., Mariani, J., Kubis, M. (eds.) LTC 2015. LNCS (LNAI), vol. 10930, pp. 31–41. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-93782-3_3CrossRefGoogle Scholar
  6. 6.
    Matějů, L., P.C., Ždánský, J.: Investigation into the use of deep neural networks for LVCSR of Czech. In: IEEE International Workshop of Electronics, Control, Measurement, Signals and Their Application to Mechatronics (ECMSM), pp. 1–4 (2015)Google Scholar
  7. 7.
    Nouza, J., Ždánský, J., Červa, P.: System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search. In: Melecon 2010–2010 15th IEEE Mediterranean Electrotechnical Conference, pp. 202–205, April 2010.  https://doi.org/10.1109/MELCON.2010.5476306
  8. 8.
    Safarik, R., Nouza, J.: Unified approach to development of ASR systems for East Slavic languages. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds.) SLSP 2017. LNCS (LNAI), vol. 10583, pp. 193–203. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-68456-7_16CrossRefGoogle Scholar
  9. 9.
    Nouza, J., Šafaří k, R., Červa, P.: ASR for South Slavic languages developed in almost automated way. In: INTERSPEECH 2016, pp. 3868–3872, September 2016Google Scholar
  10. 10.
    Nouza, J., et al.: Speech-to-text technology to transcribe and disclose 100,000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive. In: INTERSPEECH, pp. 964–968. ISCA (2014)Google Scholar
  11. 11.
    Schultz, T.: Globalphone: a multilingual speech and text database developed at Karlsruhe University. In: Proceedings of the ICSLP, pp. 345–348 (2002)Google Scholar
  12. 12.
    Sundaram, R., Picone, J.: Effects on transcription errors on supervised learning in speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2004, vol. 1, p. I–169. IEEE (2004)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Radek Šafařík
    • 1
  • Lukáš Matějů
    • 1
  • Lenka Weingartová
    • 2
  1. 1.Institute of Information Technology and ElectronicsTechnical University of LiberecLiberecCzech Republic
  2. 2.NEWTON TechnologiesPragueCzech Republic

Personalised recommendations