The Impact of Inaccurate Phonetic Annotations on Speech Recognition Performance

Safarik, Radek; Mateju, Lukas

doi:10.1007/978-3-319-64206-2_45

Radek Safarik¹⁵ &
Lukas Mateju¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1479 Accesses

Abstract

This paper focuses on impact of phonetic inaccuracies of acoustic training data on performance of automatic speech recognition system. This is especially important if the training data is created in automated way. In this case, the data often contains errors in a form of wrong phonetic transcriptions. A series of experiments simulating various common errors in phonetic transcriptions based on parts of GlobalPhone data set (for Croatian, Czech and Russian) is conducted. These experiments show the influence of various errors on different languages and acoustic models (Gaussian mixture models, deep neural networks). The impact of errors is also shown for real data obtained by our automated ASR creation process for Belarusian. The results show that the best performance is achieved by using the most accurate data; however, certain amount of errors (up to 5%) does have relatively small impact on speech recognition accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6 (2004)
Google Scholar
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)
MATH Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. Trans. Audio, Speech Lang. Proc. (2012)
Google Scholar
Hansen, M.S., Kozerke, S., Pruessmann, K.P., Boesiger, P., Pedersen, E.M., Tsao, J.: On the influence of training data quality in k-t BLAST reconstruction. Magn. Reson. Med. 52(5), 1175–1183 (2004)
Article Google Scholar
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing A Guide to Theory, Algorithm, and System Development, 1st edn. Prentice Hall, Upper Saddle River (2001)
Google Scholar
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the IEEE International Conference on Acoustics. Speech and Signal Processing, Detroit, Michigan, vol. I, pp. 181–184, May 1995
Google Scholar
Mateju, L., Cerva, P., Zdansky, J.: Investigation into the use of deep neural networks for LVCSR of Czech. In: 2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics (ECMSM), pp. 1–4 (2015)
Google Scholar
Nouza, J., Zdansky, J., Cerva, P.: System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search. In: 2010 15th IEEE Mediterranean Electrotechnical Conference, Melecon 2010, pp. 202–205, April 2010
Google Scholar
Nouza, J., Safarik, R., Cerva, P.: Asr for South Slavic languages developed in almost automated way. In: INTERSPEECH, pp. 3868–3872 (2016)
Google Scholar
Nouza, J.e.a.: Speech-to-text technology to transcribe and disclose 100,000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive. In: INTERSPEECH, pp. 964–968. ISCA (2014)
Google Scholar
Safarik, R., Mateju, L.: Impact of phonetic annotation precision on automatic speech recognition systems. In: 2016 39th International Conference on Telecommunications and Signal Processing (TSP), pp. 311–314, June 2016
Google Scholar
Schultz, T.: Globalphone: A multilingual speech and text database developed at Karlsruhe university. In: Proceedings of the ICSLP, pp. 345–348 (2002)
Google Scholar
Sundaram, R., Picone, J.: Effects on transcription errors on supervised learning in speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004 Proceedings, ICASSP 2004, vol. 1, p. I-169. IEEE (2004)
Google Scholar

Download references

Acknowledgements

This work was supported by the Technology Agency of the Czech Republic (Project No. TA04010199) and by the Student Grant Scheme 2017 of the Technical University in Liberec.

Author information

Authors and Affiliations

Institute of Information Technology and Electronics, Technical University of Liberec, Studentska 2, 461 17, Liberec, Czech Republic
Radek Safarik & Lukas Mateju

Authors

Radek Safarik
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Mateju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Radek Safarik .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Safarik, R., Mateju, L. (2017). The Impact of Inaccurate Phonetic Annotations on Speech Recognition Performance. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_45

Download citation

DOI: https://doi.org/10.1007/978-3-319-64206-2_45
Published: 29 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64205-5
Online ISBN: 978-3-319-64206-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics