Automatic Classification of Types of Artefacts Arising During the Unit Selection Speech Synthesis
The paper describes an experiment with automatic classification of the basic types of artefacts in the synthetic speech produced by the Czech text-to-speech system using the unit selection synthesis method. The developed classifier based on the Gaussian mixture models (GMM) is solved finally as the open-set classification task due to a limited database of speech artefacts resulting from incorrectly chosen or exchanged speech units during the synthesis process. The realized experiments prove principal impact of the accuracy of determination of the speech artefact section on the final precision of the artefact type classification. From the auxiliary investigations follows a relatively great influence of the number of mixtures and the type of a covariance matrix on the output artefact classification error rate as well as on the computational complexity.
KeywordsQuality of synthetic speech Text-to-speech system GMM classification Statistical analysis
- 8.Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proceedings of Interspeech 2010, Makuhari, Japan, pp. 174–177 (2010)Google Scholar
- 9.Přibil, J., Přibilová, A., Matoušek, J.: Detection of artefacts in Czech synthetic speech based on ANOVA statistics. In: Proceedings of the 37th International Conference on Telecommunications and Signal Processing, TSP 2014, Berlin, Germany, pp. 414–418 (2014)Google Scholar
- 12.Přibil, J., Přibilová, A.: Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. EURASIP J. Audio Speech Music Process. 2013(8), 1–22 (2013)Google Scholar
- 13.Nabney, I.T.: Netlab pattern analysis toolbox, Release 3.3. http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/downloads. Accessed 15 Oct 2015