Skip to main content

Automatic Evaluation of Synthetic Speech Quality by a System Based on Statistical Analysis

  • Conference paper
  • First Online:
Book cover Text, Speech, and Dialogue (TSD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11107))

Included in the following conference series:

Abstract

The paper describes a system for automatic evaluation of speech quality based on statistical analysis of differences in spectral properties, prosodic parameters, and time structuring within the speech signal. The proposed system was successfully tested in evaluation of sentences originating from male and female voices and produced by a speech synthesizer using the unit selection method with two different approaches to prosody manipulation. The experiments show necessity of all three types of speech features for obtaining correct, sharp, and stable results. A detailed analysis shows great influence of the number of statistical parameters on correctness and precision of the evaluated results. Larger size of the processed speech material has a positive impact on stability of the evaluation process. Final comparison documents basic correlation with the results obtained by the standard listening test.

The work was supported by the Czech Science Foundation GA16-04420S (J. Matoušek, J. Přibil), by the Grant Agency of the Slovak Academy of Sciences 2/0001/17 (J. Přibil), and by the Ministry of Education, Science, Research, and Sports of the Slovak Republic VEGA 1/0905/17 (A. Přibilová).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Grůber, M., Matoušek, J.: Listening-test-based annotation of communicative functions for expressive speech synthesis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2010. LNCS (LNAI), vol. 6231, pp. 283–290. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15760-8_36

    Chapter  Google Scholar 

  2. Monte-Moreno, E., Chetouani, M., Faundez-Zanuy, M., Sole-Casals, J.: Maximum likelihood linear programming data fusion for speaker recognition. Speech Commun. 51(9), 820–830 (2009)

    Article  Google Scholar 

  3. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 72–83 (1995)

    Article  Google Scholar 

  4. Xu, L., Yang, Z.: Speaker identification based on state space model. Int. J. Speech Technol. 19(2), 407–414 (2016)

    Article  Google Scholar 

  5. Campbell, W.M., Campbell, J.P., Reynolds, D.A., Singer, E., Torres-Carrasquillo, P.A.: Support vector machines for speaker and language recognition. Comput. Speech Lang. 20(2–3), 210–229 (2006)

    Article  Google Scholar 

  6. Lee, C.Y., Lee, Z.J.: A novel algorithm applied to classify unbalanced data. Appl. Soft Comput. 12, 2481–2485 (2012)

    Article  Google Scholar 

  7. Mizushima, T.: Multisample tests for scale based on kernel density estimation. Stat. Probab. Lett. 49, 81–91 (2000)

    Article  MathSciNet  Google Scholar 

  8. Hussain, T., Siniscalchi, S.M., Lee, C.C., Wang, S.S., Tsao, Y., Liao, W.H.: Experimental study on extreme learning machine applications for speech enhancement. IEEE Accesss 5, 25542 (2017)

    Article  Google Scholar 

  9. van Santen, J.P.H.: Segmental duration and speech timing. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds.) Computing Prosody. Springer, New York (1997). https://doi.org/10.1007/978-1-4612-2258-3_15

    Chapter  Google Scholar 

  10. Martinez, C.C., Cassol, M.: Measurement of voice quality, anxiety and depression symptoms after therapy. J. Voice 29(4), 446–449 (2015)

    Article  Google Scholar 

  11. Rietveld, T., van Hout, R.: The t test and beyond: recommendations for testing the central tendencies of two independent samples in research on speech, language and hiering pathology. J. Commun. Disord. 58, 158–168 (2015)

    Article  Google Scholar 

  12. Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Atlanta (Georgia, USA), pp. 373–376 (1996)

    Google Scholar 

  13. Tihelka, D., Kala, J., Matoušek, J.: Enhancements of Viterbi search for fast unit selection synthesis. In: Proceedings of INTERSPEECH 2010, Makuhari, Japan, pp. 174–177 (2010)

    Google Scholar 

  14. Jůzová, M., Tihelka, D., Skarnitzl, R.: Last syllable unit penalization in unit selection TTS. In: Ekštein, K., Matoušek, V. (eds.) TSD 2017. LNCS (LNAI), vol. 10415, pp. 317–325. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64206-2_36

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiří Přibil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Přibil, J., Přibilová, A., Matoušek, J. (2018). Automatic Evaluation of Synthetic Speech Quality by a System Based on Statistical Analysis. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2018. Lecture Notes in Computer Science(), vol 11107. Springer, Cham. https://doi.org/10.1007/978-3-030-00794-2_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00794-2_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00793-5

  • Online ISBN: 978-3-030-00794-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics