Skip to main content

Statistical Pronunciation Adaptation for Spontaneous Speech Synthesis

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10415))

Included in the following conference series:

Abstract

To bring more expressiveness into text-to-speech systems, this paper presents a new pronunciation variant generation method which works by adapting standard, i.e., dictionary-based, pronunciations to a spontaneous style. Its strength and originality lie in exploiting a wide range of linguistic, articulatory and prosodic features, and in using a probabilistic machine learning framework, namely conditional random fields and phoneme-based n-gram models. Extensive experiments on the Buckeye corpus of English conversational speech demonstrate the effectiveness of the approach through objective and perceptual evaluations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    CRFs allow dependencies between predicted phonemes but it appeared in preliminary work that using a separate phonological model is better to avoid overfitting the training data.

  2. 2.

    Binomial test with \(\alpha =0.1\) and votes for “No preference” equally spread over A and B, following the methodology proposed in [23].

References

  1. Tajchman, G., Foster, E., Jurafsky, D.: Building multiple pronunciation models for novel words using exploratory computational phonology. In: Proceedings of Eurospeech (1995)

    Google Scholar 

  2. Giachin, E., Rosenberg, A., Lee, C.H.: Word juncture modeling using phonological rules for HMM-based continuous speech recognition. In: Proceedings of ICASSP (1990)

    Google Scholar 

  3. Oshika, B.T., Zue, V.W., Weeks, R.V., Neu, H., Aurbach, J.: The role of phonological rules in speech understanding research. IEEE Trans. Acous. Speech Signal Process. 23, 104–112 (1975)

    Article  Google Scholar 

  4. Goronzy, S., Rapp, S., Kompe, R.: Generating non-native pronunciation variants for lexicon adaptation. Speech Commun. 42(1), 109–123 (2004)

    Article  Google Scholar 

  5. Vazirnezhad, B., Almasganj, F., Ahadi, S.M.: Hybrid statistical pronunciation models designed to be trained by a medium-size corpus. Comput. Speech Lang. 23, 1–24 (2009)

    Article  Google Scholar 

  6. Dilts, P.C.: Modelling phonetic reduction in a corpus of spoken English using random forests and mixed-effects regression. Ph.D. thesis, University of Alberta (2013)

    Google Scholar 

  7. Chen, K., Hasegawa-Johnson, M.: Modeling pronunciation variation using artificial neural networks for English spontaneous speech. In: Proceedings of Interspeech (2004)

    Google Scholar 

  8. Karanasou, P., Yvon, F., Lavergne, T., Lamel, L.: Discriminative training of a phoneme confusion model for a dynamic lexicon in ASR. In: Proceedings of Interspeech (2013)

    Google Scholar 

  9. Prahallad, K., Black, A.W., Mosur, R.: Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis. In: Proceedings of ICASSP (2006)

    Google Scholar 

  10. Qader, R., Lecorvé, G., Lolive, D., Sébillot, P.: Probabilistic speaker pronunciation adaptation for spontaneous speech synthesis using linguistic features. In: Dediu, A.-H., Martín-Vide, C., Vicsi, K. (eds.) SLSP 2015. LNCS (LNAI), vol. 9449, pp. 229–241. Springer, Cham (2015). doi:10.1007/978-3-319-25789-1_22

    Chapter  Google Scholar 

  11. Tahon, M., Qader, R., Lecorvé, G., Lolive, D.: Improving TTS with corpus-specific pronunciation adaptation. In: Proceedings of Interspeech (2016)

    Google Scholar 

  12. Bell, A., Brenier, J.M., Gregory, M., Girand, C., Jurafsky, D.: Predictability effects on durations of content and function words in conversational English. J. Mem. Lang. 60, 92–111 (2009)

    Article  Google Scholar 

  13. Bates, R., Ostendorf, M.: Modeling pronunciation variation in conversational speech using prosody. In: Proceedings of ISCA Tutorial and Research Workshop (ITRW) on Pronunciation Modeling and Lexicon Adaptation for Spoken Language Technology (2002)

    Google Scholar 

  14. Livescu, K., Jyothi, P., Fosler-Lussier, E.: Articulatory feature-based pronunciation modeling. Comput. Speech Lang. 36, 165–172 (2016)

    Article  Google Scholar 

  15. Rasipuram, R., Doss, M.M.: Articulatory feature based continuous speech recognition using probabilistic lexical modeling. Comput. Speech Lang. 36, 165–172 (2016)

    Article  Google Scholar 

  16. Pitt, M.A., Johnson, K., Hume, E., Kiesling, S., Raymond, W.: The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability. Speech Commun. 45, 89–95 (2005)

    Article  Google Scholar 

  17. Jiampojamarn, S., Kondrak, G., Sherif, T.: Applying many-to-many alignments and hidden Markov models to letter-to-phoneme conversion. In: Proceedings of NAACL-HLT (2007)

    Google Scholar 

  18. Rosti, A.V.I., Matsoukas, S.: Combining outputs from multiple machine translation systems. In: Proceedings of NAACL-HLT (2007)

    Google Scholar 

  19. Huet, S., Gravier, G., Sébillot, P.: Morpho-syntactic post-processing of N-best lists for improved French automatic speech recognition. Comput. Speech Lang. 24(4), 663–684 (2010)

    Article  Google Scholar 

  20. Stolcke, A., Zheng, J., Wang, W., Abrash, V.: SRILM at sixteen: update and outlook. In: Proceedings of IEEE ASRU Workshop (2011)

    Google Scholar 

  21. Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A.W., Tokuda, K.: The HMM-based speech synthesis system (HTS) version 2.0. In: Proceedings of SSW (2007)

    Google Scholar 

  22. King, S., Karaiskos, V.: The Blizzard challenge 2012. In: Proceedings of Blizzard Challenge 2012 Workshop (2012)

    Google Scholar 

  23. Karhila, R., Remes, U., Kurimo, M.: Noise in HMM-based speech synthesis adaptation: analysis, evaluation methods and experiments. IEEE J. Sel. Top. Signal Process. 8(2), 285–295 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This study has been realized under the ANR (French National Research Agency) project SynPaFlex ANR-15-CE23-0015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gwénolé Lecorvé .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Qader, R., Lecorvé, G., Lolive, D., Tahon, M., Sébillot, P. (2017). Statistical Pronunciation Adaptation for Spontaneous Speech Synthesis. In: Ekštein, K., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2017. Lecture Notes in Computer Science(), vol 10415. Springer, Cham. https://doi.org/10.1007/978-3-319-64206-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64206-2_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64205-5

  • Online ISBN: 978-3-319-64206-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics