Skip to main content

Adaptive Prosody Modelling for Improved Synthetic Speech Quality

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9561))

Abstract

Neural networks and fuzzy logic have proven to be efficient when applied individually to a variety of domain-specific problems, but their precision is enhanced when hybridized. This contribution presents a combined framework for improving the accuracy of prosodic models. It adopts the Adaptive Neuro-fuzzy Inference System (ANFIS), to offer self-tuned cognitive-learning capabilities, suitable for predicting the imprecise nature of speech prosody. After initializing the Fuzzy Inference System (FIS) structure, an Ibibio (ISO 693–3: nic; Ethnologue: IBB) speech dataset was trained using the gradient descent and non-negative least squares estimator (LSE) to demonstrate the feasibility of the proposed model. The model was then validated using synthesized speech corpus dataset of fundamental frequency (F0) values of ibibio tones, captured at various contour positions (initial, mid, final) within the courpus. Results obtained showed an insignificant difference between the predicted output and the check dataset with a checking error of 0.0412, and validates our claim that the proposed model is satisfactory and suitable for improving prosody prediction of synthetic speech.

M. Ekpenyong—Please note that the LNCS Editorial assumes that all authors have used the western naming convention, with given names preceding surnames. This determines the structure of the names in the running heads and the author index.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Xydas, G., Spiliotopoulos, D., Kouroupetroglou, G.: Modeling prosodic structures in linguistically enriched environments. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 521–528. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Ekpenyong, M., Urua, E.-A., Watts, O., King, S., Yamagishi, J.: Statistical parametric speech synthesis for Ibibio. Speech Commun. 56, 243–251 (2014)

    Article  Google Scholar 

  3. Ekpenyong, M., Udoh, E.O., Udosen, E., Urua, E.-A.: Improved syllable-based text to speech synthesis for tone language systems. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS, vol. 8387, pp. 3–15. Springer, Heidelberg (2014)

    Google Scholar 

  4. Di Cristo, A., Di Cristo, P., Campione, E., Veronis, J.: A prosodic model for text-to-speech synthesis in French. In: Botinis, A. (ed.) Intonation: Analysis Modelling and Technology, pp. 321–355. Kluwer, Amsterdam (2000)

    Chapter  Google Scholar 

  5. Prince, A., Smolensky, P.: Optimality Theory: Constraints Interaction in Generative Grammar. Wiley-Blackwell Publishers, New Jersey (2004)

    Book  Google Scholar 

  6. Ekpenyong, M., Udoh, E.O.: Intelligent prosody modelling: a framework for tone language synthesis. In: Vetulani, Z., Uszkoreit, H. (eds.) 6th Language and Technology Conference (LTC), Poznan, Poland, Fundacja Uniwersytetu im. A. Mickiewicza, pp. 279–283 (2013)

    Google Scholar 

  7. Zervas, P., Xydas, G., Fakotakis, N., Kokkinakis, G., Kouroupetroglou, G.: Evaluation of corpus based tone prediction in mismatched environments for Greek TtS synthesis. In: Proceedings of 8th International Conference on Spoken Language Processing (INTERSPEECH - ICSLP), Jeju, Korea, pp. 761–764 (2004)

    Google Scholar 

  8. Sun, Q., Hirose, K., Minematsu, N. Improved prediction of tone components for F0 contour generation of Mandarin speech based on the tone nucleus model. In: Proceedings of Speech Prosody Special Interest Group (SProSIG) Conference, Campinas, pp. 1–4 (2008)

    Google Scholar 

  9. Faytak, M., Yu, A.C.L.: A typological study of the interaction between level tones, duration. In: Proceedings of 17th ICPhS Conference, Hong Kong, pp. 659–662 (2011)

    Google Scholar 

  10. Raux, A., Black, A.W.A.: A unit selection approach to F0 modeling and its application to emphasis. In: Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 700–705 (2003)

    Google Scholar 

  11. Li, Y., Lee, T., Qian, Y.: F0 Analysis and modeling for cantonese text-to-speech. In: Speech Prosody Conference, Nara, Japan, pp. 169–180 (2004)

    Google Scholar 

  12. Nayak, P.C., Sudheerb, K.P., Rangan, D.M., Ramasastri, K.S.: A neuro-fuzzy computing technique for modelling hydrological time series. J. Hydrol. 291(2004), 52–66 (2004)

    Article  Google Scholar 

  13. Inyang, U.G., Akinyokun, O.C.: A hybrid knowledge discovery system for oil spillage risks pattern classification. Artif. Intell. Res. 3(4), 77–86 (2014)

    Article  Google Scholar 

  14. Mayilvaganan, M.K., Naidu, K.B.: Comparison of membership functions in adaptive network-based fuzzy inference system (ANFIS) for the prediction of groundwater level of a watershed. J. Comput. Appl. Res. Dev. 1(1), 35–42 (2011)

    Google Scholar 

  15. Iancu, I.: A Mamdani type fuzzy logic controller. In: Dadios, E.P. (ed.) Fuzzy Logic - Controls Concepts, Theories and Application, pp. 325–350. InTech Publishers, Vienna (2012)

    Google Scholar 

  16. Ekpenyong, M.E.: Speech Synthesis for Tone Language Systems. Ph.D. thesis, Uyo, in Supervision Collaboration with CSTR, Edinburgh (2013)

    Google Scholar 

  17. Ekpenyong, M., Udoh, E.O.: Tone modelling in Ibibio speech synthesis. Int. J. Speech Technol. 17(2), 145–159 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moses E. Ekpenyong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ekpenyong, M.E., Inyang, U.G., Udoh, E.O. (2016). Adaptive Prosody Modelling for Improved Synthetic Speech Quality. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2013. Lecture Notes in Computer Science(), vol 9561. Springer, Cham. https://doi.org/10.1007/978-3-319-43808-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-43808-5_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-43807-8

  • Online ISBN: 978-3-319-43808-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics