Adaptive Prosody Modelling for Improved Synthetic Speech Quality

  • Moses E. EkpenyongEmail author
  • Udoinyang G. Inyang
  • EmemObong O. Udoh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9561)


Neural networks and fuzzy logic have proven to be efficient when applied individually to a variety of domain-specific problems, but their precision is enhanced when hybridized. This contribution presents a combined framework for improving the accuracy of prosodic models. It adopts the Adaptive Neuro-fuzzy Inference System (ANFIS), to offer self-tuned cognitive-learning capabilities, suitable for predicting the imprecise nature of speech prosody. After initializing the Fuzzy Inference System (FIS) structure, an Ibibio (ISO 693–3: nic; Ethnologue: IBB) speech dataset was trained using the gradient descent and non-negative least squares estimator (LSE) to demonstrate the feasibility of the proposed model. The model was then validated using synthesized speech corpus dataset of fundamental frequency (F0) values of ibibio tones, captured at various contour positions (initial, mid, final) within the courpus. Results obtained showed an insignificant difference between the predicted output and the check dataset with a checking error of 0.0412, and validates our claim that the proposed model is satisfactory and suitable for improving prosody prediction of synthetic speech.


ANFIS Prosody Speech synthesis Under-resourced language 


  1. 1.
    Xydas, G., Spiliotopoulos, D., Kouroupetroglou, G.: Modeling prosodic structures in linguistically enriched environments. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 521–528. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Ekpenyong, M., Urua, E.-A., Watts, O., King, S., Yamagishi, J.: Statistical parametric speech synthesis for Ibibio. Speech Commun. 56, 243–251 (2014)CrossRefGoogle Scholar
  3. 3.
    Ekpenyong, M., Udoh, E.O., Udosen, E., Urua, E.-A.: Improved syllable-based text to speech synthesis for tone language systems. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS, vol. 8387, pp. 3–15. Springer, Heidelberg (2014)Google Scholar
  4. 4.
    Di Cristo, A., Di Cristo, P., Campione, E., Veronis, J.: A prosodic model for text-to-speech synthesis in French. In: Botinis, A. (ed.) Intonation: Analysis Modelling and Technology, pp. 321–355. Kluwer, Amsterdam (2000)CrossRefGoogle Scholar
  5. 5.
    Prince, A., Smolensky, P.: Optimality Theory: Constraints Interaction in Generative Grammar. Wiley-Blackwell Publishers, New Jersey (2004)CrossRefGoogle Scholar
  6. 6.
    Ekpenyong, M., Udoh, E.O.: Intelligent prosody modelling: a framework for tone language synthesis. In: Vetulani, Z., Uszkoreit, H. (eds.) 6th Language and Technology Conference (LTC), Poznan, Poland, Fundacja Uniwersytetu im. A. Mickiewicza, pp. 279–283 (2013)Google Scholar
  7. 7.
    Zervas, P., Xydas, G., Fakotakis, N., Kokkinakis, G., Kouroupetroglou, G.: Evaluation of corpus based tone prediction in mismatched environments for Greek TtS synthesis. In: Proceedings of 8th International Conference on Spoken Language Processing (INTERSPEECH - ICSLP), Jeju, Korea, pp. 761–764 (2004)Google Scholar
  8. 8.
    Sun, Q., Hirose, K., Minematsu, N. Improved prediction of tone components for F0 contour generation of Mandarin speech based on the tone nucleus model. In: Proceedings of Speech Prosody Special Interest Group (SProSIG) Conference, Campinas, pp. 1–4 (2008)Google Scholar
  9. 9.
    Faytak, M., Yu, A.C.L.: A typological study of the interaction between level tones, duration. In: Proceedings of 17th ICPhS Conference, Hong Kong, pp. 659–662 (2011)Google Scholar
  10. 10.
    Raux, A., Black, A.W.A.: A unit selection approach to F0 modeling and its application to emphasis. In: Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 700–705 (2003)Google Scholar
  11. 11.
    Li, Y., Lee, T., Qian, Y.: F0 Analysis and modeling for cantonese text-to-speech. In: Speech Prosody Conference, Nara, Japan, pp. 169–180 (2004)Google Scholar
  12. 12.
    Nayak, P.C., Sudheerb, K.P., Rangan, D.M., Ramasastri, K.S.: A neuro-fuzzy computing technique for modelling hydrological time series. J. Hydrol. 291(2004), 52–66 (2004)CrossRefGoogle Scholar
  13. 13.
    Inyang, U.G., Akinyokun, O.C.: A hybrid knowledge discovery system for oil spillage risks pattern classification. Artif. Intell. Res. 3(4), 77–86 (2014)CrossRefGoogle Scholar
  14. 14.
    Mayilvaganan, M.K., Naidu, K.B.: Comparison of membership functions in adaptive network-based fuzzy inference system (ANFIS) for the prediction of groundwater level of a watershed. J. Comput. Appl. Res. Dev. 1(1), 35–42 (2011)Google Scholar
  15. 15.
    Iancu, I.: A Mamdani type fuzzy logic controller. In: Dadios, E.P. (ed.) Fuzzy Logic - Controls Concepts, Theories and Application, pp. 325–350. InTech Publishers, Vienna (2012)Google Scholar
  16. 16.
    Ekpenyong, M.E.: Speech Synthesis for Tone Language Systems. Ph.D. thesis, Uyo, in Supervision Collaboration with CSTR, Edinburgh (2013)Google Scholar
  17. 17.
    Ekpenyong, M., Udoh, E.O.: Tone modelling in Ibibio speech synthesis. Int. J. Speech Technol. 17(2), 145–159 (2014)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Moses E. Ekpenyong
    • 1
    Email author
  • Udoinyang G. Inyang
    • 1
  • EmemObong O. Udoh
    • 2
  1. 1.Department of Computer ScienceUniversity of UyoUyoNigeria
  2. 2.Department of Linguistics and Nigerian LanguagesUniversity of UyoUyoNigeria

Personalised recommendations