Adaptive Prosody Modelling for Improved Synthetic Speech Quality

Ekpenyong, Moses E.; Inyang, Udoinyang G.; Udoh, EmemObong O.

doi:10.1007/978-3-319-43808-5_2

Adaptive Prosody Modelling for Improved Synthetic Speech Quality

Moses E. Ekpenyong¹⁶,
Udoinyang G. Inyang¹⁶ &
EmemObong O. Udoh¹⁷

Conference paper
First Online: 30 July 2016

663 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9561))

Abstract

Neural networks and fuzzy logic have proven to be efficient when applied individually to a variety of domain-specific problems, but their precision is enhanced when hybridized. This contribution presents a combined framework for improving the accuracy of prosodic models. It adopts the Adaptive Neuro-fuzzy Inference System (ANFIS), to offer self-tuned cognitive-learning capabilities, suitable for predicting the imprecise nature of speech prosody. After initializing the Fuzzy Inference System (FIS) structure, an Ibibio (ISO 693–3: nic; Ethnologue: IBB) speech dataset was trained using the gradient descent and non-negative least squares estimator (LSE) to demonstrate the feasibility of the proposed model. The model was then validated using synthesized speech corpus dataset of fundamental frequency (F0) values of ibibio tones, captured at various contour positions (initial, mid, final) within the courpus. Results obtained showed an insignificant difference between the predicted output and the check dataset with a checking error of 0.0412, and validates our claim that the proposed model is satisfactory and suitable for improving prosody prediction of synthetic speech.

M. Ekpenyong—Please note that the LNCS Editorial assumes that all authors have used the western naming convention, with given names preceding surnames. This determines the structure of the names in the running heads and the author index.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Xydas, G., Spiliotopoulos, D., Kouroupetroglou, G.: Modeling prosodic structures in linguistically enriched environments. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 521–528. Springer, Heidelberg (2004)
Chapter Google Scholar
Ekpenyong, M., Urua, E.-A., Watts, O., King, S., Yamagishi, J.: Statistical parametric speech synthesis for Ibibio. Speech Commun. 56, 243–251 (2014)
Article Google Scholar
Ekpenyong, M., Udoh, E.O., Udosen, E., Urua, E.-A.: Improved syllable-based text to speech synthesis for tone language systems. In: Vetulani, Z., Mariani, J. (eds.) LTC 2011. LNCS, vol. 8387, pp. 3–15. Springer, Heidelberg (2014)
Google Scholar
Di Cristo, A., Di Cristo, P., Campione, E., Veronis, J.: A prosodic model for text-to-speech synthesis in French. In: Botinis, A. (ed.) Intonation: Analysis Modelling and Technology, pp. 321–355. Kluwer, Amsterdam (2000)
Chapter Google Scholar
Prince, A., Smolensky, P.: Optimality Theory: Constraints Interaction in Generative Grammar. Wiley-Blackwell Publishers, New Jersey (2004)
Book Google Scholar
Ekpenyong, M., Udoh, E.O.: Intelligent prosody modelling: a framework for tone language synthesis. In: Vetulani, Z., Uszkoreit, H. (eds.) 6th Language and Technology Conference (LTC), Poznan, Poland, Fundacja Uniwersytetu im. A. Mickiewicza, pp. 279–283 (2013)
Google Scholar
Zervas, P., Xydas, G., Fakotakis, N., Kokkinakis, G., Kouroupetroglou, G.: Evaluation of corpus based tone prediction in mismatched environments for Greek TtS synthesis. In: Proceedings of 8th International Conference on Spoken Language Processing (INTERSPEECH - ICSLP), Jeju, Korea, pp. 761–764 (2004)
Google Scholar
Sun, Q., Hirose, K., Minematsu, N. Improved prediction of tone components for F0 contour generation of Mandarin speech based on the tone nucleus model. In: Proceedings of Speech Prosody Special Interest Group (SProSIG) Conference, Campinas, pp. 1–4 (2008)
Google Scholar
Faytak, M., Yu, A.C.L.: A typological study of the interaction between level tones, duration. In: Proceedings of 17th ICPhS Conference, Hong Kong, pp. 659–662 (2011)
Google Scholar
Raux, A., Black, A.W.A.: A unit selection approach to F0 modeling and its application to emphasis. In: Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 700–705 (2003)
Google Scholar
Li, Y., Lee, T., Qian, Y.: F0 Analysis and modeling for cantonese text-to-speech. In: Speech Prosody Conference, Nara, Japan, pp. 169–180 (2004)
Google Scholar
Nayak, P.C., Sudheerb, K.P., Rangan, D.M., Ramasastri, K.S.: A neuro-fuzzy computing technique for modelling hydrological time series. J. Hydrol. 291(2004), 52–66 (2004)
Article Google Scholar
Inyang, U.G., Akinyokun, O.C.: A hybrid knowledge discovery system for oil spillage risks pattern classification. Artif. Intell. Res. 3(4), 77–86 (2014)
Article Google Scholar
Mayilvaganan, M.K., Naidu, K.B.: Comparison of membership functions in adaptive network-based fuzzy inference system (ANFIS) for the prediction of groundwater level of a watershed. J. Comput. Appl. Res. Dev. 1(1), 35–42 (2011)
Google Scholar
Iancu, I.: A Mamdani type fuzzy logic controller. In: Dadios, E.P. (ed.) Fuzzy Logic - Controls Concepts, Theories and Application, pp. 325–350. InTech Publishers, Vienna (2012)
Google Scholar
Ekpenyong, M.E.: Speech Synthesis for Tone Language Systems. Ph.D. thesis, Uyo, in Supervision Collaboration with CSTR, Edinburgh (2013)
Google Scholar
Ekpenyong, M., Udoh, E.O.: Tone modelling in Ibibio speech synthesis. Int. J. Speech Technol. 17(2), 145–159 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Uyo, Uyo, Nigeria
Moses E. Ekpenyong & Udoinyang G. Inyang
Department of Linguistics and Nigerian Languages, University of Uyo, Uyo, Nigeria
EmemObong O. Udoh

Authors

Moses E. Ekpenyong
View author publications
You can also search for this author in PubMed Google Scholar
Udoinyang G. Inyang
View author publications
You can also search for this author in PubMed Google Scholar
EmemObong O. Udoh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Moses E. Ekpenyong .

Editor information

Editors and Affiliations

Adam Mickiewicz University , Poznań, Poland
Zygmunt Vetulani
Deutsches Forschungszentrum f. Künstl.Intelligenz (DFKI GmbH), Saarbrücken, Saarland, Germany
Hans Uszkoreit
Adam Mickiewicz University , Poznań, Poland
Marek Kubis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ekpenyong, M.E., Inyang, U.G., Udoh, E.O. (2016). Adaptive Prosody Modelling for Improved Synthetic Speech Quality. In: Vetulani, Z., Uszkoreit, H., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2013. Lecture Notes in Computer Science(), vol 9561. Springer, Cham. https://doi.org/10.1007/978-3-319-43808-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-43808-5_2
Published: 30 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43807-8
Online ISBN: 978-3-319-43808-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics