Enhancing the Quality of Nepali Text-to-Speech Systems

Ghimire, Rupak Raj; Bal, Bal Krishna

doi:10.1007/978-3-319-65551-2_14

Rupak Raj Ghimire¹³ &
Bal Krishna Bal¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 754))

Included in the following conference series:

Conference on Creativity in Intelligent Technologies and Data Science

1251 Accesses
2 Citations

Abstract

Text-to-speech (TTS) systems are widely studied applications in Computer Science. It is more popular among the languages which has rich set of resources such as English and not as rigorously taken up in under resourced languages such as Nepali. Nevertheless, it has wider scope of application in different areas including telephony, e-learning and telecommunication.

The underresourced languages have trouble in developing the natural sounding TTS system. This is primarily because of the linguistic resources involved in the system. The preparation of such linguistic resources is costly, time consuming and requires the involvement of linguists/experts. The general trend in this research domain is to develop natural sounding TTS out of limited resources available. Nepali, being an underresourced language has very few linguistic resources available for developing TTS system.

In this work, we modified the existing TTS system [9] by adding computational units to process the input and output, we call them post and pre processing modules. We also made the system available to the public through the desktop application and plugin for the Firefox by pruning and adding phonetic rules and normalization rules.

We evaluated the existing and modified TTS systems via the qualitative evaluation techniques where 30 users were asked to provide their evaluation of the systems being based on the parameters- intelligibility and naturalness. Our results have shown that there has been an overall improvement of 6% in terms of naturalness and intelligibility, whereas the result of comprehension and diagnostic rhyme test is increased by 12% and 10% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
REST stands for Representational State Transfer. (It is sometimes spelled “ReST”.) It relies on a stateless, client-server, cacheable communications protocol – and in virtually all cases, the HTTP protocol is used. REST is an architecture style for designing networked applications.

References

Cha, J.S., Lim, D.K., Shin, Y.N.: Design and implementation of a voice based navigation for visually impaired persons. Int. J. Bio-Sci. Bio-Technol. 5(3) (2013)
Google Scholar
Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht (1997). pp. 13, 14, 63, 72, 179, 196
Book Google Scholar
Dutoit, T.: A Short Introduction to Text-to-Speech Synthesis. TTS Research Team, TCTS Lab (1999)
Google Scholar
FestivalEngine (2014). http://www.cstr.ed.ac.uk/projects/festival/. Accessed 18 May 2014
Festvox (2014). http://festvox.org/. Accessed 10 June 2014
Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-1996) (1996)
Google Scholar
ITU-T: Series P: telephone transmission quality - methods for objective and subjective assessment of quality (1996)
Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Pearson Education, London (2009)
Google Scholar
Nepali-TTS: Full manual of Nepali TTS (2008)
Google Scholar
Nepali-TTS: http://bhashasanchar.org/textspeech_intro.php (2008). Accessed 18 Feb 2014
Taylor, P., Black, A.W., Caley, R.: The architecture of the festival speech synthesis system. Centre for Speech Technology Research (1998)
Google Scholar
Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15(3), 287–333 (2001). http://dx.doi.org/10.1006/csla.2001.0169
Article Google Scholar
Wang, W.Y., Georgila, K.: Automatic detection of unnatural word-level segments in unit-selection speech synthesis. In: IEEE ASRU (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Information and Language Processing Research Lab, Department of Computer Science and Engineering, Kathmandu University, Dhulikhel, Kavre, Nepal
Rupak Raj Ghimire & Bal Krishna Bal

Authors

Rupak Raj Ghimire
View author publications
You can also search for this author in PubMed Google Scholar
Bal Krishna Bal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bal Krishna Bal .

Editor information

Editors and Affiliations

Volgograd State Technical University, Volgograd, Russia
Alla Kravets
Volgograd State Technical University, Volgograd, Russia
Maxim Shcherbakov
Volgograd State Technical University, Volgograd, Russia
Marina Kultsova
University of Patras, Patras, Greece
Peter Groumpos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghimire, R.R., Bal, B.K. (2017). Enhancing the Quality of Nepali Text-to-Speech Systems. In: Kravets, A., Shcherbakov, M., Kultsova, M., Groumpos, P. (eds) Creativity in Intelligent Technologies and Data Science. CIT&DS 2017. Communications in Computer and Information Science, vol 754. Springer, Cham. https://doi.org/10.1007/978-3-319-65551-2_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-65551-2_14
Published: 17 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65550-5
Online ISBN: 978-3-319-65551-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics