Skip to main content

Enhancing the Quality of Nepali Text-to-Speech Systems

  • Conference paper
  • First Online:
Creativity in Intelligent Technologies and Data Science (CIT&DS 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 754))

Included in the following conference series:

Abstract

Text-to-speech (TTS) systems are widely studied applications in Computer Science. It is more popular among the languages which has rich set of resources such as English and not as rigorously taken up in under resourced languages such as Nepali. Nevertheless, it has wider scope of application in different areas including telephony, e-learning and telecommunication.

The underresourced languages have trouble in developing the natural sounding TTS system. This is primarily because of the linguistic resources involved in the system. The preparation of such linguistic resources is costly, time consuming and requires the involvement of linguists/experts. The general trend in this research domain is to develop natural sounding TTS out of limited resources available. Nepali, being an underresourced language has very few linguistic resources available for developing TTS system.

In this work, we modified the existing TTS system [9] by adding computational units to process the input and output, we call them post and pre processing modules. We also made the system available to the public through the desktop application and plugin for the Firefox by pruning and adding phonetic rules and normalization rules.

We evaluated the existing and modified TTS systems via the qualitative evaluation techniques where 30 users were asked to provide their evaluation of the systems being based on the parameters- intelligibility and naturalness. Our results have shown that there has been an overall improvement of 6% in terms of naturalness and intelligibility, whereas the result of comprehension and diagnostic rhyme test is increased by 12% and 10% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    REST stands for Representational State Transfer. (It is sometimes spelled “ReST”.) It relies on a stateless, client-server, cacheable communications protocol – and in virtually all cases, the HTTP protocol is used. REST is an architecture style for designing networked applications.

References

  1. Cha, J.S., Lim, D.K., Shin, Y.N.: Design and implementation of a voice based navigation for visually impaired persons. Int. J. Bio-Sci. Bio-Technol. 5(3) (2013)

    Google Scholar 

  2. Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht (1997). pp. 13, 14, 63, 72, 179, 196

    Book  Google Scholar 

  3. Dutoit, T.: A Short Introduction to Text-to-Speech Synthesis. TTS Research Team, TCTS Lab (1999)

    Google Scholar 

  4. FestivalEngine (2014). http://www.cstr.ed.ac.uk/projects/festival/. Accessed 18 May 2014

  5. Festvox (2014). http://festvox.org/. Accessed 10 June 2014

  6. Hunt, A.J., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-1996) (1996)

    Google Scholar 

  7. ITU-T: Series P: telephone transmission quality - methods for objective and subjective assessment of quality (1996)

    Google Scholar 

  8. Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Pearson Education, London (2009)

    Google Scholar 

  9. Nepali-TTS: Full manual of Nepali TTS (2008)

    Google Scholar 

  10. Nepali-TTS: http://bhashasanchar.org/textspeech_intro.php (2008). Accessed 18 Feb 2014

  11. Taylor, P., Black, A.W., Caley, R.: The architecture of the festival speech synthesis system. Centre for Speech Technology Research (1998)

    Google Scholar 

  12. Sproat, R., Black, A.W., Chen, S., Kumar, S., Ostendorf, M., Richards, C.: Normalization of non-standard words. Comput. Speech Lang. 15(3), 287–333 (2001). http://dx.doi.org/10.1006/csla.2001.0169

    Article  Google Scholar 

  13. Wang, W.Y., Georgila, K.: Automatic detection of unnatural word-level segments in unit-selection speech synthesis. In: IEEE ASRU (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bal Krishna Bal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ghimire, R.R., Bal, B.K. (2017). Enhancing the Quality of Nepali Text-to-Speech Systems. In: Kravets, A., Shcherbakov, M., Kultsova, M., Groumpos, P. (eds) Creativity in Intelligent Technologies and Data Science. CIT&DS 2017. Communications in Computer and Information Science, vol 754. Springer, Cham. https://doi.org/10.1007/978-3-319-65551-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65551-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65550-5

  • Online ISBN: 978-3-319-65551-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics