Implementation of Three Text to Speech Systems for Kurdish Language

  • Anvar Bahrampour
  • Wafa Barkhoda
  • Bahram Zahir Azami
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5856)


Nowadays, concatenative method is used in most modern TTS systems to produce artificial speech. The most important challenge in this method is choosing appropriate unit for creating database. This unit must warranty smoothness and high quality speech, and also, creating database for it must reasonable and inexpensive. For example, syllable, phoneme, allophone, and, diphone are appropriate units for all-purpose systems. In this paper, we implemented three synthesis systems for Kurdish language based on syllable, allophone, and diphone and compare their quality using subjective testing.


Speech Synthesis Concatenative Method Kurdish TTS System Allophone Syllable Diphone 


  1. 1.
    Al-Muhtaseb, H., Elshafei, M., Al-Ghamdi, M.: Techniques for High Quality Arabic Speech Synthesis. In: Information sciences. Elsevier Press, Amsterdam (2002)Google Scholar
  2. 2.
    Styger, T., Keller, E.: Fundamentals of Speech Synthesis and Speech Recognition: Basic Concepts. In: Keller, E. (ed.) State of the Art, and Future Challenges Formant synthesis, pp. 109–128. John Wiley, Chichester (1994)Google Scholar
  3. 3.
    Klatt, D.H.: Software for a Cascade/Parallel Formant Synthesizer. Journal of the Acoustical Society of America 67, 971–995 (1980)CrossRefGoogle Scholar
  4. 4.
    Hamza, W.: Arabic Speech Synthesis Using Large Speech Database. PhD. thesis, Cairo University, Electronics and Communications Engineering Department (2000)Google Scholar
  5. 5.
    Donovan, R.E.: Trainable Speech Synthesis. PhD. thesis, Cambridge University, Engineering Department (1996)Google Scholar
  6. 6.
    Lemmetty, S.: Review of Speech Synthesis Technology. M.Sc Thesis, Helsinki University of Technology, Department of Electrical and Communications Engineering (1999)Google Scholar
  7. 7.
    Youssef, A., et al.: An Arabic TTS System Based on the IBM Trainable Speech Synthesizer. In: Le traitement automatique de l’arabe, JEP–TALN 2004, Fès (2004)Google Scholar
  8. 8.
    Olive, J.P.: Rule synthesis of speech from diadic units. In: ICASSP, pp. 568–570 (1977)Google Scholar
  9. 9.
    Syrdal, A.: Development of a female voice for a concatenative text-to-speech synthesis system. Current Topics in Acoust. Res. 1, 169–181 (1994)Google Scholar
  10. 10.
    Olive, J., van Santen, J., Moebius, B., Shih, C.: Multilingual Text-to-Speech Synthesis: The Bell Labs Approach, pp. 191–228. Kluwer Academic Publishers, Norwell (1998)Google Scholar
  11. 11.
    Beutnagel, M., Conkie, A., Syrdal, A.K.: Diphone Synthesis using Unit Selection. In: The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis, ISCA (1998)Google Scholar
  12. 12.
    Sproat, R., Hu, J., Chen, H.: Emu: An e-mail preprocessor for text-to-speech. In: Proc. IEEE Workshop on Multimedia Signal Proc., pp. 239–244 (1998)Google Scholar
  13. 13.
    Wu, C.H., Chen, J.H.: Speech Activated Telephony Email Reader (SATER) Based on Speaker Verification and Text-to- Speech Conversion. IEEE Trans. Consumer Electronics 43(3), 707–716 (1997)CrossRefGoogle Scholar
  14. 14.
    Black, A.: CHATR, Version 0.8, a generic speech synthesis, System documentation. ATR-Interpreting Telecommunications Laboratories, Kyoto, Japan (1996)Google Scholar
  15. 15.
    Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using a large speech database. In: ICASSP, vol. 1, pp. 373–376 (1996)Google Scholar
  16. 16.
    Beutnagel, M., Conkie, A., Schroeter, J., Stylianou, Y., Syrdal, A.: The AT&T NEXT-GEN TTS System. In: Joint Meeting of ASA, EAA, and DAGA (1999)Google Scholar
  17. 17.
    Dutoit, T.: High Quality Text-To-Speech Synthesis of the French Language. Ph.D. dissertation, submitted at the Faculté Polytechnique de Mons (1993)Google Scholar
  18. 18.
    Dutoit, T., et al.: The MBROLA project: towards a set of high quality speech synthesizers free of use of non commercial purposes. In: ICSLP 1996, Proceedings, Fourth International Conference, IEEE (1996)Google Scholar
  19. 19.
    Chouireb, F., Guerti, M., Naïl, M., Dimeh, Y.: Development of a Prosodic Database for Standard Arabic. Arabian Journal for Science and Engineering (2007)Google Scholar
  20. 20.
    Ramsay, A., Mansour, H.: Towards including prosody in a text-to-speech system for modern standard Arabic. In: Computer Speech & Language. Elsevier, Amsterdam (2008)Google Scholar
  21. 21.
    Amdal, I., Svendsen, T.: A Speech Synthesis Corpus for Norwegian. In: lrec 2006 (2006)Google Scholar
  22. 22.
    Yoon, K.: A prosodic phrasing model for a Korean text-to-speech synthesis system. In: Computer Speech & Language, Elsevier, Amsterdam (2006)Google Scholar
  23. 23.
    Zervas, P., Potamitis, I., Fakotakis, N., Kokkinakis, G.: A Greek TTS based on Non uniform unit concatenation and the utilization of Festival architecture. In: First Balkan Conference on Informatics, Thessalonica, Greece, pp. 662–668 (2003)Google Scholar
  24. 24.
    Farrohki, A., Ghaemmaghami, S., Sheikhan, M.: Estimation of Prosodic Information for Persian Text-to-Speech System Using a Recurrent Neural Network. In: ISCA, Speech Prosody 2004, International Conference (2004)Google Scholar
  25. 25.
    Namnabat, M., Homayunpoor, M.M.: Letter-to-Sound in Persian Language Using Multy Layer Perceptron Neural Network. Iranian Electrical and Computer Engineering Journal (2006) (in persian)Google Scholar
  26. 26.
    Abutalebi, H.R., Bijankhan, M.: Implementation of a Text-toSpeech System for Farsi Language. In: Sixth International Conference on Spoken Language Processing (2000)Google Scholar
  27. 27.
    Hendessi, F., Ghayoori, A., Gulliver, T.A.: A Speech Synthesizer for Persian Text Using a Neural Network with a Smooth Ergodic HMM. ACM Transactions on Asian Language Information Processing, TALIP (2005)Google Scholar
  28. 28.
    Daneshfar, f., Barkhoda, W., Azami, B.Z.: Implementation of a Text-to-Speech System for Kurdish Language. In: ICDT 2009, Colmar, France (2009)Google Scholar
  29. 29.
    Barkhoda, W., Daneshfar, F., Azami, B.Z.: Design and Implementation of a Kurdish TTS System Based on Allophones Using Neural Network. In: ISCEE 2008, Zanjan, Iran (2008) (in persian)Google Scholar
  30. 30.
    Thackston, W.M.: Sorani Kurdish: A Reference Grammar with Selected Reading. Iranian Studies at Harvard University, Harvard (2006)Google Scholar
  31. 31.
    Sejnowski, J.T., Rosenberg, R.: Parallel Networks that Learn to Pronounce English Text, pp. 145–168. The Johns Hopkins University, Complex Systems Inc. (1987)Google Scholar
  32. 32.
    Rokhzadi, A.: Kurdish Phonetics and Grammar. Tarfarnd press, Tehran (2000)Google Scholar
  33. 33.
    Deller, R.J., et al.: Discrete time processing of speech signals. John Wiley and Sons, Chichester (2000)Google Scholar
  34. 34.
    Kaveh, M.: Kurdish Linguistic and Grammar (Saqizi accent), 1st edn. Ehsan Press, Tehran (2005) (In Persian)Google Scholar
  35. 35.
    Karaali, O., et al.: A High Quality Text-to-Speech System Composed of Multiple Neural Networks. In: Invited paper, IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle (1998)Google Scholar
  36. 36.
    Baban, S.: Phonology and Syllabication in Kurdish Language, 1st edn. Kurdish Academy Press, Arbil (2005) (In Kurdish)Google Scholar
  37. 37.
    Rao, M.N., Thomas, S., Nagarajan, T., Murthy, H.A.: Text-to-Speech Synthesis using syllable-like units. In: National Conference on Communication, India (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Anvar Bahrampour
    • 1
  • Wafa Barkhoda
    • 2
  • Bahram Zahir Azami
    • 2
  1. 1.Department of Information TechnologyIslamic Azad UniversitySanandaj Branch SanandajIran
  2. 2.Department of ComputerUniversity of KurdistanSanandajIran

Personalised recommendations