An Implementation of isiXhosa Text-to-Speech Modules to Support e-Services in Marginalized Rural Areas

  • Okuthe P. KogedaEmail author
  • Siphe Mhlana
  • Thinyane Mamello
  • Thomas Olwal
Part of the Public Administration and Information Technology book series (PAIT, volume 9)


Information and communication technology (ICT) projects are being initiated and deployed in marginalized areas to help improve the standard of living for community members. This has led to a new field, which is responsible for information processing and knowledge development in rural areas, called Information and Communication Technology for Development (ICT4D). A number of ICT4D projects have been implemented in marginalized areas all over the World. Dwesa is such a rural area situated in the wild coast of the former homeland of Transkei, in the Eastern Cape Province of South Africa. There are e-service projects, i.e., e-commerce, e-health, and e-government, deployed to support the already existent ICT infrastructure in Dwesa. However, community members face a language and literacy barrier to consume these e-services because they are developed and typically accessed through English textual interfaces. This becomes a challenge because their language of communication is isiXhosa and majority of them are illiterate. However, there are tools that can be used to convert English text into isiXhosa text, i.e., Google. Therefore, this chapter seeks to design, develop, and implement a text-to-speech system that can be used to convert isiXhosa text into natural sounding isiXhosa speech. The system was implemented using Festival speech synthesis and MySQL database. The developed text-to-speech system was tested to determine its applicability to improve e-services usability. The results show acceptable levels of usability as having produced audio utterances for the isiXhosa text-to-speech system for marginalized areas. We trained the system with isiXhosa words and sentences with 85 % success rate.


Speech Synthesis Synthetic Speech Eastern Cape Province Synthetic Voice Concatenative Synthesis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Acacia. (2000). Information and communication technologies (ICTs) for improved service delivery in the new South Africa. Retrieved September 15, 2010, from
  2. Acero, A. (1998). Source-filter models for time-scale pitch-scale modification of speech. In Proceedings of ICASSP98.Google Scholar
  3. Alam, F., Nath, K. P., & Khan, M. (2007). Text to speech for Bangla language using festival. Bangladesh: BRAC University.Google Scholar
  4. Bakar, Z. A., & Wee, M. C. (2006). Obstacles towards the use of ICT tools in teaching and learning of information systems in Malaysian universities. The International Arab Journal of Information Technology, 3(3), 203–209.Google Scholar
  5. Bali, K., Talukdar, P. P., Krishna, N. S., & Ramakrishnan, A. G. (2004). Tools for the development of a Hindi speech synthesis system. 5th ISCA Speech Synthesis workshop, Pittsburgh, USA, 109–114.Google Scholar
  6. Barnard, E., et al. (2005). A general-purpose isiZulu speech synthesizer: human language technologies research group. Pretoria, South Africa: Meraka Institute.Google Scholar
  7. Bickley, C., Syrdal, A., & Schroeter, J. (1998). Speech synthesis. In J. M. Picket (Ed.), The acoustics of speech communication: The fundamentals, speech perception theory, and technology. Boston: Allyn and Bacon. ISBN 13: 9780205198870.Google Scholar
  8. Black, A. (2000a). Speech synthesis in festival: A practical course a making computer talk, edition 2.0, for Festival version 1.4.1.Google Scholar
  9. Black, A. W. (2000b). Flite: small run-time synthesizer: language technologies institute Carnegie Mellon University. Retrieved from
  10. Black, A., Taylor, P., & Caley, R. (1998). Edinburgh University, Center for Speech Technology Research. Retrieved March 4, 2010, from
  11. Bosch, S. (2009). An African language is the writing on the screen? Retrieved May 2010, from
  12. CENSUS. (2001). Statistics South Africa. Retrieved October 2010, from
  13. Conkie, A. (1999). Robust unit selection system for speech synthesis. In Proceedings of the Joint Meeting of ASA, EAA and DEGA, Berlin, Germany, March 1999.Google Scholar
  14. Cooperation Framework on Innovation Systems Between Finland and South Africa. (2008). Using ICTs to optimise rural development. Retrieved October 2010, from
  15. Curtain, R. (2003). Information and communications technologies and development: Help or hindrance? Melbourne, Australia: Kamran Jebreili Associated Press, Curtain Consulting.Google Scholar
  16. David, V. S., David, O. C., & Pallares. (2009). Adaptation of voice server to automotive environment.Google Scholar
  17. Festival speech synthesis system—24 voices. Retrieved September 2010, from
  18. Gakuru, M., & Ngugi, K. (2005). Development of a Kiswahili text-to-speech system. Nairobi, Kenya: University of Nairobi.Google Scholar
  19. Gaved, M. (1993). Pronunciation and text normalization in applied text-to-speech systems. Proceedings of Eurospeech, 93(2), 897–900.Google Scholar
  20. Hakulinen, J. (1998). Suomenkieliset puhesynteesiohjelmistot (The software based speech synthesizers for Finnish). Report Draft, University of Tampere, Department of Computing Science, Speech Interfaces, 26.8.1998.Google Scholar
  21. Hallahan, W. (1996). DECtalk software: Text-to-speech technology and implementation. Digital Technical Journal, 7(4), 5–19.Google Scholar
  22. Hon, H., Acero, A., Huang, X., Liu, J., & Plumpe, M. (1998). Automatic generation of synthesis units for trainable text-to-speech systems. In Proceedings of ICASSP 98 (CD-ROM).Google Scholar
  23. Hood, M. (2004). Creating a voice for festival speech synthesis system. Honour’s thesis, Department of Computer Science, Rhodes University, Grahamstown, South Africa.Google Scholar
  24. Huang, X., Acero, A., Adcock, J., Hon, H., Goldsmith, J., Liu, J., & Plumpe M. (1996). Whistler: A trainable text-to-speech system. In Proceedings of ICSLP96 (4).Google Scholar
  25. Huang, X., Acero, A., Hon, H., Ju, Y., Liu, J., Mederith, S., & Plumpe, M. (1997). Recent improvements on Microsoft’s trainable text-to-speech system—Whistler. In Proceedings of ICASSP97 (2) (pp. –934).Google Scholar
  26. Juang, B. H., & Rabiner, L. R. (2004). Automatic speech recognition—A brief history of the technology development. Atlanta, GA: Georgia Institute of Technology.Google Scholar
  27. Klatt, D. (1987). Review of text-to-speech conversion for English. Journal of the Acoustical Society of America, 82(3), 737–793.CrossRefGoogle Scholar
  28. Malusi, Y., & Kogeda, O. P. (2013). A mobile transport scheduling and coordination system for marginalized rural areas. In The Proceedings of 15th Annual Conference on WWW Applications, September 10–13, 2013, Cape Peninsula University of Technology, Cape Town, South Africa.Google Scholar
  29. Martha, E. J. (2006). The application of information and communication technology (ICT) in Nigerian Academic Libraries prospects and problems. The Information Manager, 6(1 & 2), 35–39.Google Scholar
  30. Mashao, D. J., & Rousseau, F. (2005). A hybrid text-to-speech system for Afrikaans. In Proceedings of SATNAC 2005, Central Drakensberg, Kwazulu-Natal, South Africa.Google Scholar
  31. Morton, K. (1987). The British Telecom Research text-to-speech synthesis system—1984-1986. Speech production and synthesis. Unpublished Ph.D. thesis, University of Essex (pp. –172).Google Scholar
  32. Parssinen, K. (2007). Multilingual text-to-speech system for mobile devices: Development and applications. Doctoral thesis, Department of Signal Processing, Faculty of Information Technology, Tampere University of Technology, Finland.Google Scholar
  33. Portele, T., Höfer, F., & Hess, W. (1994). A mixed inventory structure for German concatenative synthesis. Bonn, Germany: University of Bonn.Google Scholar
  34. Portele, T., & Krämer, J. (1996). Adapting a TTS system to a reading machine for the blind. In Proceedings of ICSLP 96 (1).Google Scholar
  35. Portele, T., Steffan, B., Preuss, R., & Hess, W. (1991). German text-to-speech synthesis by concatenation of non-parametric units. Proceedings of Eurospeech, 91(1), 317–320.Google Scholar
  36. Portele, T., Steffan, B., Preuss, R., Sendlmeier, W., & Hess, W. (1992). HADIFIX—A speech synthesis system for German. Proceedings of ICSLP, 92(2), 1227–1230.Google Scholar
  37. Rousseau, F., & Mashao, D. (2004). Increased diphone recognition for Afrikaans text-to-speech system. In Proceedings of PRASA 2004 (pp. –117), Cape Town, South Africa. Retrieved October 23, 2012, from
  38. Schroeter, J., MZibius, B., van Santen, J., Sproat, R., & Olive, J. (1996). Recent advances in multilingual text to speech synthesis. In Fortschritte derAkustik- DAGA ‘96. DPG, BadHoMef, Germany.Google Scholar
  39. Tadesse, A., & Takara, T. (2006). Amharic speech synthesis system and its applications to multimedia and telecommunications. In International workshop on Advanced Image Technology, January 9–10, Naha, Okinawa, Japan (pp. –191).Google Scholar
  40. Vusani, S., & Kogeda, O. P. (2012). An interactive voice forum for rural subsistence farmers in South Africa. In The Proceedings of 14th Annual Conference on WWW Applications, 7–9 November 2012, Mangosuthu University of Technology, Durban, South Africa.Google Scholar
  41. Waters, K., & Levergood, T. (1993). DECface: An automatic lip-synchronization algorithm for synthetic faces. DEC Technical Report Series, Cambridge Research Laboratory, CRL 93/4.Google Scholar
  42. Weigal, G., & Waldburger, D. (eds.) (2004). ICT4D— Connecting people for a better world. Berne and Kuala Lumpur: Swiss agency for Development and Cooperation and Global Knowledge Partnership.Google Scholar
  43. Yvon, F., Boula de Mareuil, P., & Alessandro, C. D. (1998). Objective evaluation of grapheme to phoneme conversion for Text-to-Speech synthesis in French. Computer Speech and Language, 12, 393–410.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Okuthe P. Kogeda
    • 1
    Email author
  • Siphe Mhlana
    • 2
  • Thinyane Mamello
    • 2
  • Thomas Olwal
    • 3
  1. 1.Department of Computer Science, Faculty of Information Communication TechnologyTshwane University of TechnologyPretoriaSouth Africa
  2. 2.Department of Computer ScienceUniversity of Fort HareAliceSouth Africa
  3. 3.Council for Scientific and Industrial Research (CSIR)PretoriaSouth Africa

Personalised recommendations