Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 83))

The speech recognition technology has been applied with success to many Western and Asian languages. Work on African languages still remains very limited in this area. Here a study into automatic recognition of the Standard Yoruba (SY) language tones is described. The models used fundamental frequency profile of SY syllables to characterize and discriminate the three Yoruba tones. Tonal parameters were selected carefully based on linguistic knowledge of tones and observation of acoustic data. We experimented with Multi-layered Perceptron (MLP) and Recurrent Neural Network (RNN) models by training them to classify feature parameters corresponding to tonal patterns. The results obtained exhibited good performances for the two tone recognition models, although the RNN achieved accuracy rates which are higher than that of the MLP model. For example, the outside tests for the H tone, produced a recognition accuracy of 71.00 and 76.00% for the MLP and the RNN models, respectively. In conclusion, this study has demonstrated a basic approach to tone recognition for Yoruba using Artificial Neural Networks (ANN). The proposed model can be easily extended to other African tone languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L. O. Adéwolé. The categorical status and the function of the Yorùbá auxiliary verb with some structural analysis in GPSG. PhD thesis, University of Edinburgh, Edinburgh, 1988.

    Google Scholar 

  2. A. Akinlabí. Underspecification and phonology of Yorùbá /r/. Linguistic Inquiry, 24(1):139–160, 1993.

    Google Scholar 

  3. A. M. A. Ali, J. Spiegel, and P. Mueller. Acoustic–phonetic features for the automatic classifcation of stop consonants. IEEE Transactions on Speech and Audio Processing, 9(8):833–841, 2001.

    Article  Google Scholar 

  4. S. Austin, G. Zavaliagkos, J. Makhoul, and R. Schwartz. Continuous speech recognition using segmental neural nets. In IEEE ICASSP, 625–628, San Francisco, 2006.

    Google Scholar 

  5. S. Bird. Automated tone transcription. http://www.idc.upenn.edu/sb/home/papers/9410022/941002.pdf, May 1994. Visited: Apr 2004.

  6. P. Boersma and D. Weenink. Praat, doing phonetics by computer. http://www.fon.hum.uva.nl/praat/,Mar 2004. Visited: Mar 2004.

  7. H. Bourlard, N. Morgan, and S. Renals. Neural nets and hidden Markov models: Review and generalizations. Speech Communication, 11:237–246, 1992.

    Article  Google Scholar 

  8. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Tree. Wadworth, CA, 1984.

    Google Scholar 

  9. T.–L. Burrows. Trainable Speech Synthesis. PhD thesis, Cambridge, Mar 1996.

    Google Scholar 

  10. Y. Cao, S. Zhang, T. Huang, and B. Xu. Tone modeling for continuous Mandarin speech recognition. International Journal of Speech Technology, 7:115–128, 2004.

    Article  Google Scholar 

  11. P. C. Chang, S. W. Sue, and S. H. Chen. Mandarin tone recognition by multi–layer perceptron. In Proceedings on IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 517–520, 1990.

    Google Scholar 

  12. S.–H. Chen, S.–H. Hwang, and Y.–R. Wang. An RNN–based prosodic information synthesiser for Mandarin text–to–speech. IEEE Transactions on Speech and Audio Processing, 6(3):226–239, 1998.

    Google Scholar 

  13. P.–C. Cheng, S.–W. Sun, and S. Chen. Mandarin tone recognition by multi–layer perceptron. In IEEE 1990 Internertional Conference on Acoustics, Speech and Signal Processing, ICASSP–90, 517–520, 1990.

    Google Scholar 

  14. B. Connell and D. R. Ladd. Aspect of pitch realisation in Yorùbá. Phonology, 7:1–29, 1990.

    Article  Google Scholar 

  15. B. A. Connell, J. T. Hogan, and A. J. Rozsypal. Experimental evidence of interaction between tone and intonation in Mandarin Chinese. Journal of Phonetics, 11:337–351, 1983.

    Google Scholar 

  16. R. Córdoba, J. M. Montero, J. M. Gutiérrez, J. A. Vallejo, E. Enriquez, and J. M. Pardo. Selection of most significant parameters for duration modelling in a Spanish text–to–speech system using neural networks. Computer Speech and Language, 16:183–203, 2002.

    Article  Google Scholar 

  17. D. H. Crozier and R. M. Blench. An Index of Nigerian Languages. Summer Institute of Linguistics, Dallas, 2nd edition, 1976.

    Google Scholar 

  18. E. Davalo and P. Näim. Neural Networks. MacMillan, Hong Kong, 1991.

    Google Scholar 

  19. T. Demeechai and K. Mäkeläinen. Recognition of syllable in a tone language. Speech Communication, 33:241–254, 2001.

    Article  MATH  Google Scholar 

  20. S. J. Eady. Difference in the f0 patterns of speech: Tone languages versus stress languages. Language and Speech, 25(Part 1):29–41, 1982.

    Google Scholar 

  21. J. L. Elman. Finding structure in time. Cognitive Science, 12(2):179–211, 1990.

    Article  Google Scholar 

  22. J. W. A. Fackrell, H. Vereecken, J. P. Martens, and B. V. Coile. Multilingual prosody modelling using cascades of regression trees and neural networks. http://chardonnay.elis.rug.ac.be/papers/1999_0001.pdf, 1999. Visited: Sep 2004.

  23. A. K. Fernando, X. Zhang, and P. F. Kinley. Combined sewer overflow forecasting with feed–forward back–propagation artificial neural network. Transactions On Engineering, Computing And Technology, 12:58–64, 2006.

    Google Scholar 

  24. S. C. Fox and E. K. Ong. A high school project on artificial intelligence in robotics. Artificial Intelligence in Engineering, 10:61–70, 1996.

    Article  Google Scholar 

  25. J. Gandour, S. Potisuk, and S. Dechnonhkit. Tonal co–articulation in Thai. Phonetica, 56:123–134, 1999.

    Article  Google Scholar 

  26. N. F. Gülera, E. D. übeylib, and I. Gülera. Recurrent neural networks employing Lyapunov exponents for EEG signals classification. Expert Systems with Applications, 29:506–514, 2005.

    Article  Google Scholar 

  27. M. D. Hanes, S. C. Ahalt, and A. K. Krishnamurthy. Acosutic–to–phonetic mapping using recurrent neural networks. IEEE Transactions on Neural Networks, 4(5):659–662, 1994.

    Article  Google Scholar 

  28. W. Holmes and M. Huckvale. Why have HMMs been so successful for automatic speech recognition and how might they be improved? Technical report, Phonetics, University Colledge London, 1994.

    Google Scholar 

  29. M. Huckvale. 10 things engineers have discovered about speech recognition. In NATO ASI workshop on speech pattern processing, 1997.

    Google Scholar 

  30. A. Hunt. Recurrent neural networks for syllabification. Speech Communication, 13:323–332, 1993.

    Article  Google Scholar 

  31. B. H. Juang and L. R. Rabiner. Hidden Markov models for speech recognition. Technometrics, 33:251–272, 1991.

    Article  MATH  MathSciNet  Google Scholar 

  32. A. Khotanzak and J. H. Lu. Classification of invariant images representations using a neural netwrok. IEEE Transactions on Speech and Audio Processing, 38(6):1028–1038, 1990.

    Google Scholar 

  33. D. R. Ladd. Tones and turning points: Bruce, pierrehumbert, and the elements of intonation phonology. In M. Horne (ed.) Prosody: Theory and Experiment – Studies presented to Gösta Bruce, 37–50, Kluwer Academic Publishers, Dordrecht, 2000.

    Google Scholar 

  34. S. Lee and Y.–H. Oh. Tree–based modelling of prosodic phrasing and segmental duration for Korean TTS systems. Speech Communication, 28(4):283–300, 1999.

    Article  Google Scholar 

  35. T. Lee. Automatic Recognition Of Isolated Cantonese Syllables Using Neural Networks. PhD thesis, The Chinese University of Hong Kong, Hong Kong, 1996.

    Google Scholar 

  36. T. Lee, P. C. Ching, L. W. Chan, Y. H. Cheng, and B. Mak. Tone recognition of isolated Cantonese syllables. IEEE Transactions on Speech and Audio Processing, 3(3):204–209, 1995.

    Article  Google Scholar 

  37. W.–S. Lee. The effect of intonation on the citation tones in Cantonese. In International Symposium on Tonal Aspect of Language, 28–31, Beijing, Mar 2004.

    Google Scholar 

  38. Y. Lee and L.–S. Lee. Continuous hidden Markov models integrating transition and instantaneous features for Mandarin syllable recognition. Computer Speech and Language, 7:247–263, 1993.

    Article  Google Scholar 

  39. S. E. Levinson. A unified theory of composite pattern analysis for automatic speech recognition. In F. F. andW. A.Woods (eds) Computer Speech Processing, Prentice–Hall International, London, 1985.

    Google Scholar 

  40. Y.–F. Liao and S.–H. Chen. A modular RNN–based method for continuous Mandarin speech recognition. IEEE Transactions on Speech and Audio Processing, 9(3):252–263, 2001.

    Article  MathSciNet  Google Scholar 

  41. C.–H. Lin, R.–C.Wu, J.–Y. Chang, and S.–F. Liang. A novel prosodic–information synthesizer based on recurrent fuzzy neural networks for Chinese TTS system. IEEE Transactions on Systems, Man and Cybernetics, B:1–16, 2003.

    Google Scholar 

  42. R. P. Lippman. Review of neural networks for speech recognition. Neural Computing, 1:1–38, 1989.

    Article  Google Scholar 

  43. F.–H. Liu, Y. Lee, and L.–S. Lee. A direct–concatenation approach to training hiddenMarkov models to recognize the highly confusing Mandarin syllables with very limited training data. IEEE Transactions on Speech and Audio Processing, 1(1):113–119, 1993.

    Article  Google Scholar 

  44. L. Liu, H. Yang, H.Wang, and Y. Chang. Tone recognition of polysyllabic words in Mandarin speech. Computer Speech and Language, 3:253–264, 1989.

    Article  Google Scholar 

  45. J. McKenna. Tone and initial/final recognition for Mandarin Chinese. Master’s thesis, University of Edingbrugh, U.K., 1996.

    Google Scholar 

  46. N. Minematsu, R. Kita, and K. Hirose. Automatic estimation of accentual attribute values of words for accent sandhi rules of Japanese text–to–speech conversion. IEICE Transactions on Information and System, E86–D(3):550–557, Mar 2003.

    Google Scholar 

  47. N. Morgan and H. Bourlard. Continuous speech recognition using multilayer perceptrons with Hidden Markov Models. In Proceedings of IEEE ICASSP, 413– 416, Albuquerque, 1990.

    Google Scholar 

  48. R. D. Mori, P. Laface, and Y. Mong. Parallel algorithms for syllable recognition in continuous speech. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI–7(1):56–69, 1985.

    Article  Google Scholar 

  49. Y. Morlec, G. Bailly, and V. Aubergé. Generating prosodic attitudes in French: data, model and evaluation. Speech Communication, 33:357–371, 2001.

    Article  MATH  Google Scholar 

  50. S. M. O’Brien. Knowledge–based systems in speech recognition: A survey. International Journal of Man–Machine Studies, 38:71–95, 1993.

    Google Scholar 

  51. O. A. O. déjobí, A. J. Beaumont, and S. H. S. Wong. A computational model of intonation for Yorübá text–to–speech synthesis: Design and analysis. In P. Sojka, I. Kopeček, and K. Pala, (eds) Lecture Notes in Artificial Intelligence, Lecture Notes in Computer Science (LNAI 3206), 409–416. Springer, Berlin Heidelberg New York, Sep 2004.

    Google Scholar 

  52. O. A. O. déjobí, A. J. Beaumont, and S. H. S. Wong. Experiments on stylisation of standard Yorübá language tones. Technical Report KEG/2004/003, Aston University, Birmingham, Jul 2004.

    Google Scholar 

  53. S. M. Peeling and R. K. Moore. Isolated digit recognition experiments using the multi–layer perceptron. Speech Communication, 7:403–409, 1988.

    Article  Google Scholar 

  54. G. Peng and W. S.–Y. Wang. Tone recognition of continuous Cantonese speech based on support vector machines. Speech Communication, 45:49–62, Sep 2005.

    Google Scholar 

  55. A. A. Petrosian, D. V. Prokhorov, W. Lajara–Nanson, and R. B. Schiffer. Recurrent neural network–based approach for early recognition of Alzheimer’s disease in EEG. Clinical Neurophysiology, 112(8):1378–1387, 2001.

    Google Scholar 

  56. L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of IEEE, 77:257–286, 1989.

    Google Scholar 

  57. M. Riedi. A neural–network–based model of segmental duration for speech synthesis. In European Conference on Speech Communication and Technology, 599–602, 1995.

    Google Scholar 

  58. M. D. Riley. Tree–based modelling of segmental durations. In G. Bailly, C. Benoit, and T. R. Sawallis (eds), Talking Machines: Theories, Models and Designs, p 265–273. Elsevier, Amsterdam, 1992.

    Google Scholar 

  59. S. R. Safavian and D. Landgrebe. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man and Cybernetics, 21:660–674, 1991.

    Google Scholar 

  60. P. Salmena. Applying dynamic context into MLP/HMM speech recognition system. IEEE Transactions on Neural Networks, 15:233–255, 2001.

    Google Scholar 

  61. J. Sirigos, N. Fakotakis, and G. Kokkinakis. A hybrid syllable recognition system based on vowel spotting. Speech Communication, 38:427–440, 2002.

    Article  MATH  Google Scholar 

  62. C. Taylor. Typesetting African languages. http://www.ideography.co.uk/ library/afrolingua.html, 2000. Visited: Apr 2004.

    Google Scholar 

  63. P. Taylor. Using neural networks to locate pitch accents. In Proceedings of EuroSpeech ‘95, 1345–1348, Madrid, Sep 1995.

    Google Scholar 

  64. N. Thubthong and B. Kijsirikul. Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half–tone model. International Journal of Uncertainty, Fuzziness and Knowledge–Based Systems, 9(6):815–825, 2001.

    Google Scholar 

  65. M. Vainio. Artificial neural network based prosody models for Finnish text–tospeech synthesis. PhD thesis, Department of Phonetics, University of Helsinki, Helsinki, 2001.

    Google Scholar 

  66. H.–M. Wang, T.–H. Ho, R.–C. Yang, J.–L. Shen, B.–O. Bai, J.–C. Hong, W.–P. Chen, T.–L. Yu, and L.–S. Lee. Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data. IEEE Transactions on Speech and Audio Processing, 5(2):195–200, 1997.

    Google Scholar 

  67. T.-R. Wang and S.-H. Chen. Tone recognition of continuous Mandarin speech assisted with prosodic information. Journal of the Acoustical Society of America, 96(5):2637–2645, 1994.

    Article  Google Scholar 

  68. W.-J. Wang, Y.-F. Liao, and S.-H. Chen. RNN-based prosodic modelling for Mandarin speech and its application to speech-to-text conversion. Speech Communication, 36:247–265, 2002.

    Article  MATH  Google Scholar 

  69. Y. R. Wang, J.-M. Shieh, and S.-H. Chen. Tone recognition of continuous Mandarin speech based on hidden Markov model. International Journal of Pattern Recognition and Artificial Intelligence, 8(1):233–246, 1994.

    Article  MATH  Google Scholar 

  70. M. Wester. Pronunciation modeling for ASR knowledge-based and data-driven methods. Computer Speech and Language, 38:69–85, 2003.

    Article  Google Scholar 

  71. Y. Xu. Understanding tone from the perspective of production and perception. Language and Lingustics, 5:757–797, 2005.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

ỌDélọBí, Ọ.À. (2008). Recognition of Tones in YorÙbÁ Speech: Experiments With Artificial Neural Networks. In: Prasad, B., Prasanna, S.R.M. (eds) Speech, Audio, Image and Biomedical Signal Processing using Neural Networks. Studies in Computational Intelligence, vol 83. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75398-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75398-8_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75397-1

  • Online ISBN: 978-3-540-75398-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics