The speech recognition technology has been applied with success to many Western and Asian languages. Work on African languages still remains very limited in this area. Here a study into automatic recognition of the Standard Yoruba (SY) language tones is described. The models used fundamental frequency profile of SY syllables to characterize and discriminate the three Yoruba tones. Tonal parameters were selected carefully based on linguistic knowledge of tones and observation of acoustic data. We experimented with Multi-layered Perceptron (MLP) and Recurrent Neural Network (RNN) models by training them to classify feature parameters corresponding to tonal patterns. The results obtained exhibited good performances for the two tone recognition models, although the RNN achieved accuracy rates which are higher than that of the MLP model. For example, the outside tests for the H tone, produced a recognition accuracy of 71.00 and 76.00% for the MLP and the RNN models, respectively. In conclusion, this study has demonstrated a basic approach to tone recognition for Yoruba using Artificial Neural Networks (ANN). The proposed model can be easily extended to other African tone languages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L. O. Adéwolé. The categorical status and the function of the Yorùbá auxiliary verb with some structural analysis in GPSG. PhD thesis, University of Edinburgh, Edinburgh, 1988.
A. Akinlabí. Underspecification and phonology of Yorùbá /r/. Linguistic Inquiry, 24(1):139–160, 1993.
A. M. A. Ali, J. Spiegel, and P. Mueller. Acoustic–phonetic features for the automatic classifcation of stop consonants. IEEE Transactions on Speech and Audio Processing, 9(8):833–841, 2001.
S. Austin, G. Zavaliagkos, J. Makhoul, and R. Schwartz. Continuous speech recognition using segmental neural nets. In IEEE ICASSP, 625–628, San Francisco, 2006.
S. Bird. Automated tone transcription. http://www.idc.upenn.edu/sb/home/papers/9410022/941002.pdf, May 1994. Visited: Apr 2004.
P. Boersma and D. Weenink. Praat, doing phonetics by computer. http://www.fon.hum.uva.nl/praat/,Mar 2004. Visited: Mar 2004.
H. Bourlard, N. Morgan, and S. Renals. Neural nets and hidden Markov models: Review and generalizations. Speech Communication, 11:237–246, 1992.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Tree. Wadworth, CA, 1984.
T.–L. Burrows. Trainable Speech Synthesis. PhD thesis, Cambridge, Mar 1996.
Y. Cao, S. Zhang, T. Huang, and B. Xu. Tone modeling for continuous Mandarin speech recognition. International Journal of Speech Technology, 7:115–128, 2004.
P. C. Chang, S. W. Sue, and S. H. Chen. Mandarin tone recognition by multi–layer perceptron. In Proceedings on IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 517–520, 1990.
S.–H. Chen, S.–H. Hwang, and Y.–R. Wang. An RNN–based prosodic information synthesiser for Mandarin text–to–speech. IEEE Transactions on Speech and Audio Processing, 6(3):226–239, 1998.
P.–C. Cheng, S.–W. Sun, and S. Chen. Mandarin tone recognition by multi–layer perceptron. In IEEE 1990 Internertional Conference on Acoustics, Speech and Signal Processing, ICASSP–90, 517–520, 1990.
B. Connell and D. R. Ladd. Aspect of pitch realisation in Yorùbá. Phonology, 7:1–29, 1990.
B. A. Connell, J. T. Hogan, and A. J. Rozsypal. Experimental evidence of interaction between tone and intonation in Mandarin Chinese. Journal of Phonetics, 11:337–351, 1983.
R. Córdoba, J. M. Montero, J. M. Gutiérrez, J. A. Vallejo, E. Enriquez, and J. M. Pardo. Selection of most significant parameters for duration modelling in a Spanish text–to–speech system using neural networks. Computer Speech and Language, 16:183–203, 2002.
D. H. Crozier and R. M. Blench. An Index of Nigerian Languages. Summer Institute of Linguistics, Dallas, 2nd edition, 1976.
E. Davalo and P. Näim. Neural Networks. MacMillan, Hong Kong, 1991.
T. Demeechai and K. Mäkeläinen. Recognition of syllable in a tone language. Speech Communication, 33:241–254, 2001.
S. J. Eady. Difference in the f0 patterns of speech: Tone languages versus stress languages. Language and Speech, 25(Part 1):29–41, 1982.
J. L. Elman. Finding structure in time. Cognitive Science, 12(2):179–211, 1990.
J. W. A. Fackrell, H. Vereecken, J. P. Martens, and B. V. Coile. Multilingual prosody modelling using cascades of regression trees and neural networks. http://chardonnay.elis.rug.ac.be/papers/1999_0001.pdf, 1999. Visited: Sep 2004.
A. K. Fernando, X. Zhang, and P. F. Kinley. Combined sewer overflow forecasting with feed–forward back–propagation artificial neural network. Transactions On Engineering, Computing And Technology, 12:58–64, 2006.
S. C. Fox and E. K. Ong. A high school project on artificial intelligence in robotics. Artificial Intelligence in Engineering, 10:61–70, 1996.
J. Gandour, S. Potisuk, and S. Dechnonhkit. Tonal co–articulation in Thai. Phonetica, 56:123–134, 1999.
N. F. Gülera, E. D. übeylib, and I. Gülera. Recurrent neural networks employing Lyapunov exponents for EEG signals classification. Expert Systems with Applications, 29:506–514, 2005.
M. D. Hanes, S. C. Ahalt, and A. K. Krishnamurthy. Acosutic–to–phonetic mapping using recurrent neural networks. IEEE Transactions on Neural Networks, 4(5):659–662, 1994.
W. Holmes and M. Huckvale. Why have HMMs been so successful for automatic speech recognition and how might they be improved? Technical report, Phonetics, University Colledge London, 1994.
M. Huckvale. 10 things engineers have discovered about speech recognition. In NATO ASI workshop on speech pattern processing, 1997.
A. Hunt. Recurrent neural networks for syllabification. Speech Communication, 13:323–332, 1993.
B. H. Juang and L. R. Rabiner. Hidden Markov models for speech recognition. Technometrics, 33:251–272, 1991.
A. Khotanzak and J. H. Lu. Classification of invariant images representations using a neural netwrok. IEEE Transactions on Speech and Audio Processing, 38(6):1028–1038, 1990.
D. R. Ladd. Tones and turning points: Bruce, pierrehumbert, and the elements of intonation phonology. In M. Horne (ed.) Prosody: Theory and Experiment – Studies presented to Gösta Bruce, 37–50, Kluwer Academic Publishers, Dordrecht, 2000.
S. Lee and Y.–H. Oh. Tree–based modelling of prosodic phrasing and segmental duration for Korean TTS systems. Speech Communication, 28(4):283–300, 1999.
T. Lee. Automatic Recognition Of Isolated Cantonese Syllables Using Neural Networks. PhD thesis, The Chinese University of Hong Kong, Hong Kong, 1996.
T. Lee, P. C. Ching, L. W. Chan, Y. H. Cheng, and B. Mak. Tone recognition of isolated Cantonese syllables. IEEE Transactions on Speech and Audio Processing, 3(3):204–209, 1995.
W.–S. Lee. The effect of intonation on the citation tones in Cantonese. In International Symposium on Tonal Aspect of Language, 28–31, Beijing, Mar 2004.
Y. Lee and L.–S. Lee. Continuous hidden Markov models integrating transition and instantaneous features for Mandarin syllable recognition. Computer Speech and Language, 7:247–263, 1993.
S. E. Levinson. A unified theory of composite pattern analysis for automatic speech recognition. In F. F. andW. A.Woods (eds) Computer Speech Processing, Prentice–Hall International, London, 1985.
Y.–F. Liao and S.–H. Chen. A modular RNN–based method for continuous Mandarin speech recognition. IEEE Transactions on Speech and Audio Processing, 9(3):252–263, 2001.
C.–H. Lin, R.–C.Wu, J.–Y. Chang, and S.–F. Liang. A novel prosodic–information synthesizer based on recurrent fuzzy neural networks for Chinese TTS system. IEEE Transactions on Systems, Man and Cybernetics, B:1–16, 2003.
R. P. Lippman. Review of neural networks for speech recognition. Neural Computing, 1:1–38, 1989.
F.–H. Liu, Y. Lee, and L.–S. Lee. A direct–concatenation approach to training hiddenMarkov models to recognize the highly confusing Mandarin syllables with very limited training data. IEEE Transactions on Speech and Audio Processing, 1(1):113–119, 1993.
L. Liu, H. Yang, H.Wang, and Y. Chang. Tone recognition of polysyllabic words in Mandarin speech. Computer Speech and Language, 3:253–264, 1989.
J. McKenna. Tone and initial/final recognition for Mandarin Chinese. Master’s thesis, University of Edingbrugh, U.K., 1996.
N. Minematsu, R. Kita, and K. Hirose. Automatic estimation of accentual attribute values of words for accent sandhi rules of Japanese text–to–speech conversion. IEICE Transactions on Information and System, E86–D(3):550–557, Mar 2003.
N. Morgan and H. Bourlard. Continuous speech recognition using multilayer perceptrons with Hidden Markov Models. In Proceedings of IEEE ICASSP, 413– 416, Albuquerque, 1990.
R. D. Mori, P. Laface, and Y. Mong. Parallel algorithms for syllable recognition in continuous speech. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI–7(1):56–69, 1985.
Y. Morlec, G. Bailly, and V. Aubergé. Generating prosodic attitudes in French: data, model and evaluation. Speech Communication, 33:357–371, 2001.
S. M. O’Brien. Knowledge–based systems in speech recognition: A survey. International Journal of Man–Machine Studies, 38:71–95, 1993.
O. A. O. déjobí, A. J. Beaumont, and S. H. S. Wong. A computational model of intonation for Yorübá text–to–speech synthesis: Design and analysis. In P. Sojka, I. Kopeček, and K. Pala, (eds) Lecture Notes in Artificial Intelligence, Lecture Notes in Computer Science (LNAI 3206), 409–416. Springer, Berlin Heidelberg New York, Sep 2004.
O. A. O. déjobí, A. J. Beaumont, and S. H. S. Wong. Experiments on stylisation of standard Yorübá language tones. Technical Report KEG/2004/003, Aston University, Birmingham, Jul 2004.
S. M. Peeling and R. K. Moore. Isolated digit recognition experiments using the multi–layer perceptron. Speech Communication, 7:403–409, 1988.
G. Peng and W. S.–Y. Wang. Tone recognition of continuous Cantonese speech based on support vector machines. Speech Communication, 45:49–62, Sep 2005.
A. A. Petrosian, D. V. Prokhorov, W. Lajara–Nanson, and R. B. Schiffer. Recurrent neural network–based approach for early recognition of Alzheimer’s disease in EEG. Clinical Neurophysiology, 112(8):1378–1387, 2001.
L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of IEEE, 77:257–286, 1989.
M. Riedi. A neural–network–based model of segmental duration for speech synthesis. In European Conference on Speech Communication and Technology, 599–602, 1995.
M. D. Riley. Tree–based modelling of segmental durations. In G. Bailly, C. Benoit, and T. R. Sawallis (eds), Talking Machines: Theories, Models and Designs, p 265–273. Elsevier, Amsterdam, 1992.
S. R. Safavian and D. Landgrebe. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man and Cybernetics, 21:660–674, 1991.
P. Salmena. Applying dynamic context into MLP/HMM speech recognition system. IEEE Transactions on Neural Networks, 15:233–255, 2001.
J. Sirigos, N. Fakotakis, and G. Kokkinakis. A hybrid syllable recognition system based on vowel spotting. Speech Communication, 38:427–440, 2002.
C. Taylor. Typesetting African languages. http://www.ideography.co.uk/ library/afrolingua.html, 2000. Visited: Apr 2004.
P. Taylor. Using neural networks to locate pitch accents. In Proceedings of EuroSpeech ‘95, 1345–1348, Madrid, Sep 1995.
N. Thubthong and B. Kijsirikul. Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half–tone model. International Journal of Uncertainty, Fuzziness and Knowledge–Based Systems, 9(6):815–825, 2001.
M. Vainio. Artificial neural network based prosody models for Finnish text–tospeech synthesis. PhD thesis, Department of Phonetics, University of Helsinki, Helsinki, 2001.
H.–M. Wang, T.–H. Ho, R.–C. Yang, J.–L. Shen, B.–O. Bai, J.–C. Hong, W.–P. Chen, T.–L. Yu, and L.–S. Lee. Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data. IEEE Transactions on Speech and Audio Processing, 5(2):195–200, 1997.
T.-R. Wang and S.-H. Chen. Tone recognition of continuous Mandarin speech assisted with prosodic information. Journal of the Acoustical Society of America, 96(5):2637–2645, 1994.
W.-J. Wang, Y.-F. Liao, and S.-H. Chen. RNN-based prosodic modelling for Mandarin speech and its application to speech-to-text conversion. Speech Communication, 36:247–265, 2002.
Y. R. Wang, J.-M. Shieh, and S.-H. Chen. Tone recognition of continuous Mandarin speech based on hidden Markov model. International Journal of Pattern Recognition and Artificial Intelligence, 8(1):233–246, 1994.
M. Wester. Pronunciation modeling for ASR knowledge-based and data-driven methods. Computer Speech and Language, 38:69–85, 2003.
Y. Xu. Understanding tone from the perspective of production and perception. Language and Lingustics, 5:757–797, 2005.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
ỌDélọBí, Ọ.À. (2008). Recognition of Tones in YorÙbÁ Speech: Experiments With Artificial Neural Networks. In: Prasad, B., Prasanna, S.R.M. (eds) Speech, Audio, Image and Biomedical Signal Processing using Neural Networks. Studies in Computational Intelligence, vol 83. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75398-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-75398-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75397-1
Online ISBN: 978-3-540-75398-8
eBook Packages: EngineeringEngineering (R0)