Recognition of Tones in YorÙbÁ Speech: Experiments With Artificial Neural Networks

ỌDélọBí, Ọdétúnjí Àjàdí

doi:10.1007/978-3-540-75398-8_2

Ọdétúnjí Àjàdí ỌDélọBí⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 83))

2127 Accesses
3 Citations

The speech recognition technology has been applied with success to many Western and Asian languages. Work on African languages still remains very limited in this area. Here a study into automatic recognition of the Standard Yoruba (SY) language tones is described. The models used fundamental frequency profile of SY syllables to characterize and discriminate the three Yoruba tones. Tonal parameters were selected carefully based on linguistic knowledge of tones and observation of acoustic data. We experimented with Multi-layered Perceptron (MLP) and Recurrent Neural Network (RNN) models by training them to classify feature parameters corresponding to tonal patterns. The results obtained exhibited good performances for the two tone recognition models, although the RNN achieved accuracy rates which are higher than that of the MLP model. For example, the outside tests for the H tone, produced a recognition accuracy of 71.00 and 76.00% for the MLP and the RNN models, respectively. In conclusion, this study has demonstrated a basic approach to tone recognition for Yoruba using Artificial Neural Networks (ANN). The proposed model can be easily extended to other African tone languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. O. Adéwolé. The categorical status and the function of the Yorùbá auxiliary verb with some structural analysis in GPSG. PhD thesis, University of Edinburgh, Edinburgh, 1988.
Google Scholar
A. Akinlabí. Underspecification and phonology of Yorùbá /r/. Linguistic Inquiry, 24(1):139–160, 1993.
Google Scholar
A. M. A. Ali, J. Spiegel, and P. Mueller. Acoustic–phonetic features for the automatic classifcation of stop consonants. IEEE Transactions on Speech and Audio Processing, 9(8):833–841, 2001.
Article Google Scholar
S. Austin, G. Zavaliagkos, J. Makhoul, and R. Schwartz. Continuous speech recognition using segmental neural nets. In IEEE ICASSP, 625–628, San Francisco, 2006.
Google Scholar
S. Bird. Automated tone transcription. http://www.idc.upenn.edu/sb/home/papers/9410022/941002.pdf, May 1994. Visited: Apr 2004.
P. Boersma and D. Weenink. Praat, doing phonetics by computer. http://www.fon.hum.uva.nl/praat/,Mar 2004. Visited: Mar 2004.
H. Bourlard, N. Morgan, and S. Renals. Neural nets and hidden Markov models: Review and generalizations. Speech Communication, 11:237–246, 1992.
Article Google Scholar
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Tree. Wadworth, CA, 1984.
Google Scholar
T.–L. Burrows. Trainable Speech Synthesis. PhD thesis, Cambridge, Mar 1996.
Google Scholar
Y. Cao, S. Zhang, T. Huang, and B. Xu. Tone modeling for continuous Mandarin speech recognition. International Journal of Speech Technology, 7:115–128, 2004.
Article Google Scholar
P. C. Chang, S. W. Sue, and S. H. Chen. Mandarin tone recognition by multi–layer perceptron. In Proceedings on IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 517–520, 1990.
Google Scholar
S.–H. Chen, S.–H. Hwang, and Y.–R. Wang. An RNN–based prosodic information synthesiser for Mandarin text–to–speech. IEEE Transactions on Speech and Audio Processing, 6(3):226–239, 1998.
Google Scholar
P.–C. Cheng, S.–W. Sun, and S. Chen. Mandarin tone recognition by multi–layer perceptron. In IEEE 1990 Internertional Conference on Acoustics, Speech and Signal Processing, ICASSP–90, 517–520, 1990.
Google Scholar
B. Connell and D. R. Ladd. Aspect of pitch realisation in Yorùbá. Phonology, 7:1–29, 1990.
Article Google Scholar
B. A. Connell, J. T. Hogan, and A. J. Rozsypal. Experimental evidence of interaction between tone and intonation in Mandarin Chinese. Journal of Phonetics, 11:337–351, 1983.
Google Scholar
R. Córdoba, J. M. Montero, J. M. Gutiérrez, J. A. Vallejo, E. Enriquez, and J. M. Pardo. Selection of most significant parameters for duration modelling in a Spanish text–to–speech system using neural networks. Computer Speech and Language, 16:183–203, 2002.
Article Google Scholar
D. H. Crozier and R. M. Blench. An Index of Nigerian Languages. Summer Institute of Linguistics, Dallas, 2nd edition, 1976.
Google Scholar
E. Davalo and P. Näim. Neural Networks. MacMillan, Hong Kong, 1991.
Google Scholar
T. Demeechai and K. Mäkeläinen. Recognition of syllable in a tone language. Speech Communication, 33:241–254, 2001.
Article MATH Google Scholar
S. J. Eady. Difference in the f0 patterns of speech: Tone languages versus stress languages. Language and Speech, 25(Part 1):29–41, 1982.
Google Scholar
J. L. Elman. Finding structure in time. Cognitive Science, 12(2):179–211, 1990.
Article Google Scholar
J. W. A. Fackrell, H. Vereecken, J. P. Martens, and B. V. Coile. Multilingual prosody modelling using cascades of regression trees and neural networks. http://chardonnay.elis.rug.ac.be/papers/1999_0001.pdf, 1999. Visited: Sep 2004.
A. K. Fernando, X. Zhang, and P. F. Kinley. Combined sewer overflow forecasting with feed–forward back–propagation artificial neural network. Transactions On Engineering, Computing And Technology, 12:58–64, 2006.
Google Scholar
S. C. Fox and E. K. Ong. A high school project on artificial intelligence in robotics. Artificial Intelligence in Engineering, 10:61–70, 1996.
Article Google Scholar
J. Gandour, S. Potisuk, and S. Dechnonhkit. Tonal co–articulation in Thai. Phonetica, 56:123–134, 1999.
Article Google Scholar
N. F. Gülera, E. D. übeylib, and I. Gülera. Recurrent neural networks employing Lyapunov exponents for EEG signals classification. Expert Systems with Applications, 29:506–514, 2005.
Article Google Scholar
M. D. Hanes, S. C. Ahalt, and A. K. Krishnamurthy. Acosutic–to–phonetic mapping using recurrent neural networks. IEEE Transactions on Neural Networks, 4(5):659–662, 1994.
Article Google Scholar
W. Holmes and M. Huckvale. Why have HMMs been so successful for automatic speech recognition and how might they be improved? Technical report, Phonetics, University Colledge London, 1994.
Google Scholar
M. Huckvale. 10 things engineers have discovered about speech recognition. In NATO ASI workshop on speech pattern processing, 1997.
Google Scholar
A. Hunt. Recurrent neural networks for syllabification. Speech Communication, 13:323–332, 1993.
Article Google Scholar
B. H. Juang and L. R. Rabiner. Hidden Markov models for speech recognition. Technometrics, 33:251–272, 1991.
Article MATH MathSciNet Google Scholar
A. Khotanzak and J. H. Lu. Classification of invariant images representations using a neural netwrok. IEEE Transactions on Speech and Audio Processing, 38(6):1028–1038, 1990.
Google Scholar
D. R. Ladd. Tones and turning points: Bruce, pierrehumbert, and the elements of intonation phonology. In M. Horne (ed.) Prosody: Theory and Experiment – Studies presented to Gösta Bruce, 37–50, Kluwer Academic Publishers, Dordrecht, 2000.
Google Scholar
S. Lee and Y.–H. Oh. Tree–based modelling of prosodic phrasing and segmental duration for Korean TTS systems. Speech Communication, 28(4):283–300, 1999.
Article Google Scholar
T. Lee. Automatic Recognition Of Isolated Cantonese Syllables Using Neural Networks. PhD thesis, The Chinese University of Hong Kong, Hong Kong, 1996.
Google Scholar
T. Lee, P. C. Ching, L. W. Chan, Y. H. Cheng, and B. Mak. Tone recognition of isolated Cantonese syllables. IEEE Transactions on Speech and Audio Processing, 3(3):204–209, 1995.
Article Google Scholar
W.–S. Lee. The effect of intonation on the citation tones in Cantonese. In International Symposium on Tonal Aspect of Language, 28–31, Beijing, Mar 2004.
Google Scholar
Y. Lee and L.–S. Lee. Continuous hidden Markov models integrating transition and instantaneous features for Mandarin syllable recognition. Computer Speech and Language, 7:247–263, 1993.
Article Google Scholar
S. E. Levinson. A unified theory of composite pattern analysis for automatic speech recognition. In F. F. andW. A.Woods (eds) Computer Speech Processing, Prentice–Hall International, London, 1985.
Google Scholar
Y.–F. Liao and S.–H. Chen. A modular RNN–based method for continuous Mandarin speech recognition. IEEE Transactions on Speech and Audio Processing, 9(3):252–263, 2001.
Article MathSciNet Google Scholar
C.–H. Lin, R.–C.Wu, J.–Y. Chang, and S.–F. Liang. A novel prosodic–information synthesizer based on recurrent fuzzy neural networks for Chinese TTS system. IEEE Transactions on Systems, Man and Cybernetics, B:1–16, 2003.
Google Scholar
R. P. Lippman. Review of neural networks for speech recognition. Neural Computing, 1:1–38, 1989.
Article Google Scholar
F.–H. Liu, Y. Lee, and L.–S. Lee. A direct–concatenation approach to training hiddenMarkov models to recognize the highly confusing Mandarin syllables with very limited training data. IEEE Transactions on Speech and Audio Processing, 1(1):113–119, 1993.
Article Google Scholar
L. Liu, H. Yang, H.Wang, and Y. Chang. Tone recognition of polysyllabic words in Mandarin speech. Computer Speech and Language, 3:253–264, 1989.
Article Google Scholar
J. McKenna. Tone and initial/final recognition for Mandarin Chinese. Master’s thesis, University of Edingbrugh, U.K., 1996.
Google Scholar
N. Minematsu, R. Kita, and K. Hirose. Automatic estimation of accentual attribute values of words for accent sandhi rules of Japanese text–to–speech conversion. IEICE Transactions on Information and System, E86–D(3):550–557, Mar 2003.
Google Scholar
N. Morgan and H. Bourlard. Continuous speech recognition using multilayer perceptrons with Hidden Markov Models. In Proceedings of IEEE ICASSP, 413– 416, Albuquerque, 1990.
Google Scholar
R. D. Mori, P. Laface, and Y. Mong. Parallel algorithms for syllable recognition in continuous speech. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI–7(1):56–69, 1985.
Article Google Scholar
Y. Morlec, G. Bailly, and V. Aubergé. Generating prosodic attitudes in French: data, model and evaluation. Speech Communication, 33:357–371, 2001.
Article MATH Google Scholar
S. M. O’Brien. Knowledge–based systems in speech recognition: A survey. International Journal of Man–Machine Studies, 38:71–95, 1993.
Google Scholar
O. A. O. déjobí, A. J. Beaumont, and S. H. S. Wong. A computational model of intonation for Yorübá text–to–speech synthesis: Design and analysis. In P. Sojka, I. Kopeček, and K. Pala, (eds) Lecture Notes in Artificial Intelligence, Lecture Notes in Computer Science (LNAI 3206), 409–416. Springer, Berlin Heidelberg New York, Sep 2004.
Google Scholar
O. A. O. déjobí, A. J. Beaumont, and S. H. S. Wong. Experiments on stylisation of standard Yorübá language tones. Technical Report KEG/2004/003, Aston University, Birmingham, Jul 2004.
Google Scholar
S. M. Peeling and R. K. Moore. Isolated digit recognition experiments using the multi–layer perceptron. Speech Communication, 7:403–409, 1988.
Article Google Scholar
G. Peng and W. S.–Y. Wang. Tone recognition of continuous Cantonese speech based on support vector machines. Speech Communication, 45:49–62, Sep 2005.
Google Scholar
A. A. Petrosian, D. V. Prokhorov, W. Lajara–Nanson, and R. B. Schiffer. Recurrent neural network–based approach for early recognition of Alzheimer’s disease in EEG. Clinical Neurophysiology, 112(8):1378–1387, 2001.
Google Scholar
L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. In Proceedings of IEEE, 77:257–286, 1989.
Google Scholar
M. Riedi. A neural–network–based model of segmental duration for speech synthesis. In European Conference on Speech Communication and Technology, 599–602, 1995.
Google Scholar
M. D. Riley. Tree–based modelling of segmental durations. In G. Bailly, C. Benoit, and T. R. Sawallis (eds), Talking Machines: Theories, Models and Designs, p 265–273. Elsevier, Amsterdam, 1992.
Google Scholar
S. R. Safavian and D. Landgrebe. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man and Cybernetics, 21:660–674, 1991.
Google Scholar
P. Salmena. Applying dynamic context into MLP/HMM speech recognition system. IEEE Transactions on Neural Networks, 15:233–255, 2001.
Google Scholar
J. Sirigos, N. Fakotakis, and G. Kokkinakis. A hybrid syllable recognition system based on vowel spotting. Speech Communication, 38:427–440, 2002.
Article MATH Google Scholar
C. Taylor. Typesetting African languages. http://www.ideography.co.uk/ library/afrolingua.html, 2000. Visited: Apr 2004.
Google Scholar
P. Taylor. Using neural networks to locate pitch accents. In Proceedings of EuroSpeech ‘95, 1345–1348, Madrid, Sep 1995.
Google Scholar
N. Thubthong and B. Kijsirikul. Tone recognition of continuous Thai speech under tonal assimilation and declination effects using half–tone model. International Journal of Uncertainty, Fuzziness and Knowledge–Based Systems, 9(6):815–825, 2001.
Google Scholar
M. Vainio. Artificial neural network based prosody models for Finnish text–tospeech synthesis. PhD thesis, Department of Phonetics, University of Helsinki, Helsinki, 2001.
Google Scholar
H.–M. Wang, T.–H. Ho, R.–C. Yang, J.–L. Shen, B.–O. Bai, J.–C. Hong, W.–P. Chen, T.–L. Yu, and L.–S. Lee. Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data. IEEE Transactions on Speech and Audio Processing, 5(2):195–200, 1997.
Google Scholar
T.-R. Wang and S.-H. Chen. Tone recognition of continuous Mandarin speech assisted with prosodic information. Journal of the Acoustical Society of America, 96(5):2637–2645, 1994.
Article Google Scholar
W.-J. Wang, Y.-F. Liao, and S.-H. Chen. RNN-based prosodic modelling for Mandarin speech and its application to speech-to-text conversion. Speech Communication, 36:247–265, 2002.
Article MATH Google Scholar
Y. R. Wang, J.-M. Shieh, and S.-H. Chen. Tone recognition of continuous Mandarin speech based on hidden Markov model. International Journal of Pattern Recognition and Artificial Intelligence, 8(1):233–246, 1994.
Article MATH Google Scholar
M. Wester. Pronunciation modeling for ASR knowledge-based and data-driven methods. Computer Speech and Language, 38:69–85, 2003.
Article Google Scholar
Y. Xu. Understanding tone from the perspective of production and perception. Language and Lingustics, 5:757–797, 2005.
Google Scholar

Download references

Author information

Authors and Affiliations

Ọbáfémi Awólòwò University, Ilé-Ifè, Nigeria
Ọdétúnjí Àjàdí ỌDélọBí

Authors

Ọdétúnjí Àjàdí ỌDélọBí
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer and Information Sciences, Florida A&M University, Tallahassee, FL 32307, USA
Bhanu Prasad
Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, India
S. R. Mahadeva Prasanna

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

ỌDélọBí, Ọ.À. (2008). Recognition of Tones in YorÙbÁ Speech: Experiments With Artificial Neural Networks. In: Prasad, B., Prasanna, S.R.M. (eds) Speech, Audio, Image and Biomedical Signal Processing using Neural Networks. Studies in Computational Intelligence, vol 83. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75398-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-75398-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75397-1
Online ISBN: 978-3-540-75398-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics