Abstract
A modified 2-D Kohonen Self-Organizing (MSOM) neural network is used for recognizing Farsi isolated words. The network dimension is 10*15 cells with a hexagonal topology and it is trained using 300 Farsi words. As input vectors for learning, speech spectrum and energy of signal are used. The weight vectors of the cells are then fine tuned using supervised learning vector quantization 3 (LVQ3) technique. The cells are labeled to 28 out of 29 Farsi phonemes. At the word recognition stage, the quasi phonemes are obtained. Then the phonemes are determined. Using the phonetic rules of Farsi words and the connection rules of Farsi characters, the recognized word will appear on the monitor. To remedy the errors, a 2500 word dictionary is used. The determined sequence of phonemes is given to the dictionary, and the closest word to the sequence is shown on the monitor. The proposed recognizer is able to recognize all vowels with the accuracy of 100 percent, and it also recognize correctly 55 isolated words among 100 words.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kohonen, T.: The neural phonetic typewriter. Computer 21(3), 11–22 (1988)
Kohonen, T.: The self-organizing map. Proceeding of the IEEE 78(9), 1464–1480 (1990)
Pal, N.R., Bezdek, J.C., Tsao, E.C.K.: Generelized clustering networks and Kohonen’s self-organizing scheme. IEEE Trans. Neural Networks 4, 49–558 (1993)
Kaarayiannis, N.B.: Fuzzy Algorithms for Learning Vector Quantization. IEEE Trans. Neural Networks 7(5), 1196–1211 (1996)
Kohonen, T.: Self-Organization and Associative Memory, 3rd edn. Springer, Berlin (1989)
Karayiannis, N.B., Randolph-Gips, M.M.: Soft Learnung Vector Quantization and Clustering Algorithms Based on Non-Euclidean Norm: Multinorm Algorithms. IEEE Transactions on Neural Networks 14(1), 89–102 (2003)
Bijankhan, M., Sheikhzadegan, M.J.: FARSDAT: The Speech Database of Farsi Spoken Language. In: Proceeding of Speech Science and Technology Conference, December 1994, pp. 826–831 (1994)
Makhoul, J., Roucos, S., Gish, H.: Vector quantization in speech coding. Proc. IEEE 73, 1551–1588 (1985)
Papamichalis, P.E.: Practical Approaches to speech coding. Prentice-Hall, Englewood Cliffs (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shirazi, J., Menhaj, M.B. (2005). A SOM Based 2500 – Isolated – Farsi – Word Speech Recognizer. In: Duch, W., Kacprzyk, J., Oja, E., Zadrożny, S. (eds) Artificial Neural Networks: Biological Inspirations – ICANN 2005. ICANN 2005. Lecture Notes in Computer Science, vol 3696. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11550822_92
Download citation
DOI: https://doi.org/10.1007/11550822_92
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28752-0
Online ISBN: 978-3-540-28754-4
eBook Packages: Computer ScienceComputer Science (R0)