Hybrid Method for Digits Recognition using Fixed-Frame Scores and Derived Pitch

Sudirman, Rubita; Salleh, Sh-Hussain; Salleh, Shaharuddin

doi:10.1007/978-3-540-68017-8_19

Rubita Sudirman²,
Sh-Hussain Salleh² &
Shaharuddin Salleh³

Part of the book series: IFMBE Proceedings ((IFMBE,volume 15))

2671 Accesses

Abstract

This paper presents a procedure of frame normalization based on the traditional dynamic time warping (DTW) using the LPC coefficients. The redefined method is called as the DTW frame-fixing method (DTW-FF), it works by normalizing the word frames of the input against the reference frames. The enthusiasm to this study is due to neural network limitation that entails a fix number of input nodes for when processing multiple inputs in parallel. Due to this problem, this research is initiated to reduce the amount of computation and complexity in a neural network by reducing the number of inputs into the network. In this study, dynamic warping process is used, in which local distance scores of the warping path are fixed and collected so that their scores are of equal number of frames. Also studied in this paper is the consideration of pitch as a contributing feature to the speech recognition. Results showed a good performance and improvement when using pitch along with DTW-FF feature. The convergence rate between using the steepest gradient descent is also compared to another method namely conjugate gradient method. Convergence rate is also improved when conjugate gradient method is introduced in the backpropagation algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sakoe H and Chiba S (1978). Dynamic Programming Algorithm Optimization for Spoken Word Recognition, IEEE Transactions on Acoustics, Speech and Signal Processing. ASSP-26(1): 43–49.
Article Google Scholar
M. Magimai-Doss M (2003). Using Pitch Frequency Information in Speech Recognition. Proceedings of 8^th European on Speech Communication and Technology. Geneva, Switzerland. 4: 2525–2528.
Google Scholar
Abdulla W H, Chow D and Sin G (2003). Cross-Words Reference Template for DTW-based Speech Recognition System. IEEE Technology Conference (TENCON). Bangalore, India, 1: 1–4.
Google Scholar
Creany M J (1996). Isolated Word Recognition using Reduced Connectivity Neural Networks with Non-Linear Time Alignment Methods. PhD Thesis, University of New Castle-Upon-Tyne, UK.
Google Scholar
Uma S, Sridhar, V, and Krishna G (1992). Time-Normalization Techniques for Speaker-Independent Isolated Word Recognition. Proceedings of Pattern Recognition Conference: Image, Speech and Signal Analysis. 3: 537–540.
Google Scholar
Prasanna S R M, Zachariah J M, and Yegnanarayana B (2004). Neural Network Models for Combining Evidence from Spectral and Suprasegmental Features for Text-Dependent Speaker Verification. Proceedings of International Conference on Intelligent, Sensing, and Information Processing. pp 359–363.
Google Scholar
B. R. Wildermoth. 2000. Text-Independent Speaker Recognition using Source Based Features. Master of Philosophy Thesis Griffith University, Australia.
Google Scholar
Botros N M and Premnath S (1992). Speech Recognition using Dynamic Neural Networks. International Joint Conference in Neural Network. 4: 737–742.
Google Scholar
Soens P and Verhelst W (2005). Split Time Warping for Improved Automatic Time Synchronization of Speech. Proceeding of SPS DARTS, Antwerp, Belgium.
Google Scholar
Sudirman R., Salleh S-H, and Ming T C (2005). Pre-Processing of Input Features using LPC and Warping Process. Proceeding of 1^st International Conference on Computers, Communications, and Signal Processing, Kuala Lumpur. pp 300–303.
Google Scholar
Sudirman R, Salleh S-H and Salleh S (2006). Local DTW Coefficients and Pitch Feature for Back-Propagation NN Digits Recognition. IASTED International Conference on Networks and Communications, Chiang Mai, Thailand. pp 201–206.
Google Scholar
Hagan M T, Demuth H B, and Beale M (1996). Neural Network Design. Boston: PWS Publishing Company.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Biomedical Engineering, Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310, UTM Skudai, Johore, Malaysia
Rubita Sudirman & Sh-Hussain Salleh
Mathematics Department, Faculty of Science, Universiti Teknologi Malaysia, 81310, UTM Skudai, Johore, Malaysia
Shaharuddin Salleh

Authors

Rubita Sudirman
View author publications
You can also search for this author in PubMed Google Scholar
Sh-Hussain Salleh
View author publications
You can also search for this author in PubMed Google Scholar
Shaharuddin Salleh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Biomedical Engineering Faculty of Engineering, University of Malaya, 50603, Kuala Lumpur, Malaysia
Fatimah Ibrahim , Noor Azuan Abu Osman , Juliana Usman & Nahrizul Adib Kadri , , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sudirman, R., Salleh, SH., Salleh, S. (2007). Hybrid Method for Digits Recognition using Fixed-Frame Scores and Derived Pitch. In: Ibrahim, F., Osman, N.A.A., Usman, J., Kadri, N.A. (eds) 3rd Kuala Lumpur International Conference on Biomedical Engineering 2006. IFMBE Proceedings, vol 15. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68017-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-68017-8_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68016-1
Online ISBN: 978-3-540-68017-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics