Recognition of Speech Isolated Words Based on Pyramid Phonetic Bag of Words Model Display and Kernel-Based Support Vector Machine Classifier Model

Rekavandi, Sodabeh Salehi; Ghaffary, Hamidreza; Davodpour, Maryam

doi:10.1007/978-981-10-8672-4_2

Sodabeh Salehi Rekavandi³³,
Hamidreza Ghaffary³³ &
Maryam Davodpour³³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 480))

1238 Accesses

Abstract

This study aimed to improve the classification of individual (isolated) words, and specifically, the numbers from one to twenty. In this study, a strong model was suggested to gain a unified view of voice. It is based on the idea of phonetic bag for voice that has been developed into a pyramid state. The pyramid idea can model temporal relationships. One of the problems of Support Vector Machine to classify words is its inability to model temporal relationships unlike hidden Markov models. Using the BOW-based pyramid idea in the extraction of the display containing temporal information of voice, the SVM can be given the capability of considering the time relationships of speech frames. One of the main advantages of Support Vector Machine model is its fewer parameters than the hidden Markov model. As the experiments’ results have shown, it has much higher accuracy than the hidden Markov model in applications such as the recognition of single words, where the data set volume is limited. Using the pyramid BOW idea, the accuracy of SVM-based method can be increased as 20% compared to previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.mathworks.com/matlabcentral/fileexchange/28826-silence-removal-in-speech-signals/

References

Hegde S, Achary KK, Shetty S (2012) Isolated word recognition for Kannada language using support vector machine. In: Wireless networks and computational intelligence. Springer, Berlin Heidelberg, pp 262–269
Chapter Google Scholar
Giannakopoulos T (2009) A method for silence removal and segmentation of speech signals, implemented in Matlab. Department of Informatics and Telecommunications, University of Athens, Greece, Computational Intelligence Laboratory (CIL), Insititute of Informatics and Telecommunications (IIT), NCSR DEMOKRITOS, Greece
Google Scholar
Gabriella C, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1, no 1–22, pp 1–2
Google Scholar
Yang, Jun, Yu-Gang Jiang, Alexander G. Hauptmann, and Chong-Wah Ngo. “Evaluating bag-of-visual-words representations in scene classification.” In: Proceedings of the international workshop on Workshop on multimedia information retrieval, pp. 197–206. ACM, 2007
Google Scholar
Ramesh B, Xiang C, Lee TH (2015) Shape classification using invariant features and contextual information in the bag-of-words model. Pattern Recogn 48(3):894–906
Article Google Scholar
Yang Y-B, Zhu Q-H, Mao X-J, Pan L-Y (2015) Visual feature coding for image classification integrating dictionary structure. Pattern Recogn
Google Scholar
Pokorny FB, Graf F, Pernkopf F, Schuller BW (2015) Detection of negative emotions in speech signals using bags-of-audio-words. In: 2015 International conference on affective computing and intelligent interaction (ACII). IEEE, pp 879–884
Google Scholar
Grzeszick R, Plinge A, Fink GA (2015) Temporal acoustic words for online acoustic event detection. In: Pattern recognition. Springer International Publishing, pp 142–153
Google Scholar
Wu P, Hoi SCH, Xia H, Zhao P, Wang D, Miao C (2013) Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM international conference on Multimedia. ACM, pp 153–162
Google Scholar
Quan C, Wan D, Zhang B, Ren F (2013) Reduce the dimensions of emotional features by principal component analysis for speech emotion recognition. In: 2013 IEEE/SICE International symposium on system integration (SII). IEEE, pp 222–226
Google Scholar
Chiou B-C, Chen C-P (2013) Feature space dimension reduction in speech emotion recognition using support vector machine. In: Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific. IEEE, pp 1–6
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 2169–2178
Google Scholar
Kim SY, Sohn K-A (2015) Mobile phone spam image detection based on graph partitioning with pyramid histogram of visual words image descriptor. In: 2015 IEEE/ACIS 14th international conference on computer and information science (ICIS). IEEE, pp 209–214
Google Scholar
Lan Z, Hauptmann AG (2015) Beyond spatial pyramid matching: space-time extended descriptor for action recognition. arXiv preprint arXiv:1510.04565
Seryasat OR, Aliyari-shoorehdeli M, Honarvar F (2010) Multi-fault diagnosis of ball bearing based on features extracted from time-domain and multi-class support vector machine (MSVM). In: IEEE international conference on systems man and cybernetics (SMC), pp 4300–4303
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Islamic Azad University, Ferdows, Iran
Sodabeh Salehi Rekavandi, Hamidreza Ghaffary & Maryam Davodpour

Authors

Sodabeh Salehi Rekavandi
View author publications
You can also search for this author in PubMed Google Scholar
Hamidreza Ghaffary
View author publications
You can also search for this author in PubMed Google Scholar
Maryam Davodpour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sodabeh Salehi Rekavandi .

Editor information

Editors and Affiliations

Department of Electrical Engineering, Amirkabir University of Technology , Tehran, Iran
Shahram Montaser Kouhsari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rekavandi, S.S., Ghaffary, H., Davodpour, M. (2019). Recognition of Speech Isolated Words Based on Pyramid Phonetic Bag of Words Model Display and Kernel-Based Support Vector Machine Classifier Model. In: Montaser Kouhsari, S. (eds) Fundamental Research in Electrical Engineering. Lecture Notes in Electrical Engineering, vol 480. Springer, Singapore. https://doi.org/10.1007/978-981-10-8672-4_2

Download citation

DOI: https://doi.org/10.1007/978-981-10-8672-4_2
Published: 26 July 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8671-7
Online ISBN: 978-981-10-8672-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics