Skip to main content

Recognition of Speech Isolated Words Based on Pyramid Phonetic Bag of Words Model Display and Kernel-Based Support Vector Machine Classifier Model

  • Conference paper
  • First Online:
Fundamental Research in Electrical Engineering

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 480))

  • 1238 Accesses

Abstract

This study aimed to improve the classification of individual (isolated) words, and specifically, the numbers from one to twenty. In this study, a strong model was suggested to gain a unified view of voice. It is based on the idea of phonetic bag for voice that has been developed into a pyramid state. The pyramid idea can model temporal relationships. One of the problems of Support Vector Machine to classify words is its inability to model temporal relationships unlike hidden Markov models. Using the BOW-based pyramid idea in the extraction of the display containing temporal information of voice, the SVM can be given the capability of considering the time relationships of speech frames. One of the main advantages of Support Vector Machine model is its fewer parameters than the hidden Markov model. As the experiments’ results have shown, it has much higher accuracy than the hidden Markov model in applications such as the recognition of single words, where the data set volume is limited. Using the pyramid BOW idea, the accuracy of SVM-based method can be increased as 20% compared to previous methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.mathworks.com/matlabcentral/fileexchange/28826-silence-removal-in-speech-signals/

References

  1. Hegde S, Achary KK, Shetty S (2012) Isolated word recognition for Kannada language using support vector machine. In: Wireless networks and computational intelligence. Springer, Berlin Heidelberg, pp 262–269

    Chapter  Google Scholar 

  2. Giannakopoulos T (2009) A method for silence removal and segmentation of speech signals, implemented in Matlab. Department of Informatics and Telecommunications, University of Athens, Greece, Computational Intelligence Laboratory (CIL), Insititute of Informatics and Telecommunications (IIT), NCSR DEMOKRITOS, Greece

    Google Scholar 

  3. Gabriella C, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, ECCV, vol 1, no 1–22, pp 1–2

    Google Scholar 

  4. Yang, Jun, Yu-Gang Jiang, Alexander G. Hauptmann, and Chong-Wah Ngo. “Evaluating bag-of-visual-words representations in scene classification.” In: Proceedings of the international workshop on Workshop on multimedia information retrieval, pp. 197–206. ACM, 2007

    Google Scholar 

  5. Ramesh B, Xiang C, Lee TH (2015) Shape classification using invariant features and contextual information in the bag-of-words model. Pattern Recogn 48(3):894–906

    Article  Google Scholar 

  6. Yang Y-B, Zhu Q-H, Mao X-J, Pan L-Y (2015) Visual feature coding for image classification integrating dictionary structure. Pattern Recogn

    Google Scholar 

  7. Pokorny FB, Graf F, Pernkopf F, Schuller BW (2015) Detection of negative emotions in speech signals using bags-of-audio-words. In: 2015 International conference on affective computing and intelligent interaction (ACII). IEEE, pp 879–884

    Google Scholar 

  8. Grzeszick R, Plinge A, Fink GA (2015) Temporal acoustic words for online acoustic event detection. In: Pattern recognition. Springer International Publishing, pp 142–153

    Google Scholar 

  9. Wu P, Hoi SCH, Xia H, Zhao P, Wang D, Miao C (2013) Online multimodal deep similarity learning with application to image retrieval. In: Proceedings of the 21st ACM international conference on Multimedia. ACM, pp 153–162

    Google Scholar 

  10. Quan C, Wan D, Zhang B, Ren F (2013) Reduce the dimensions of emotional features by principal component analysis for speech emotion recognition. In: 2013 IEEE/SICE International symposium on system integration (SII). IEEE, pp 222–226

    Google Scholar 

  11. Chiou B-C, Chen C-P (2013) Feature space dimension reduction in speech emotion recognition using support vector machine. In: Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific. IEEE, pp 1–6

    Google Scholar 

  12. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE computer society conference on computer vision and pattern recognition, vol 2. IEEE, pp 2169–2178

    Google Scholar 

  13. Kim SY, Sohn K-A (2015) Mobile phone spam image detection based on graph partitioning with pyramid histogram of visual words image descriptor. In: 2015 IEEE/ACIS 14th international conference on computer and information science (ICIS). IEEE, pp 209–214

    Google Scholar 

  14. Lan Z, Hauptmann AG (2015) Beyond spatial pyramid matching: space-time extended descriptor for action recognition. arXiv preprint arXiv:1510.04565

  15. Seryasat OR, Aliyari-shoorehdeli M, Honarvar F (2010) Multi-fault diagnosis of ball bearing based on features extracted from time-domain and multi-class support vector machine (MSVM). In: IEEE international conference on systems man and cybernetics (SMC), pp 4300–4303

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sodabeh Salehi Rekavandi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rekavandi, S.S., Ghaffary, H., Davodpour, M. (2019). Recognition of Speech Isolated Words Based on Pyramid Phonetic Bag of Words Model Display and Kernel-Based Support Vector Machine Classifier Model. In: Montaser Kouhsari, S. (eds) Fundamental Research in Electrical Engineering. Lecture Notes in Electrical Engineering, vol 480. Springer, Singapore. https://doi.org/10.1007/978-981-10-8672-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8672-4_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8671-7

  • Online ISBN: 978-981-10-8672-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics