Skip to main content

Feature Extraction

  • Chapter
  • First Online:
Audio Processing and Speech Recognition

Part of the book series: SpringerBriefs in Applied Sciences and Technology ((BRIEFSINTELL))

Abstract

In order to classify any audio or speech signal, feature extraction is the prerequisite. The analog speech signal s(t) is sampled a number of times per second to be stored in some recording device or simply on a computer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. De Poli, G., & Mion, L. (2006). From audio to content. Livro não publicado. Padova: Dipartimento di Ingegneria Dell’Informazione-Università degli Studi di Padova.

    Google Scholar 

  2. Song, Y., Wang, W. H., & Guo, F. J. (2009). Feature extraction and classification for audio information in news video. In 2009 International Conference on Wavelet Analysis and Pattern Recognition, ICWAPR 2009 (pp. 43–46). IEEE.

    Google Scholar 

  3. Burred, J. J., & Lerch, A. (2004). Hierarchical automatic audio signal classification. Journal of the Audio Engineering Society, 52(7/8), 724–739.

    Google Scholar 

  4. Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), 293–302.

    Article  Google Scholar 

  5. Galembo, A., & Askenfelt, A. (1994). Measuring inharmonicity through pitch extraction. Journal STL-QPSR, 35(1), 135–144.

    Google Scholar 

  6. Fletcher, H. (1964). Normal vibration frequencies of a stiff piano string. The Journal of the Acoustical Society of America, 36(1), 203–209.

    Article  Google Scholar 

  7. Retrieved September 09, 2018, from https://pages.mtu.edu/~suits/autocorrelation.html.

  8. American National Standards Institute. (1973). American national psychoacoustical terminology S3. 20.

    Google Scholar 

  9. Lazaro, A., Sarno, R., Andre, R. J., & Mahardika, M. N. (2017). Music tempo classification using audio spectrum centroid, audio spectrum flatness, and audio spectrum spread based on MPEG-7 audio features. In 2017 3rd International Conference on Science in Information Technology (ICSITech) (pp. 41–46). IEEE.

    Google Scholar 

  10. Burred, J. J., & Lerch, A. (2004). Hierarchical automatic audio signal classification. Journal of the Audio Engineering Society, 52(7/8), 724–739.

    Google Scholar 

  11. Chauhan, P. M., & Desai, N. P. (2014). Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. In 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) (pp. 1–5). IEEE.

    Google Scholar 

  12. Lindblom, B., Sundberg, J., Branderud, P., Djamshidpey, H., & Granqvist, S. (2010). The Gunnar Fant legacy in the study of vocal acoustics. In 10ème Congrès Français d’Acoustique.

    Google Scholar 

  13. Retrieved September 13, 2018, from https://www.yumpu.com/en/document/view/18555951/l7-linear-prediction-of-speech.

  14. Retrieved September 12, 2018, from https://www.ece.ucsb.edu/Faculty/Rabiner/ece259/speech%20course.html.

  15. Bradbury, J. (2000). Linear predictive coding. Hill: Mc G.

    Google Scholar 

  16. Kumar, C. S., & Rao, P. M. (2011). Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm. International Journal on Computer Science and Engineering, 3(8), 2942.

    Google Scholar 

  17. Shrawankar, U., & Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: A comparative study. arXiv preprint arXiv:1305.1145.

  18. Benba, A., Jilbab, A., & Hammouch, A. (2014). Voice analysis for detecting persons with Parkinson’s disease using MFCC and VQ. In The 2014 International Conference on Circuits, Systems and Signal Processing (pp. 23–25).

    Google Scholar 

  19. Young, S., et al. (2006). The HTK book (v3. 4). Cambridge University.

    Google Scholar 

  20. Brigham, E. O., & Morrow, R. E. (1967). The fast Fourier transform. IEEE spectrum, 4(12), 63–70.

    Article  Google Scholar 

  21. Retrieved September 16, 2018, from kom.aau.dk/group/04gr742/pdf/MFCC_worksheet.pdf.

  22. Wanli, Z., & Guoxin, L. (2013). The research of feature extraction based on MFCC for speaker recognition. In 2013 3rd International Conference on Computer Science and Network Technology (ICCSNT) (pp. 1074–1077). IEEE.

    Google Scholar 

  23. Paliwal, K. K. (1982). On the performance of the quefrency-weighted cepstral coefficients in vowel recognition. Speech Communication, 1(2), 151–154.

    Article  MathSciNet  Google Scholar 

  24. Tohkura, Y. (1987). A weighted cepstral distance measure for speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(10), 1414–1422.

    Article  Google Scholar 

  25. Juang, B. H., Rabiner, L., & Wilpon, J. G. (1986). On the use of bandpass liftering in speech recognition. In IEEE International Conference on ICASSP’86 Acoustics, Speech, and Signal Processing (Vol. 11, pp. 765–768). IEEE.

    Google Scholar 

  26. Itakura, F., & Umezaki, T. (1987). Distance measure for speech recognition based on the smoothed group delay spectrum. In IEEE International Conference on ICASSP’87 Acoustics, Speech, and Signal Processing (Vol. 12, pp. 1257–1260). IEEE.

    Google Scholar 

  27. Hanson, B., & Wakita, H. (1987). Spectral slope distance measures with linear prediction analysis for word recognition in noise. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(7), 968–973.

    Article  Google Scholar 

  28. Paliwal, K. K. (1999). Decorrelated and liftered filter-bank energies for robust speech recognition. In Sixth European Conference on Speech Communication and Technology.

    Google Scholar 

  29. Hermansky, H. (1990). Perceptual linear predictive (PLP) analysis of speech. The Journal of the Acoustical Society of America, 87(4), 1738–1752.

    Article  Google Scholar 

  30. Zwicker, E. (1961). Subdivision of the audible frequency range into critical bands (Frequenzgruppen). The Journal of the Acoustical Society of America, 33(2), 248–248.

    Article  Google Scholar 

  31. Hermes, D. J. Sound Perception: The Science of Sound Design.

    Google Scholar 

  32. Stevens, S. S. (1957). On the psychophysical law. Psychological Review, 64(3), 153.

    Article  Google Scholar 

  33. Graps, A. (1995). An introduction to wavelets. IEEE Computational Science and Engineering, 2(2), 50–61.

    Article  Google Scholar 

  34. Polikar, R. (1996). Fundamental concepts & an overview of the wavelet theory. In The Wavelet Tutorial Part I. Rowan University, College of Engineering Web Servers (vol. 15).

    Google Scholar 

  35. Avci, E., & Akpolat, Z. H. (2006). Speech recognition using a wavelet packet adaptive network based fuzzy inference system. Expert Systems with Applications, 31(3), 495–503.

    Article  Google Scholar 

  36. Siafarikas, M., Ganchev, T., & Fakotakis, N. (2004). Wavelet packet based speaker verification. In ODYSSEY04-The Speaker and Language Recognition Workshop.

    Google Scholar 

  37. Buckheit, J. B., & Donoho, D. L. (1995). Wavelab and reproducible research. In Wavelets and statistics (pp. 55–81). New York: Springer.

    Chapter  Google Scholar 

  38. Wesfreid, E., & Wickerhauser, M. V. (1993). Adapted local trigonometric transforms and speech processing. IEEE Transactions on Signal Processing, 41(12), 3596–3600.

    Article  Google Scholar 

  39. Visser, E., Otsuka, M., & Lee, T. W. (2003). A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments. Speech Communication, 41(2–3), 393–407.

    Article  Google Scholar 

  40. Tufekci, Z., & Gowdy, J. N. (2000). Feature extraction using discrete wavelet transform for speech recognition. In Proceedings of the IEEE Southeastcon 2000 (pp. 116–123). IEEE.

    Google Scholar 

  41. El-Attar, A., Ashour, A. S., Dey, N., Abdelkader, H., Abd El-Naby, M. M., & Sherratt, R. S. (2018). Discrete wavelet transform-based freezing of gait detection in Parkinson’s disease. Journal of Experimental & Theoretical Artificial Intelligence, 1–17.

    Google Scholar 

  42. Mukhopadhyay, S., Biswas, S., Roy, A. B., & Dey, N. (2012). Wavelet based QRS complex detection of ECG signal. arXiv preprint arXiv:1209.1563.

  43. Rady, E. R., Yahia, A. H., El-Sayed, A., & El-Borey, H. Speech recognition system based on wavelet transform and artificial neural network.

    Google Scholar 

  44. Zbancioc, M., & Costin, M. (2003). Using neural networks and LPCC to improve speech recognition. In 2003 International Symposium on Signals, Circuits and Systems, SCS 2003 (Vol. 2, pp. 445–448). IEEE.

    Google Scholar 

  45. Paul, A. K., Das, D., & Kamal, M. M. (2009). Bangla speech recognition system using LPC and ANN. In Seventh International Conference on Advances in Pattern Recognition, 2009. ICAPR’09 (pp. 171–174). IEEE.

    Google Scholar 

  46. Kuo, K. (2010). Feature extraction and recognition of infant cries. In 2010 IEEE International Conference on Electro/Information Technology (EIT) (pp. 1–5). IEEE.

    Google Scholar 

  47. Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In 2010 4th International Conference on Signal Processing and Communication Systems (ICSPCS) (pp. 1–5). IEEE.

    Google Scholar 

  48. Wanli, Z., & Guoxin, L. (2013). The research of feature extraction based on MFCC for speaker recognition. In 2013 3rd International Conference on Computer Science and Network Technology (ICCSNT) (pp. 1074–1077). IEEE.

    Google Scholar 

  49. Sharma, D., & Ali, I. (2015). A modified MFCC feature extraction technique for robust speaker recognition. In 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (pp. 1052–1057). IEEE.

    Google Scholar 

  50. Najafi, J., & Marvi, H. (2009). PLP based CELP speech coder. In 2009 Second International Conference on Computer and Electrical Engineering, ICCEE’09 (Vol. 1, pp. 263–267). IEEE.

    Google Scholar 

  51. Chaloupka, J., Červa, P., Silovský, J., Žd’ánský, J., & Nouza, J. (2012). Modification of the speech feature extraction module for the improvement of the system for automatic lectures transcription. In ELMAR, 2012 Proceedings (pp. 223–226). IEEE.

    Google Scholar 

  52. Saeidi, R., Alku, P., & Bäckström, T. (2016). Feature extraction using power-law adjusted linear prediction with application to speaker recognition under severe vocal effort mismatch. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(1), 42–53.

    Article  Google Scholar 

  53. Gang, R., Bocko, M. F., & Headlam, D. (2010). Reverberation features identification from music recordings using the discrete wavelet transform. In 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) (pp. 161–164). IEEE.

    Google Scholar 

  54. Nehe, N. S., & Holambe, R. S. (2012). DWT and LPC based feature extraction methods for isolated word recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2012(1), 7.

    Article  Google Scholar 

  55. Kristomo, D., Hidayat, R., & Soesanti, I. (2016). Feature extraction and classification of the Indonesian syllables using Discrete Wavelet Transform and statistical features. In International Conference on Science and Technology-Computer (ICST) (pp. 88–92). IEEE.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sen, S., Dutta, A., Dey, N. (2019). Feature Extraction. In: Audio Processing and Speech Recognition. SpringerBriefs in Applied Sciences and Technology(). Springer, Singapore. https://doi.org/10.1007/978-981-13-6098-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-6098-5_3

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-6097-8

  • Online ISBN: 978-981-13-6098-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics