Advertisement

The effect of different acoustic noise on speech signal formant frequency location

  • Mohsen Sadeghi
  • Hossein Marvi
  • Maaruf Ali
Article
  • 10 Downloads

Abstract

The presence of noise is one of the major challenges and concerns in speech recognition systems. There are in particular different kinds of noises (pink, white and leopard) that can adversely affect a speech signal in various ways and degrees. In this study, the extent of resistance of a speech signal’s formants or in other words, the displacement of the formants have been measured against being subjected to different conventional noises. The methodology adopted was to apply different noises to the original voice signal, then to measure and to investigate the amount of formant location displacement. In this paper, the mean square movement (MSM) parameter has been introduced. This represents the deviation and displacement amount of the frequencies of the formants caused by applying the various noises. All of the investigations were conducted under three different SNR conditions (5, 10 and 15 dB). This allowed for the assessment of the influence of the signal-to-noise ratio (SNR) on the MSM parameter and the extent of the displacements of the formants. The results indicate that the frequency of the formants under these three SNR amounts was resistant against the machine gun type of noise, whilst white noise caused the most measureable effect and displacement in the frequencies of the formants.

Keywords

Formant Automatic speech recognition (ASR) Acoustic Linear predictive coding (LPC) Hidden Markov model Auto regressive (AR) Mean square movement (MSM) Vocal tract 

References

  1. Acero, A. (1999). Formant analysis and synthesis using hidden Markov models. Sixth European Conference on Speech Communication and Technology. Retrieved July 19, 2018 from https://www.microsoft.com/en-us/research/wp-content/uploads/1999/09/1999-alexac-eurospeech.pdf.
  2. Darwin, C. (2008). Computational auditory scene analysis: Principles, algorithms and applications. The Journal of the Acoustical Society of America, 124(1), 13–13.CrossRefGoogle Scholar
  3. Dendrinos, M., Bakamidis, S., & Carayannis, G. (1991). Speech enhancement from noise: A regenerative approach. Speech Communication, 10(1), 45–57.CrossRefGoogle Scholar
  4. Duan, Z., Mysore, G. J., & Smaragdis, P. (2012). Speech enhancement by online non-negative spectrogram decomposition in nonstationary noise environments. Thirteenth Annual Conference of the International Speech Communication Association. Retrieved, July 19, 2018 from https://ccrma.stanford.edu/~gautham/Site/Publications_files/duan-interspeech2012.pdf.
  5. Gargouri, D., Kammoun, M. A., & Hamida, A. B. (2006, May). A comparative study of formant frequencies estimation techniques. Proceedings of the 5th WSEAS International Conference on Signal Processing, Istanbul, Turkey (pp. 15–19).Google Scholar
  6. Hagerman, B. (1984). Clinical measurements of speech reception threshold in noise. Scandinavian Audiology, 13(1), 57–63.CrossRefGoogle Scholar
  7. Hernando, J., & Nadeu, C. (1997). Linear prediction of the one-sided autocorrelation sequence for noisy speech recognition. IEEE Transactions on Speech and Audio Processing, 5(1), 80–84.CrossRefGoogle Scholar
  8. Hu, Y., & Loizou, P. C. (2007). A comparative intelligibility study of single-microphone noise reduction algorithms. The Journal of the Acoustical Society of America, 122(3), 1777–1786.CrossRefGoogle Scholar
  9. Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.CrossRefGoogle Scholar
  10. Kammi, S., & Mollaei, M. R. K. (2017). Noisy speech enhancement with sparsity regularization. Speech Communication, 87, 58–69.CrossRefGoogle Scholar
  11. Kim, G., & Loizou, P. C. (2010). Improving speech intelligibility in noise using environment-optimized algorithms. IEEE Transactions on Audio, Speech, and Language Processing, 18(8), 2080–2090.CrossRefGoogle Scholar
  12. Loizou, P. C., & Kim, G. (2011). Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 47–56.CrossRefGoogle Scholar
  13. Peinado, A., & Segura, J. (2006). Speech recognition over digital channels: Robustness and standards. Chichester: Wiley.CrossRefGoogle Scholar
  14. Rabiner, L. R., & Schafer, R. W. (2007). Introduction to digital speech processing. Foundations and Trends® in Signal Processing, 1(1–2), 1–194.CrossRefMATHGoogle Scholar
  15. Sameti, H., Sheikhzadeh, H., Deng, L., & Brennan, R. L. (1998). HMM-based strategies for enhancement of speech signals embedded in nonstationary noise. IEEE Transactions on Speech and Audio processing, 6(5), 445–455.CrossRefGoogle Scholar
  16. Signal Processing Information Base (2013, July 21). Retrieved March 20, 2017, from http://spib.linse.ufsc.br/noise.html
  17. Teacher, C., & Watkins, H. (1978). ANDVT microphone and audio system study. Ketron final report. Washington, DC: Ketron, Inc.Google Scholar
  18. Weber, K., Bengio, S., & Bourlard, H. (2001). Hmm2-extraction of formant features and their use for robust ASR. European Conference on Speech Communication and Technology (Eurospeech 2001) (No. EPFL-CONF-82693, pp. 607–610).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Electrical and Computer EngineeringShahrood University of TechnologyShahroodIran
  2. 2.International Association of Educators and Researchers (IAER)LondonUK

Personalised recommendations