Skip to main content
Log in

A comparative study of several approaches to short-term frequency analysis of a speech signal

  • Control in Social Economic Systems, Medicine, and Biology
  • Published:
Automation and Remote Control Aims and scope Submit manuscript

Abstract

We study how the time-frequency representation of a speech signal depends on the chosen method of frequency analysis. We consider dynamical spectrograms obtained with a set of band-pass filters with different parameters and different order of their position along the frequency axis. We show that when a set of filters with parameters close to the filters of an audial analyzer is used, information on vowels and consonants in the speech signal is more uniformly distributed across the frequency axis, and spectral maxima related to the first and second formants of a vowel are more explicitly expressed, which is very important for speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Johansson, A., Helbing, D., Al-Abideen, H.Z., et al., From Crowd Dynamics to Crowd Safety: A Video-Based Analysis, Adv. Complex Syst., 2008, vol. 11, no. 4, pp. 497–527.

    Article  MATH  Google Scholar 

  2. Musse, S.R. and Thalmann, D., A Model of Human Crowd Behavior: Group Inter-Relationship and Collision Detection Analysis Computer Animation and Simulations ’97, in Proc. Eur. Workshop, Budapest, Wien: Springer, 1997, pp. 39–51.

    Google Scholar 

  3. Helitsvaara, S., Korhonen, T., Hostikka, S., et al., Counterflow Model for Agent-Based Simulation of Crowd Dynamics, Building Environment, 2012, vol. 48, no. 1, pp. 89–100.

    Article  Google Scholar 

  4. Ding, A.W., Implementing Real-Time Grouping for Fast Egress in Emergency, Safety Sci., 2011, vol. 49, no. 10, pp. 1404–1411.

    Article  Google Scholar 

  5. Wen-Hu Qin, Guo-Hui Su, and Xiao-Na Li., Technology for Simulating Crowd Evacuation Behaviors, Int. J. Automat. Comput., 2009, vol. 6, no. 4, pp. 351–355.

    Article  Google Scholar 

  6. Bonabeau, E., Agent-Based Modeling: Methods and Techniques for Simulating Human Systems, Proc. Natl. Acad. Sci., 2002, vol. 99, no. 3, pp. 7280–7287.

    Article  Google Scholar 

  7. Helbing, D., Johansson, A., and Al-Abideen, H.Z., Crowd Turbulence: The Physics of Crowd Disasters, in Fifth Int. Conf. Nonlinear Mechanics (ICNM-V), 2007, pp. 967–969.

    Google Scholar 

  8. Kirik, E.S., Kruglov, D.V., and Yurgel’yan, T.B., On Discrete Model of Human Motion with an Element of Environmental Analysis, Zh. SFU, Ser. Mat. Phys., 2008, vol. 1, no. 3, pp. 262–271.

    Google Scholar 

  9. Evsyukov, A.A., 3D Simulator of Evacuation of People at Fire in the Educational Institutions, in Collected Papers VII Int. Conf. “Innovative Informational-and-Pedagogical Technologies in Education,” Moscow, 2012, pp. 98–104.

    Google Scholar 

  10. Akopov, A.S. and Beklaryan, L., Simulation of Human Crowd Behavior in Extreme Situations, Int. J. Pure Appl. Math., 2012, vol. 79, no. 1, pp. 121–138.

    MATH  MathSciNet  Google Scholar 

  11. Oppenheim, A.V. and Schafer, R.W., Digital Signal Processing, Englewood Cliffs: Prentice Hall, 1989. Translated under the title Tsifrovaya obrabotka signalov, Moscow: Tekhnosfera, 2006

    Google Scholar 

  12. Springer Handbook of Speech Processing, Benesty, J., Sondhi, M.M., and Huang, Y., Eds., Berlin: Springe, 2008

  13. Rabiner, L.R. and Shafer, R.W., Digital Processing of Speech Signals, Englewood Cliffs: Prentice Hall, 1978. Translated under the title Tsifrovaya obrabotka rechevykh signalov, Moscow: Radio i Svyaz’, 1981

    Google Scholar 

  14. Levinson, S.E., Structural Methods of Automatic Speech Recognition, TIIER, 1985, vol. 83, no. 11, pp. 100–129.

    Google Scholar 

  15. Zue, V.W. and Cole, R.A., Experiments on Spectrogram Reading, Proc. ICASSP-79, 1979, pp. 116–119.

    Google Scholar 

  16. Zue, V.W., Linguistic Approach to Computer-assisted Speech Recognition, Proc. IEEE, 1985, vol. 73, no. 11, pp. 75–91.

    Article  Google Scholar 

  17. Chistovich, L.A., Ventsov, A.V., Granstrem, M.P., et al., Fiziologiya rechi. Vospriyatie rechi chelovekom (Physiology of Speech. Human Perception of Speech), Leningrad: Nauka, 1976.

    Google Scholar 

  18. Potter, R.K., Kopp, G.A., and Green, H.C., Visible speech, New York: Van Nostrand, 1947

    Google Scholar 

  19. Fant, G., Acoustic Theory of Speech Perception, Mouton: ’s-Gravenhage, 1960. Translated under the title Akusticheskaya teoriya recheobrazovaniya, Moscow: Nauka, 1964

    Google Scholar 

  20. Zwicker, E. and Feldtkeller, R., Das Ohr als Nachrichtenempfänger, Stuttgart: S. Hirzel Verlag, 1976. Translated under the title Ukho kak priemnik informatsii, Moscow: Svyaz’, 1971

    Google Scholar 

  21. Zwicker, E. and Terhardt, E., Analytical Expressions for Critical-band Rate and Critical Bandwidth as a Function of Frequency, J. Acoust. Soc. Am., 1980, vol. 68, no. 5, pp. 1523–1525.

    Article  Google Scholar 

  22. Traunmuller, H., Analytical Expressions for the Tonotopic Sensory Scale, J. Acoust. Soc. Am., 1990, vol. 88, no. 1, pp. 97–100.

    Article  Google Scholar 

  23. D’yakonov, V.P., Veivlety. Ot teorii k praktike (Wavelets. From Theory to Practice), Moscow: SOLONPress, 2004.

    Google Scholar 

  24. Harris, F.J., Using Windows in the Harmonic Analysis by the Method of Discrete Fourier Transform, Proc. IEEE, 1978, vol. 66, no. 1, pp. 60–96.

    Article  Google Scholar 

  25. Cooper, F.S., Delattre, P.C., Liberman, A.M., et al., Experiments on the Perception of Synthetic Speech Sounds, J. Acoust. Soc. Am., 1952, vol. 24, pp. 597–606.

    Article  Google Scholar 

  26. Blumstein, S.E. and Stevens, K.N., Perceptual Invariance and Onset Spectra for Stop Consonants in Different Vowel Environments, J. Acoust. Soc. Am., 1980, vol. 67, pp. 648–662.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. S. Kolokolov.

Additional information

Original Russian Text © A.S. Kolokolov, I.A. Lyubinskii, 2015, published in Avtomatika i Telemekhanika, 2015, No. 10, pp. 144–151.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kolokolov, A.S., Lyubinskii, I.A. A comparative study of several approaches to short-term frequency analysis of a speech signal. Autom Remote Control 76, 1828–1833 (2015). https://doi.org/10.1134/S0005117915100100

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0005117915100100

Keywords

Navigation