Abstract
We study how the time-frequency representation of a speech signal depends on the chosen method of frequency analysis. We consider dynamical spectrograms obtained with a set of band-pass filters with different parameters and different order of their position along the frequency axis. We show that when a set of filters with parameters close to the filters of an audial analyzer is used, information on vowels and consonants in the speech signal is more uniformly distributed across the frequency axis, and spectral maxima related to the first and second formants of a vowel are more explicitly expressed, which is very important for speech recognition.
Similar content being viewed by others
References
Johansson, A., Helbing, D., Al-Abideen, H.Z., et al., From Crowd Dynamics to Crowd Safety: A Video-Based Analysis, Adv. Complex Syst., 2008, vol. 11, no. 4, pp. 497–527.
Musse, S.R. and Thalmann, D., A Model of Human Crowd Behavior: Group Inter-Relationship and Collision Detection Analysis Computer Animation and Simulations ’97, in Proc. Eur. Workshop, Budapest, Wien: Springer, 1997, pp. 39–51.
Helitsvaara, S., Korhonen, T., Hostikka, S., et al., Counterflow Model for Agent-Based Simulation of Crowd Dynamics, Building Environment, 2012, vol. 48, no. 1, pp. 89–100.
Ding, A.W., Implementing Real-Time Grouping for Fast Egress in Emergency, Safety Sci., 2011, vol. 49, no. 10, pp. 1404–1411.
Wen-Hu Qin, Guo-Hui Su, and Xiao-Na Li., Technology for Simulating Crowd Evacuation Behaviors, Int. J. Automat. Comput., 2009, vol. 6, no. 4, pp. 351–355.
Bonabeau, E., Agent-Based Modeling: Methods and Techniques for Simulating Human Systems, Proc. Natl. Acad. Sci., 2002, vol. 99, no. 3, pp. 7280–7287.
Helbing, D., Johansson, A., and Al-Abideen, H.Z., Crowd Turbulence: The Physics of Crowd Disasters, in Fifth Int. Conf. Nonlinear Mechanics (ICNM-V), 2007, pp. 967–969.
Kirik, E.S., Kruglov, D.V., and Yurgel’yan, T.B., On Discrete Model of Human Motion with an Element of Environmental Analysis, Zh. SFU, Ser. Mat. Phys., 2008, vol. 1, no. 3, pp. 262–271.
Evsyukov, A.A., 3D Simulator of Evacuation of People at Fire in the Educational Institutions, in Collected Papers VII Int. Conf. “Innovative Informational-and-Pedagogical Technologies in Education,” Moscow, 2012, pp. 98–104.
Akopov, A.S. and Beklaryan, L., Simulation of Human Crowd Behavior in Extreme Situations, Int. J. Pure Appl. Math., 2012, vol. 79, no. 1, pp. 121–138.
Oppenheim, A.V. and Schafer, R.W., Digital Signal Processing, Englewood Cliffs: Prentice Hall, 1989. Translated under the title Tsifrovaya obrabotka signalov, Moscow: Tekhnosfera, 2006
Springer Handbook of Speech Processing, Benesty, J., Sondhi, M.M., and Huang, Y., Eds., Berlin: Springe, 2008
Rabiner, L.R. and Shafer, R.W., Digital Processing of Speech Signals, Englewood Cliffs: Prentice Hall, 1978. Translated under the title Tsifrovaya obrabotka rechevykh signalov, Moscow: Radio i Svyaz’, 1981
Levinson, S.E., Structural Methods of Automatic Speech Recognition, TIIER, 1985, vol. 83, no. 11, pp. 100–129.
Zue, V.W. and Cole, R.A., Experiments on Spectrogram Reading, Proc. ICASSP-79, 1979, pp. 116–119.
Zue, V.W., Linguistic Approach to Computer-assisted Speech Recognition, Proc. IEEE, 1985, vol. 73, no. 11, pp. 75–91.
Chistovich, L.A., Ventsov, A.V., Granstrem, M.P., et al., Fiziologiya rechi. Vospriyatie rechi chelovekom (Physiology of Speech. Human Perception of Speech), Leningrad: Nauka, 1976.
Potter, R.K., Kopp, G.A., and Green, H.C., Visible speech, New York: Van Nostrand, 1947
Fant, G., Acoustic Theory of Speech Perception, Mouton: ’s-Gravenhage, 1960. Translated under the title Akusticheskaya teoriya recheobrazovaniya, Moscow: Nauka, 1964
Zwicker, E. and Feldtkeller, R., Das Ohr als Nachrichtenempfänger, Stuttgart: S. Hirzel Verlag, 1976. Translated under the title Ukho kak priemnik informatsii, Moscow: Svyaz’, 1971
Zwicker, E. and Terhardt, E., Analytical Expressions for Critical-band Rate and Critical Bandwidth as a Function of Frequency, J. Acoust. Soc. Am., 1980, vol. 68, no. 5, pp. 1523–1525.
Traunmuller, H., Analytical Expressions for the Tonotopic Sensory Scale, J. Acoust. Soc. Am., 1990, vol. 88, no. 1, pp. 97–100.
D’yakonov, V.P., Veivlety. Ot teorii k praktike (Wavelets. From Theory to Practice), Moscow: SOLONPress, 2004.
Harris, F.J., Using Windows in the Harmonic Analysis by the Method of Discrete Fourier Transform, Proc. IEEE, 1978, vol. 66, no. 1, pp. 60–96.
Cooper, F.S., Delattre, P.C., Liberman, A.M., et al., Experiments on the Perception of Synthetic Speech Sounds, J. Acoust. Soc. Am., 1952, vol. 24, pp. 597–606.
Blumstein, S.E. and Stevens, K.N., Perceptual Invariance and Onset Spectra for Stop Consonants in Different Vowel Environments, J. Acoust. Soc. Am., 1980, vol. 67, pp. 648–662.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.S. Kolokolov, I.A. Lyubinskii, 2015, published in Avtomatika i Telemekhanika, 2015, No. 10, pp. 144–151.
Rights and permissions
About this article
Cite this article
Kolokolov, A.S., Lyubinskii, I.A. A comparative study of several approaches to short-term frequency analysis of a speech signal. Autom Remote Control 76, 1828–1833 (2015). https://doi.org/10.1134/S0005117915100100
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0005117915100100