Abstract
Computational intelligence techniques have been used for the processing of speech and audio for several years. Some of the applications in speech processing where computational intelligences are extensively used include speech recognition, speaker recognition, speech enhancement, speech coding and speech synthesis, while in audio processing, computational intelligence applications include music classification, audio classification and audio indexing and retrieval. In this paper we provide an overview of recent applications of modern computational intelligence theory in the field of speech and audio processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bovbel, E., Tsishkou, D.: Belarussian speech recognition using genetic algorithms. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 185–204. Springer, Heidelberg (2000)
Bugatti, A., Flammini, A., Migliorati, P.: Audio classification in speech and music: a comparison between a statistical and a neural approach. EURASIP Journal on Applied Signal Processing 2002 (4), 372–378 (2002)
Buscicchio, C., Grecki, P., Caponetti, L.: Speech emotion recognition using spiking neural networks. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds.) ISMIS 2006. LNCS (LNAI), vol. 4203, pp. 38–46. Springer, Heidelberg (2006)
Cetin, O., Kantor, A., King, S., Bartels, C., Magimai-Doss, M., Frankel, J., Livescu, K.: An articulatory feature-based tandem approach and factored observation modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 645–648 (2007)
Corrigan, G., Massey, N., Schnurr, O.: Transition-based speech synthesis using neural networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 945–948 (2000)
Czyzewski, A.: Automatic identification of sound source position employing neural networks and rough sets. Pattern Recognition Letters 24, 921–933 (2003)
Czyzewski, A., Szczerba, M.: Pitch estimation enhancement employing neural network-based music prediction. In: IASTED Intern. Conference, Artificial Intelligence and Soft Computing, pp. 413–418 (2002)
Czyzewski, A.B.K., Skarzynski, H.: Diagnostic system for speech articulation and speech understanding. In: Meeting of the Acoustical Society of America (2002)
Czyzewski, A., Kaczmarek, A., Kostek, B.: Intelligent processing of stuttered speech. Journal of Intelligent Information Systems 21(2), 143–171 (2003)
Ding, I.J.: Incremental MLLR speaker adaptation by fuzzy logic control. Pattern Recognition 40(11), 3110–3119 (2007)
Faraj, M., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps. Pattern Recognition Letters 28(11), 1368–1382 (2007)
Fellenz, W., Taylor, J., Cowie, R., Douglas-Cowie, E., Piat, F., Kollias, C., Orovas, S., Apolloni, B.: On emotion recognition of faces and of speech using neural networks, fuzzy logic and the ASSESS system. In: IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 2, pp. 93–98 (2000)
Frankel, J., Richmond, K., King, S., Taylor, P.: An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces. In: Proc. ICSLP, vol. 4, pp. 254–257 (2000)
Guido, R., Pereira, J., Slaets, J.: Advances on pattern recognition for speech and audio processing. Pattern Recognition Letters 28(11), 1283–1284 (2007)
Halavati, R., Shouraki, S., Eshraghi, M., Alemzadeh, M., Ziaie, P.: A novel fuzzy approach to speech recognition. In: International Conference on Hybrid Intelligent Systems, pp. 340–345 (2004)
Hendessi, F., Ghayoori, A., Gulliver, T.A.: A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM. ACM Transactions on Asian Language Information Processing 4(1), 38–52 (2005)
Karaali, O., Corrigan, G., Gerson, I.: Speech synthesis with neural networks. In: World Congress on Neural Networks, pp. 45–50 (1996)
Kostek, B., Czyzewski, A.: Employing fuzzy logic and noisy speech for automatic fitting of hearing aid. In: Meeting of the Acoustical Society of America (2001)
Kung, S.Y., Hwang, J.N.: Neural networks for intelligent multimedia processing. Proceedings of the IEEE 86(6), 1244–1272 (1998)
Lewis, T., Powers, D.M.W.: Audio-visual speech recognition using red exclusion and neural networks. In: Australasian conference on Computer science, vol. 4, pp. 149–156 (2002)
Lim, E., Seng, K., Tse, K.: RBF neural network mouth tracking for audio-visual speech recognition system. In: IEEE Region 10 Conference TENCON, pp. 84–87 (2004)
Liu, J., Wang, Z., Xiao, X.: A hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital speech recognition. Pattern Recognition Letter 28(8), 912–920 (2007)
Meng, S., Zhang, Y.: A method of visual speech feature area localization. In: International Conference on Neural Networks and Signal Processing, vol. 2, pp. 1173–1176 (2003)
Nakamura, S.: Statistical multimodal integration for audio-visual speech processing. IEEE Transactions on Neural Networks 13(4), 854–866 (2002)
Sadeghi, V., Yaghmaie, K.: Vowel recognition using neural networks. International Journal of Computer Science and Network Security 6(12), 154–158 (2006)
Schuller, B., Reiter, S., Rigoll, G.: Evolutionary feature generation in speech emotion. In: IEEE International Conference on Recognition Multimedia, pp. 5–8 (2006)
Selouani, S.A., O’Shaughnessy, D.: On the use of evolutionary algorithms to improve the robustness of continuous speech recognition systems in adverse conditions. EURASIP Journal on Applied Signal Processing 8, 814–823 (2003)
Zhou, J., Wang, G., Yang, Y., Chen, P.: Speech emotion recognition based on rough set and SVM. In: 5th IEEE International Conference on Cognitive Informatics, vol. 1, pp. 53–61 (2006)
Zwan, P., Szczuko, P., Kostek, B., Czyzewski, C.: Automatic singing voice recognition employing neural networks and rough sets. In: Kryszkiewicz, M., Peters, J.F., Rybiński, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 793–802. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hassanien, A.E., Schaefer, G., Darwish, A. (2010). Computational Intelligence in Speech and Audio Processing: Recent Advances. In: Gao, XZ., Gaspar-Cunha, A., Köppen, M., Schaefer, G., Wang, J. (eds) Soft Computing in Industrial Applications. Advances in Intelligent and Soft Computing, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11282-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-642-11282-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11281-2
Online ISBN: 978-3-642-11282-9
eBook Packages: EngineeringEngineering (R0)