Computational Intelligence in Speech and Audio Processing: Recent Advances

Hassanien, Aboul Ella; Schaefer, Gerald; Darwish, Ashraf

doi:10.1007/978-3-642-11282-9_32

Aboul Ella Hassanien⁷,
Gerald Schaefer⁸ &
Ashraf Darwish⁹

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 75))

1020 Accesses
3 Citations

Abstract

Computational intelligence techniques have been used for the processing of speech and audio for several years. Some of the applications in speech processing where computational intelligences are extensively used include speech recognition, speaker recognition, speech enhancement, speech coding and speech synthesis, while in audio processing, computational intelligence applications include music classification, audio classification and audio indexing and retrieval. In this paper we provide an overview of recent applications of modern computational intelligence theory in the field of speech and audio processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bovbel, E., Tsishkou, D.: Belarussian speech recognition using genetic algorithms. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 185–204. Springer, Heidelberg (2000)
Google Scholar
Bugatti, A., Flammini, A., Migliorati, P.: Audio classification in speech and music: a comparison between a statistical and a neural approach. EURASIP Journal on Applied Signal Processing 2002 (4), 372–378 (2002)
Google Scholar
Buscicchio, C., Grecki, P., Caponetti, L.: Speech emotion recognition using spiking neural networks. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds.) ISMIS 2006. LNCS (LNAI), vol. 4203, pp. 38–46. Springer, Heidelberg (2006)
Chapter Google Scholar
Cetin, O., Kantor, A., King, S., Bartels, C., Magimai-Doss, M., Frankel, J., Livescu, K.: An articulatory feature-based tandem approach and factored observation modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 645–648 (2007)
Google Scholar
Corrigan, G., Massey, N., Schnurr, O.: Transition-based speech synthesis using neural networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 945–948 (2000)
Google Scholar
Czyzewski, A.: Automatic identification of sound source position employing neural networks and rough sets. Pattern Recognition Letters 24, 921–933 (2003)
Article Google Scholar
Czyzewski, A., Szczerba, M.: Pitch estimation enhancement employing neural network-based music prediction. In: IASTED Intern. Conference, Artificial Intelligence and Soft Computing, pp. 413–418 (2002)
Google Scholar
Czyzewski, A.B.K., Skarzynski, H.: Diagnostic system for speech articulation and speech understanding. In: Meeting of the Acoustical Society of America (2002)
Google Scholar
Czyzewski, A., Kaczmarek, A., Kostek, B.: Intelligent processing of stuttered speech. Journal of Intelligent Information Systems 21(2), 143–171 (2003)
Article Google Scholar
Ding, I.J.: Incremental MLLR speaker adaptation by fuzzy logic control. Pattern Recognition 40(11), 3110–3119 (2007)
Article MATH Google Scholar
Faraj, M., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps. Pattern Recognition Letters 28(11), 1368–1382 (2007)
Article Google Scholar
Fellenz, W., Taylor, J., Cowie, R., Douglas-Cowie, E., Piat, F., Kollias, C., Orovas, S., Apolloni, B.: On emotion recognition of faces and of speech using neural networks, fuzzy logic and the ASSESS system. In: IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 2, pp. 93–98 (2000)
Google Scholar
Frankel, J., Richmond, K., King, S., Taylor, P.: An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces. In: Proc. ICSLP, vol. 4, pp. 254–257 (2000)
Google Scholar
Guido, R., Pereira, J., Slaets, J.: Advances on pattern recognition for speech and audio processing. Pattern Recognition Letters 28(11), 1283–1284 (2007)
Article Google Scholar
Halavati, R., Shouraki, S., Eshraghi, M., Alemzadeh, M., Ziaie, P.: A novel fuzzy approach to speech recognition. In: International Conference on Hybrid Intelligent Systems, pp. 340–345 (2004)
Google Scholar
Hendessi, F., Ghayoori, A., Gulliver, T.A.: A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM. ACM Transactions on Asian Language Information Processing 4(1), 38–52 (2005)
Article Google Scholar
Karaali, O., Corrigan, G., Gerson, I.: Speech synthesis with neural networks. In: World Congress on Neural Networks, pp. 45–50 (1996)
Google Scholar
Kostek, B., Czyzewski, A.: Employing fuzzy logic and noisy speech for automatic fitting of hearing aid. In: Meeting of the Acoustical Society of America (2001)
Google Scholar
Kung, S.Y., Hwang, J.N.: Neural networks for intelligent multimedia processing. Proceedings of the IEEE 86(6), 1244–1272 (1998)
Article Google Scholar
Lewis, T., Powers, D.M.W.: Audio-visual speech recognition using red exclusion and neural networks. In: Australasian conference on Computer science, vol. 4, pp. 149–156 (2002)
Google Scholar
Lim, E., Seng, K., Tse, K.: RBF neural network mouth tracking for audio-visual speech recognition system. In: IEEE Region 10 Conference TENCON, pp. 84–87 (2004)
Google Scholar
Liu, J., Wang, Z., Xiao, X.: A hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital speech recognition. Pattern Recognition Letter 28(8), 912–920 (2007)
Article Google Scholar
Meng, S., Zhang, Y.: A method of visual speech feature area localization. In: International Conference on Neural Networks and Signal Processing, vol. 2, pp. 1173–1176 (2003)
Google Scholar
Nakamura, S.: Statistical multimodal integration for audio-visual speech processing. IEEE Transactions on Neural Networks 13(4), 854–866 (2002)
Article Google Scholar
Sadeghi, V., Yaghmaie, K.: Vowel recognition using neural networks. International Journal of Computer Science and Network Security 6(12), 154–158 (2006)
Google Scholar
Schuller, B., Reiter, S., Rigoll, G.: Evolutionary feature generation in speech emotion. In: IEEE International Conference on Recognition Multimedia, pp. 5–8 (2006)
Google Scholar
Selouani, S.A., O’Shaughnessy, D.: On the use of evolutionary algorithms to improve the robustness of continuous speech recognition systems in adverse conditions. EURASIP Journal on Applied Signal Processing 8, 814–823 (2003)
Google Scholar
Zhou, J., Wang, G., Yang, Y., Chen, P.: Speech emotion recognition based on rough set and SVM. In: 5th IEEE International Conference on Cognitive Informatics, vol. 1, pp. 53–61 (2006)
Google Scholar
Zwan, P., Szczuko, P., Kostek, B., Czyzewski, C.: Automatic singing voice recognition employing neural networks and rough sets. In: Kryszkiewicz, M., Peters, J.F., Rybiński, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 793–802. Springer, Heidelberg (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Information Technology Department, Cairo University, Giza, Egypt
Aboul Ella Hassanien
Department of Computer Science, Loughborough University, Loughborough, U.K
Gerald Schaefer
Computer Science Department, Helwan University, Cairo, Egypt
Ashraf Darwish

Authors

Aboul Ella Hassanien
View author publications
You can also search for this author in PubMed Google Scholar
Gerald Schaefer
View author publications
You can also search for this author in PubMed Google Scholar
Ashraf Darwish
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Helsinki University, Espoo, Finland
Xiao-Zhi Gao
University of Minho, Giumaraes, Portugal
António Gaspar-Cunha
Kyushu Inst. of Technology, Fukuoka, Japan
Mario Köppen
Loughborough University, Loughborough, UK
Gerald Schaefer
The Chinese University of Hong Kong, Hong Kong
Jun Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hassanien, A.E., Schaefer, G., Darwish, A. (2010). Computational Intelligence in Speech and Audio Processing: Recent Advances. In: Gao, XZ., Gaspar-Cunha, A., Köppen, M., Schaefer, G., Wang, J. (eds) Soft Computing in Industrial Applications. Advances in Intelligent and Soft Computing, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11282-9_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-11282-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11281-2
Online ISBN: 978-3-642-11282-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics