Skip to main content

Computational Intelligence in Speech and Audio Processing: Recent Advances

  • Conference paper
Soft Computing in Industrial Applications

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 75))

Abstract

Computational intelligence techniques have been used for the processing of speech and audio for several years. Some of the applications in speech processing where computational intelligences are extensively used include speech recognition, speaker recognition, speech enhancement, speech coding and speech synthesis, while in audio processing, computational intelligence applications include music classification, audio classification and audio indexing and retrieval. In this paper we provide an overview of recent applications of modern computational intelligence theory in the field of speech and audio processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Bovbel, E., Tsishkou, D.: Belarussian speech recognition using genetic algorithms. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 185–204. Springer, Heidelberg (2000)

    Google Scholar 

  • Bugatti, A., Flammini, A., Migliorati, P.: Audio classification in speech and music: a comparison between a statistical and a neural approach. EURASIP Journal on Applied Signal Processing 2002 (4), 372–378 (2002)

    Google Scholar 

  • Buscicchio, C., Grecki, P., Caponetti, L.: Speech emotion recognition using spiking neural networks. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds.) ISMIS 2006. LNCS (LNAI), vol. 4203, pp. 38–46. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Cetin, O., Kantor, A., King, S., Bartels, C., Magimai-Doss, M., Frankel, J., Livescu, K.: An articulatory feature-based tandem approach and factored observation modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 645–648 (2007)

    Google Scholar 

  • Corrigan, G., Massey, N., Schnurr, O.: Transition-based speech synthesis using neural networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 945–948 (2000)

    Google Scholar 

  • Czyzewski, A.: Automatic identification of sound source position employing neural networks and rough sets. Pattern Recognition Letters 24, 921–933 (2003)

    Article  Google Scholar 

  • Czyzewski, A., Szczerba, M.: Pitch estimation enhancement employing neural network-based music prediction. In: IASTED Intern. Conference, Artificial Intelligence and Soft Computing, pp. 413–418 (2002)

    Google Scholar 

  • Czyzewski, A.B.K., Skarzynski, H.: Diagnostic system for speech articulation and speech understanding. In: Meeting of the Acoustical Society of America (2002)

    Google Scholar 

  • Czyzewski, A., Kaczmarek, A., Kostek, B.: Intelligent processing of stuttered speech. Journal of Intelligent Information Systems 21(2), 143–171 (2003)

    Article  Google Scholar 

  • Ding, I.J.: Incremental MLLR speaker adaptation by fuzzy logic control. Pattern Recognition 40(11), 3110–3119 (2007)

    Article  MATH  Google Scholar 

  • Faraj, M., Bigun, J.: Audio-visual person authentication using lip-motion from orientation maps. Pattern Recognition Letters 28(11), 1368–1382 (2007)

    Article  Google Scholar 

  • Fellenz, W., Taylor, J., Cowie, R., Douglas-Cowie, E., Piat, F., Kollias, C., Orovas, S., Apolloni, B.: On emotion recognition of faces and of speech using neural networks, fuzzy logic and the ASSESS system. In: IEEE-INNS-ENNS International Joint Conference on Neural Networks, vol. 2, pp. 93–98 (2000)

    Google Scholar 

  • Frankel, J., Richmond, K., King, S., Taylor, P.: An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces. In: Proc. ICSLP, vol. 4, pp. 254–257 (2000)

    Google Scholar 

  • Guido, R., Pereira, J., Slaets, J.: Advances on pattern recognition for speech and audio processing. Pattern Recognition Letters 28(11), 1283–1284 (2007)

    Article  Google Scholar 

  • Halavati, R., Shouraki, S., Eshraghi, M., Alemzadeh, M., Ziaie, P.: A novel fuzzy approach to speech recognition. In: International Conference on Hybrid Intelligent Systems, pp. 340–345 (2004)

    Google Scholar 

  • Hendessi, F., Ghayoori, A., Gulliver, T.A.: A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM. ACM Transactions on Asian Language Information Processing 4(1), 38–52 (2005)

    Article  Google Scholar 

  • Karaali, O., Corrigan, G., Gerson, I.: Speech synthesis with neural networks. In: World Congress on Neural Networks, pp. 45–50 (1996)

    Google Scholar 

  • Kostek, B., Czyzewski, A.: Employing fuzzy logic and noisy speech for automatic fitting of hearing aid. In: Meeting of the Acoustical Society of America (2001)

    Google Scholar 

  • Kung, S.Y., Hwang, J.N.: Neural networks for intelligent multimedia processing. Proceedings of the IEEE 86(6), 1244–1272 (1998)

    Article  Google Scholar 

  • Lewis, T., Powers, D.M.W.: Audio-visual speech recognition using red exclusion and neural networks. In: Australasian conference on Computer science, vol. 4, pp. 149–156 (2002)

    Google Scholar 

  • Lim, E., Seng, K., Tse, K.: RBF neural network mouth tracking for audio-visual speech recognition system. In: IEEE Region 10 Conference TENCON, pp. 84–87 (2004)

    Google Scholar 

  • Liu, J., Wang, Z., Xiao, X.: A hybrid SVM/DDBHMM decision fusion modeling for robust continuous digital speech recognition. Pattern Recognition Letter 28(8), 912–920 (2007)

    Article  Google Scholar 

  • Meng, S., Zhang, Y.: A method of visual speech feature area localization. In: International Conference on Neural Networks and Signal Processing, vol. 2, pp. 1173–1176 (2003)

    Google Scholar 

  • Nakamura, S.: Statistical multimodal integration for audio-visual speech processing. IEEE Transactions on Neural Networks 13(4), 854–866 (2002)

    Article  Google Scholar 

  • Sadeghi, V., Yaghmaie, K.: Vowel recognition using neural networks. International Journal of Computer Science and Network Security 6(12), 154–158 (2006)

    Google Scholar 

  • Schuller, B., Reiter, S., Rigoll, G.: Evolutionary feature generation in speech emotion. In: IEEE International Conference on Recognition Multimedia, pp. 5–8 (2006)

    Google Scholar 

  • Selouani, S.A., O’Shaughnessy, D.: On the use of evolutionary algorithms to improve the robustness of continuous speech recognition systems in adverse conditions. EURASIP Journal on Applied Signal Processing 8, 814–823 (2003)

    Google Scholar 

  • Zhou, J., Wang, G., Yang, Y., Chen, P.: Speech emotion recognition based on rough set and SVM. In: 5th IEEE International Conference on Cognitive Informatics, vol. 1, pp. 53–61 (2006)

    Google Scholar 

  • Zwan, P., Szczuko, P., Kostek, B., Czyzewski, C.: Automatic singing voice recognition employing neural networks and rough sets. In: Kryszkiewicz, M., Peters, J.F., Rybiński, H., Skowron, A. (eds.) RSEISP 2007. LNCS (LNAI), vol. 4585, pp. 793–802. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hassanien, A.E., Schaefer, G., Darwish, A. (2010). Computational Intelligence in Speech and Audio Processing: Recent Advances. In: Gao, XZ., Gaspar-Cunha, A., Köppen, M., Schaefer, G., Wang, J. (eds) Soft Computing in Industrial Applications. Advances in Intelligent and Soft Computing, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11282-9_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11282-9_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11281-2

  • Online ISBN: 978-3-642-11282-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics