A Follow-Up Survey of Audiovisual Speech Integration Strategies

Addarrazi, Ilham; Satori, Hassan; Satori, Khalid

doi:10.1007/978-981-15-0947-6_60

Ilham Addarrazi¹⁷,
Hassan Satori¹⁷ &
Khalid Satori¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1076))

1465 Accesses
1 Citations

Abstract

The automatic speech recognition (ASR) systems benefit from visual modality to improve its performance especially in noisy environments. By combining acoustic features with the visual features, audiovisual speech recognition (AVSR) system could be implemented. This paper presents a review on various existing and recent techniques for AVSR. A special emphasis was placed on recent AVSR system fusion technique, where the AVSR systems fusion stages (early, intermediate and late integration) are discussed with their corresponding models. The aim of this study is to discuss different AVSR approach and compare the existing AVSR techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

McGurk, H., MacDonald, J.: Hearing lips and seeing voices. Nature 264, 746 (1976)
Article Google Scholar
Aleksic, P.S., Katsaggelos, A.K.: Audio-visual biometrics. Proc. IEEE 94, 2025–2044 (2006)
Article Google Scholar
Atrey, P.K., Hossain, M.A., El Saddik, A., Kankanhalli, M.S.: Multimodal fusion for multimedia analysis: a survey. Multimedia Syst. 16, 345–379 (2010)
Article Google Scholar
Katsaggelos, A.K., Bahaadini, S., Molina, R.: Audiovisual fusion: challenges and new approaches. Proc. IEEE 103, 1635–1653 (2015)
Article Google Scholar
Addarrazi, I., Satori, H., Satori, K.: Amazigh audiovisual speech recognition system design. In: 2017 Intelligent Systems and Computer Vision (ISCV), pp. 1–5. IEEE (2017)
Google Scholar
Satori, H., El Haoussi, F.: Investigation Amazigh speech recognition using CMU tools. Int. J. Speech Technol. 17, 235–243 (2014)
Article Google Scholar
Satori, H., Zealouk, O., Satori, K., ElHaoussi, F.: Voice comparison between smokers and non-smokers using HMM speech recognition system. Int. J. Speech Technol. 20, 771–777 (2017)
Article Google Scholar
Zealouk, O., Satori, H., Hamidi, M., Laaidi, N., Satori, K.: Vocal parameters analysis of smoker using Amazigh language. Int. J. Speech Technol. 21, 85–91 (2018)
Article Google Scholar
Gupta, K., Gupta, D.: An analysis on LPC, RASTA and MFCC techniques in automatic speech recognition system. In: 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence), pp. 493–497. IEEE (2016)
Google Scholar
Dave, N.: Feature extraction methods LPC, PLP and MFCC in speech recognition. Int. J. Adv. Res. Eng. Technol. 1, 1–4 (2013)
Google Scholar
Upadhyaya, P., Farooq, O., Abidi, M.R., Varshney, P.: Comparative study of visual feature for bimodal Hindi speech recognition. Arch. Acoust. 40, 609–619 (2015)
Article Google Scholar
Morade, S.S., Patnaik, S.: A novel lip reading algorithm by using localized ACM and HMM: tested for digit recognition. Optik 125, 5181–5186 (2014)
Article Google Scholar
Aleksic, P.S., Williams, J.J., Wu, Z., Katsaggelos, A.K.: Audio-visual continuous speech recognition using MPEG-4 compliant visual features. In: Proceedings. International Conference on Image Processing, vol. 1, pp. I–I. IEEE (2002)
Google Scholar
Paleček, K., Chaloupka, J.: Audio-visual speech recognition in noisy audio environments. In: 2013 36th International Conference on Telecommunications and Signal Processing (TSP), pp. 484–487. IEEE (2013)
Google Scholar
Makhlouf, A., Lazli, L., Bensaker, B.: Evolutionary structure of hidden Markov models for audio-visual Arabic speech recognition. Int. J. Signal Imaging Syst. Eng. 9, 55–66 (2016)
Article Google Scholar
Lucey, S., Chen, T., Sridharan, S., Chandran, V.: Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition. IEEE Trans. Multimedia 7, 495–506 (2005)
Article Google Scholar
Sanderson, C., Paliwal, K.K.: Information fusion and person verification using speech and face information. Research Paper IDIAP-RR, pp. 02–33 (2002)
Google Scholar
Amarnag, S., Gurbuz, S., Patterson, E., Gowdy, J.N.: Audio-visual speech integration using coupled hidden markov models for continuous speech recognition. In: Student Forum Paper at ICASSP (2003)
Google Scholar
Subashini, K., Palanivel, S., Ramalingam, V.: Audio-video based classification using SVM and AANN. Int. J. Comput. Appl. 44(6), 33–39 (2012)
Google Scholar
Ibrahim, M.Z., Mulvaney, D.J., Abas, M.F.: Feature-fusion based audio-visual speech recognition using lip geometry features in noisy enviroment. ARPN J. Eng. Appl. Sci. 10, 17521–17527 (2015)
Google Scholar
Chelali, F., Djeradi, A.: Audiovisual speaker identification based on lip and speech modalities. Int. Arab J. Inf. Technol. (IAJIT) 14 (2017)
Google Scholar
Rahmani, M.H., Almasganj, F., Seyyedsalehi, S.A.: Audio-visual feature fusion via deep neural networks for automatic speech recognition. Digit. Signal Proc. 82, 54–63 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science FSDM, University of Sidi Mohamed Ben Abdllah, Fez, Morocco
Ilham Addarrazi, Hassan Satori & Khalid Satori

Authors

Ilham Addarrazi
View author publications
You can also search for this author in PubMed Google Scholar
Hassan Satori
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Satori
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges (SRMGPC), Lucknow, Uttar Pradesh, India
Vikrant Bhateja
School of Computer Engineering, Kalinga Institute of Industrial Technology (KIIT), Bhubaneswar, Odisha, India
Suresh Chandra Satapathy
Department of Computer Sciences, Faculty of Sciences Dhar Mahraz, Sidi Mohammed Ben Abbdallah University, Fez, Morocco
Hassan Satori

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Addarrazi, I., Satori, H., Satori, K. (2020). A Follow-Up Survey of Audiovisual Speech Integration Strategies. In: Bhateja, V., Satapathy, S., Satori, H. (eds) Embedded Systems and Artificial Intelligence. Advances in Intelligent Systems and Computing, vol 1076. Springer, Singapore. https://doi.org/10.1007/978-981-15-0947-6_60

Download citation

DOI: https://doi.org/10.1007/978-981-15-0947-6_60
Published: 08 April 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0946-9
Online ISBN: 978-981-15-0947-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics