Unified System for Visual Speech Recognition and Speaker Identification

Rekik, Ahmed; Ben-Hamadou, Achraf; Mahdi, Walid

doi:10.1007/978-3-319-25903-1_33

Ahmed Rekik¹⁹,
Achraf Ben-Hamadou²⁰ &
Walid Mahdi^19,21

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9386))

Included in the following conference series:

International Conference on Advanced Concepts for Intelligent Vision Systems

2872 Accesses
8 Citations

Abstract

This paper proposes a unified system for both visual speech recognition and speaker identification. The proposed system can handle image and depth data if they are available. The proposed system consists of four consecutive steps, namely, 3D face pose tracking, mouth region extraction, features computing, and classification using the Support Vector Machine method. The system is experimentally evaluated on three public datasets, namely, MIRACL-VC1, OuluVS, and CUAVE. In one hand, the visual speech recognition module achieves up to 96 % and 79.2 % for speaker dependent and speaker independent settings, respectively. On the other hand, speaker identification performs up to 98.9 % of recognition rate. Additionally, the obtained results demonstrate the importance of the depth data to resolve the subject dependency issue.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahlberg, J.: Candide-3 - an updated parameterised face. Technical report, Department of Electrical Engineering, Linköping University, Sweden (2001)
Google Scholar
Bakry, A., Elgammal, A.: Mkpls: manifold kernel partial least squares for lipreading and speaker identification. In: International Conference on Computer Vision and Pattern Recognition, pp. 684–691 (2013)
Google Scholar
Ben-Hamadou, A., Soussen, C., Daul, C., Blondel, W., Wolf, D.: Flexible calibration of structured-light systems projecting point patterns. Computer Vision and Image Understanding 117(10), 1468–1481 (2013)
Article Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. International Conference on Computer Vision and Pattern Recognition 1, 886–893 (2005)
Google Scholar
Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006)
Chapter Google Scholar
de la Cuesta, A.G., Zhang, J., Miller, P.: Biometric identification using motion history images of a speaker’s lip movements. In: International Machine Vision and Image Processing Conference, IMVIP 2008, pp. 83–88. IEEE (2008)
Google Scholar
Liu, Y.-F., Lin, C.-Y., Guo, J.-M.: Impact of the lips for biometrics. IEEE Transactions on Image Processing 21(6), 3092–3101 (2012)
Article MathSciNet Google Scholar
Lucey, P., Sridharan, S.: Patch-based representation of visual speech. In: Proceedings of the HCSNet Workshop on Use of Vision in Human-Computer Interaction, pp. 79–85 (2006)
Google Scholar
Lucey, P., Sridharan, S., Dean, D.: Continuous pose-invariant lipreading. In: INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, pp. 2679–2682, September 22–26, 2008
Google Scholar
Papandreou, G., Katsamanis, A., Pitsikalis, V., Maragos, P.: Adaptive multimodal fusion by uncertainty compensation with application to audiovisual speech recognition. Audio, Speech, and Language Processing 17(3), 423–435 (2009)
Article Google Scholar
Patterson, E.K., Gurbuz, S., Tufekci, Z., Gowdy, J.: Cuave: a new audio-visual database for multimodal human-computer interface research. In: Acoustics, Speech, and Signal Processing, vol. 2, pp. 2017–2020 (2002)
Google Scholar
Pei, Y., Kim, T.-k., Zha, H.: Unsupervised random forest manifold alignment for lipreading. In: International Conference on Computer Vision, pp. 129–136 (2013)
Google Scholar
Rekik, A., Ben-Hamadou, A., Mahdi, W.: Face pose tracking under arbitrary illumination changes. In: International Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (2014)
Google Scholar
Rekik, A., Ben-Hamadou, A., Mahdi, W.: A new visual speech recognition approach for RGB-D cameras. In: Campilho, A., Kamel, M. (eds.) ICIAR 2014, Part II. LNCS, vol. 8815, pp. 21–28. Springer, Heidelberg (2014)
Google Scholar
Rekik, A., Ben-Hamadou, A., Mahdi, W.: An adaptive approach for lip-reading using image and depth data. Multimedia Tools and Applications, 1–28 (2015)
Google Scholar
Rekik, A., Ben-Hamadou, A., Mahdi, W.: Human machine interaction via visual speech spotting. In: Proc. of Advanced Concepts for Intelligent Vision Systems (ACIVS) (2015)
Google Scholar
Saeed, U.: Comparative analysis of lip features for person identification. In: Proceedings of the 8th International Conference on Frontiers of Information Technology, pp. 20. ACM (2010)
Google Scholar
Saeed, U.: Person identification using behavioral features from lip motion. In: 2011 IEEE International Conference on Automatic Face & Gesture Recognition and Workshops (FG 2011), pp. 131–136. IEEE (2011)
Google Scholar
Zhang, Z.: A flexible new technique for camera calibration. Pattern Analysis and Machine Intelligence 22(11), 1330–1334 (2000)
Article Google Scholar
Zhao, G., Barnard, M., Pietikainen, M.: Lipreading with local spatiotemporal descriptors. Multimedia, IEEE Transactions 11(7), 1254–1265 (2009)
Article Google Scholar
Zhou, Z., Hong, X., Zhao, G., Pietikainen, M.: A compact representation of visual speech data using latent variables. IEEE Transactions on Pattern Analysis and Machine Intelligence 36(1), 181–187 (2014)
Google Scholar
Zhou, Z., Zhao, G., Hong, X., Pietikäinen, M.: A review of recent advances in visual speech decoding. Image and Vision Computing (2014)
Google Scholar
Zhou, Z., Zhao, G. and Pietikainen, M.: Towards a practical lipreading system. In: International Conference on Computer Vision and Pattern Recognition, pp. 137–144 (2011)
Google Scholar
Zhou, Z., Zhao, G., Pietikainen, M.: Lipreading: a graph embedding approach. In: International Conference on Pattern Recognition, pp. 523–526 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Multimedia InfoRmation Systems and Advanced Computing Laboratory (MIRACL), Sfax University Pôle technologique de Sfax, BP 242, route de Tunis Km 10, 3021, Sfax, Tunisia
Ahmed Rekik & Walid Mahdi
Driving Assistance Research Center, Valeo 34 rue St-André Z.I. des Vignes, 93012, Bobigny, France
Achraf Ben-Hamadou
Department of Computer Science, College of Computers and Information Technology, Taif University, P.O.Box 888, Hawiyah Taif, 21974, Kingdom of Saudi Arabia
Walid Mahdi

Authors

Ahmed Rekik
View author publications
You can also search for this author in PubMed Google Scholar
Achraf Ben-Hamadou
View author publications
You can also search for this author in PubMed Google Scholar
Walid Mahdi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Rekik .

Editor information

Editors and Affiliations

Dipartimento di Matematica e Informatica, Università di Catania, Catania, Catania, Italy
Sebastiano Battiato
Arcueil CX, France
Jacques Blanc-Talon
Catania, Italy
Giovanni Gallo
Gent, Belgium
Wilfried Philips
CSIRO, Sydney, New South Wales, Australia
Dan Popescu
Vision Lab., University of Antwerp, Antwerpen, Belgium
Paul Scheunders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rekik, A., Ben-Hamadou, A., Mahdi, W. (2015). Unified System for Visual Speech Recognition and Speaker Identification. In: Battiato, S., Blanc-Talon, J., Gallo, G., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2015. Lecture Notes in Computer Science(), vol 9386. Springer, Cham. https://doi.org/10.1007/978-3-319-25903-1_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-25903-1_33
Published: 06 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25902-4
Online ISBN: 978-3-319-25903-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics