Audiovisual Analysis and Synthesis for Multimodal Human-Computer Interfaces

Sevillano, Xavier; Melenchón, Javier; Cobo, Germán; Socoró, Joan Claudi; Alías, Francesc

doi:10.1007/978-1-84800-136-7_13

Xavier Sevillano⁴,
Javier Melenchón⁴,
Germán Cobo⁴,
Joan Claudi Socoró⁴ &
…
Francesc Alías⁴

1201 Accesses

Abstract

Multimodal signal processing techniques are called to play a salient role in the implementation of natural computer-human interfaces. In particular, the development of efficient interface front ends that emulate interpersonal communication would benefit from the use of techniques capable of processing the visual and auditory modes jointly. This work introduces the application of audiovisual analysis and synthesis techniques based on Principal Component Analysis and Non-negative Matrix Factorization on facial audiovisual sequences. Furthermore, the applicability of the extracted audiovisual bases is analyzed throughout several experiments that evaluate the quality of audiovisual resynthesis using both objective and subjective criteria.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allen J, Byron D, Dzikovska M, Ferguson G, Galescu L and Stent A (2001) Towards conversational human-computer interaction. AI Magazine 22(4):27–38
Google Scholar
Bregler C, Covell M and Slaney M (1997) Video Rewrite: driving visual speech with audio. Proc. of the ACM Conference in Computer Graphics and Interactive Techniques, 353–360
Google Scholar
Butz T and Thiran JP (2005) From error probability to information theoretic (multi-modal) signal processing. Signal Processing, 85(5):875–902
Article MATH Google Scholar
Calle J, Martínez P and Valle D (2006) Hacia la realización de una interacción natural. Proc. of the VII International Conference on Human-Computer Interaction, 471–480(in Spanish)
Google Scholar
Casey MA and Westner A (2000) Separation of mixed audio sources by Independent Subspace Analysis. Proc. of the International Computer Music Conference, 154–161
Google Scholar
Cosatto E and Graf HP (1998) Sample-based synthesis of photo-realistic talking heads. Computer Animation, 103–110
Google Scholar
Ezzat T, Geiger G and Poggio T (2002) Trainable videorealistic speech animation. Proc. of the ACM Conference in Computer Graphics and Interactive Techniques, 225–228
Google Scholar
Fagel S (2006) Joint Audio-Visual Unit Selection - The JAVUS speech synthesizer. Proc. of the International Conference on Speech and Computer
Google Scholar
Fisher III JW, Darrell T, Freeman TW and Viola P (2000) Learning joint statistical models for audio-visual fusion and segregation. Advances in Neural Information Processing Systems, vol. 14
Google Scholar
Golub G and Loan CV (1996) Matrix computations. The John Hopkins University Press
Google Scholar
Hershey J and Movellan J (1999) Audio-vision: using audio-visual synchrony to locate sounds. Advances in Neural Information Processing Systems, vol. 12
Google Scholar
Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5:1457–1469
MathSciNet MATH Google Scholar
Hyvarinen A, Karhunen J and Oja E (2001) Independent Component Analysis. John Wiley and Sons
Google Scholar
Jolliffe I (1986) Principal Component Analysis. Springler-Verlag
Google Scholar
Kirby M (2001) Geometric data analysis: an empirical approach to dimensionality reduction and the study of patterns. John Wiley and Sons
Google Scholar
Lee DD and Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791
Article Google Scholar
Lee DD and Seung HS (2000) Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, vol. 13
Google Scholar
Markel JE and Gray AH (1982) Linear prediction of speech. Springer-Verlag
Google Scholar
Melenchón J, De la Torre F, Iriondo I, Alías F, Martínez E and Vicent Ll (2003) Text to visual synthesis with appearance models. Proc. of the IEEE International Conference on Image Processing, vol. 1, 237–240
Google Scholar
Melenchón J, Iriondo I, Socoró JC and Martínez E (2003) Lip animation of a personalized facial model from auditory speech. Proc. IEEE International Symposium on Signal Processing and Information Technology, 187–190
Google Scholar
Melenchón J, Meler L and Iriondo I (2004) On-the-fly training. Articulated Models and Deformable Objects, LNCS vol. 3179, pp. 146–153
Google Scholar
Pantic M, Sebe N, Cohn JF and Huang T (2005) Affective multimodal human-computer interaction. Proc. of the 13th annual ACM International Conference on Multimedia, pp. 669–676
Google Scholar
Papamichalis PE and Barnwell III TP (1983) Variable rate speech compression by encoding subsets of the PARCOR coefficients. IEEE Transactions on Acoustics, Speech and Signal Processing, 31(3):706–713
Article Google Scholar
Sevillano X, Melenchón J and Socoró JC (2006) Análisis y síntesis audiovisual para interfaces multimodales ordenador-persona. Proc. of the VII International Conference on Human-Computer Interaction, 481–490(in Spanish)
Google Scholar
Slaney M and Covell M (2000) Facesync: a linear operator for measuring synchronization of video facial images and audio tracks. Advances in Neural Information Processing Systems, vol. 13
Google Scholar
Smaragdis P and Casey M (2003) Audio/visual independent components. Proc. of the Fourth International Symposium on Independent Component Analysis and Blind Source Separation, 709–714
Google Scholar
Smaragdis P and Brown JC (2003) Non-negative matrix factorization for polyphonic music transcription. Proc. of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 177–180
Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics, 1: 80--83
Article Google Scholar

Download references

Author information

Authors and Affiliations

GPMM – Grup de Recerca en Processament Multimodal, Enginyeria i Arquitectura La Salle Universitat Ramon Llull, Quatre Camins 2 – 08022, Barcelona
Xavier Sevillano, Javier Melenchón, Germán Cobo, Joan Claudi Socoró & Francesc Alías

Authors

Xavier Sevillano
View author publications
You can also search for this author in PubMed Google Scholar
Javier Melenchón
View author publications
You can also search for this author in PubMed Google Scholar
Germán Cobo
View author publications
You can also search for this author in PubMed Google Scholar
Joan Claudi Socoró
View author publications
You can also search for this author in PubMed Google Scholar
Francesc Alías
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xavier Sevillano .

Editor information

Editors and Affiliations

Depto. Informática, Universidad de Castilla-La Mancha, Paseo de la Universidad 4, Ciudad Real, 13071, Spain
Miguel Redondo
Depto. Informática, Universidad de Castilla-La Mancha, Paseo de la Universidad 4, Ciudad Real, 13071, Spain
Crescencio Bravo
Depto. Informática, Universidad de Castilla-La Mancha, Paseo de la Universidad 4, Ciudad Real, 13071, Spain
Manuel Ortega

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sevillano, X., Melenchón, J., Cobo, G., Socoró, J.C., Alías, F. (2009). Audiovisual Analysis and Synthesis for Multimodal Human-Computer Interfaces. In: Redondo, M., Bravo, C., Ortega, M. (eds) Engineering the User Interface. Springer, London. https://doi.org/10.1007/978-1-84800-136-7_13

Download citation

DOI: https://doi.org/10.1007/978-1-84800-136-7_13
Published: 30 September 2008
Publisher Name: Springer, London
Print ISBN: 978-1-84800-135-0
Online ISBN: 978-1-84800-136-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics