The Research Context

Abel, Andrew; Hussain, Amir

doi:10.1007/978-3-319-13509-0_3

Andrew Abel⁴ &
Amir Hussain⁴

Part of the book series: SpringerBriefs in Cognitive Computation ((BRIEFSCC,volume 5))

323 Accesses

Abstract

This chapter presents a literature review that places the research proposed in this book in context, building on the background presented in the previous chapters. Firstly, the overall speech processing domain is briefly discussed. This review presents examples of listening devices using directional microphones, array microphones, noise reduction algorithms, and rule based automatic decision making, demonstrating that the multimodal two stage framework presented later in this book has established precedent in the context of real world hearing aid devices. The other significant aspect vital to the research context of this work is the field of audiovisual speech filtering. This chapter presents a review of multimodal speech enhancement, with a discussion of the initial early stage audiovisual speech filtering systems in the literature, and the subsequent development and diversification of this field. A number of different state of the art speech filtering systems are examined and reviewed in depth, particularly multimodal beamforming and Wiener filtering. Finally, several audiovisual speech databases are evaluated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

K. Chung, Challenges and recent developments in hearing aids. Part i. Speech understanding in noise, microphone technologies and noise reduction algorithms. Trends Amplif. 8(3), 83–124 (2004)
Article Google Scholar
T. Ricketts, H. Mueller, Making sense of directional microphone hearing aids. Am. J. Audiol. 8(2), 117 (1999)
Article Google Scholar
M. Valente, Use of microphone technology to improve user performance in noise, Textbook of Hearing Aid Amplification (Singular Thomason Learning, San Diego, 2000), p. 247
Google Scholar
F. Kuk, D. Keenan, C. Lau, C. Ludvigsen, Performance of a fully adaptive directional microphone to signals presented from various azimuths. J. Am. Acad. Audiol. 16(6), 333–347 (2005)
Article Google Scholar
M. Cord, R. Surr, B. Walden, L. Olson, Performance of directional microphone hearing aids in everyday life. J. Am. Acad. Audiol. 13(6), 295–307 (2002)
Google Scholar
M. Cord, R. Surr, B. Walden, O. Dyrlund, Relationship between laboratory measures of directional advantage and everyday success with directional microphone hearing aids. J. Am. Acad. Audiol. 15(5), 353–364 (2004)
Article Google Scholar
T. Ricketts, P. Henry, Evaluation of an adaptive, directional-microphone hearing aid: evaluación de un auxiliar auditivo de micrófono direccional adaptable. Int. J. Audiol. 41(2), 100–112 (2002)
Article Google Scholar
R. Bentler, C. Palmer, A. Dittberner, Hearing-in-noise: comparison of listeners with normal and (aided) impaired hearing. J. Am. Acad. Audiol. 15(3), 216–225 (2004)
Article Google Scholar
L. Mens, Speech understanding in noise with an eyeglass hearing aid: asymmetric fitting and the head shadow benefit of anterior microphones. Int. J. Audiol. 50(1), 27–33 (2011)
Article Google Scholar
L. Christensen, D. Helmink, W. Soede, M. Killion, Complaints about hearing in noise: a new answer. Hear. Rev. 9(6), 34–36 (2002)
Google Scholar
S. Laugesen, T. Schmidtke, Improving on the speech-in-noise problem with wireless array technology. News from Oticon (2004), pp. 3–23
Google Scholar
S. Rosen, Temporal information in speech: acoustic, auditory and linguistic aspects. Philos. Trans.: Biol. Sci. 336, 367–373 (1992)
Article Google Scholar
N. Tellier, H. Arndt, H. Luo, Speech or noise? Using signal detection and noise reduction. Hear. Rev. 10(6), 48–51 (2003)
Google Scholar
H. Levitt, Noise reduction in hearing aids: an overview. J. Rehabil. Res. Dev. 38(1), 111–121 (2001)
MathSciNet Google Scholar
M. Boymans, W. Dreschler, P. Schoneveld, H. Verschuure, Clinical evaluation of a full-digital in-the-ear hearing instrument. Int. J. Audiol. 38(2), 99–108 (1999)
Article Google Scholar
J. Alcántara, B. Moore, V. Kühnel, S. Launer, Evaluation of the noise reduction system in a commrcial digital hearing aid: evaluación del sistema de reducción de ruido en un auxiliar auditivo digital comercial. Int. J. Audiol. 42(1), 34–42 (2003)
Article Google Scholar
C. Elberling, About the voicefinder. News from Oticon (2002)
Google Scholar
D. Schum, Noise-reduction circuitry in hearing aids: (2) goals and current strategies. Hear. J. 56(6), 32 (2003)
Article Google Scholar
L. Girin, J. Schwartz, G. Feng, Audio-visual enhancement of speech in noise. J. Acoust. Soc. Am. 109, 3007 (2001)
Article Google Scholar
R. Goecke, G. Potamianos, C. Neti, Noisy audio feature enhancement using audio-visual speech data, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’02), vol. 2 (IEEE, 2002), pp. 2025–2028
Google Scholar
S. Deligne, G. Potamianos, C. Neti, Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization), in Proceedings of the Sensor Array and Multichannel Signal Processing Workshop (IEEE, 2003), pp. 68–71
Google Scholar
A. Acero, R. Stern, Environmental robustness in automatic speech recognition, in Proceedings of the International Conference on Acoustics, Speech, and Signal ProcessingICASSP-90 (IEEE, 2002), pp. 849–852
Google Scholar
L. Deng, A. Acero, L. Jiang, J. Droppo, X. Huang, High-performance robust speech recognition using stereo training data, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’01), vol. 1 (IEEE, 2002), pp. 301–304
Google Scholar
B. Rivet, J. Chambers, Multimodal speech separation, in Advances in Nonlinear Speech Processing, vol. 5933, Lecture Notes in Computer Science, ed. by J. Sole-Casals, V. Zaiats (Springer, Berlin, 2010), pp. 1–11
Chapter Google Scholar
B. Rivet, L. Girin, C. Jutten, Log-Rayleigh distribution: a simple and efficient statistical representation of log-spectral coefficients. IEEE Trans. Audio Speech Lang. Process. 15(3), 796–802 (2007)
Article Google Scholar
B. Rivet, L. Girin, C. Jutten, Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures. IEEE Trans. Audio Speech Lang. Process. 15(1), 96–108 (2007)
Article Google Scholar
B. Rivet, L. Girin, C. Serviere, D.-T. Pham, C. Jutten, Using a visual voice activity detector to regularize the permutations in blind separation of convolutive speech mixtures, in Proceedings of the 15th International Conference on Digital Signal Processing (2007), pp. 223 –226
Google Scholar
B. Rivet, L. Girin, C. Jutten, Visual voice activity detection as a help for speech source separation from convolutive mixtures. Speech Commun. 49(7–8), 667–677 (2007)
Article Google Scholar
C. Jutten, J. Herault, Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process. 24(1), 1–10 (1991)
Article MATH Google Scholar
J. Herault, C. Jutten, B. Ans, Detection de grandeurs primitives dans un message composite par une architecture de calcul neuromimetrique en apprentissage non supervise. Actes du Xeme colloque GRETSI 2, 1017–1020 (1985)
Google Scholar
E. Cherry, Some experiments on the recognition of speech, with one and with two ears. J. Acoust. Soc. Am. 25(5), 975–979 (1953)
Article Google Scholar
L. Girin, G. Feng, J. Schwartz, Fusion of auditory and visual information for noisy speech enhancement: a preliminary study of vowel transitions, in Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 2 (IEEE, 2002), pp. 1005–1008
Google Scholar
D. Sodoyer, L. Girin, C. Jutten, J. Schwartz, Developing an audio-visual speech source separation algorithm. Speech Commun. 44(1–4), 113–125 (2004)
Article Google Scholar
D. Sodoyer, J. Schwartz, L. Girin, J. Klinkisch, C. Jutten, Separation of audio-visual speech sources: a new approach exploiting the audio-visual coherence of speech stimuli. EURASIP J. Appl. Signal Process. 2002(1), 1165–1173 (2002)
Article MATH Google Scholar
S. Naqvi, M. Yu, J. Chambers, A multimodal approach to blind source separation of moving sources. IEEE J. Sel. Top. Signal Process. 4(5), 895–910 (2010)
Article Google Scholar
P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1 (IEEE Computer Society, 2001), pp. 511–518
Google Scholar
A. Hyvarinen, J. Karhunen, E. Oja, Independent Component Analysis, vol. 26 (Wiley-Interscience, New York, 2001)
Book Google Scholar
E. Bingham, A. Hyvarinen, A fast fixed-point algorithm for independent component analysis of complex valued signals. Int. J. Neural Syst. 10(1), 1–8 (2000)
Article Google Scholar
J. Barker, X. Shao, Audio-visual speech fragment decoding, in Proceedings of the International Conference on Auditory-Visual Speech Processing (2007), pp. 37–42
Google Scholar
J. Barker, M. Cooke, D. Ellis, Decoding speech in the presence of other sources. Speech Commun. 45(1), 5–25 (2005)
Article Google Scholar
A. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound (The MIT Press, Cambridge, 1990)
Google Scholar
A. Bregman, Auditory Scene Analysis: Hearing in Complex Environments (Oxford University Press, Oxford, 1993)
Google Scholar
J. Barker, X. Shao, Energetic and informational masking effects in an audiovisual speech recognition system. IEEE Trans. Audio Speech Lang. Process. 17(3), 446–458 (2009)
Article Google Scholar
M. Cooke, J. Barker, S. Cunningham, X. Shao, An audio-visual corpus for speech perception and automatic speech recognition. J. Acoust. Soc. Am. 120(5 Pt 1), 2421–2424 (2006)
Article Google Scholar
I. Almajai, B. Milner, in Proceedings of the Enhancing Audio Speech using Visual Speech Features (Interspeech, Brighton, 2009)
Google Scholar
N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series: With Engineering Applications (The MIT Press, Cambridge, 1949)
MATH Google Scholar
I. Almajai, B. Milner, Maximising audio-visual speech correlation, in Proceedings of the AVSP (2007)
Google Scholar
I. Almajai, B. Milner, J. Darch, S. Vaseghi, Visually-derived Wiener filters for speech enhancement, in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 4 (2007), pp. 585–588
Google Scholar
I. Almajai, B. Milner, in Proceedings of the Effective Visually-derived Wiener Filtering For Audio-visual Speech Processing (Interspeech, Brighton, UK, 2009)
Google Scholar
B. Milner, I. Almajai, Noisy audio speech enhancement using Wiener filters derived from visual speech, in Proceedings of the International Workshop on Auditory-Visual Speech Processing (AVSP)
Google Scholar
B. Lee, M. Hasegawa-Johnson, C. Goudeseune, S. Kamdar, S. Borys, M. Liu, T. Huang, AVICAR: audio-visual speech corpus in a car environment, in Proceedings of the Conference on Spoken Language, Jeju, Korea (Citeseer, 2004), pp. 2489–2492
Google Scholar
H. Lane, B. Tranel, The Lombard sign and the role of hearing in speech. J. Speech Hear. Res. 14(4), 677 (1971)
Article Google Scholar
T. Wakasugi, M. Nishiura, K. Fukui, Robust lip contour extraction using separability of multi-dimensional distributions, in Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (IEEE, 2004), pp. 415–420
Google Scholar
A. Liew, S. Leung, W. Lau, Lip contour extraction from color images using a deformable model. Pattern Recognit. 35(12), 2949–2962 (2002)
Article MATH Google Scholar
Q. Nguyen, M. Milgram, Semi adaptive appearance models for lip tracking, in Proceedings of the ICIP09 (2009), pp. 2437–2440
Google Scholar
M. Kass, A. Witkin, D. Terzopoulos, Snakes: active contour models. Int. J. Comput. Vis. 1, 321–331 (1988)
Article Google Scholar
A. Das, D. Ghoshal, Extraction of time invariant lips based on morphological operation and corner detection method. Int. J. Comput. Appl. 48(21), 7–11 (2012)
Google Scholar
Y. Cheung, X. Liu, X. You, A local region based approach to lip tracking. Pattern Recognit. 45, 3336–3347 (2012)
Article Google Scholar
X. Zhang, R. Mersereau, Lip feature extraction towards an automatic speechreading system, in Proceedings of the 2000 International Conference on Image Processing, vol. 3 (IEEE, 2000), pp. 226–229
Google Scholar
N. Eveno, A. Caplier, P. Coulon, New color transformation for lips segmentation, in IEEE Fourth Workshop on Multimedia Signal Processing (IEEE, 2001), pp. 3–8
Google Scholar
N. Eveno, A. Caplier, P. Coulon, Key points based segmentation of lips, in Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, ICME’02, vol. 2, (IEEE, 2002), pp. 125–128
Google Scholar
D. Freedman, M. Brandstein, Contour tracking in clutter: a subset approach. Int. J. Comput. Vis. 38(2), 173–186 (2000)
Article MATH Google Scholar
Z. Ji, Y. Su, J. Wang, R. Hua, Robust sea-sky-line detection based on horizontal projection and hough transformation, in 2nd International Congress on Image and Signal Processing, CISP’09 (IEEE, 2009), pp. 1–4
Google Scholar
C. Harris, M. Stephens, A combined corner and edge detector, in Alvey Vision Conference, vol. 15 (Manchester, 1988), p. 50
Google Scholar
J. Luettin, N. Thacker, S. Beet, Visual speech recognition using active shape models and hidden Markov models, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-96, vol. 2 (IEEE, 1996), pp. 817–820
Google Scholar
Q. Nguyen, M. Milgram, T. Nguyen, Multi features models for robust lip tracking, in 10th International Conference on Control, Automation, Robotics and Vision, 2008. ICARCV 2008, (IEEE, 2008), pp. 1333–1337
Google Scholar
T. Cootes, G. Edwards, C. Taylor, Active appearance models, inComputer Vision-ECCV’98 (1998), pp. 484–498
Google Scholar
A. Yuille, P. Hallinan, D. Cohen, Feature extraction from faces using deformable templates. Int. J Comput. Vis. 8(2), 99–111 (1992)
Article Google Scholar
G. Chiou, J. Hwang, Lipreading from color video. IEEE Trans. Image Process. 6(8), 1192–1195 (1997)
Article Google Scholar
M. Yang, D. Kriegman, N. Ahuja, Detecting faces in images: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 24(1), 34–58 (2002)
Article Google Scholar
S. Wang, A. Abdel-Dayem, Improved viola-jones face detector, in Proceedings of the 1st Taibah University International Conference on Computing and Information Technology, ICCIT’12 (2012), pp. 321–328
Google Scholar
C. Kotropoulos, I. Pitas, Rule-based face detection in frontal views, in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-97, vol. 4, (IEEE, 1997), pp. 2537–2540
Google Scholar
G. Yang, T. Huang, Human face detection in a complex background. Pattern Recognit. 27(1), 53–63 (1994)
Article Google Scholar
R. Kjeldsen, J. Kender, Finding skin in color images, in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition (IEEE, 1996), pp. 312–317
Google Scholar
K. Yow, R. Cipolla, A probabilistic framework for perceptual grouping of features for human face detection, in Proceedings of the Second International Conference on Automatic Face and Gesture Recognition, (IEEE, 1996), pp. 16–21
Google Scholar
T. Kohonen, Self-organisation and Associative Memory (Springer, Berlin, 1989)
Book Google Scholar
K. Sung, Learning and example selection for object and pattern detection (1996)
Google Scholar
T. Agui, Y. Kokubo, H. Nagahashi, T. Nagao, Extraction of face regions from monochromatic photographs using neural networks, in Proceedings of the International Conference on Robotics (1992)
Google Scholar
F. Crow, Summed-area tables for texture mapping. Comput. Graph. 18(3), 207–212 (1984)
Article Google Scholar
G. Bradski, The OpenCV Library. Dr. Dobb’s J. Softw. Tools 25(11), 120–126 (2000)
Google Scholar
C. Zhang, Z. Zhang, A survey of recent advances in face detection. Microsoft Research, June 2010
Google Scholar
R. Meir, G. Rätsch, An introduction to boosting and leveraging, Advanced Lectures on Machine Learning (Springer, New York, 2003), pp. 118–183
Chapter Google Scholar
Y. Freund, R. Schapire, A decision-theoretic generalization of on-line learning and an application to boosting, Computational Learning Theory (Springer, Berlin, 1995), pp. 23–37
Chapter Google Scholar
J. Friedman, T. Hastie, R. Tibshirani, Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Ann. Stat. 28(2), 337–407 (2000)
Article MathSciNet MATH Google Scholar
S. Brubaker, J. Wu, J. Sun, M. Mullin, J. Rehg, On the design of cascades of boosted ensembles for face detection. Int. J. Comput. Vis. 77(1), 65–86 (2008)
Article Google Scholar
S. Li, L. Zhu, Z. Zhang, A. Blake, H. Zhang, H. Shum, Statistical learning of multi-view face detection. in Computer Vision, ECCV 2002 (2006), pp. 117–121
Google Scholar
C. Bishop, P. Viola, Learning and vision: discriminative methods. ICCV Course Lear. Vis. 2(7), 11 (2003)
Google Scholar
R. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated predictions. Mach. Lear. 37(3), 297–336 (1999)
Article MATH Google Scholar
X. Huang, S. Li, Y. Wang, Jensen-Shannon boosting learning for object recognition, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, vol. 2 (IEEE, 2005), pp. 144–149
Google Scholar
E. Patterson, S. Gurbuz, Z. Tufekci, J. Gowdy, Cuave: a new audio-visual database for multimodal human-computer interface research, in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-93, vol. 2 (IEEE, 2002), p. II
Google Scholar
E. Bailly-Bailliere, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler, J. Mariéthoz, J. Matas, K. Messer, V. Popovici, F. Porée et al., The BANCA database and evaluation protocol, Audio- and Video-Based Biometric Person Authentication (Springer, 2003), p. 1057
Google Scholar
K. Messer, J. Matas, J. Kittler, J. Luettin, G. Maitre, XM2VTSDB: the extended M2VTS database, in Second International Conference on Audio and Video-based Biometric Person Authentication, vol. 964 (Citeseer, 1999), pp. 965–966
Google Scholar
C. Sanderson, K. Paliwal, Polynomial features for robust face authentication, in Proceedings of the International Conference on Image Processing, vol. 3 (IEEE, 2002), pp. 997–1000
Google Scholar
C. Sanderson, Biometric Person Recognition: Face, Speech and Fusion (VDM Verlag Dr, Muller, 2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing Science and Mathematics, University of Stirling, Stirling, European Union
Andrew Abel & Amir Hussain

Authors

Andrew Abel
View author publications
You can also search for this author in PubMed Google Scholar
Amir Hussain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Abel .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Abel, A., Hussain, A. (2015). The Research Context. In: Cognitively Inspired Audiovisual Speech Filtering. SpringerBriefs in Cognitive Computation, vol 5. Springer, Cham. https://doi.org/10.1007/978-3-319-13509-0_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-13509-0_3
Published: 08 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13508-3
Online ISBN: 978-3-319-13509-0
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics