An Adaptive Non Reference Anchor Array Framework for Distant Speech Recognition

Shukla, Arpit; Nathwani, Karan; Hegde, Rajesh M.

doi:10.1007/978-3-642-34778-8_20

Arpit Shukla²⁰,
Karan Nathwani²⁰ &
Rajesh M. Hegde²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7674))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

3451 Accesses
1 Citations

Abstract

Distant speech recognition over microphone arrays is challenging, especially in multi source environments. In this paper, a non reference anchor array (NRA) framework for distant speech recognition is proposed. The NRA framework uses a non reference anchor array to capture the interfering speech sources, in addition to the primary array that captures the speech source of interest. The framework uses a linearly constrained minimum variance beam former (LC-MV) beam former such that the signal coming from the look direction is preserved while rejecting correlated interferences coming from the same direction as the source of interest. The performance of the proposed method discussed herein is evaluated by conducting experiments on clean speech acquisition from distant microphones and also on distant speech recognition on the TIMIT and MONC databases. Experimental results obtained from the proposed method indicate a reasonable improvement over correlation, subspace and standard minimum variance beam forming methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, J., Benesty, J., Huang, Y., Doclo, S.: New insights into the noise reduction wiener filter. IEEE Transactions on Audio, Speech, and Language Processing 14(4), 1218–1234 (2006)
Google Scholar
Chen, J., Benesty, J., Huang, Y.A.: On the optimal linear filtering techniques for noise reduction. Speech Communication 49(4), 305–316 (2007)
Google Scholar
Meyer, J., Elko, G.: Spherical microphone arrays for 3d sound recording. In: Audio Signal Processing for Next-Generation Multimedia Communication Systems, pp. 67–89 (2004)
Google Scholar
Meyer, J., Elko, G.: A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP, vol. 2, p. II-1781. IEEE (2002)
Google Scholar
Capon, J.: High-resolution frequency-wavenumber spectrum analysis. Proceedings of the IEEE 57(8), 1408–1418 (1969)
Google Scholar
Zhang, W., Rao, B.D.: Robust broadband beamformer with diagonally loaded constraint matrix and its application to speech recognition. In: Proceedings of the 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, vol. 1, p. I. IEEE (2006)
Google Scholar
Li, J., Stoica, P., Wang, Z.: On robust capon beamforming and diagonal loading. IEEE Transactions on Signal Processing 51(7), 1702–1715 (2003)
Google Scholar
Van Trees, H.L.: Optimum Array Processing. Wiley-Interscience (2002)
Google Scholar
Zue, V., Seneff, S., Glass, J.: Speech database development at mit: Timit and beyond. Speech Communication 9(4), 351–356 (1990)
Google Scholar
Levi, A.: Multi Channel Overlapping Numbers Corpus distribution, Linguistic Data Consortium (2003), http://cslu.cse.ogi.edu/corpora/

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, Indian Institute of Technology Kanpur, India
Arpit Shukla, Karan Nathwani & Rajesh M. Hegde

Authors

Arpit Shukla
View author publications
You can also search for this author in PubMed Google Scholar
Karan Nathwani
View author publications
You can also search for this author in PubMed Google Scholar
Rajesh M. Hegde
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Engineering, Nanyang Technologies University, 50 Nanyang Avenue, 639798, Singapore
Weisi Lin , Dong Xu , Jianxin Wu , Ying He & Jianfei Cai , , , &
Department of Computing, University of Surrey, GU2 7XH, Guildford, UK
Anthony Ho
Department of Computer Science, School of Computing, National University of Singapore, Building AS6, Room #05-06, 117417, Singapore
Mohan Kankanhalli
Department of Electrical Engineering, University of Washington, M418 EE/CSE, Box 352500, 98195, Seattle, WA, USA
Ming-Ting Sun

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shukla, A., Nathwani, K., Hegde, R.M. (2012). An Adaptive Non Reference Anchor Array Framework for Distant Speech Recognition. In: Lin, W., et al. Advances in Multimedia Information Processing – PCM 2012. PCM 2012. Lecture Notes in Computer Science, vol 7674. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34778-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-34778-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34777-1
Online ISBN: 978-3-642-34778-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics