Gaussian filter for TDOA based sound source localization in multimedia surveillance

Zhu, Mengyao; Yao, Huan; Wu, Xiukun; Lu, Zhihua; Zhu, Xiaoqiang; Huang, Qinghua

doi:10.1007/s11042-017-5129-4

Gaussian filter for TDOA based sound source localization in multimedia surveillance

Published: 02 September 2017

Volume 77, pages 3369–3385, (2018)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mengyao Zhu ORCID: orcid.org/0000-0002-1069-4427¹,
Huan Yao¹,
Xiukun Wu¹,
Zhihua Lu²,
Xiaoqiang Zhu¹ &
…
Qinghua Huang¹

646 Accesses
13 Citations
Explore all metrics

Abstract

Although multimedia surveillance systems are becoming increasingly ubiquitous in our living environment, automated multimedia surveillance systems based on video camera lacks the robustness and reliability most of the time in several real applications. To overcome this drawback, audio sensory devices have been taken into account in a considerable amount of research. For example, Sound Source Localization (SSL) may indicate potential security risks and could point the camera in that direction. In this paper, a reliable sound source localization based on Time-Difference-Of-Arrival (TDOA) is explored. The novel aspect of our approach includes a TDOA based Gaussian filter to improve the accuracy and stability of sound source localization. The advantage of our proposed algorithm is its extensive integration with various TDOA-based methods in all kinds of microphone array. The Experimental comparison shows significant improvement over the state of the art TDOA-based algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model of Surveillance System Based on Sound Tracking

Robust Sound Localization Algorithm for Intelligent Acoustic Surveillance System

Audio Surveillance System

References

Benesty J, Chen J, Huang Y (2008) Microphone Array signal processing. Springer, Berlin
Google Scholar
Bian X, Abowd GD, Rehg JM (2005) Using sound source localization in a home environment. In: Gellersen HW, Want R, Schmidt A (eds) Pervasive Computing: Third International Conference, PERVASIVE 2005. Proceeding. Springer, Munich, p 19–36
Google Scholar
Brandstein M, Ward D (2013) Microphone arrays: signal processing techniques and applications. Springer Science and Business Media, Medford
Google Scholar
Brandstein MS, Adcock JE, Silverman HF (1997) A closed-form location estimator for use with room environment microphone arrays. IEEE Trans Speech Audio Process 5(1):45–50
Article Google Scholar
Buckley KM, Griffiths LJ (1988) Broad-band signal-subspace spatial-spectrum (BASS-ALE) estimation. IEEE Trans Acoust Speech Signal Process 36(7):953–964
Article Google Scholar
Carter GC (1977) Variance bounds for passively locating an acoustic source with a symmetric line array. J Acoust Soc Am 62(4):922–926
Article Google Scholar
Carter GC, Nuttall AH, Cable PG (1973) The smoothed coherence transform. Proc IEEE 61(10):1497–1498
Article Google Scholar
Champagne B, Bedard S, Stephenne A (1996) Performance of time-delay estimation in the presence of room reverberation. IEEE Trans Speech Audio Process 4(2):148–152
Article Google Scholar
Chang X, Nie F, Wang S, Yang Y, Zhou X, Zhang C (2016) Compound rank-k projections for bilinear analysis. IEEE Trans Neural Netw Lear Syst 27(7):1502–1513
Article MathSciNet Google Scholar
Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybernetics 47(5):1180–1197
Article Google Scholar
Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632
Article Google Scholar
Chang X, Ma Z, Lin M, Yang Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast Kinect motion detection. IEEE Trans Image Process 26(8):3911–3920
Article MathSciNet Google Scholar
Chua TW, Leman K, Gao F (2014) Hierarchical audio-visual surveillance for passenger elevators. In: Gurrin C, Hopfgartner F, Hurst W, et al (eds) MultiMedia Modeling: 20th Anniversary International Conference, MMM 2014. Proceedings, Part II. Springer, Dublin, p 44–55
Chapter Google Scholar
Crocco M, Cristani M, Trucco A, Murino V (2016) Audio surveillance: a systematic review. ACM Comput Surv 48(4):52. https://doi.org/10.1145/2871183
Article Google Scholar
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2):443–445
Article Google Scholar
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL et al (1993) TIMIT acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia
Google Scholar
Guo Y, Hazas M (2010) Acoustic source localization of everyday sounds using wireless sensor networks. International conference adjunct papers on ubiquitous computing. ACM, Copenhagen, p 411–412
Hahn W, Tretter S (1973) Optimum processing for delay-vector estimation in passive signal arrays. IEEE Trans Inf Theory 19(5):608–614
Article Google Scholar
Haykin S (2002) Adaptive filter theory. Prentice Hall 2:478–481
Google Scholar
Ianniello JP (1982) Time delay estimation via cross-correlation in the presence of large estimation errors. IEEE Trans Acoust Speech Signal Process 30(6):998–1003
Article Google Scholar
Johnson DH, Dudgeon DE (1992) Array signal processing: concepts and techniques. P T R Prentice Hall, Upper Saddle River
MATH Google Scholar
Knapp C, Carter G (1976) The generalized correlation method for estimation of time delay. IEEE Trans Acoust Speech Signal Process 24(4):320–327
Article Google Scholar
Kotus J, Lopatka K, Czyzewski A, Bogdanis G (2013) Audio-visual surveillance system for application in bank operating room. In: Dziech A, Czyżewski A (eds) Multimedia Communications, Services and Security: 6th International Conference, MCSS 2013. Proceedings. Springer, Krakow, pp 107–120
Google Scholar
Kotus J, Lopatka K, Czyzewski A (2014) Detection and localization of selected acoustic events in acoustic field for smart surveillance applications. Multimedia Tools and Applications 68(1):5–21
Article Google Scholar
Ma Z, Chang X, Yang Y, Sebe N, Hauptmann A (2017) The many shades of negativity. IEEE Trans Multimedia, PP(99), 1–1. doi:https://doi.org/10.1109/TMM.2017.2659221
Article Google Scholar
Pham QC, Lapeyronnie A, Baudry C, Lucat L (2010) Audio-video surveillance system for public transportation. International conference on image processing theory, tools and applications. IEEE, Paris, p 47–53
Schmidt RO (1972) A new approach to geometry of range difference location. IEEE Trans Aerosp Electron Syst 6:821–835
Article Google Scholar
Schmidt RO (1981) A signal subspace approach to multiple emitter location spectral estimation. Ph.d.thesis Stanford University
de Silva G C, Yamasaki T, Aizawa K (2008) Audio analysis for multimedia retrieval from a ubiquitous home. In: Satoh SI, Nack F, Etoh M (eds) Advances in Multimedia Modeling: International Multimedia Modeling Conference, MMM 2008. Proceedings. Springer, Kyoto, pp 466–476
Smith JO, Abel JS (1987) Close-form least-squares source location estimation from range-difference measurements. IEEE Trans Acoust Speech Signal Process 35(12):1661–1669
Article Google Scholar
Stachurski J, Netsch L, Cole R (2013) Sound source localization for video surveillance camera. International conference on advanced video and signal based surveillance. IEEE, Krakow, p 93–98
Svaizer P, Matassoni M, Omologo M (1997) Acoustic source location in a three-dimensional space using crosspower spectrum phase. International conference on acoustics, speech, and signal processing. IEEE, Munich, pp 231–234
Tan LN, Borgstrom BJ, Alwan A (2010) Voice activity detection using harmonic frequency components in likelihood ratio test. International conference on acoustics speech and signal processing. IEEE, Dallas, pp 4466–4469
Wang H, Kaveh M (1985) Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources. IEEE Trans Acoust Speech Signal Process 33(4):823–831
Article Google Scholar
Wax M, Kailath T (1983) Optimum localization of multiple sources by passive arrays. IEEE Trans Acoust Speech Signal Process 31(5):1210–1217
Article Google Scholar
Yan Y, Nie F, Li W, Gao C, Yang Y, Xu D (2016) Image classification by cross-media active learning with privileged information. IEEE Trans Multimedia 18(12):2494–2502
Article Google Scholar
Yang Y, Ma Z, Hauptmann AG, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans Multimedia 15(3):661–669
Article Google Scholar
Zhu L, Shen J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search. International joint conference on artificial intelligence. AAAI, New York, pp 3959–3965
Zhu L, Shen J, Xie L, Cheng Z (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybernetics PP(99):1–14. https://doi.org/10.1109/tcyb.2016.2591068
Article Google Scholar
Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486
Article Google Scholar
Zieger C, Brutti A, Svaizer P (2009) Acoustic based surveillance system for intrusion detection. International conference on advanced video and signal based surveillance. IEEE, Genova, pp 314–319

Download references

Acknowledgements

This work was supported by the key support Projects of Shanghai Science and Technology Committee (16010500100), the National Natural Science Foundation of China (61402277, 61571279), and Innovation Program of Shanghai Municipal Education Commission (15ZZ044).

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, Shanghai, 200444, China
Mengyao Zhu, Huan Yao, Xiukun Wu, Xiaoqiang Zhu & Qinghua Huang
College of Information Science and Engineering, Ningbo University, Ningbo, 315211, China
Zhihua Lu

Authors

Mengyao Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Huan Yao
View author publications
You can also search for this author in PubMed Google Scholar
Xiukun Wu
View author publications
You can also search for this author in PubMed Google Scholar
Zhihua Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qinghua Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mengyao Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, M., Yao, H., Wu, X. et al. Gaussian filter for TDOA based sound source localization in multimedia surveillance. Multimed Tools Appl 77, 3369–3385 (2018). https://doi.org/10.1007/s11042-017-5129-4

Download citation

Received: 29 April 2017
Revised: 16 August 2017
Accepted: 17 August 2017
Published: 02 September 2017
Issue Date: February 2018
DOI: https://doi.org/10.1007/s11042-017-5129-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gaussian filter for TDOA based sound source localization in multimedia surveillance

Abstract

Access this article

Similar content being viewed by others

Model of Surveillance System Based on Sound Tracking

Robust Sound Localization Algorithm for Intelligent Acoustic Surveillance System

Audio Surveillance System

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Gaussian filter for TDOA based sound source localization in multimedia surveillance

Abstract

Access this article

Similar content being viewed by others

Model of Surveillance System Based on Sound Tracking

Robust Sound Localization Algorithm for Intelligent Acoustic Surveillance System

Audio Surveillance System

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation