Abstract
Although multimedia surveillance systems are becoming increasingly ubiquitous in our living environment, automated multimedia surveillance systems based on video camera lacks the robustness and reliability most of the time in several real applications. To overcome this drawback, audio sensory devices have been taken into account in a considerable amount of research. For example, Sound Source Localization (SSL) may indicate potential security risks and could point the camera in that direction. In this paper, a reliable sound source localization based on Time-Difference-Of-Arrival (TDOA) is explored. The novel aspect of our approach includes a TDOA based Gaussian filter to improve the accuracy and stability of sound source localization. The advantage of our proposed algorithm is its extensive integration with various TDOA-based methods in all kinds of microphone array. The Experimental comparison shows significant improvement over the state of the art TDOA-based algorithm.
Similar content being viewed by others
References
Benesty J, Chen J, Huang Y (2008) Microphone Array signal processing. Springer, Berlin
Bian X, Abowd GD, Rehg JM (2005) Using sound source localization in a home environment. In: Gellersen HW, Want R, Schmidt A (eds) Pervasive Computing: Third International Conference, PERVASIVE 2005. Proceeding. Springer, Munich, p 19–36
Brandstein M, Ward D (2013) Microphone arrays: signal processing techniques and applications. Springer Science and Business Media, Medford
Brandstein MS, Adcock JE, Silverman HF (1997) A closed-form location estimator for use with room environment microphone arrays. IEEE Trans Speech Audio Process 5(1):45–50
Buckley KM, Griffiths LJ (1988) Broad-band signal-subspace spatial-spectrum (BASS-ALE) estimation. IEEE Trans Acoust Speech Signal Process 36(7):953–964
Carter GC (1977) Variance bounds for passively locating an acoustic source with a symmetric line array. J Acoust Soc Am 62(4):922–926
Carter GC, Nuttall AH, Cable PG (1973) The smoothed coherence transform. Proc IEEE 61(10):1497–1498
Champagne B, Bedard S, Stephenne A (1996) Performance of time-delay estimation in the presence of room reverberation. IEEE Trans Speech Audio Process 4(2):148–152
Chang X, Nie F, Wang S, Yang Y, Zhou X, Zhang C (2016) Compound rank-k projections for bilinear analysis. IEEE Trans Neural Netw Lear Syst 27(7):1502–1513
Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybernetics 47(5):1180–1197
Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632
Chang X, Ma Z, Lin M, Yang Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast Kinect motion detection. IEEE Trans Image Process 26(8):3911–3920
Chua TW, Leman K, Gao F (2014) Hierarchical audio-visual surveillance for passenger elevators. In: Gurrin C, Hopfgartner F, Hurst W, et al (eds) MultiMedia Modeling: 20th Anniversary International Conference, MMM 2014. Proceedings, Part II. Springer, Dublin, p 44–55
Crocco M, Cristani M, Trucco A, Murino V (2016) Audio surveillance: a systematic review. ACM Comput Surv 48(4):52. https://doi.org/10.1145/2871183
Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2):443–445
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL et al (1993) TIMIT acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia
Guo Y, Hazas M (2010) Acoustic source localization of everyday sounds using wireless sensor networks. International conference adjunct papers on ubiquitous computing. ACM, Copenhagen, p 411–412
Hahn W, Tretter S (1973) Optimum processing for delay-vector estimation in passive signal arrays. IEEE Trans Inf Theory 19(5):608–614
Haykin S (2002) Adaptive filter theory. Prentice Hall 2:478–481
Ianniello JP (1982) Time delay estimation via cross-correlation in the presence of large estimation errors. IEEE Trans Acoust Speech Signal Process 30(6):998–1003
Johnson DH, Dudgeon DE (1992) Array signal processing: concepts and techniques. P T R Prentice Hall, Upper Saddle River
Knapp C, Carter G (1976) The generalized correlation method for estimation of time delay. IEEE Trans Acoust Speech Signal Process 24(4):320–327
Kotus J, Lopatka K, Czyzewski A, Bogdanis G (2013) Audio-visual surveillance system for application in bank operating room. In: Dziech A, Czyżewski A (eds) Multimedia Communications, Services and Security: 6th International Conference, MCSS 2013. Proceedings. Springer, Krakow, pp 107–120
Kotus J, Lopatka K, Czyzewski A (2014) Detection and localization of selected acoustic events in acoustic field for smart surveillance applications. Multimedia Tools and Applications 68(1):5–21
Ma Z, Chang X, Yang Y, Sebe N, Hauptmann A (2017) The many shades of negativity. IEEE Trans Multimedia, PP(99), 1–1. doi:https://doi.org/10.1109/TMM.2017.2659221
Pham QC, Lapeyronnie A, Baudry C, Lucat L (2010) Audio-video surveillance system for public transportation. International conference on image processing theory, tools and applications. IEEE, Paris, p 47–53
Schmidt RO (1972) A new approach to geometry of range difference location. IEEE Trans Aerosp Electron Syst 6:821–835
Schmidt RO (1981) A signal subspace approach to multiple emitter location spectral estimation. Ph.d.thesis Stanford University
de Silva G C, Yamasaki T, Aizawa K (2008) Audio analysis for multimedia retrieval from a ubiquitous home. In: Satoh SI, Nack F, Etoh M (eds) Advances in Multimedia Modeling: International Multimedia Modeling Conference, MMM 2008. Proceedings. Springer, Kyoto, pp 466–476
Smith JO, Abel JS (1987) Close-form least-squares source location estimation from range-difference measurements. IEEE Trans Acoust Speech Signal Process 35(12):1661–1669
Stachurski J, Netsch L, Cole R (2013) Sound source localization for video surveillance camera. International conference on advanced video and signal based surveillance. IEEE, Krakow, p 93–98
Svaizer P, Matassoni M, Omologo M (1997) Acoustic source location in a three-dimensional space using crosspower spectrum phase. International conference on acoustics, speech, and signal processing. IEEE, Munich, pp 231–234
Tan LN, Borgstrom BJ, Alwan A (2010) Voice activity detection using harmonic frequency components in likelihood ratio test. International conference on acoustics speech and signal processing. IEEE, Dallas, pp 4466–4469
Wang H, Kaveh M (1985) Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources. IEEE Trans Acoust Speech Signal Process 33(4):823–831
Wax M, Kailath T (1983) Optimum localization of multiple sources by passive arrays. IEEE Trans Acoust Speech Signal Process 31(5):1210–1217
Yan Y, Nie F, Li W, Gao C, Yang Y, Xu D (2016) Image classification by cross-media active learning with privileged information. IEEE Trans Multimedia 18(12):2494–2502
Yang Y, Ma Z, Hauptmann AG, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans Multimedia 15(3):661–669
Zhu L, Shen J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search. International joint conference on artificial intelligence. AAAI, New York, pp 3959–3965
Zhu L, Shen J, Xie L, Cheng Z (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybernetics PP(99):1–14. https://doi.org/10.1109/tcyb.2016.2591068
Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486
Zieger C, Brutti A, Svaizer P (2009) Acoustic based surveillance system for intrusion detection. International conference on advanced video and signal based surveillance. IEEE, Genova, pp 314–319
Acknowledgements
This work was supported by the key support Projects of Shanghai Science and Technology Committee (16010500100), the National Natural Science Foundation of China (61402277, 61571279), and Innovation Program of Shanghai Municipal Education Commission (15ZZ044).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, M., Yao, H., Wu, X. et al. Gaussian filter for TDOA based sound source localization in multimedia surveillance. Multimed Tools Appl 77, 3369–3385 (2018). https://doi.org/10.1007/s11042-017-5129-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5129-4