Skip to main content
Log in

Gaussian filter for TDOA based sound source localization in multimedia surveillance

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Although multimedia surveillance systems are becoming increasingly ubiquitous in our living environment, automated multimedia surveillance systems based on video camera lacks the robustness and reliability most of the time in several real applications. To overcome this drawback, audio sensory devices have been taken into account in a considerable amount of research. For example, Sound Source Localization (SSL) may indicate potential security risks and could point the camera in that direction. In this paper, a reliable sound source localization based on Time-Difference-Of-Arrival (TDOA) is explored. The novel aspect of our approach includes a TDOA based Gaussian filter to improve the accuracy and stability of sound source localization. The advantage of our proposed algorithm is its extensive integration with various TDOA-based methods in all kinds of microphone array. The Experimental comparison shows significant improvement over the state of the art TDOA-based algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Benesty J, Chen J, Huang Y (2008) Microphone Array signal processing. Springer, Berlin

    Google Scholar 

  2. Bian X, Abowd GD, Rehg JM (2005) Using sound source localization in a home environment. In: Gellersen HW, Want R, Schmidt A (eds) Pervasive Computing: Third International Conference, PERVASIVE 2005. Proceeding. Springer, Munich, p 19–36

    Google Scholar 

  3. Brandstein M, Ward D (2013) Microphone arrays: signal processing techniques and applications. Springer Science and Business Media, Medford

    Google Scholar 

  4. Brandstein MS, Adcock JE, Silverman HF (1997) A closed-form location estimator for use with room environment microphone arrays. IEEE Trans Speech Audio Process 5(1):45–50

    Article  Google Scholar 

  5. Buckley KM, Griffiths LJ (1988) Broad-band signal-subspace spatial-spectrum (BASS-ALE) estimation. IEEE Trans Acoust Speech Signal Process 36(7):953–964

    Article  Google Scholar 

  6. Carter GC (1977) Variance bounds for passively locating an acoustic source with a symmetric line array. J Acoust Soc Am 62(4):922–926

    Article  Google Scholar 

  7. Carter GC, Nuttall AH, Cable PG (1973) The smoothed coherence transform. Proc IEEE 61(10):1497–1498

    Article  Google Scholar 

  8. Champagne B, Bedard S, Stephenne A (1996) Performance of time-delay estimation in the presence of room reverberation. IEEE Trans Speech Audio Process 4(2):148–152

    Article  Google Scholar 

  9. Chang X, Nie F, Wang S, Yang Y, Zhou X, Zhang C (2016) Compound rank-k projections for bilinear analysis. IEEE Trans Neural Netw Lear Syst 27(7):1502–1513

    Article  MathSciNet  Google Scholar 

  10. Chang X, Ma Z, Yang Y, Zeng Z, Hauptmann AG (2017) Bi-level semantic representation analysis for multimedia event detection. IEEE Trans Cybernetics 47(5):1180–1197

    Article  Google Scholar 

  11. Chang X, Yu YL, Yang Y, Xing EP (2017) Semantic pooling for complex event analysis in untrimmed videos. IEEE Trans Pattern Anal Mach Intell 39(8):1617–1632

    Article  Google Scholar 

  12. Chang X, Ma Z, Lin M, Yang Y, Hauptmann A (2017) Feature interaction augmented sparse learning for fast Kinect motion detection. IEEE Trans Image Process 26(8):3911–3920

    Article  MathSciNet  Google Scholar 

  13. Chua TW, Leman K, Gao F (2014) Hierarchical audio-visual surveillance for passenger elevators. In: Gurrin C, Hopfgartner F, Hurst W, et al (eds) MultiMedia Modeling: 20th Anniversary International Conference, MMM 2014. Proceedings, Part II. Springer, Dublin, p 44–55

    Chapter  Google Scholar 

  14. Crocco M, Cristani M, Trucco A, Murino V (2016) Audio surveillance: a systematic review. ACM Comput Surv 48(4):52. https://doi.org/10.1145/2871183

    Article  Google Scholar 

  15. Ephraim Y, Malah D (1985) Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 33(2):443–445

    Article  Google Scholar 

  16. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL et al (1993) TIMIT acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia

    Google Scholar 

  17. Guo Y, Hazas M (2010) Acoustic source localization of everyday sounds using wireless sensor networks. International conference adjunct papers on ubiquitous computing. ACM, Copenhagen, p 411–412

  18. Hahn W, Tretter S (1973) Optimum processing for delay-vector estimation in passive signal arrays. IEEE Trans Inf Theory 19(5):608–614

    Article  Google Scholar 

  19. Haykin S (2002) Adaptive filter theory. Prentice Hall 2:478–481

    Google Scholar 

  20. Ianniello JP (1982) Time delay estimation via cross-correlation in the presence of large estimation errors. IEEE Trans Acoust Speech Signal Process 30(6):998–1003

    Article  Google Scholar 

  21. Johnson DH, Dudgeon DE (1992) Array signal processing: concepts and techniques. P T R Prentice Hall, Upper Saddle River

    MATH  Google Scholar 

  22. Knapp C, Carter G (1976) The generalized correlation method for estimation of time delay. IEEE Trans Acoust Speech Signal Process 24(4):320–327

    Article  Google Scholar 

  23. Kotus J, Lopatka K, Czyzewski A, Bogdanis G (2013) Audio-visual surveillance system for application in bank operating room. In: Dziech A, Czyżewski A (eds) Multimedia Communications, Services and Security: 6th International Conference, MCSS 2013. Proceedings. Springer, Krakow, pp 107–120

    Google Scholar 

  24. Kotus J, Lopatka K, Czyzewski A (2014) Detection and localization of selected acoustic events in acoustic field for smart surveillance applications. Multimedia Tools and Applications 68(1):5–21

    Article  Google Scholar 

  25. Ma Z, Chang X, Yang Y, Sebe N, Hauptmann A (2017) The many shades of negativity. IEEE Trans Multimedia, PP(99), 1–1. doi:https://doi.org/10.1109/TMM.2017.2659221

    Article  Google Scholar 

  26. Pham QC, Lapeyronnie A, Baudry C, Lucat L (2010) Audio-video surveillance system for public transportation. International conference on image processing theory, tools and applications. IEEE, Paris, p 47–53

  27. Schmidt RO (1972) A new approach to geometry of range difference location. IEEE Trans Aerosp Electron Syst 6:821–835

    Article  Google Scholar 

  28. Schmidt RO (1981) A signal subspace approach to multiple emitter location spectral estimation. Ph.d.thesis Stanford University

  29. de Silva G C, Yamasaki T, Aizawa K (2008) Audio analysis for multimedia retrieval from a ubiquitous home. In: Satoh SI, Nack F, Etoh M (eds) Advances in Multimedia Modeling: International Multimedia Modeling Conference, MMM 2008. Proceedings. Springer, Kyoto, pp 466–476

  30. Smith JO, Abel JS (1987) Close-form least-squares source location estimation from range-difference measurements. IEEE Trans Acoust Speech Signal Process 35(12):1661–1669

    Article  Google Scholar 

  31. Stachurski J, Netsch L, Cole R (2013) Sound source localization for video surveillance camera. International conference on advanced video and signal based surveillance. IEEE, Krakow, p 93–98

  32. Svaizer P, Matassoni M, Omologo M (1997) Acoustic source location in a three-dimensional space using crosspower spectrum phase. International conference on acoustics, speech, and signal processing. IEEE, Munich, pp 231–234

  33. Tan LN, Borgstrom BJ, Alwan A (2010) Voice activity detection using harmonic frequency components in likelihood ratio test. International conference on acoustics speech and signal processing. IEEE, Dallas, pp 4466–4469

  34. Wang H, Kaveh M (1985) Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources. IEEE Trans Acoust Speech Signal Process 33(4):823–831

    Article  Google Scholar 

  35. Wax M, Kailath T (1983) Optimum localization of multiple sources by passive arrays. IEEE Trans Acoust Speech Signal Process 31(5):1210–1217

    Article  Google Scholar 

  36. Yan Y, Nie F, Li W, Gao C, Yang Y, Xu D (2016) Image classification by cross-media active learning with privileged information. IEEE Trans Multimedia 18(12):2494–2502

    Article  Google Scholar 

  37. Yang Y, Ma Z, Hauptmann AG, Sebe N (2013) Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans Multimedia 15(3):661–669

    Article  Google Scholar 

  38. Zhu L, Shen J, Liu X, Xie L, Nie L (2016) Learning compact visual representation with canonical views for robust mobile landmark search. International joint conference on artificial intelligence. AAAI, New York, pp 3959–3965

  39. Zhu L, Shen J, Xie L, Cheng Z (2016) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Trans Cybernetics PP(99):1–14. https://doi.org/10.1109/tcyb.2016.2591068

    Article  Google Scholar 

  40. Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486

    Article  Google Scholar 

  41. Zieger C, Brutti A, Svaizer P (2009) Acoustic based surveillance system for intrusion detection.  International conference on advanced video and signal based surveillance. IEEE, Genova, pp 314–319

Download references

Acknowledgements

This work was supported by the key support Projects of Shanghai Science and Technology Committee (16010500100), the National Natural Science Foundation of China (61402277, 61571279), and Innovation Program of Shanghai Municipal Education Commission (15ZZ044).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mengyao Zhu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, M., Yao, H., Wu, X. et al. Gaussian filter for TDOA based sound source localization in multimedia surveillance. Multimed Tools Appl 77, 3369–3385 (2018). https://doi.org/10.1007/s11042-017-5129-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-017-5129-4

Keywords

Navigation