Skip to main content
Log in

Audio splicing detection and localization using environmental signature

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Audio splicing is one of the most common manipulation techniques in the area of audio forensics. In this paper, the magnitudes of acoustic channel impulse response and ambient noise are proposed as the environmental signature. Specifically, the spliced audio segments are detected according to the magnitude correlation between the query frames and reference frames via a statically optimal threshold. The detection accuracy is further refined by comparing the adjacent frames. The effectiveness of the proposed method is tested on two data sets. One is generated from TIMIT database, the second is made in four acoustic environments using a commercial grade microphones. Experimental results show that the proposed method not only detects the presence of spliced frames, but also localizes the forgery segments with near perfect accuracy. Comparison results illustrate that the identification accuracy of the proposed scheme is higher than the previous schemes. In addition, experimental results also show that the proposed scheme is also superior to the previous works. A real-world meeting recording database (AMI corpus) is also used to verify the effectiveness of the proposed method for practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. 1 In [3], the average false alarm rate is near 30 % for the sampling rate of 16kHz.

  2. 2 In our implementation, the recommended parameters in [3] do not work on our data sets. The optimal parameters used in this paper are manually selected by trials and errors.

References

  1. Borgstrom BJ, McCree A (2012) The linear prediction inverse modulation transfer function (LP-IMTF) filter for spectral enhancement, with applications to speaker recognition. In: IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 4065–4068

  2. Brixen E (2009) Acoustics of the crime scene as transmitted by mobile phones. In; Proceedings of audio engineering society 126th Convention. Munich

  3. Chen J, Xiang S, Liu W, Huang H (2013) Exposing digital audio forgeries in time domain by using singularity analysis with wavelets. In: Proceedings of the first ACM workshop on information hiding and multimedia security, pp 149–158

  4. Cooper AJ (2010) Detecting butt-spliced edits in forensic digital audio recordings. In: Proceedings of audio engineering society 39th conf., audio forensics: practices and challenges

  5. Dominguez-Molina JA, González-Farías G, Rodríguez-Dagnino RM, Monterrey IC (2001) A practical procedure to estimate the shape parameter in the generalized gaussian distribution. technique report I-01-18_eng.pdf, available through http://www.cimat.mx/reportes/enlinea/I-01-18_eng.pdf 1

  6. Garg R, Hajj-Ahmad A, Wu M (2013) Geo-location estimation from electrical network frequency signals. In: IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2862–2866

  7. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia

    Google Scholar 

  8. Gaubitch ND, Brooks M, Naylor PA (2013) Blind channel magnitude response estimation in speech using spectrum classification. IEEE Transations Acoust Speech Signal Process 21(10):2162–2171

    Google Scholar 

  9. Grigoras C (2010) Statistical tools for multimedia forensics. In: Proceedings of audio engineering society 39th conf., audio forensics: practices and challenges, pp 27–32

  10. Hajj-Ahmad A, Garg R, Wu M (2013) Spectrum combining for ENF signal estimation. IEEE Signal Process Lett 20(9):885–888

    Article  Google Scholar 

  11. Ikram S, Malik H (2012) Microphone identification using higher-order statistics. In: Audio engineering society conference: 46th international conference: audio forensics

  12. Koenig B, Lacey D, Killion S (2007) Forensic enhancement of digital audio recordings. J Audio Eng Soc 55(5):352–371

    Google Scholar 

  13. Korycki R (2013) Time and spectral analysis methods with machine learning for the authentication of digital audio recordings. Forens Sci Int 230(1C3):117–126

    Article  Google Scholar 

  14. Kotz S, Nadarajah S (2000) Extreme value distributions. World Scientific

  15. Lehmann E, Johansson A (2008) Prediction of energy decay in room impulse responses simulated with an image-source model. J Acous Soc Amer 1(121):269–277

    Article  Google Scholar 

  16. Lehmann EA, Johansson AM (2010) Diffuse reverberation model for efficient image-source simulation of room impulse responses. IEEE Trans Audio Speech Lang Process 18(6):1429–1439

    Article  Google Scholar 

  17. Liu Q, Sung A, Qiao M (2010) Detection of double mp3 compression. Cognit Comput Special Issue: Adv Comput Intell Appl 2(4):291–296

    Article  Google Scholar 

  18. Malik H (2012) Securing speaker verification systen against replay attack. In: Proceedings of AES 46th conference on audio forensics

  19. Malik H (2013) Acoustic environment identification and its application to audio forensics. IEEE Trans Inform Forens Secur 8(11):1827–1837

    Article  Google Scholar 

  20. Malik H, Farid H (2010) Audio forensics from acoustic reverberation. In: Proceedings of the IEEE int. conference on acoustics, speech, and signal processing (ICASSP’10). Dallas, pp 1710–1713

  21. Malik H, Zhao H (2012) Recording environment identification using acoustic reverberation. In: Proceedings of the IEEE int. conference on acoustics, speech, and signal processing (ICASSP’12). Kyoto, pp 1833–1836

  22. Pan X, Zhang X, Lyu S (2012) Detecting splicing in digital audios using local noise level estimation. In: Proceedings of IEEE int. conf. on acoustics, speech, and signal processing (ICASSP’12). Kyoto, pp 1841–1844

  23. Panagakis Y, Kotropoulos C (2012) Telephone handset identification by feature selection and sparse representations. In: 2012 IEEE International workshop on information forensics and security (WIFS), pp 73–78

  24. Qiao M, Sung AH, Liu Q (2013) Improved detection of mp3 double compression using content-independent features. In: IEEE International conference on signal processing, communication and computing (ICSPCC), pp 1–4

  25. Sehr A, Maas R, Kellermann W (2010) Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition. IEEE Trans Audio Speech Lang Process 18(7):1676– 1691

    Article  Google Scholar 

  26. Simon D, Marc M (2001) Combined frequency-domain dereverberation and noise reduction technique for multi-microphone speech enhancement. In: Proceedings of the 7th IEEE/EURASIP international workshop on acoustic echo and noise control (IWAENC 01), pp 31–34

  27. Su H, Garg R, Hajj-Ahmad A, Wu M (2013) ENF analysis on recaptured audio recordings. In; IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 3018–3022

  28. Wolfel M (2009) Enhanced speech features by single-channel joint compensation of noise and reverberation. IEEE Trans Audio Speech Lang Proc 17(2):312–323

    Article  Google Scholar 

  29. Zhao H, Malik H (2012) Audio forensics using acoustic environment traces. In: Proceedings of the IEEE statistical signal processing workshop (SSP’12). Ann Arbor, pp 373–376

  30. Zhao H, Malik H (2013) Audio recording location identification using acoustic environment signature. IEEE Trans Inform Forens Secur 8(11):1746–1759

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant No. 61402219), 2013 Guangdong Natural Science Funds for Distinguished Young Scholar (S2013050014223), the NPST program by the King Saud University under grant number 12-INF2634-02 and a grant from the National Science Foundation (CNS-1440929).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Zhao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, H., Chen, Y., Wang, R. et al. Audio splicing detection and localization using environmental signature. Multimed Tools Appl 76, 13897–13927 (2017). https://doi.org/10.1007/s11042-016-3758-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3758-7

Keywords

Navigation