Audio splicing detection and localization using environmental signature

Zhao, Hong; Chen, Yifan; Wang, Rui; Malik, Hafiz

doi:10.1007/s11042-016-3758-7

Audio splicing detection and localization using environmental signature

Published: 26 July 2016

Volume 76, pages 13897–13927, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Hong Zhao¹,
Yifan Chen¹,
Rui Wang¹ &
…
Hafiz Malik²

921 Accesses
21 Citations
1 Altmetric
Explore all metrics

Abstract

Audio splicing is one of the most common manipulation techniques in the area of audio forensics. In this paper, the magnitudes of acoustic channel impulse response and ambient noise are proposed as the environmental signature. Specifically, the spliced audio segments are detected according to the magnitude correlation between the query frames and reference frames via a statically optimal threshold. The detection accuracy is further refined by comparing the adjacent frames. The effectiveness of the proposed method is tested on two data sets. One is generated from TIMIT database, the second is made in four acoustic environments using a commercial grade microphones. Experimental results show that the proposed method not only detects the presence of spliced frames, but also localizes the forgery segments with near perfect accuracy. Comparison results illustrate that the identification accuracy of the proposed scheme is higher than the previous schemes. In addition, experimental results also show that the proposed scheme is also superior to the previous works. A real-world meeting recording database (AMI corpus) is also used to verify the effectiveness of the proposed method for practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio splicing detection and localization using multistage filterbank spectral sketches and decision fusion

Article 25 March 2024

Audio Forgery Detection Based on Max Offsets for Cross Correlation between ENF and Reference Signal

A Deep Learning Approach for Splicing Detection in Digital Audios

Notes

¹ In [3], the average false alarm rate is near 30 % for the sampling rate of 16kHz.
² In our implementation, the recommended parameters in [3] do not work on our data sets. The optimal parameters used in this paper are manually selected by trials and errors.

References

Borgstrom BJ, McCree A (2012) The linear prediction inverse modulation transfer function (LP-IMTF) filter for spectral enhancement, with applications to speaker recognition. In: IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 4065–4068
Brixen E (2009) Acoustics of the crime scene as transmitted by mobile phones. In; Proceedings of audio engineering society 126th Convention. Munich
Chen J, Xiang S, Liu W, Huang H (2013) Exposing digital audio forgeries in time domain by using singularity analysis with wavelets. In: Proceedings of the first ACM workshop on information hiding and multimedia security, pp 149–158
Cooper AJ (2010) Detecting butt-spliced edits in forensic digital audio recordings. In: Proceedings of audio engineering society 39th conf., audio forensics: practices and challenges
Dominguez-Molina JA, González-Farías G, Rodríguez-Dagnino RM, Monterrey IC (2001) A practical procedure to estimate the shape parameter in the generalized gaussian distribution. technique report I-01-18_eng.pdf, available through http://www.cimat.mx/reportes/enlinea/I-01-18_eng.pdf 1
Garg R, Hajj-Ahmad A, Wu M (2013) Geo-location estimation from electrical network frequency signals. In: IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2862–2866
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia
Google Scholar
Gaubitch ND, Brooks M, Naylor PA (2013) Blind channel magnitude response estimation in speech using spectrum classification. IEEE Transations Acoust Speech Signal Process 21(10):2162–2171
Google Scholar
Grigoras C (2010) Statistical tools for multimedia forensics. In: Proceedings of audio engineering society 39th conf., audio forensics: practices and challenges, pp 27–32
Hajj-Ahmad A, Garg R, Wu M (2013) Spectrum combining for ENF signal estimation. IEEE Signal Process Lett 20(9):885–888
Article Google Scholar
Ikram S, Malik H (2012) Microphone identification using higher-order statistics. In: Audio engineering society conference: 46th international conference: audio forensics
Koenig B, Lacey D, Killion S (2007) Forensic enhancement of digital audio recordings. J Audio Eng Soc 55(5):352–371
Google Scholar
Korycki R (2013) Time and spectral analysis methods with machine learning for the authentication of digital audio recordings. Forens Sci Int 230(1C3):117–126
Article Google Scholar
Kotz S, Nadarajah S (2000) Extreme value distributions. World Scientific
Lehmann E, Johansson A (2008) Prediction of energy decay in room impulse responses simulated with an image-source model. J Acous Soc Amer 1(121):269–277
Article Google Scholar
Lehmann EA, Johansson AM (2010) Diffuse reverberation model for efficient image-source simulation of room impulse responses. IEEE Trans Audio Speech Lang Process 18(6):1429–1439
Article Google Scholar
Liu Q, Sung A, Qiao M (2010) Detection of double mp3 compression. Cognit Comput Special Issue: Adv Comput Intell Appl 2(4):291–296
Article Google Scholar
Malik H (2012) Securing speaker verification systen against replay attack. In: Proceedings of AES 46th conference on audio forensics
Malik H (2013) Acoustic environment identification and its application to audio forensics. IEEE Trans Inform Forens Secur 8(11):1827–1837
Article Google Scholar
Malik H, Farid H (2010) Audio forensics from acoustic reverberation. In: Proceedings of the IEEE int. conference on acoustics, speech, and signal processing (ICASSP’10). Dallas, pp 1710–1713
Malik H, Zhao H (2012) Recording environment identification using acoustic reverberation. In: Proceedings of the IEEE int. conference on acoustics, speech, and signal processing (ICASSP’12). Kyoto, pp 1833–1836
Pan X, Zhang X, Lyu S (2012) Detecting splicing in digital audios using local noise level estimation. In: Proceedings of IEEE int. conf. on acoustics, speech, and signal processing (ICASSP’12). Kyoto, pp 1841–1844
Panagakis Y, Kotropoulos C (2012) Telephone handset identification by feature selection and sparse representations. In: 2012 IEEE International workshop on information forensics and security (WIFS), pp 73–78
Qiao M, Sung AH, Liu Q (2013) Improved detection of mp3 double compression using content-independent features. In: IEEE International conference on signal processing, communication and computing (ICSPCC), pp 1–4
Sehr A, Maas R, Kellermann W (2010) Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition. IEEE Trans Audio Speech Lang Process 18(7):1676– 1691
Article Google Scholar
Simon D, Marc M (2001) Combined frequency-domain dereverberation and noise reduction technique for multi-microphone speech enhancement. In: Proceedings of the 7th IEEE/EURASIP international workshop on acoustic echo and noise control (IWAENC 01), pp 31–34
Su H, Garg R, Hajj-Ahmad A, Wu M (2013) ENF analysis on recaptured audio recordings. In; IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 3018–3022
Wolfel M (2009) Enhanced speech features by single-channel joint compensation of noise and reverberation. IEEE Trans Audio Speech Lang Proc 17(2):312–323
Article Google Scholar
Zhao H, Malik H (2012) Audio forensics using acoustic environment traces. In: Proceedings of the IEEE statistical signal processing workshop (SSP’12). Ann Arbor, pp 373–376
Zhao H, Malik H (2013) Audio recording location identification using acoustic environment signature. IEEE Trans Inform Forens Secur 8(11):1746–1759
Article Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (Grant No. 61402219), 2013 Guangdong Natural Science Funds for Distinguished Young Scholar (S2013050014223), the NPST program by the King Saud University under grant number 12-INF2634-02 and a grant from the National Science Foundation (CNS-1440929).

Author information

Authors and Affiliations

Department of Electrical and Electronic Engineering, South University of Science and Technology of China, Shenzhen, Guangdong, China, 518055
Hong Zhao, Yifan Chen & Rui Wang
Department of Electrical and Computer Engineering, University of Michigan – Dearborn, Dearborn, MI, 48128, USA
Hafiz Malik

Authors

Hong Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hafiz Malik
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, H., Chen, Y., Wang, R. et al. Audio splicing detection and localization using environmental signature. Multimed Tools Appl 76, 13897–13927 (2017). https://doi.org/10.1007/s11042-016-3758-7

Download citation

Received: 02 December 2015
Revised: 02 June 2016
Accepted: 30 June 2016
Published: 26 July 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s11042-016-3758-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Audio splicing detection and localization using environmental signature

Abstract

Access this article

Similar content being viewed by others

Audio splicing detection and localization using multistage filterbank spectral sketches and decision fusion

Audio Forgery Detection Based on Max Offsets for Cross Correlation between ENF and Reference Signal

A Deep Learning Approach for Splicing Detection in Digital Audios

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Audio splicing detection and localization using environmental signature

Abstract

Access this article

Similar content being viewed by others

Audio splicing detection and localization using multistage filterbank spectral sketches and decision fusion

Audio Forgery Detection Based on Max Offsets for Cross Correlation between ENF and Reference Signal

A Deep Learning Approach for Splicing Detection in Digital Audios

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation