Abstract
Audio splicing is one of the most common manipulation techniques in the area of audio forensics. In this paper, the magnitudes of acoustic channel impulse response and ambient noise are proposed as the environmental signature. Specifically, the spliced audio segments are detected according to the magnitude correlation between the query frames and reference frames via a statically optimal threshold. The detection accuracy is further refined by comparing the adjacent frames. The effectiveness of the proposed method is tested on two data sets. One is generated from TIMIT database, the second is made in four acoustic environments using a commercial grade microphones. Experimental results show that the proposed method not only detects the presence of spliced frames, but also localizes the forgery segments with near perfect accuracy. Comparison results illustrate that the identification accuracy of the proposed scheme is higher than the previous schemes. In addition, experimental results also show that the proposed scheme is also superior to the previous works. A real-world meeting recording database (AMI corpus) is also used to verify the effectiveness of the proposed method for practical applications.
Similar content being viewed by others
References
Borgstrom BJ, McCree A (2012) The linear prediction inverse modulation transfer function (LP-IMTF) filter for spectral enhancement, with applications to speaker recognition. In: IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 4065–4068
Brixen E (2009) Acoustics of the crime scene as transmitted by mobile phones. In; Proceedings of audio engineering society 126th Convention. Munich
Chen J, Xiang S, Liu W, Huang H (2013) Exposing digital audio forgeries in time domain by using singularity analysis with wavelets. In: Proceedings of the first ACM workshop on information hiding and multimedia security, pp 149–158
Cooper AJ (2010) Detecting butt-spliced edits in forensic digital audio recordings. In: Proceedings of audio engineering society 39th conf., audio forensics: practices and challenges
Dominguez-Molina JA, González-Farías G, Rodríguez-Dagnino RM, Monterrey IC (2001) A practical procedure to estimate the shape parameter in the generalized gaussian distribution. technique report I-01-18_eng.pdf, available through http://www.cimat.mx/reportes/enlinea/I-01-18_eng.pdf 1
Garg R, Hajj-Ahmad A, Wu M (2013) Geo-location estimation from electrical network frequency signals. In: IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 2862–2866
Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS, Dahlgren NL, Zue V (1993) TIMIT acoustic-phonetic continuous speech corpus. Linguistic Data Consortium, Philadelphia
Gaubitch ND, Brooks M, Naylor PA (2013) Blind channel magnitude response estimation in speech using spectrum classification. IEEE Transations Acoust Speech Signal Process 21(10):2162–2171
Grigoras C (2010) Statistical tools for multimedia forensics. In: Proceedings of audio engineering society 39th conf., audio forensics: practices and challenges, pp 27–32
Hajj-Ahmad A, Garg R, Wu M (2013) Spectrum combining for ENF signal estimation. IEEE Signal Process Lett 20(9):885–888
Ikram S, Malik H (2012) Microphone identification using higher-order statistics. In: Audio engineering society conference: 46th international conference: audio forensics
Koenig B, Lacey D, Killion S (2007) Forensic enhancement of digital audio recordings. J Audio Eng Soc 55(5):352–371
Korycki R (2013) Time and spectral analysis methods with machine learning for the authentication of digital audio recordings. Forens Sci Int 230(1C3):117–126
Kotz S, Nadarajah S (2000) Extreme value distributions. World Scientific
Lehmann E, Johansson A (2008) Prediction of energy decay in room impulse responses simulated with an image-source model. J Acous Soc Amer 1(121):269–277
Lehmann EA, Johansson AM (2010) Diffuse reverberation model for efficient image-source simulation of room impulse responses. IEEE Trans Audio Speech Lang Process 18(6):1429–1439
Liu Q, Sung A, Qiao M (2010) Detection of double mp3 compression. Cognit Comput Special Issue: Adv Comput Intell Appl 2(4):291–296
Malik H (2012) Securing speaker verification systen against replay attack. In: Proceedings of AES 46th conference on audio forensics
Malik H (2013) Acoustic environment identification and its application to audio forensics. IEEE Trans Inform Forens Secur 8(11):1827–1837
Malik H, Farid H (2010) Audio forensics from acoustic reverberation. In: Proceedings of the IEEE int. conference on acoustics, speech, and signal processing (ICASSP’10). Dallas, pp 1710–1713
Malik H, Zhao H (2012) Recording environment identification using acoustic reverberation. In: Proceedings of the IEEE int. conference on acoustics, speech, and signal processing (ICASSP’12). Kyoto, pp 1833–1836
Pan X, Zhang X, Lyu S (2012) Detecting splicing in digital audios using local noise level estimation. In: Proceedings of IEEE int. conf. on acoustics, speech, and signal processing (ICASSP’12). Kyoto, pp 1841–1844
Panagakis Y, Kotropoulos C (2012) Telephone handset identification by feature selection and sparse representations. In: 2012 IEEE International workshop on information forensics and security (WIFS), pp 73–78
Qiao M, Sung AH, Liu Q (2013) Improved detection of mp3 double compression using content-independent features. In: IEEE International conference on signal processing, communication and computing (ICSPCC), pp 1–4
Sehr A, Maas R, Kellermann W (2010) Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition. IEEE Trans Audio Speech Lang Process 18(7):1676– 1691
Simon D, Marc M (2001) Combined frequency-domain dereverberation and noise reduction technique for multi-microphone speech enhancement. In: Proceedings of the 7th IEEE/EURASIP international workshop on acoustic echo and noise control (IWAENC 01), pp 31–34
Su H, Garg R, Hajj-Ahmad A, Wu M (2013) ENF analysis on recaptured audio recordings. In; IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 3018–3022
Wolfel M (2009) Enhanced speech features by single-channel joint compensation of noise and reverberation. IEEE Trans Audio Speech Lang Proc 17(2):312–323
Zhao H, Malik H (2012) Audio forensics using acoustic environment traces. In: Proceedings of the IEEE statistical signal processing workshop (SSP’12). Ann Arbor, pp 373–376
Zhao H, Malik H (2013) Audio recording location identification using acoustic environment signature. IEEE Trans Inform Forens Secur 8(11):1746–1759
Acknowledgments
This work is supported by the National Natural Science Foundation of China (Grant No. 61402219), 2013 Guangdong Natural Science Funds for Distinguished Young Scholar (S2013050014223), the NPST program by the King Saud University under grant number 12-INF2634-02 and a grant from the National Science Foundation (CNS-1440929).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, H., Chen, Y., Wang, R. et al. Audio splicing detection and localization using environmental signature. Multimed Tools Appl 76, 13897–13927 (2017). https://doi.org/10.1007/s11042-016-3758-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3758-7