Skip to main content
Log in

 WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake Detection

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Breathtaking advances in face forgery techniques produce visually untraceable deepfake videos, thus potential malicious abuse of these techniques has sparked great concerns. Existing deepfake detectors primarily focus on specific forgery patterns with global features extracted by CNN backbones for forgery detection. Due to inadequate exploration of content and texture features, they often suffer from overfitting method-specific forged regions, thus exhibiting limited generalization to increasingly realistic forgeries. In this paper, we propose a Wavelet-guided Texture-Content HiErarchical Relation (WATCHER) Learning framework to delve deeper into the relation-aware texture-content features. Specifically, we propose a Wavelet-guided AutoEncoder scheme to retrieve the general visual representation, which is aware of high-frequency details for understanding forgeries. To further excavate fine-grained counterfeit clues, a Texture-Content Attention Maps Learning module is presented to enrich the contextual information of content and texture features via multi-level attention maps in a hierarchical learning protocol. Finally, we propose a Progressive Multi-domain Feature Interaction module in pursuit to perform semantic reasoning on relationship-enhanced texture-content forgery features. Extensive experiments on popular benchmark datasets substantiate the superiority of our WATCHER model, consistently trumping state-of-the-art methods by a significant margin.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Data Availibility Statement

The datasets underlying the results of this research are accessible upon reasonable request from the corresponding author or the first author.

References

  • Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). Mesonet: A compact facial video forgery detection network. In 2018 IEEE international workshop on information forensics and security (pp. 1–7).

  • Aujol, J.-F., & Chambolle, A. (2005). Dual norms and image decomposition models. International Journal of Computer Vision, 63, 85–104.

    Article  MATH  Google Scholar 

  • Cao, J., Ma, C., Yao, T., Chen, S., Ding, S., & Yang, X. (2022). End-to-end reconstruction-classification learning for face forgery detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4113–4122).

  • Chen, S., Yao, T., Chen, Y., Ding, S., Li, J., & Ji, R. (2021). Local relation learning for face forgery detection. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 1081–1088).

  • Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251–1258).

  • Dang, H., Liu, F., Stehouwer, J., Liu, X., & Jain, A.K. (2020). On the detection of digital face manipulation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5781–5790).

  • Dolhansky, B., Howes, R., Pflaum, B., Baram, N., & Ferrer, C.C. (2019). The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854

  • Dufour, N., & Gully, A. (2019). Contributing data to deepfake detection research. Google AI Blog 1(3).

  • Gao, P., Jiang, Z., You, H., Lu, P., Hoi, S.C.H., Wang, X., & Li, H. (2019). Dynamic fusion with intra- and inter-modality attention flow for visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6639–6648).

  • Gao, Y., Wei, F., Bao, J., Gu, S., Chen, D., Wen, F., & Lian, Z. (2021). High-fidelity and arbitrary face editing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 16115–16124).

  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems.

  • Gu, Q., Chen, S., Yao, T., Chen, Y., Ding, S., & Yi, R. (2022). Exploiting fine-grained face forgery clues via progressive enhancement learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, pp. 735–743).

  • Haliassos, A., Vougioukas, K., Petridis, S., & Pantic, M. (2021). Lips don’t lie: A generalisable and robust approach to face forgery detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5039–5049).

  • He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 16000–16009).

  • He, Y., Gan, B., Chen, S., Zhou, Y., Yin, G., Song, L., Sheng, L., Shao, J., & Liu, Z. (2021). Forgerynet: A versatile benchmark for comprehensive forgery analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4360–4369).

  • Hong, F.-T., Zhang, L., Shen, L., & Xu, D. (2022). Depth-aware generative adversarial network for talking head video generation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3397–3406).

  • Hu, J. (2023). Deepfakemae: Facial part consistency aware masked autoencoder for deepfake video detection. arXiv preprint arXiv:2303.01740

  • Huang, B., & Wang, Z. (2023). Implicit identity driven deepfake face swapping detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4490–4499).

  • Huang, H., He, R., Sun, Z., & Tan, T. (2017). Wavelet-SRNET: A wavelet-based CNN for multi-scale face super resolution. In Proceedings of the IEEE international conference on computer vision (pp. 1689–1697).

  • Jiang, L., Li, R., Wu, W., Qian, C., & Loy, C.C. (2020). Deeperforensics-1.0: A large-scale dataset for real-world face forgery detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2889–2898).

  • Khosla, P., Teterwak, P., Wang, C., Sarna, A., Tian, Y., Isola, P., Maschinot, A., Liu, C., & Krishnan, D. (2020). Supervised contrastive learning. Advances in Neural Information Processing Systems, 33, 18661–18673.

    Google Scholar 

  • Kim, H., Choi, Y., Kim, J., Yoo, S., & Uh, Y. (2021). Exploiting spatial dimensions of latent in GAN for real-time image editing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 852–861).

  • Kipf, T.N., & Welling, M. (2017). Semi-supervised classification with graph convolutional networks. In 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, conference track proceedings.

  • Kowalski, M. (2018). FaceSwap. https://github.com/marekkowalski/faceswap Accessed 01 August 2020.

  • Li, J., & Xie, H. (2022). Wavelet-enhanced weakly supervised local feature learning for face forgery detection. In Proceedings of the 30th ACM international conference on multimedia (pp. 1299–1308).

  • Li, D., & Yang, Y. (2018). Learning to generalize: Meta-learning for domain generalization. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32).

  • Li, L., Bao, J., Yang, H., Chen, D., & Wen, F. (2019). Faceshifter: Towards high fidelity and occlusion aware face swapping. arXiv preprint arXiv:1912.13457

  • Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., & Guo, B. (2020). Face x-ray for more general face forgery detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5001–5010).

  • Li, Y., Chang, M.-C., & Lyu, S. (2018). In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In 2018 IEEE international workshop on information forensics and security (pp. 1–7).

  • Li, J., Xie, H., Li, J., Wang, Z., & Zhang, Y. (2021). Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6458–6467).

  • Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3207–3216).

  • Liang, J., Shi, H., & Deng, W. (2022). Exploring disentangled content information for face forgery detection. arXiv preprint arXiv:2207.09202

  • Lin, M. (2015). Bilinear CNN models for fine-grained visual recognition. In Proceedings of the IEEE international conference on computer vision (pp. 1449–1457).

  • Liu, H., Li, X., Zhou, W., Chen, Y., He, Y., Xue, H., Zhang, W., & Yu, N. (2021). Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 772–781).

  • Liu, Z., Qi, X., & Torr, P. H. (2020). Global texture enhancement for fake face detection in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8060–8069).

  • Liu, L., Zhao, L., Long, Y., Kuang, G., & Fieguth, P. (2012). Extended local binary patterns for texture classification. Image and Vision Computing, 30(2), 86–99.

    Article  MATH  Google Scholar 

  • Luo, Y., Zhang, Y., Yan, J., & Liu, W. (2021). Generalizing face forgery detection with high-frequency features. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 16317–16326).

  • Miao, C., & Tan, Z. (2022). Hierarchical frequency-assisted interactive networks for face manipulation detection. IEEE Transactions on Information Forensics and Security, 17, 3008–3021.

    Article  MATH  Google Scholar 

  • Miao, C., & Tan, Z. (2023). F 2 trans: High-frequency fine-grained transformer for face forgery detection. IEEE Transactions on Information Forensics and Security, 18, 1039–1051.

    Article  MATH  Google Scholar 

  • Nguyen, H. H., Yamagishi, J., and Echizen, I. (2019). Capsule-forensics: Using capsule networks to detect forged images and videos. In 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2307–2311).

  • Ojala, T., Pietikainen, M., & Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7), 971–987.

    Article  MATH  Google Scholar 

  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch.

  • Perov, I., Gao, D., Chervoniy, N., Liu, K., Marangonda, S., Umé, C., Dpfks, M., Facenheim, C. S., et al. (2020). Deepfacelab: A simple, flexible and extensible face swapping framework.

  • Qian, Y., Yin, G., Sheng, L., Chen, Z., & Shao, J. (2020). Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proceedings of the IEEE conference on European conference on computer vision (pp. 86–103). Springer.

  • Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE international conference on computer vision (pp. 1–11).

  • Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2016). 300 faces in-the-wild challenge: Database and results. Image and Vision Computing, 47, 3–18.

    Article  MATH  Google Scholar 

  • Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., & Batra, D. (2016). Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. arXiv e-prints.

  • Shen, Y., Gu, J., Tang, X., & Zhou, B. (2020). Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 9240–9249).

  • Song, L., Fang, Z., Li, X., Dong, X., Jin, Z., Chen, Y., & Lyu, S. (2022). Adaptive face forgery detection in cross domain. In Proceedings of the European conference on computer vision (pp. 467–484). Springer.

  • Song, H., Huang, S., Dong, Y., & Tu, W.-W. (2023). Robustness and generalizability of deepfake detection: A study with diffusion models. arXiv preprint arXiv:2309.02218

  • Song, L., Li, X., Fang, Z., Jin, Z., Chen, Y., & Xu, C. (2022). Face forgery detection via symmetric transformer. In Proceedings of the 30th ACM international conference on multimedia (pp. 4102–4111).

  • Song, J., Cho, H., Yoon, J., & Yoon, S. M. (2017). Structure adaptive total variation minimization-based image decomposition. IEEE Transactions on Circuits and Systems for Video Technology, 28(9), 2164–2176.

    Article  MATH  Google Scholar 

  • Starck, J.-L., Elad, M., & Donoho, D. L. (2005). Image decomposition via the combination of sparse representations and a variational approach. IEEE Transactions on Image Processing, 14(10), 1570–1582.

    Article  MathSciNet  MATH  Google Scholar 

  • Sun, K., & Liu, H. (2021). Domain general face forgery detection by learning to weight. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 2638–2646).

  • Sun, K., & Yao, T. (2022). Dual contrastive learning for general face forgery detection. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, pp. 2316–2324).

  • Suwajanakorn, S., & Seitz, S. M. (2017). Synthesizing Obama: Learning lip sync from audio. ACM Transactions on Graphics (TOG), 36(4), 1–13.

    Article  Google Scholar 

  • Tan, M., & Le, Q. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105–6114). PMLR.

  • Tan, Z., Yang, Y., Wan, J., Guo, G., & Li, S.Z. (2020). Relation-aware pedestrian attribute recognition with graph convolutional networks. In Proceedings of the AAAI conference on artificial intelligence (vol. 34, pp. 12055–12062).

  • Tan, Z., Yang, Y., Wan, J., Hang, H., Guo, G., & Li, S. Z. (2019). Attention-based pedestrian attribute analysis. IEEE Transactions on Image Processing, 28(12), 6126–6140.

    Article  MathSciNet  MATH  Google Scholar 

  • Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2016). Face2face: Real-time face capture and reenactment of RGB videos. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2387–2395).

  • ThiesJ,. (2019). Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG), 38(4), 1–12.

  • Tolstikhin, I. O., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., Yung, J., Steiner, A., Keysers, D., Uszkoreit, J., et al. (2021). MLP-mixer: An all-MLP architecture for vision. Advances in Neural Information Processing Systems, 34, 24261–24272.

    Google Scholar 

  • Tora, M. (2018). Deepfakes. https://github.com/deepfakes/faceswap/tree/v2.0.0 Accessed 03 March 2021.

  • Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., & Bengio, Y. (2017). Graph attention networks. arXiv preprint arXiv:1710.10903

  • Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on machine learning (pp. 1096–1103).

  • Wang, C., & Deng, W. (2021). Representative forgery mining for fake face detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 14923–14932).

  • Wang, Y., & Yu, K. (2023). Dynamic graph learning with content-guided spatial-frequency relation reasoning for deepfake detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7278–7287).

  • Wang, J., Wu, Z., Ouyang, W., Han, X., Chen, J., Jiang, Y.-G., & Li, S.-N. (2022). M2tr: Multi-modal multi-scale transformers for deepfake detection. In Proceedings of the 2022 international conference on multimedia retrieval (pp. 615–623).

  • Wang, X., Yu, K., Wu, S., Gu, J., Liu, Y., Dong, C., Qiao, Y., & Change Loy, C. (2018). Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision workshops (pp. 0–0).

  • Wang, J., Sun, Y., & Tang, J. (2022). Lisiam: Localization invariance Siamese network for Deepfake detection. IEEE Transactions on Information Forensics and Security, 17, 2425–2436.

    Article  MATH  Google Scholar 

  • Wei, X., Zhang, T., Li, Y., Zhang, Y., & Wu, F. (2020). Multi-modality cross attention network for image and sentence matching. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 10938–10947).

  • Woo, S., Park, J., Lee, J.-Y., & Kweon, I.S. (2018). CBAM: Convolutional block attention module. In Proceedings of the European conference on computer vision (pp. 3–19).

  • Xu, Y., Yin, Y., Jiang, L., Wu, Q., Zheng, C., Loy, C.C., Dai, B., & Wu, W. (2022). Transeditor: Transformer-based dual-space gan for highly controllable facial editing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7683–7692).

  • Yang, X., Li, Y., & Lyu, S. (2019). Exposing deep fakes using inconsistent head poses. In 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 8261–8265).

  • Yang, Z., & Liang, J. (2023). Masked relation learning for deepfake detection. IEEE Transactions on Information Forensics and Security, 18, 1696–1708.

    Article  MATH  Google Scholar 

  • Yu, Z., Zhao, C., Wang, Z., Qin, Y., Su, Z., Li, X., Zhou, F., & Zhao, G. (2020). Searching central difference convolutional networks for face anti-spoofing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5295–5305).

  • Zhao, T., Xu, X., Xu, M., Ding, H., et al. (2021). Learning self-consistency for deepfake detection. In Proceedings of the IEEE international conference on computer vision (pp. 15023–15033).

  • Zhao, Y., Yan, K., Huang, F., & Li, J. (2021). Graph-based high-order relation discovery for fine-grained recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 15079–15088).

  • Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., & Yu, N. (2021). Multi-attentional deepfake detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2185–2194).

  • Zhuang, W., & Chu, Q. (2022). UIA-VIT: Unsupervised inconsistency-aware method based on vision transformer for face forgery detection. In Proceedings of the European conference on computer vision (pp. 391–407). Springer.

  • Zi, B., Chang, M., Chen, J., Ma, X., & Jiang, Y.-G. (2020). Wilddeepfake: A challenging real-world dataset for deepfake detection. In Proceedings of the 28th ACM international conference on multimedia (pp. 2382–2390).

Download references

Acknowledgements

This work is supported by National Key R &D Program of China (2021YFF0602101) and the National Natural Science Foundation of China (62172227).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiyuan Hu.

Additional information

Communicated by Segio Escalera.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Chen, C., Zhang, N. et al.  WATCHER: Wavelet-Guided Texture-Content Hierarchical Relation Learning for Deepfake Detection. Int J Comput Vis (2024). https://doi.org/10.1007/s11263-024-02116-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11263-024-02116-5

Keywords

Navigation