Towards Privacy-Preserving Visual Recognition via Adversarial Training: A Pilot Study

  • Zhenyu WuEmail author
  • Zhangyang Wang
  • Zhaowen Wang
  • Hailin Jin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)


This paper aims to improve privacy-preserving visual recognition, an increasingly demanded feature in smart camera applications, by formulating a unique adversarial training framework. The proposed framework explicitly learns a degradation transform for the original video inputs, in order to optimize the trade-off between target task performance and the associated privacy budgets on the degraded video. A notable challenge is that the privacy budget, often defined and measured in task-driven contexts, cannot be reliably indicated using any single model performance, because a strong protection of privacy has to sustain against any possible model that tries to hack privacy information. Such an uncommon situation has motivated us to propose two strategies, i.e., budget model restarting and ensemble, to enhance the generalization of the learned degradation on protecting privacy against unseen hacker models. Novel training strategies, evaluation protocols, and result visualization methods have been designed accordingly. Two experiments on privacy-preserving action recognition, with privacy budgets defined in various ways, manifest the compelling effectiveness of the proposed framework in simultaneously maintaining high target task (action recognition) performance while suppressing the privacy breach risk. The code is available at


Visual privacy Adversarial training Action recognition 

Supplementary material

474218_1_En_37_MOESM1_ESM.pdf (2.9 mb)
Supplementary material 1 (pdf 2979 KB)


  1. 1.
    Abadi, M., et al.: Deep learning with differential privacy. In:: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. ACM (2016)Google Scholar
  2. 2.
    Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., Baskurt, A.: Sequential deep learning for human action recognition. In: Salah, A.A., Lepri, B. (eds.) HBU 2011. LNCS, vol. 7065, pp. 29–39. Springer, Heidelberg (2011). Scholar
  3. 3.
    Butler, D.J., Huang, J., Roesner, F., Cakmak, M.": The privacy-utility tradeoff for remotely teleoperated robots. In: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, pp. 27–34. ACM (2015)Google Scholar
  4. 4.
    Chattopadhyay, A., Boult, T.E.: Privacycam: a privacy preserving camera using uclinux on the blackfin dsp. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007, CVPR 2007, pp. 1–8. IEEE (2007)Google Scholar
  5. 5.
    Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. Action recognition from depth sequences using depth motion maps-based local binary patterns. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1092–1099. IEEE (2015)Google Scholar
  6. 6.
    Chen, J., Wu, J., Konrad, J., Ishwar, P.: Semi-coupled two-stream fusion convnets for action recognition at extremely low resolutions. arXiv preprint arXiv:1610.03898 (2016)
  7. 7.
    Cheng, B., et al.: Robust emotion recognition from low quality and low bit rate video: a deep learning approach. In: 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 65–70. IEEE (2017)Google Scholar
  8. 8.
    Cormode, G.: Individual privacy vs population privacy: learning to attack anonymization. arXiv preprint arXiv:1011.2511 (2010)
  9. 9.
    Dai, J., Saghafi, B., Wu, J., Konrad, J., Ishwar, P.: Towards privacy-preserving recognition of human activities. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 4238–4242. IEEE (2015)Google Scholar
  10. 10.
    Desjardins, G., Courville, A., Bengio, Y.: Disentangling factors of variation via generative entangling. arXiv preprint arXiv:1210.5474 (2012)
  11. 11.
    Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)Google Scholar
  12. 12.
    Dwork, C.: Differential Privacy: A Survey of Results. In: Agrawal, M., Du, D., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008). Scholar
  13. 13.
    Erkin, Z., Franz, M., Guajardo, J., Katzenbeisser, S., Lagendijk, I., Toft, T.: Privacy-preserving face recognition. In: Goldberg, I., Atallah, M.J. (eds.) PETS 2009. LNCS, vol. 5672, pp. 235–253. Springer, Heidelberg (2009). Scholar
  14. 14.
    Farokhi, F., Sandberg, H.: Fisher information as a measure of privacy: preserving privacy of households with smart meters using batteries. IEEE Transactions on Smart Grid (2017)Google Scholar
  15. 15.
    Godard, C., Mac Aodha, O., Brostow, G..J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR, vol. 2, p. 7 (2017)Google Scholar
  16. 16.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  17. 17.
    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
  18. 18.
    Hamm, J.: Minimax filter: learning to preserve privacy from inference attacks. J. Mach. Learn. Res. 18(1), 4704–4734 (2017)MathSciNetzbMATHGoogle Scholar
  19. 19.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
  20. 20.
    Higgins, I., et al.: Darla: Improving zero-shot transfer in reinforcement learning. arXiv:1707.08475 (2017)
  21. 21.
    Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  22. 22.
    Ji, S., Wei, X., Yang, M., Kai, Y.: 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)CrossRefGoogle Scholar
  23. 23.
    Jia, L., Radke, R..J.: Using time-of-flight measurements for privacy-preserving tracking in a smart room. IEEE Trans. Ind. Inf. 10(1), 689–696 (2014)CrossRefGoogle Scholar
  24. 24.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision (2016)Google Scholar
  25. 25.
    Li, J., Li, S.Z., Pan, Q., Yang, T.: Illumination and motion-based video enhancement for night surveillance. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, 2005, pp. 169–175. IEEE (2005)Google Scholar
  26. 26.
    Li, Y., Vishwamitra, N., Knijnenburg, B.P., Hu, H., Caine, K.: Blur vs. block: Investigating the effectiveness of privacy-enhancing obfuscation for images. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1343–1351. IEEE (2017)Google Scholar
  27. 27.
    Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)
  28. 28.
    Liu, D., Cheng, B., Wang, Z., Zhang, H., Huang, T.S.: Enhance visual recognition under adverse conditions via deep networks. arXiv preprint arXiv:1712.07732 (2017)
  29. 29.
    Liu, P., Zhou, J.T., Tsang, I.W.-H., Meng, Z., Han, S., Tong, Y.: Feature disentangling machine - a novel approach of feature selection and disentangling in facial expression analysis. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 151–166. Springer, Cham (2014). Scholar
  30. 30.
    Mahasseni, B., Todorovic, S., Fern, A.: Budget-aware deep semantic video segmentationGoogle Scholar
  31. 31.
    Mahendran, A., Vedaldi, A.: Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vis. (2016)Google Scholar
  32. 32.
    McPherson, R., Shokri, R., Shmatikov, V.: Defeating image obfuscation with deep learning. arXiv preprint arXiv:1609.00408 (2016)
  33. 33.
    Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 251–260. ACM (2010)Google Scholar
  34. 34.
    Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: 2009 30th IEEE Symposium on, Security and Privacy, pp. 173–187. IEEE (2009)Google Scholar
  35. 35.
    Nayar, S.K., Narasimhan, S.G.: Vision in bad weather. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 820–827. IEEE (1999)Google Scholar
  36. 36.
    Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427–436 (2015)Google Scholar
  37. 37.
    Oh, Seong Joon, Benenson, Rodrigo, Fritz, Mario, Schiele, Bernt: Faceless person recognition: privacy implications in social media. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9907, pp. 19–35. Springer, Cham (2016). Scholar
  38. 38.
    Oh, S.J., Fritz, M., Schiele, B.: Adversarial image perturbation for privacy protection-a game theory perspective. In: International Conference on Computer Vision (ICCV) (2017)Google Scholar
  39. 39.
    Orekondy, T., Schiele, B., Fritz, M.: Towards a visual privacy advisor: understanding and predicting privacy risks in images. In: IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  40. 40.
    Orekondy, T., Schiele, B., Fritz, M., Saarland Informatics Campus: Towards a visual privacy advisor: understanding and predicting privacy risks in images. arXiv preprint arXiv:1703.10660 (2017)
  41. 41.
    Pittaluga, F., Koppal, S.J.: Privacy preserving optics for miniature vision sensors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 314–324 (2015)Google Scholar
  42. 42.
    Francesco Pittaluga and Sanjeev Jagannatha Koppal: Pre-capture privacy for small vision sensors. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2215–2226 (2017)CrossRefGoogle Scholar
  43. 43.
    Raval, N., Machanavajjhala, A., Cox, L.P.: Protecting visual secrets using adversarial nets. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1329–1332. IEEE (2017)Google Scholar
  44. 44.
    Ryoo, M.S., Fuchs, T.J., Xia, L., Aggarwal, J.K., Matthies, L.: Robot-centric activity prediction from first-person videos: What will they do to me? In: ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 295–302. Portland, OR (March 2015)Google Scholar
  45. 45.
    Ryoo, M.S., Matthies, L.: First-person activity recognition: what are they doing to me? In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, OR (June 2013)Google Scholar
  46. 46.
    Ryoo, M.S., Kim, K., Yang, H.J.: Extreme low resolution activity recognition with multi-siamese embedding learning. arXiv preprint arXiv:1708.00999 (2017)
  47. 47.
    Ryoo, M.S., Rothrock, B., Fleming, C., Yang, H.J.: Privacy-preserving human activity recognition from extreme low resolution (2017)Google Scholar
  48. 48.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)Google Scholar
  49. 49.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004, ICPR 2004, vol. 3, pp. 32–36. IEEE (2004)Google Scholar
  50. 50.
    Sharma, S., Kiros, R., Salakhutdinov, R.: Action recognition using visual attention. arXiv preprint arXiv:1511.04119 (2015)
  51. 51.
    Siddharth, N., et al.: Learning disentangled representations in deep generative models (2016)Google Scholar
  52. 52.
    Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)Google Scholar
  53. 53.
    Sokolic, J., Qiu, Q., Rodrigues, M.R.D., Sapiro, G.: Learning to succeed while teaching to fail: privacy in closed machine learning systems. arXiv preprint arXiv:1705.08197 (2017)
  54. 54.
    Soomro, K., Roshan Zamir, A., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
  55. 55.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  56. 56.
    Szegedy, C.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
  57. 57.
    Taigman, Y., Yang, M., Ranzato, M.A., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  58. 58.
    Tao, S., Kudo, M., Nonaka, H.: Privacy-preserved behavior analysis and fall detection by an infrared ceiling sensor network. Sensors 12(12), 16920–16936 (2012)CrossRefGoogle Scholar
  59. 59.
    TechCrunch: Amazon’s camera-equipped echo look raises new questions about smart home privacy.
  60. 60.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4489–4497. IEEE (2015)Google Scholar
  61. 61.
    Wang, Z., Chang, S., Yang, Y., Liu, D., Huang, T.S.: Studying very low resolution recognition using deep networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  62. 62.
    Weinland, D., Ronfard, R., Boyer, E.: Free viewpoint action recognition using motion history volumes. Comput. Vis. Image Underst. 104(2), 249–257 (2006)CrossRefGoogle Scholar
  63. 63.
    Winkler, T., Erdélyi, A., Rinner, B.: Trusteye. m4: protecting the sensornot the camera. In: 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 159–164. IEEE (2014)Google Scholar
  64. 64.
    Xiang, X., Tran, T.D.: Linear disentangled representation learning for facial actions. arXiv preprint arXiv:1701.03102 (2017)
  65. 65.
    Xie, Y., Xiao, J., Tillo, T., Wei, Y., Zhao, Y.: 3d video super-resolution using fully convolutional neural networks. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2016)Google Scholar
  66. 66.
    Yonetani, R., Boddeti, V.N., Kitani, K.M., Sato, Y.: Privacy-preserving visual learning using doubly permuted homomorphic encryption. arXiv preprint arXiv:1704.02203 (2017)
  67. 67.
    Yun, K., Honorio, J., Chattopadhyay, D., Berg, T.L., Samaras, D.: Two-person interaction detection using body-pose features and multiple instance learning. In: IEEE Computer Vision and Pattern Recognition Workshops (CVPRW) (2012)Google Scholar
  68. 68.
    Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.07012 (2017)

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Zhenyu Wu
    • 1
    Email author
  • Zhangyang Wang
    • 1
  • Zhaowen Wang
    • 2
  • Hailin Jin
    • 2
  1. 1.Texas A&M UniversityCollege StationUSA
  2. 2.Adobe ResearchSan JoseUSA

Personalised recommendations