Secure Deep Learning Engineering: A Road Towards Quality Assurance of Intelligent Systems

  • Yang Liu
  • Lei MaEmail author
  • Jianjun ZhaoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11852)


Over the past decades, deep learning (DL) systems have achieved tremendous success and gained great popularity in various applications, such as intelligent machines, image processing, speech processing, and medical diagnostics. Deep neural networks are the key driving force behind its recent success, but still seem to be a magic black box lacking interpretability and understanding. This brings up many open safety and security issues with enormous and urgent demands on rigorous methodologies and engineering practice for quality enhancement. A plethora of studies have shown that state-of-the-art DL systems suffer from defects and vulnerabilities that can lead to severe loss and tragedies, especially when applied to real-world safety-critical applications.

In this paper, we perform a large-scale study and construct a paper repository of 223 relevant works to the quality assurance, security, and interpretation of deep learning. Based on this, we, from a software quality assurance perspective, pinpoint challenges and future opportunities to facilitate drawing the attention of the software engineering community towards addressing the pressing industrial demand of secure intelligent systems.


Artificial intelligence Deep learning Software engineering Security Quality assurance Reliability Deep learning engineering 



We thank Felix Juefei-Xu, Xiaofei Xie, Minhui Xue, Qiang Hu, Xiaoning Du, Yi Li, Sen Chen, Bo Li, Jianxiong Yin, Simon See for their contribution to initiate the early work of this paper. We also acknowledge the support of NVIDIA AI Tech Center (NVAITC) to our research, which largely shapes the direction of this work. This research was supported (in part) by the National Research Foundation, Prime Ministers Office, Singapore under its National Cybersecurity R&D Program (Award No. NRF2018NCR-NCR005-0001), National Satellite of Excellence in Trustworthy Software System (Award No. NRF2018NCR-NSOE003-0001) administered by the National Cybersecurity R&D Directorate; JSPS KAKENHI Grant NO.19H04086, NO. 18H04097, and Qdai-jump Research Program NO. 01277.


  1. 1.
    BBC: Google’s DeepMind to peek at NHS eye scans for disease analysis (2016).
  2. 2.
    BBC: AI image recognition fooled by single pixel change (2018).
  3. 3.
    BBC: Artificial intelligence ’did not miss a single urgent case’ (2018).
  4. 4.
    BBC: Can we trust AI if we don’t know how it works? (2018).
  5. 5.
    BBC: General Motors and Fiat Chrysler unveil self-driving deals (2018).
  6. 6.
    BBC: Google cars self-drive to Walmart supermarket in trial (2018).
  7. 7.
    BBC: Honda to invest \$2.8bn in GM’s self-driving car unit (2018).
  8. 8.
    BBC: Jaguar self-drive car revealed in New York (2018).
  9. 9.
    Biggio, B., et al.: Evasion attacks against machine learning at test time. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS (LNAI), vol. 8190, pp. 387–402. Springer, Heidelberg (2013). Scholar
  10. 10.
    Breier, J., Hou, X., Jap, D., Ma, L., Bhasin, S., Liu, Y.: DeepLaser: practical fault attack on deep neural networks. ArXiv e-printsGoogle Scholar
  11. 11.
    Breier, J., Hou, X., Jap, D., Ma, L., Bhasin, S., Liu, Y.: Practical fault attack on deep neural networks. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018 (2018)Google Scholar
  12. 12.
    Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE Symposium on Security and Privacy (SP), pp. 39–57 (2017)Google Scholar
  13. 13.
    Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: learning affordance for direct perception in autonomous driving. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2722–2730, December 2015.
  14. 14.
    Chen, P.Y., Sharma, Y., Zhang, H., Yi, J., Hsieh, C.J.: Ead: elastic-net attacks to deep neural networks via adversarial examples. arXiv preprint arXiv:1709.04114 (2017)
  15. 15.
    Chen, Y., et al.: Lidar-video driving dataset: Learning driving policies effectively. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  16. 16.
    Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. ArXiv e-printsGoogle Scholar
  17. 17.
    Du, X., Xie, X., Li, Y., Ma, L., Liu, Y., Zhao, J.: Deepstellar: model-based quantitative analysis of stateful deep learning systems. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 477–487. ESEC/FSE 2019 (2019)Google Scholar
  18. 18.
    Elgammal, A.M., Liu, B., Elhoseiny, M., Mazzone, M.: CAN: creative adversarial networks, generating “art” by learning about styles and deviating from style norms. CoRR abs/1706.07068 (2017).
  19. 19.
    Eliot, L.B.: Advances in AI and Autonomous Vehicles: Cybernetic Self-Driving Cars Practical Advances in Artificial Intelligence (AI) and Machine Learning, 1st edn. LBE Press Publishing (2017)Google Scholar
  20. 20.
    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015)Google Scholar
  21. 21.
    Google Accident: A Google self-driving car caused a crash for the first time (2016).
  22. 22.
    Guo, Q., et al.: An empirical study towards characterizing deep learning development and deployment across different frameworks and platforms. In: Proceedings of the 34rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2019 (2019)Google Scholar
  23. 23.
    He, W., Wei, J., Chen, X., Carlini, N., Song, D.: Adversarial example defenses: ensembles of weak defenses are not strong. arXiv preprint arXiv:1706.04701 (2017)
  24. 24.
    Huang, X., Wang, P., Cheng, X., Zhou, D., Geng, Q., Yang, R.: The ApolloScape open dataset for autonomous driving and its application. ArXiv e-printsGoogle Scholar
  25. 25.
    Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). Scholar
  26. 26.
    Kim, B., et al.: Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). ArXiv e-printsGoogle Scholar
  27. 27.
    Kim, J., Feldt, R., Yoo, S.: Guiding deep learning system testing using surprise adequacy arXiv:1808.08444 (2018)
  28. 28.
    Lipton, Z.C.: The mythos of model interpretability. CoRR abs/1606.03490 (2016).
  29. 29.
    Ma, L., et al.: DeepCT: tomographic combinatorial testing for deep learning systems. In: 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 614–618, February 2019Google Scholar
  30. 30.
    Ma, L., et al.: DeepGauge: multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, pp. 120–131 (2018)Google Scholar
  31. 31.
    Ma, L., et al.: DeepMutation: mutation testing of deep learning systems. In: The 29th IEEE International Symposium on Software Reliability Engineering (ISSRE) (2018)Google Scholar
  32. 32.
    Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. arXiv preprint arXiv:1511.04599 (2015)
  33. 33.
    Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE (2016)Google Scholar
  34. 34.
    Papernot, N., McDaniel, P.D., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. In: IEEE Symposium on Security and Privacy, SP 2016, pp. 582–597 (2016)Google Scholar
  35. 35.
    Pei, K., Cao, Y., Yang, J., Jana, S.: DeepXplore: automated whitebox testing of deep learning systems. In: Proceedings of the 26th Symposium on Operating Systems Principles, pp. 1–18 (2017)Google Scholar
  36. 36.
    Pressman, R.: Software Engineering: A Practitioner’s Approach, 7th edn. McGraw-Hill Inc., New York (2010)zbMATHGoogle Scholar
  37. 37.
    Ramanishka, V., Chen, Y.T., Misu, T., Saenko, K.: Toward driving scene understanding: a dataset for learning driver behavior and causal reasoning. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar
  38. 38.
    Ruparelia, N.B.: Software development lifecycle models. SIGSOFT Softw. Eng. Notes 35(3), 8–13 (2010). Scholar
  39. 39.
    The New York Times: Alexa and Siri Can Hear This Hidden Command. You Can’t (2018).
  40. 40.
    The New York Times: Toyota, SoftBank Setting Up Mobility Services Joint Venture (2018).
  41. 41.
    Uber Accident: After Fatal Uber Crash, a Self-Driving Start-Up Moves Forward (2018).
  42. 42.
    Xiang, Y., et al.: ObjectNet3D: a large scale database for 3D object recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 160–176. Springer, Cham (2016). Scholar
  43. 43.
    Xie, X., et al.: DeepHunter: a coverage-guided fuzz testing framework for deep neural networks. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, pp. 146–157 (2019)Google Scholar
  44. 44.
    Xie, X., Ma, L., Wang, H., Li, Y., Liu, Y., Li, X.: DiffChaser: detecting disagreements for deep neural networks. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence (2019)Google Scholar
  45. 45.
    Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155 (2017)
  46. 46.
    Xu, W., Qi, Y., Evans, D.: Automatically evading classifiers. In: Proceedings of the 2016 Network and Distributed Systems Symposium (2016)Google Scholar
  47. 47.
    Zhang, J.M., Harman, M., Ma, L., Liu, Y.: Machine learning testing: survey, landscapes and horizons. arXiv e-prints, June 2019Google Scholar
  48. 48.
    Zhang, T., Gao, C., Ma, L., Lyu, M.R., Kim, M.: An empirical study of common challenges in developing deep learning applications. In: The 30th IEEE International Symposium on Software Reliability Engineering (ISSRE) (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Nanyang Technological UniversitySingaporeSingapore
  2. 2.Kyushu UniversityFukuokaJapan

Personalised recommendations