Skip to main content

SCVD: A New Semantics-Based Approach for Cloned Vulnerable Code Detection

  • Conference paper
  • First Online:
Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10327))

Abstract

The behavior of copying existing code to reuse or modify its functionality is very common in the software development. However, when developers clone the existing code, they also clone any vulnerabilities in it. Thus, it seriously affects the security of the system. In this paper, we propose a novel semantics-based approach called SCVD for cloned vulnerable code detection. We use the full path traversal algorithm to transform the Program Dependency Graph (PDG) into a tree structure while preserving all the semantic information carried by the PDG and apply the tree to the cloned vulnerable code detection. We use the identifier name mapping technique to eliminate the impact of identifier name modification. Our key insights are converting the complex graph similarity problem into a simpler tree similarity problem and using the identifier name mapping technique to improve the effectiveness of semantics-based cloned vulnerable code detection. We have developed a practical tool based on our approach and performed a large number of experiments to evaluate the performance from three aspects, including the false positive rate, false negative rate, and time cost. The experiment results show that our approach has a significant improvement on the vulnerability detection effectiveness compared with the existing approaches and has lower time cost than subgraph isomorphism approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. CloneDR. http://www.semdesigns.com/Products/Clone/

  2. CodeSurfer. https://www.grammatech.com/products/codesurfer

  3. Baker, B.S.: On finding duplication and near-duplication in large software systems. In: Proceedings of 2nd Working Conference on Reverse Engineering, pp. 86–95. IEEE (1995)

    Google Scholar 

  4. Baxter, I.D., Yahin, A., Moura, L., Sant’Anna, M., Bier, L.: Clone detection using abstract syntax trees. In: Proceedings of International Conference on Software Maintenance, pp. 368–377 (1998)

    Google Scholar 

  5. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for matching large graphs. In: Proceedings of 3rd IAPR-TC15 Workshop on Graph-Based Representations in Pattern Recognition, pp. 149–159 (2001)

    Google Scholar 

  6. Csardi, G., Nepusz, T.: The igraph software package for complex network research. Int. J. Complex Syst. 1695(5), 1–9 (2006)

    Google Scholar 

  7. Ducasse, S., Rieger, M., Demeyer, S.: A language independent approach for detecting duplicated code. In: Proceedings of the International Conference on Software Maintenance (ICSM), pp. 109–118. IEEE (1999)

    Google Scholar 

  8. Ferrante, J., Ottenstein, K.J., Warren, J.D.: The program dependence graph and its use in optimization. ACM Trans. Program. Lang. Syst. (TOPLAS) 9(3), 319–349 (1987)

    Article  MATH  Google Scholar 

  9. Gabel, M., Jiang, L., Su, Z.: Scalable detection of semantic clones. In: Proceedings of ACM/IEEE 30th International Conference on Software Engineering (ICSE), pp. 321–330. IEEE (2008)

    Google Scholar 

  10. Jang, J., Agrawal, A., Brumley, D.: ReDeBug: finding unpatched code clones in entire OS distributions. In: Proceedings of IEEE Symposium on Security and Privacy (SP), pp. 48–62. IEEE (2012)

    Google Scholar 

  11. Jiang, L., Misherghi, G., Su, Z., Glondu, S.: Deckard: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th International Conference on Software Engineering, pp. 96–105. IEEE Computer Society (2007)

    Google Scholar 

  12. Johnson, J.H.: Identifying redundancy in source code using fingerprints. In: Proceedings of the 1993 Conference of the Centre for Advanced Studies on Collaborative Research, pp. 171–183. IBM Press (1993)

    Google Scholar 

  13. Johnson, J.H.: Substring matching for clone detection and change tracking. In: Proceedings of the International Conference on Software Maintenance (ICSM), vol. 94, pp. 120–126 (1994)

    Google Scholar 

  14. Jones, J.: Abstract syntax tree implementation idioms. In: Proceedings of the 10th Conference on Pattern Languages of Programs (PLoP). p. 26 (2003)

    Google Scholar 

  15. Kamiya, T., Kusumoto, S., Inoue, K.: CCFinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)

    Article  Google Scholar 

  16. Kim, M., Sazawal, V., Notkin, D., Murphy, G.: An empirical study of code clone genealogies. In: ACM SIGSOFT Software Engineering Notes, vol. 30, pp. 187–196. ACM (2005)

    Google Scholar 

  17. Komondoor, R., Horwitz, S.: Using slicing to identify duplication in source code. In: Cousot, P. (ed.) SAS 2001. LNCS, vol. 2126, pp. 40–56. Springer, Heidelberg (2001). doi:10.1007/3-540-47764-0_3

    Chapter  Google Scholar 

  18. Koschke, R., Falke, R., Frenzel, P.: Clone detection using abstract syntax suffix trees. In: Proceedings of the 13th Working Conference on Reverse Engineering (WCRE), pp. 253–262. IEEE (2006)

    Google Scholar 

  19. Li, J., Ernst, M.D.: CBCD: cloned buggy code detector. In: Proceedings of 34th International Conference on Software Engineering (ICSE), pp. 310–320. IEEE (2012)

    Google Scholar 

  20. Li, Z., Zou, D., Xu, S., Jin, H., Qi, H., Hu, J.: VulPecker: an automated vulnerability detection system based on code similarity analysis. In: Proceedings of the 32nd Annual Conference on Computer Security Applications (ACSAC), pp. 201–213. ACM (2016)

    Google Scholar 

  21. Li, Z., Lu, S., Myagmar, S., Zhou, Y.: CP-Miner: finding copy-paste and related bugs in large-scale software code. IEEE Trans. Softw. Eng. 32(3), 176–192 (2006)

    Article  Google Scholar 

  22. Mayrand, J., Leblanc, C., Merlo, E.: Experiment on the automatic detection of function clones in a software system using metrics. In: Proceedings of International Conference on Software Maintenance (ICSM), p. 244 (1996)

    Google Scholar 

  23. Read, R.C., Corneil, D.G.: The graph isomorphism disease. J. Graph Theory 1(4), 339–363 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  24. Sajnani, H., Saini, V., Lopes, C.: A parallel and efficient approach to large scale clone detection. J. Softw. Evol. Process 27(6), 402–429 (2015)

    Article  Google Scholar 

  25. Sheneamer, A., Kalita, J.: Semantic clone detection using machine learning. In: Proceedings of 15th IEEE International Conference on Machine Learning and Applications, pp. 1024–1028. IEEE (2016)

    Google Scholar 

  26. White, M., Tufano, M., Vendome, C., Poshyvanyk, D.: Deep learning code fragments for code clone detection. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, pp. 87–98. ACM (2016)

    Google Scholar 

  27. Yamaguchi, F., Golde, N., Arp, D., Rieck, K.: Modeling and discovering vulnerabilities with code property graphs. In: Proceedings of IEEE Symposium on Security and Privacy (SP), pp. 590–604. IEEE (2014)

    Google Scholar 

Download references

Acknowledgments

This paper is supported by the National Science Foundation of China under grant No. 61672249, the National Basic Research Program of China (973 Program) under grant No. 2014CB340600, the National Key Research & Development (R&D) Plan of China under grant No. 2016YFB0200300, and the Natural Science Foundation of Hebei Province under grant No. F2015201089.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhen Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zou, D. et al. (2017). SCVD: A New Semantics-Based Approach for Cloned Vulnerable Code Detection. In: Polychronakis, M., Meier, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2017. Lecture Notes in Computer Science(), vol 10327. Springer, Cham. https://doi.org/10.1007/978-3-319-60876-1_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-60876-1_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-60875-4

  • Online ISBN: 978-3-319-60876-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics