Discovering Vulnerable Functions: A Code Similarity Based Approach

  • Aditya Chandran
  • Lokesh JainEmail author
  • Sanjay Rawat
  • Kannan Srinathan
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 625)


This paper extends recent work on vulnerability extrapolation. A surge in vulnerability exploits against old and new softwares, urges the importance of detection of vulnerabilities and possible attacks prior to the attacker. How sophisticated an exploit may be, an underlying prerequisite remains to be the presence of at least one memory corruption bug, serving as entry point for the exploit. Therefore several rigorous software testing techniques are borrowed to detect and eliminate software bugs as early as possible. Code similarity based bug detection is one of such techniques, which, in the parlance of software security, is also termed as vulnerability extrapolation. In this paper, we present a source code similarity based bug identification technique by considering code features that are relevant for security related bugs. Our technique works by enriching (augmenting) abstract syntax trees (ASTs) of functions by considering security relevant properties of the code. We show the effectiveness of the augmented AST based similarity approach over existing methods by evaluating proposed method on real-world applications.


Software vulnerability Abstract syntax tree Vulnerability extrapolation Code similarity 


  1. 1.
  2. 2.
    Flawfinder., d. A. Wheeler
  3. 3.
    Pscan: a limited problem scanner for c source files., a. Dekok
  4. 4.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391 (1990)CrossRefGoogle Scholar
  5. 5.
    Ducasse, S., Rieger, M., Demeyer, S.: A language independent approach for detecting duplicated code. In: IEEE International Conference on Software Maintenance 1999 (ICSM 1999) Proceedings, pp. 109–118. IEEE (1999)Google Scholar
  6. 6.
    Evans, D., Larochelle, D.: Improving security using extensible lightweight static analysis. IEEE Softw. 19(1), 42–51 (2002)CrossRefGoogle Scholar
  7. 7.
    Heelan, S.: Vulnerability detection systems: think cyborg, not robot. IEEE Secur. Priv. 9(3), 74–77 (2011)CrossRefGoogle Scholar
  8. 8.
    Kapser, C., Godfrey, M.W.: Toward a taxonomy of clones in source code: a case study. In: Proceedings of the Conference on Evolution of Large Scale Industrial Software Architectures (ELISA 2003), pp. 67–78 (2003)Google Scholar
  9. 9.
    Kontogiannis, K.A., Demori, R., Merlo, E., Galler, M., Bernstein, M.: Pattern matching for clone and concept detection. In: Reverse Engineering, pp. 77–108. Springer (1996)Google Scholar
  10. 10.
    Li, Z., Lu, S., Myagmar, S., Zhou, Y.: CP-Miner: finding copy-paste and related bugs in large-scale software code. IEEE Trans. Softw. Eng. 32(3), 176–192 (2006)CrossRefGoogle Scholar
  11. 11.
    Newsome, J., Song, D.: Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In: NDSS. IEEE (2005)Google Scholar
  12. 12.
    Ransbotham, S.: An empirical analysis of exploitation attempts based on vulnerabilities in open source software. In: WEIS (2010)Google Scholar
  13. 13.
    Rawat, S., Mounier, L.: Finding buffer overflow inducing loops in binary executables. In: 2012 IEEE Sixth International Conference on Software Security and Reliability (SERE), pp. 177–186. IEEE CSP (2012)Google Scholar
  14. 14.
    Schwartz, E.J., Avgerinos, T., Brumley, D.: All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In: 2010 IEEE symposium on Security and Privacy (SP), pp. 317–331. IEEE (2010)Google Scholar
  15. 15.
    Sutton, M., Greene, A., Amini, P.: Fuzzing: Brute Force Vulnerability Discovery. Pearson Education, Upper Saddle River (2007)Google Scholar
  16. 16.
    Viega, J., Bloch, J.T., Kohno, Y., McGraw, G.: ITS4: a static vulnerability scanner for C and C++ code. In: 16th Annual Conference on Computer Security Applications, 2000 (ACSAC 2000), pp. 257–267. IEEE (2000)Google Scholar
  17. 17.
    Williams, C.C., Hollingsworth, J.K.: Automatic mining of source code repositories to improve bug finding techniques. IEEE Trans. Softw. Eng. 31(6), 466–480 (2005)CrossRefGoogle Scholar
  18. 18.
    Yamaguchi, F., Lindner, F., Rieck, K.: Vulnerability extrapolation: assisted discovery of vulnerabilities using machine learning. In: Proceedings of the 5th USENIX Conference on Offensive Technologies, p. 13. USENIX Association (2011)Google Scholar
  19. 19.
    Yamaguchi, F., Lottmann, M., Rieck, K.: Generalized vulnerability extrapolation using abstract syntax trees. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 359–368. ACM (2012)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2016

Authors and Affiliations

  • Aditya Chandran
    • 1
  • Lokesh Jain
    • 1
    Email author
  • Sanjay Rawat
    • 1
  • Kannan Srinathan
    • 1
  1. 1.International Institute of Information TechnologyHyderabadIndia

Personalised recommendations