Advertisement

Binary Analysis Overview

  • Saed Alrabaee
  • Mourad Debbabi
  • Paria Shirani
  • Lingyu Wang
  • Amr Youssef
  • Ashkan Rahimian
  • Lina Nouh
  • Djedjiga Mouheb
  • He Huang
  • Aiman Hanna
Chapter
  • 112 Downloads
Part of the Advances in Information Security book series (ADIS, volume 78)

Abstract

When the source code is unavailable, it is important for security applications, such as malware detection, software license infringement, vulnerability analysis, and digital forensics to be able to efficiently extract meaningful fingerprints from the binary code. Such fingerprints will enhance the effectiveness and efficiency of reverse engineering tasks as they can provide a range of insights into the program binaries. However, a great deal of important information will likely be lost during the compilation process, including variable and function names, the original control and data flow structures, comments, and layout. In this chapter, we provide a comprehensive review of existing binary code fingerprinting frameworks. As such, we systematize the study of binary code fingerprints based on the most important dimensions: the applications that motivate it, the approaches used and their implementations, the specific aspects of the fingerprinting framework, and how the results are evaluated.

References

  1. 26.
    Malheur: Automatic Analysis of Malware Behavior. http://www.mlsec.org/malheur/, 2015.
  2. 30.
    C++ refactoring tools for visual studio. http://www.wholetomato.com/, 2016. Accessed: February 2016.
  3. 41.
    Refactoring tool. https://www.devexpress.com/Products/CodeRush/, 2018. Accessed: February 2018.
  4. 43.
    EXEINFO PE. http://exeinfo.atwebpages.com/, 2019. Accessed: June 2019.
  5. 45.
    Hex-Rays IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: June 2019.
  6. 46.
    HexRays: IDA Pro. https://www.hex-rays.com/products/ida/, 2019. Accessed: January 2019.
  7. 47.
    OllyDbg, a 32-bit Assembler Level Analysing Debugger for Microsoft Windows. http://ollydbg.de/, 2019. Accessed: June 2019.
  8. 48.
    PEfile. http://code.google.com/p/pefile/, 2019. Accessed: June 2019.
  9. 49.
    RDG_Packer_Detector. http://www.rdgsoft.net/, 2019. Accessed: June 2019.
  10. 50.
    The Paradyn Project. http://www.paradyn.org/html/dyninst9.0.0-features.html, 2019. Accessed: June 2019.
  11. 51.
    PlanetMath. Symmetric Difference. https://planetmath.org/symmetricdifference, 2019. Accessed: 2019.
  12. 52.
    Tigress, a Diversifying Virtualizer/Obfuscator for the C language. http://tigress.cs.arizona.edu/, 2019. Accessed: June 2019.
  13. 53.
    Zynamics, BinNavi: Binary Code Reverse Engineering Tool. http://www.zynamics.com/binnavi.html, 2019. Accessed: June 2019.
  14. 54.
    Laksono Adhianto, Sinchan Banerjee, Mike Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey, and Nathan R Tallent. HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurrency and Computation: Practice and Experience, 22(6):685–701, 2010.Google Scholar
  15. 55.
    Hiralal Agrawal and Joseph R Horgan. Dynamic program slicing. In ACM SIGPLAN Notices, volume 25, pages 246–256. ACM, 1990.Google Scholar
  16. 56.
    Agrawal, Parag and Arasu, Arvind and Kaushik, Raghav. On indexing error-tolerant set containment. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pages 927–938, 2010.Google Scholar
  17. 58.
    Shahinur Alam, R Nigel Horspool, and Issa Traore. MARD: a framework for metamorphic malware analysis and real-time detection. In The 28th International Conference on Advanced Information Networking and Applications (AINA), pages 480–489. IEEE, 2014.Google Scholar
  18. 59.
    Saed Alrabaee, Noman Saleem, Stere Preda, Lingyu Wang, and Mourad Debbabi. OBA2: an onion approach to binary code authorship attribution. Digital Investigation, 11:S94–S103, 2014.CrossRefGoogle Scholar
  19. 61.
    Saed Alrabaee, Paria Shirani, Lingyu Wang, and Mourad Debbabi. SIGMA: a semantic integrated graph matching approach for identifying reused functions in binary code. Digital Investigation, 12:S61–S71, 2015.CrossRefGoogle Scholar
  20. 64.
    Saed Alrabaee, Lingyu Wang, and Mourad Debbabi. BinGold: Towards robust binary analysis by extracting the semantics of binary code as semantic flow graphs (SFGs). Digital Investigation, 18:S11–S22, 2016.CrossRefGoogle Scholar
  21. 65.
    Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06)., pages 459–468. IEEE, 2006.Google Scholar
  22. 66.
    Dorian C Arnold, Dong H Ahn, Bronis R De Supinski, Gregory L Lee, Barton P Miller, and Martin Schulz. Stack trace analysis for large scale debugging. In IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pages 1–10. IEEE, 2007.Google Scholar
  23. 67.
    Thanassis Avgerinos, Sang Kil Cha, Alexandre Rebert, Edward J Schwartz, Maverick Woo, and David Brumley. Automatic exploit generation. Communications of the ACM, 57(2):74–84, 2014.CrossRefGoogle Scholar
  24. 68.
    Gogul Balakrishnan, Radu Gruian, Thomas Reps, and Tim Teitelbaum. CodeSurfer/x86—A platform for analyzing x86 executables. In Compiler Construction, pages 250–254. Springer, 2005.Google Scholar
  25. 69.
    Gogul Balakrishnan and Thomas Reps. WYSINWYX: What you see is not what you eXecute. ACM Transactions on Programming Languages and Systems (TOPLAS), 32(6):23, 2010.Google Scholar
  26. 72.
    Tiffany Bao, Jonathan Burket, Maverick Woo, Rafael Turner, and David Brumley. BYTEWEIGHT: Learning to Recognize Functions in Binary Code. In 23rd USENIX Security Symposium (USENIX Security 14), pages 845–860, 2014.Google Scholar
  27. 73.
    Sébastien Bardin, Philippe Herrmann, Jérôme Leroux, Olivier Ly, Sighireanu M., R. Tabary, T. Touili, and Aymeric Vincent. Description of the BINCOA Model. In Deliverable J1.1 part 2 of ANR Project BINCOA, 2009.Google Scholar
  28. 74.
    Sébastien Bardin, Philippe Herrmann, Jérôme Leroux, Olivier Ly, Renaud Tabary, and Aymeric Vincent. The BINCOA framework for binary code analysis. In International Conference on Computer Aided Verification, pages 165–170. Springer, 2011.Google Scholar
  29. 75.
    Mayank Bawa, Tyson Condie, and Prasanna Ganesan. LSH forest: self-tuning indexes for similarity search. In Proceedings of the 14th international conference on World Wide Web, pages 651–660. ACM, 2005.Google Scholar
  30. 78.
    Laszlo A. Belady and Meir M Lehman. A model of large program development. IBM Systems journal, 15(3):225–252, 1976.zbMATHCrossRefGoogle Scholar
  31. 84.
    Martial Bourquin, Andy King, and Edward Robbins. BinSlayer: accurate comparison of binary executables. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, page 4. ACM, 2013.Google Scholar
  32. 86.
    David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J Schwartz. BAP: A binary analysis platform. In International Conference on Computer Aided Verification, pages 463–469. Springer, 2011.Google Scholar
  33. 87.
    Danilo Bruschi, Lorenzo Martignoni, and Mattia Monga. Code normalization for self-mutating malware. IEEE Security & Privacy, (2):46–54, 2007.Google Scholar
  34. 88.
    Juan Caballero, Noah M Johnson, Stephen McCamant, and Dawn Song. Binary code extraction and interface identification for security applications. Technical report, University of California, Berkeley, Dept. of Electrical Engineering and Computer Science, 2009.Google Scholar
  35. 89.
    Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX conference on Operating Systems Design and Implementation, pages 209–224. USENIX Association, 2008.Google Scholar
  36. 90.
    Aylin Caliskan-Islam, Fabian Yamaguchi, Edwin Dauber, Richard Harang, Konrad Rieck, Rachel Greenstadt, and Arvind Narayanan. When coding style survives compilation: De-anonymizing programmers from executable binaries. The 25th Annual Network and Distributed System Security Symposium (NDSS), pages 255–270, 2018.Google Scholar
  37. 91.
    Joan Calvet, José M Fernandez, and Jean-Yves Marion. Aligot: cryptographic function identification in obfuscated binary programs. In Proceedings of the 2012 ACM conference on Computer and communications security (CCS), pages 169–182. ACM, 2012.Google Scholar
  38. 93.
    Silvio Cesare, Yang Xiang, and Wanlei Zhou. Control flow-based malware variantdetection. IEEE Transactions on Dependable and Secure Computing (TDSC), 11(4):307–317, 2014.CrossRefGoogle Scholar
  39. 94.
    Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley. Unleashing mayhem on binary code. In IEEE Symposium on Security and Privacy (S&P), pages 380–394. IEEE, 2012.Google Scholar
  40. 95.
    Sang Kil Cha, Maverick Woo, and David Brumley. Program-adaptive mutational fuzzing. In IEEE Symposium on Security and Privacy (S&P), pages 725–741. IEEE, 2015.Google Scholar
  41. 96.
    Sagar Chaki, Cory Cohen, and Arie Gurfinkel. Supervised learning for provenance-similarity of binaries. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 15–23. ACM, 2011.Google Scholar
  42. 97.
    Chandra, Mahalanobis Prasanta and Others. On the generalised distance in statistics. Proceedings of the National Institute of Sciences of India, 2(1):49–55, 1936.Google Scholar
  43. 98.
    Mahinthan Chandramohan, Yinxing Xue, Zhengzi Xu, Yang Liu, Chia Yuan Cho, and Hee Beng Kuan Tan. BinGo: cross-architecture cross-OS binary search. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 678–689. ACM, 2016.Google Scholar
  44. 99.
    Eric Cheng. Binary Analysis and Symbolic Execution with angr. PhD thesis, The MITRE Corporation, 2016.Google Scholar
  45. 100.
    Vitaly Chipounov, Volodymyr Kuznetsov, and George Candea. The S2E platform: Design, implementation, and applications. ACM Transactions on Computer Systems (TOCS), 30(1):2, 2012.Google Scholar
  46. 101.
    Young Han Choi, Byoung Jin Han, Byung Chul Bae, Hyung Geun Oh, and Ki Wook Sohn. Toward extracting malware features for classification using static and dynamic analysis. In The 8th International Conference on Computing and Networking Technology (ICCNT), pages 126–129. IEEE, 2012.Google Scholar
  47. 103.
    Paolo Milani Comparetti, Guido Salvaneschi, Engin Kirda, Clemens Kolbitsch, Christopher Kruegel, and Stefano Zanero. Identifying dormant functionality in malware programs. In IEEE Symposium on Security and Privacy (S&P), pages 61–76. IEEE, 2010.Google Scholar
  48. 106.
    Christoph Csallner and Yannis Smaragdakis. Check‘n’crash: combining static checking and testing. In Proceedings of the 27th international conference on Software engineering, pages 422–431. ACM, 2005.Google Scholar
  49. 107.
    Ţăpuş, Cristian and Chung, I-Hsin and Hollingsworth, Jeffrey K and others. Active harmony: Towards automated performance tuning. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–11. IEEE Computer Society Press, 2002.Google Scholar
  50. 110.
    Yaniv David, Nimrod Partush, and Eran Yahav. Statistical similarity of binaries. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 266–280. ACM, 2016.Google Scholar
  51. 112.
    Yaniv David and Eran Yahav. Tracelet-based code search in executables. ACM SIGPLAN Notices, 49(6):349–360, 2014.CrossRefGoogle Scholar
  52. 115.
    De Maesschalck, Roy and Jouan-Rimbaud, Delphine, and Massart, Désiré L. The mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50(1): 1–18, 2000.CrossRefGoogle Scholar
  53. 116.
    Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340. Springer, 2008.Google Scholar
  54. 117.
    Alessandro Di Federico, Mathias Payer, and Giovanni Agosta. REV.NG: a unified binary analysis framework to recover CFGs and function boundaries. In Proceedings of the 26th International Conference on Compiler Construction, pages 131–141. ACM, 2017.Google Scholar
  55. 118.
    Steven HH Ding, Benjamin Fung, and Philippe Charland. Kam1n0: Mapreduce-based assembly clone search for reverse engineering. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 461–470. ACM, 2016.Google Scholar
  56. 120.
    Adel Djoudi and Sébastien Bardin. BINSEC: Binary Code Analysis with Low-Level Regions. In Tools and Algorithms for the Construction and Analysis of Systems, pages 212–217. Springer, 2015.Google Scholar
  57. 121.
    Tudor Dumitraş and Darren Shou. Toward a standard benchmark for computer security research: The Worldwide Intelligence Network Environment (WINE). In Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS workshop), pages 89–96. ACM, 2011.Google Scholar
  58. 124.
    Manuel Egele, Theodoor Scholte, Engin Kirda, and Christopher Kruegel. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys (CSUR), 44(2):6, 2012.Google Scholar
  59. 125.
    Manuel Egele, Maverick Woo, Peter Chapman, and David Brumley. Blanket execution: Dynamic similarity testing for program binaries and components. In 23rd USENIX Security Symposium (USENIX Security 14), pages 303–317, 2014.Google Scholar
  60. 129.
    Khaled ElWazeer, Kapil Anand, Aparna Kotha, Matthew Smithson, and Rajeev Barua. Scalable variable and data type detection in a binary rewriter. In ACM SIGPLAN Notices, volume 48, pages 51–60. ACM, 2013.Google Scholar
  61. 130.
    Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. discovRE: Efficient cross-architecture identification of bugs in binary code. In Proceedings of the 23rd Symposium on Network and Distributed System Security (NDSS), 2016.Google Scholar
  62. 131.
    Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. Journal of machine learning research, 9(Aug):1871–1874, 2008.zbMATHGoogle Scholar
  63. 132.
    Wenbin Fang, Barton P Miller, and James A Kupsch. Automated tracing and visualization of software security structure and properties. In Proceedings of the ninth international symposium on visualization for cyber security, pages 9–16. ACM, 2012.Google Scholar
  64. 133.
    Mohammad Reza Farhadi, Benjamin Fung, Philippe Charland, and Mourad Debbabi. BinClone: detecting code clones in malware. In Eighth International Conference on Software Security and Reliability (SERE), pages 78–87. IEEE, 2014.Google Scholar
  65. 136.
    Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. Scalable Graph-based Bug Search for Firmware Images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), pages 480–491. ACM, 2016.Google Scholar
  66. 137.
    Jeanne Ferrante, Karl J Ottenstein, and Joe D Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 9(3):319–349, 1987.zbMATHCrossRefGoogle Scholar
  67. 139.
    Halvar Flake. Graph-based binary analysis. Blackhat Briefings 2002, 2002.Google Scholar
  68. 140.
    Martin Fowler. Refactoring: improving the design of existing code. Pearson Education India, 1999.zbMATHGoogle Scholar
  69. 143.
    Junhao Gan, Jianlin Feng, Qiong Fang, and Wilfred Ng. Locality-sensitive hashing scheme based on dynamic collision counting. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pages 541–552. ACM, 2012.Google Scholar
  70. 147.
    Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: directed automated random testing. In ACM Sigplan Notices, volume 40, pages 213–223. ACM, 2005.Google Scholar
  71. 148.
    Patrice Godefroid, Michael Y Levin, and David Molnar. SAGE: whitebox fuzzing for security testing. Communications of the ACM, 55(3):40–44, 2012.CrossRefGoogle Scholar
  72. 151.
    Ilfak Guilfanov. IDA fast library identification and recognition technology (FLIRT Technology): In-depth. https://www.hex\-rays.com/products/ida/tech/flirt/in_depth.shtml, 2012.
  73. 152.
    Sumit Gulwani and George C Necula. Precise interprocedural analysis using random interpretation. In ACM SIGPLAN Notices, volume 40, pages 324–337. ACM, 2005.Google Scholar
  74. 153.
    Archit Gupta, Pavan Kuppili, Aditya Akella, and Paul Barford. An empirical study of malware evolution. In First International Communication Systems and Networks and Workshops (COMSNETS), pages 1–10. IEEE, 2009.Google Scholar
  75. 155.
    Wook-Shin Han, Jinsoo Lee, and Jeong-Hoon Lee. TurboISO: towards ultrafast and robust subgraph isomorphism search in large graph databases. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pages 337–348. ACM, 2013.Google Scholar
  76. 156.
    Sean Heelan. Automatic generation of control flow hijacking exploits for software vulnerabilities. PhD thesis, University of Oxford, 2009.Google Scholar
  77. 157.
    Sean Heelan and Agustin Gianni. Augmenting vulnerability analysis of binary code. In Proceedings of the 28th Annual Computer Security Applications Conference (ACSAC), pages 199–208. ACM, 2012.Google Scholar
  78. 158.
    Christian Heitman and Iván Arce. BARF: A multiplatform open source binary analysis and reverse engineering framework. In XX Congreso Argentino de Ciencias de la Computación (Buenos Aires, 2014), 2014.Google Scholar
  79. 159.
    Armijn Hemel, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Dolstra. Finding software license violations through binary code clone detection. In Proceedings of the 8th Working Conference on Mining Software Repositories, pages 63–72. ACM, 2011.Google Scholar
  80. 161.
    Susan Horwitz, Thomas Reps, and David Binkley. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(1):26–60, 1990.CrossRefGoogle Scholar
  81. 164.
    Emily R Jacobson, Andrew R Bernat, William R Williams, and Barton P Miller. Detecting code reuse attacks with a model of conformant program execution. In Engineering Secure Software and Systems, pages 1–18. Springer, 2014.Google Scholar
  82. 165.
    Emily R Jacobson, Nathan Rosenblum, and Barton P Miller. Labeling library functions in stripped binaries. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools (PASTE), pages 1–8. ACM, 2011.Google Scholar
  83. 166.
    Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651–666, 2010.CrossRefGoogle Scholar
  84. 167.
    Jiyong Jang, Abeer Agrawal, and David Brumley. ReDeBug: finding unpatched code clones in entire os distributions. In IEEE Symposium on Security and Privacy (S&P), pages 48–62. IEEE, 2012.Google Scholar
  85. 168.
    Jiyong Jang and David Brumley. Bitshred: Fast, scalable code reuse detection in binary code. CMU-CyLab-10-006, 16, 2009.Google Scholar
  86. 169.
    Jiyong Jang, Maverick Woo, and David Brumley. Towards automatic software lineage inference. In USENIX Security Symposium (USENIX Security 13), pages 81–96, 2013.Google Scholar
  87. 170.
    Yoon-Chan Jhi, Xinran Wang, Xiaoqi Jia, Sencun Zhu, Peng Liu, and Dinghao Wu. Value-based program characterization and its application to software plagiarism detection. In Proceedings of the 33rd International Conference on Software Engineering, pages 756–765. ACM, 2011.Google Scholar
  88. 171.
    Weiwei Jin, Sagar Chaki, Cory Cohen, Arie Gurfinkel, Jeffrey Havrilla, Charles Hines, and Priya Narasimhan. Binary function clustering using semantic hashes. In The 11th International Conference on Machine Learning and Applications (ICMLA), volume 1, pages 386–391. IEEE, 2012.Google Scholar
  89. 172.
    Jousselme, Anne-Laure and Maupin, Patrick. Distances in evidence theory: Comprehensive survey and generalizations. International Journal of Approximate Reasoning, 53(2), 118–145, 2012.MathSciNetzbMATHCrossRefGoogle Scholar
  90. 173.
    Pascal Junod, Julien Rinaldini, Johan Wehrli, and Julie Michielin. Obfuscator-LLVM: software protection for the masses. In Proceedings of the 1st International Workshop on Software PROtection (SPRO), pages 3–9. IEEE Press, 2015.Google Scholar
  91. 177.
    Md Enamul Karim, Andrew Walenstein, Arun Lakhotia, and Laxmi Parida. Malware phylogeny generation using permutations of code. Journal in Computer Virology, 1(1-2):13–23, 2005.CrossRefGoogle Scholar
  92. 179.
    Wei Ming Khoo, Alan Mycroft, and Ross Anderson. Rendezvous: a search engine for binary code. In Proceedings of the 10th Working Conference on Mining Software Repositories, pages 329–338. IEEE Press, 2013.Google Scholar
  93. 180.
    Johannes Kinder. Static analysis of x86 executables. PhD thesis, Technische Universität Darmstadt, 2010.Google Scholar
  94. 181.
    Johannes Kinder and Helmut Veith. Jakstab: A static analysis platform for binaries. In International Conference on Computer Aided Verification, pages 423–427. Springer, 2008.Google Scholar
  95. 185.
    Jonghoon Kwon and Heejo Lee. Bingraph: Discovering mutant malware using hierarchical semantic signatures. In Malicious and Unwanted Software (MALWARE), 2012 7th International Conference on, pages 104–111. IEEE, 2012.Google Scholar
  96. 187.
    Shuvendu K Lahiri, Chris Hawblitzel, Ming Kawaguchi, and Henrique Rebêlo. Symdiff: A language-agnostic semantic diff tool for imperative programs. In International Conference on Computer Aided Verification, pages 712–717. Springer, 2012.Google Scholar
  97. 188.
    Arun Lakhotia, Mila Dalla Preda, and Roberto Giacobazzi. Fast location of similar code fragments using semantic ‘juice’. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, page 5. ACM, 2013.Google Scholar
  98. 189.
    Andrea Lanzi, Davide Balzarotti, Christopher Kruegel, Mihai Christodorescu, and Engin Kirda. Accessminer: using system-centric models for malware protection. In Proceedings of the 17th ACM conference on Computer and communications security (CCS), pages 399–412. ACM, 2010.Google Scholar
  99. 191.
    Meir M Lehman and Juan F Ramil. Rules and tools for software evolution planning and management. Annals of software engineering, 11(1):15–44, 2001.Google Scholar
  100. 193.
    Pierre Lestringant, Frédéric Guihéry, and Pierre-Alain Fouque. Automated identification of cryptographic primitives in binary code with data flow graph isomorphism. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, pages 203–214. ACM, 2015.Google Scholar
  101. 194.
    Yuping Li, Sathya Chandran Sundaramurthy, Alexandru G Bardas, Xinming Ou, Doina Caragea, Xin Hu, and Jiyong Jang. Experimental study of fuzzy hashing in malware clustering analysis. In 8th Workshop on Cyber Security Experimentation and Test (CSET 15), 2015.Google Scholar
  102. 195.
    Michael Ligh, Steven Adair, Blake Hartstein, and Matthew Richard. Malware analyst’s cookbook and DVD: tools and techniques for fighting malicious code. Wiley Publishing, 2010.Google Scholar
  103. 196.
    Da Lin and Mark Stamp. Hunting for undetectable metamorphic viruses. Journal in computer virology, 7(3):201–214, 2011.CrossRefGoogle Scholar
  104. 197.
    Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. Automatic reverse engineering of data structures from binary execution. In Proceedings of the 11th Annual Information Security Symposium, page 5. CERIAS-Purdue University, 2010.Google Scholar
  105. 199.
    Yingfan Liu, Jiangtao Cui, Zi Huang, Hui Li, and Heng Tao Shen. Sk-lsh: An efficient index structure for approximate nearest neighbor search. Proceedings of the VLDB Endowment, 7(9):745–756, 2014.CrossRefGoogle Scholar
  106. 201.
    Fan Long, Stelios Sidiroglou-Douskos, and Martin Rinard. Automatic runtime error repair and containment via recovery shepherding. In ACM SIGPLAN Notices, volume 49, pages 227–238. ACM, 2014.Google Scholar
  107. 202.
    Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 389–400. ACM, 2014.Google Scholar
  108. 203.
    Matias Madou, Bertrand Anckaert, Bjorn De Sutter, and Koen De Bosschere. Hybrid static-dynamic attacks against software protection mechanisms. In Proceedings of the 5th ACM workshop on Digital rights management, pages 75–82. ACM, 2005.Google Scholar
  109. 207.
    Lorenzo Martignoni, Stephen McCamant, Pongsin Poosankam, Dawn Song, and Petros Maniatis. Path-exploration lifting: Hi-fi tests for lo-fi emulators. In ACM SIGARCH Computer Architecture News, volume 40, pages 337–348. ACM, 2012.Google Scholar
  110. 208.
    Sven Mattsen, Arne Wichmann, and Sibylle Schupp. A non-convex abstract domain for the value analysis of binaries. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pages 271–280. IEEE, 2015.Google Scholar
  111. 210.
    Eitan Menahem, Asaf Shabtai, Lior Rokach, and Yuval Elovici. Improving malware detection by applying multi-inducer ensemble. Computational Statistics & Data Analysis, 53(4):1483–1494, 2009.MathSciNetzbMATHCrossRefGoogle Scholar
  112. 211.
    Charith Mendis, Jeffrey Bosboom, Kevin Wu, Shoaib Kamil, Jonathan Ragan-Kelley, Sylvain Paris, Qin Zhao, and Saman Amarasinghe. Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide dsl code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 391–402. ACM, 2015.Google Scholar
  113. 212.
    Xiaozhu Meng. Fine-grained binary code authorship identification. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 1097–1099. ACM, 2016.Google Scholar
  114. 213.
    Xiaozhu Meng, Barton P Miller, and Kwang-Sung Jun. Identifying multiple authors in a binary program. In European Symposium on Research in Computer Security (ESORICS), pages 286–304. Springer, 2017.Google Scholar
  115. 214.
    Barton P Miller, Mark D Callaghan, Jonathan M Cargille, Jeffrey K Hollingsworth, R Bruce Irvin, Karen L Karavanic, Krishna Kunchithapadam, and Tia Newhall. The paradyn parallel performance measurement tool. Computer, 28(11):37–46, 1995.CrossRefGoogle Scholar
  116. 216.
    Jiang Ming, Meng Pan, and Debin Gao. iBinHunt: binary hunting with inter-procedural control flow. In Information Security and Cryptology–ICISC 2012, pages 92–109. Springer, 2012.Google Scholar
  117. 218.
    Mondaini, Rubem P. BIOMAT 2012: International Symposium on Mathematical and Computational Biology, Tempe, Arizona, USA, 6-10 November 2012. World Scientific, 2013.Google Scholar
  118. 221.
    James Munkres. Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 5(1):32–38, 1957.MathSciNetzbMATHCrossRefGoogle Scholar
  119. 223.
    Lakshmanan Nataraj, Dhilung Kirat, BS Manjunath, and Giovanni Vigna. SARVAM: Search and retrieval of malware. In Worshop on Next Generation Malware Attacks and Defense (NGMAD), 2013.Google Scholar
  120. 225.
    Beng Heng Ng and Aravind Prakash. Exposé: discovering potential binary code re-use. In IEEE 37th Annual Computer Software and Applications Conference (COMPSAC), pages 492–501. IEEE, 2013.Google Scholar
  121. 227.
    Pádraig OáSullivan, Kapil Anand, Aparna Kotha, Matthew Smithson, Rajeev Barua, and Angelos D Keromytis. Retrofitting security in cots software with binary rewriting. In Future Challenges in Security and Privacy for Academia and Industry, pages 154–172. Springer, 2011.Google Scholar
  122. 228.
    Karl J Ottenstein and Linda M Ottenstein. The program dependence graph in a software development environment. In ACM Sigplan Notices, volume 19, pages 177–184. ACM, 1984.Google Scholar
  123. 231.
    Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. Cross-architecture bug search in binary executables. In IEEE Symposium on Security and Privacy (S&P), pages 709–724. IEEE, 2015.Google Scholar
  124. 232.
    Jannik Pewny, Felix Schuster, Lukas Bernhard, Thorsten Holz, and Christian Rossow. Leveraging semantic signatures for bug search in binary programs. In Proceedings of the 30th Annual Computer Security Applications Conference (ACSAC), pages 406–415. ACM, 2014.Google Scholar
  125. 233.
    Van-Thuan Pham, Wei Boon Ng, Konstantin Rubinov, and Abhik Roychoudhury. Hercules: reproducing crashes in real-world application binaries. In Proceedings of the 37th International Conference on Software Engineering-Volume 1, pages 891–901. IEEE Press, 2015.Google Scholar
  126. 235.
    Jing Qiu, Xiaohong Su, and Peijun Ma. Library functions identification in binary code by using graph isomorphism testings. In 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER), pages 261–270. IEEE, 2015.Google Scholar
  127. 236.
    Jing Qiu, Xiaohong Su, and Peijun Ma. Using reduced execution flow graph to identify library functions in binary code. IEEE Transactions on Software Engineering (TSE), 42(2):187–202, 2016.CrossRefGoogle Scholar
  128. 238.
    Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, and Saman Amarasinghe. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 48(6):519–530, 2013.CrossRefGoogle Scholar
  129. 240.
    Ashkan Rahimian, Paria Shirani, Saed Alrbaee, Lingyu Wang, and Mourad Debbabi. Bincomp: A stratified approach to compiler provenance attribution. Digital Investigation, 14:S146–S155, 2015.CrossRefGoogle Scholar
  130. 244.
    David A Ramos and Dawson Engler. Under-constrained symbolic execution: correctness checking for real code. In 24th USENIX Security Symposium (USENIX Security 15), pages 49–64, 2015.Google Scholar
  131. 245.
    Alexandre Rebert, Sang Kil Cha, Thanassis Avgerinos, Jonathan Foote, David Warren, Gustavo Grieco, and David Brumley. Optimizing seed selection for fuzzing. In 23rd USENIX Security Symposium (USENIX Security 14), pages 861–875, 2014.Google Scholar
  132. 246.
    Konrad Rieck, Philipp Trinius, Carsten Willems, and Thorsten Holz. Automatic analysis of malware behavior using machine learning. Journal of Computer Security, 19(4):639–668, 2011.CrossRefGoogle Scholar
  133. 249.
    Roman, Steven. Coding and Information Theory, vol. 134, Springer Science & Business Media, 1992.Google Scholar
  134. 250.
    Nathan Rosenblum, Barton P Miller, and Xiaojin Zhu. Recovering the toolchain provenance of binary code. In Proceedings of the International Symposium on Software Testing and Analysis, pages 100–110. ACM, 2011.Google Scholar
  135. 251.
    Nathan Rosenblum, Xiaojin Zhu, and Barton P Miller. Who wrote this code? identifying the authors of program binaries. In European Symposium on Research in Computer Security (ESORICS), pages 172–189. Springer, 2011.Google Scholar
  136. 252.
    Nathan E Rosenblum, Barton P Miller, and Xiaojin Zhu. Extracting compiler provenance from program binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, pages 21–28. ACM, 2010.Google Scholar
  137. 253.
    Kevin A Roundy and Barton P Miller. Hybrid analysis and control of malware. In Recent Advances in Intrusion Detection (RAID), pages 317–338. Springer, 2010.Google Scholar
  138. 254.
    Chanchal K Roy, James R Cordy, and Rainer Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach. Science of Computer Programming, 74(7):470–495, 2009.MathSciNetzbMATHCrossRefGoogle Scholar
  139. 255.
    Brian Ruttenberg, Craig Miles, Lee Kellogg, Vivek Notani, Michael Howard, Charles LeDoux, Arun Lakhotia, and Avi Pfeffer. Identifying shared software components to support malware forensics. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), pages 21–40. Springer, 2014.Google Scholar
  140. 256.
    Andreas Sæbjørnsen, Jeremiah Willcock, Thomas Panas, Daniel Quinlan, and Zhendong Su. Detecting code clones in binary executables. In Proceedings of the eighteenth international symposium on Software testing and analysis, pages 117–128. ACM, 2009.Google Scholar
  141. 258.
    Saul Schleimer, Daniel S Wilkerson, and Alex Aiken. Winnowing: local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 76–85. ACM, 2003.Google Scholar
  142. 259.
    Matthew G Schultz, Eleazar Eskin, Erez Zadok, and Salvatore J Stolfo. Data mining methods for detection of new malicious executables. In IEEE Symposium on Security and Privacy (S&P), pages 38–49. IEEE, 2001.Google Scholar
  143. 260.
    Farrukh Shahzad and Muddassar Farooq. ELF-Miner: using structural knowledge and data mining methods to detect new (Linux) malicious executables. Knowledge and information systems, 30(3):589–612, 2012.CrossRefGoogle Scholar
  144. 262.
    Eui Chul Richard Shin, Dawn Song, and Reza Moazzezi. Recognizing functions in binaries with neural networks. In 24th USENIX Security Symposium (USENIX Security 15), pages 611–626, 2015.Google Scholar
  145. 264.
    Paria Shirani, Lingyu Wang, and Mourad Debbabi. BinShape: Scalable and robust binary library function identification using function shape. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), pages 301–324. Springer, 2017.Google Scholar
  146. 265.
    Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, et al. Sok:(state of) the art of war: Offensive techniques in binary analysis. In IEEE Symposium on Security and Privacy (SP), pages 138–157. IEEE, 2016.Google Scholar
  147. 267.
    Asia Slowinska, Traian Stancescu, and Herbert Bos. Howard: A dynamic excavator for reverse engineering data structures. In NDSS. Citeseer, 2011.Google Scholar
  148. 269.
    Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. Bitblaze: A new approach to computer security via binary analysis. In Information systems security, pages 1–25. Springer, 2008.Google Scholar
  149. 274.
    Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment, 5(9):788–799, 2012.CrossRefGoogle Scholar
  150. 275.
    Johan AK Suykens and Joos Vandewalle. Least squares support vector machine classifiers. Neural processing letters, 9(3):293–300, 1999.Google Scholar
  151. 276.
    Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. Quality and efficiency in high dimensional nearest neighbor search. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 563–576. ACM, 2009.Google Scholar
  152. 277.
    Yufei Tao, Ke Yi, Cheng Sheng, and Panos Kalnis. Efficient and accurate nearest neighbor and closest pair search in high-dimensional space. ACM Transactions on Database Systems (TODS), 35(3):20, 2010.Google Scholar
  153. 282.
    Julian R Ullmann. An algorithm for subgraph isomorphism. Journal of the ACM (JACM), 23(1):31–42, 1976.MathSciNetCrossRefGoogle Scholar
  154. 283.
    Maarten Van Emmerik. Identifying library functions in executable file using patterns. In Software Engineering Conference, 1998. Proceedings. 1998 Australian, pages 90–97. IEEE, 1998.Google Scholar
  155. 285.
    William M Waite and Gerhard Goos. Compiler construction. Springer Science & Business Media, 2012.Google Scholar
  156. 286.
    Andrew Walenstein, Michael Venable, Matthew Hayes, Christopher Thompson, and Arun Lakhotia. Exploiting similarity between variants to defeat malware. In Proc. BlackHat DC Conf, 2007.Google Scholar
  157. 288.
    Xinran Wang, Chi-Chun Pan, Peng Liu, and Sencun Zhu. Sigfree: A signature-free buffer overflow attack blocker. Dependable and Secure Computing, IEEE Transactions on, 7(1):65–79, 2010.CrossRefGoogle Scholar
  158. 289.
    Zheng Wang, Ken Pierce, and Scott McFarling. Bmat-a binary matching tool for stale profile propagation. The Journal of Instruction-Level Parallelism, 2:1–20, 2000.Google Scholar
  159. 290.
    Daniel Weise, Roger F Crew, Michael Ernst, and Bjarne Steensgaard. Value dependence graphs: Representation without taxation. In Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 297–310. ACM, 1994.Google Scholar
  160. 291.
    Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin. Symstra: A framework for generating object-oriented unit tests using symbolic execution. In Tools and Algorithms for the Construction and Analysis of Systems, pages 365–381. Springer, 2005.Google Scholar
  161. 293.
    Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In IEEE Symposium on Security and Privacy, pages 797–812. IEEE, 2015.Google Scholar
  162. 298.
    Junyuan Zeng, Yangchun Fu, Kenneth A Miller, Zhiqiang Lin, Xiangyu Zhang, and Dongyan Xu. Obfuscation resilient binary code reuse through trace-oriented programming. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (CCS), pages 487–498. ACM, 2013.Google Scholar
  163. 302.
    Viviane Zwanger and Felix C Freiling. Kernel mode API spectroscopy for incident response and digital forensics. In Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop, page 3. ACM, 2013.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Saed Alrabaee
    • 1
  • Mourad Debbabi
    • 2
  • Paria Shirani
    • 2
  • Lingyu Wang
    • 2
  • Amr Youssef
    • 2
  • Ashkan Rahimian
    • 3
  • Lina Nouh
    • 4
  • Djedjiga Mouheb
    • 5
  • He Huang
    • 6
  • Aiman Hanna
    • 2
  1. 1.Information Systems & Security (CIT)United Arab Emirates UniversityAl AinUAE
  2. 2.Gina Cody School of Engineering and Computer ScienceConcordia UniversityMontrealCanada
  3. 3.East TowerBay Adelaide Centre Deloitte CanadaTorontoCanada
  4. 4.Deloitte Middle EastRiyadhSaudi Arabia
  5. 5.Department of Computer ScienceUniversity of SharjahSharjahUAE
  6. 6.Moody’s AnalyticsTorontoCanada

Personalised recommendations