Skip to main content

ELISA: ELiciting ISA of Raw Binaries for Fine-Grained Code and Data Separation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10885))

Abstract

Static binary analysis techniques are widely used to reconstruct the behavior and discover vulnerabilities in software when source code is not available. To avoid errors due to mis-interpreting data as machine instructions (or vice-versa), disassemblers and static analysis tools must precisely infer the boundaries between code and data. However, this information is often not readily available. Worse, compilers may embed small chunks of data inside the code section. Most state of the art approaches to separate code and data are rooted on recursive traversal disassembly, with severe limitations when dealing with indirect control instructions. We propose ELISA, a technique to separate code from data and ease the static analysis of executable files. ELISA leverages supervised sequential learning techniques to locate the code section(s) boundaries of header-less binary files, and to predict the instruction boundaries inside the identified code section. As a preliminary step, if the Instruction Set Architecture (ISA) of the binary is unknown, ELISA leverages a logistic regression model to identify the correct ISA from the file content. We provide a comprehensive evaluation on a dataset of executables compiled for different ISAs, and we show that our method is capable to identify code sections with a byte-level accuracy (F1 score) ranging from \(98.13\%\) to over \(99.9\%\) depending on the ISA. Fine-grained separation of code from embedded data on x86, x86-64 and ARM executables is accomplished with an accuracy of over \(99.9\%\).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    While it is possible to integrate our methodology with ISA-dependent heuristics, we show that our methodology achieves good results without ISA-specific knowledge.

  2. 2.

    The parameters m and n belong to the model and can be appropriately tuned; for example, in our evaluation we used grid search.

  3. 3.

    http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SimpleLogistic.html.

  4. 4.

    http://security.ece.cmu.edu/byteweight/.

  5. 5.

    We determine that \(C=10000\) is the optimal value through grid search optimization.

  6. 6.

    http://www.arduino.org/products/boards/arduino-uno.

  7. 7.

    https://github.com/BinaryAnalysisPlatform/arm-binaries.

References

  1. Song, D., et al.: BitBlaze: a new approach to computer security via binary analysis. In: Sekar, R., Pujari, A.K. (eds.) ICISS 2008. LNCS, vol. 5352, pp. 1–25. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89862-7_1

    Chapter  Google Scholar 

  2. Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 463–469. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1_37

    Chapter  Google Scholar 

  3. Shoshitaishvili, Y., Wang, R., Salls, C., Stephens, N., Polino, M., Dutcher, A., Grosen, J., Feng, S., Hauser, C., Kruegel, C., Vigna, G.: Sok: (state of) the art of war: offensive techniques in binary analysis. In: Proceedings of 2016 IEEE Symposium on Security and Privacy, SP, pp. 138–157 (2016)

    Google Scholar 

  4. Shoshitaishvili, Y., Wang, R., Hauser, C., Kruegel, C., Vigna, G.: Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In: Proceedings of 2015 Network and Distributed System Security Symposium, NDSS (2015)

    Google Scholar 

  5. Haller, I., Slowinska, A., Neugschwandtner, M., Bos, H.: Dowsing for overflows: a guided fuzzer to find buffer boundary violations. In: Proceedings of 22nd USENIX Security Symposium, USENIX Security 2013, pp. 49–64 (2013)

    Google Scholar 

  6. Corina, J., Machiry, A., Salls, C., Shoshitaishvili, Y., Hao, S., Kruegel, C., Vigna, G.: Difuze: interface aware fuzzing for kernel drivers. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, pp. 2123–2138 (2017)

    Google Scholar 

  7. Stephens, N., Grosen, J., Salls, C., Dutcher, A., Wang, R., Corbetta, J., Shoshitaishvili, Y., Kruegel, C., Vigna, G.: Driller: augmenting fuzzing through selective symbolic execution. In: Proceedings of 2016 Network and Distributed System Security Symposium, NDSS, vol. 16, pp. 1–16 (2016)

    Google Scholar 

  8. Cova, M., Felmetsger, V., Banks, G., Vigna, G.: Static detection of vulnerabilities in x86 executables. In: Proceedings of 22nd Annual Computer Security Applications Conference, ACSAC, pp. 269–278. IEEE (2006)

    Google Scholar 

  9. Kolsek, M.: Did microsoft just manually patch their equation editor executable? Why yes, yes they did. (cve-2017-11882) (2017). https://0patch.blogspot.com/2017/11/did-microsoft-just-manually-patch-their.html

  10. Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 522–536. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_34

    Chapter  Google Scholar 

  11. Andriesse, D., Chen, X., van der Veen, V., Slowinska, A., Bos, H.: An in-depth analysis of disassembly on full-scale x86/x64 binaries. In: Proceedings of 25th USENIX Security Symposium, USENIX Security 2016, pp. 583–600 (2016)

    Google Scholar 

  12. Andriesse, D., Slowinska, A., Bos, H.: Compiler-agnostic function detection in binaries. In: Proceedings of 2017 IEEE European Symposium on Security and Privacy, Euro S&P, pp. 177–189. IEEE (2017)

    Google Scholar 

  13. Chen, J.Y., Shen, B.Y., Ou, Q.H., Yang, W., Hsu, W.C.: Effective code discovery for ARM/Thumb mixed ISA binaries in a static binary translator. In: Proceedings of 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, CASES 2013, pp. 1–10 (2013)

    Google Scholar 

  14. Clemens, J.: Automatic classification of object code using machine learning. Digit. Investig. 14, S156–S162 (2015)

    Article  Google Scholar 

  15. Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of 18th International Conference on Machine Learning, ICML 2001, pp. 282–289. Morgan Kaufmann Publishers Inc. (2001)

    Google Scholar 

  16. Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: Advances in Neural Information Processing Systems, pp. 25–32 (2004)

    Google Scholar 

  17. Lacoste-Julien, S., Jaggi, M., Schmidt, M., Pletscher, P.: Block-coordinate Frank-Wolfe optimization for structural SVMs. In: Proceedings of 30th International Conference on Machine Learning, ICML 2013, pp. 53–61 (2013)

    Google Scholar 

  18. Müller, A.C., Behnke, S.: PyStruct - learning structured prediction in python. J. Mach. Learn. Res. 15, 2055–2060 (2014)

    MathSciNet  MATH  Google Scholar 

  19. Buluç, A., Fineman, J.T., Frigo, M., Gilbert, J.R., Leiserson, C.E.: Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In: Proceedings of 21st Annual Symposium on Parallelism in algorithms and architectures, SPAA 2009, pp. 233–244. ACM (2009)

    Google Scholar 

  20. Arduino: Built-In Examples. https://www.arduino.cc/en/Tutorial/BuiltInExamples

  21. NVIDIA: CUDA Samples. http://docs.nvidia.com/cuda/cuda-samples/index.html

  22. Legitimate Business Syndicate: The cLEMENCy Architecture (2017). https://blog.legitbs.net/2017/07/the-clemency-architecture.html

  23. Bao, T., Burket, J., Woo, M., Turner, R., Brumley, D.: ByteWeight: learning to recognize functions in binary code. In: Proceedings of 23rd USENIX Security Symposium, pp. 845–860 (2014)

    Google Scholar 

  24. Karampatziakis, N.: Static analysis of binary executables using structural SVMs. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems 23, pp. 1063–1071. Curran Associates, Inc. (2010)

    Google Scholar 

  25. Microsoft: Universal Windows Platform (UWP) app samples. https://github.com/Microsoft/Windows-universal-samples

  26. Microsoft: Dia2dump sample. https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/dia2dump-sample

  27. Eager, M.J.: Introduction to the DWARF debugging format (2012). http://www.dwarfstd.org/doc/Debugging

  28. Linn, C., Debray, S.: Obfuscation of executable code to improve resistance to static disassembly. In: Proceedings of 10th ACM Conference on Computer and Communications Security, CCS 2003, pp. 290–299. ACM (2003)

    Google Scholar 

  29. Kruegel, C., Robertson, W., Valeur, F., Vigna, G.: Static disassembly of obfuscated binaries. In: Proceedings of 13th USENIX Security Symposium (2004)

    Google Scholar 

  30. Rosenblum, N., Zhu, X., Miller, B., Hunt, K.: Learning to analyze binary computer code. In: Proceedings of 23th AAAI Conference on Artificial Intelligence, AAAI 2008, pp. 798–804. AAAI Press (2008)

    Google Scholar 

  31. Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: Proceedings of 24th USENIX Security Symposium, pp. 611–626 (2015)

    Google Scholar 

  32. McDaniel, M., Heydari, M.H.: Content based file type detection algorithms. In: Proceedings of 36th Annual Hawaii International Conference on System Sciences (2003)

    Google Scholar 

  33. Li, W.J., Wang, K., Stolfo, S.J., Herzog, B.: Fileprints: identifying file types by n-gram analysis. In: Proceedings of the 6th Annual IEEE SMC Information Assurance Workshop, IAW 2005, pp. 64–71. IEEE (2005)

    Google Scholar 

  34. Sportiello, L., Zanero, S.: Context-based file block classification. In: Peterson, G., Shenoi, S. (eds.) DigitalForensics 2012. IAICT, vol. 383, pp. 67–82. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33962-2_5

    Chapter  Google Scholar 

  35. Penrose, P., Macfarlane, R., Buchanan, W.J.: Approaches to the classification of high entropy file fragments. Digit. Investig. 10(4), 372–384 (2013)

    Article  Google Scholar 

  36. Granboulan, L.: cpu_rec: Recognize cpu instructions in an arbitrary binary file (2017). https://github.com/airbus-seclab/cpu_rec

  37. Oberhumer, M.F., Molnár, L., Reiser, J.F.: UPX: the Ultimate Packer for eXecutables. https://upx.github.io/

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their suggestions that led to improving this work. This project has been supported by the Italian Ministry of University and Research under the FIRB project FACE (Formal Avenue for Chasing malwarE), grant agreement nr. RBFR13AJFT; and by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement nr. 690972.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pietro De Nicolao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

De Nicolao, P., Pogliani, M., Polino, M., Carminati, M., Quarta, D., Zanero, S. (2018). ELISA: ELiciting ISA of Raw Binaries for Fine-Grained Code and Data Separation. In: Giuffrida, C., Bardin, S., Blanc, G. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2018. Lecture Notes in Computer Science(), vol 10885. Springer, Cham. https://doi.org/10.1007/978-3-319-93411-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93411-2_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93410-5

  • Online ISBN: 978-3-319-93411-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics