Skip to main content

An Adaptable Infrastructure to Generate Training Datasets for Decompilation Issues

  • Conference paper
Book cover New Perspectives in Information Systems and Technologies, Volume 2

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 276))

Abstract

The conventional decompilation approach is based on a combination of heuristics and pattern matching. This approach depends on the processor architecture, the code generation templates used by the compiler, and the optimization level. In addition, there are specific scenarios where heuristics and pattern matching do not infer high-level information such as the return type of a function. Since AI has been previously used in similar scenarios, we have designed an adaptable infrastructure to facilitate the use of AI techniques for overcoming the decompilation issues detected. The proposed infrastructure is aimed at automatically generating training datasets. The architecture follows the Pipes and Filters architectural pattern that facilitates adapting the infrastructure to different kind of decompilation scenarios. It also makes it easier to parallelize the implementation. The generated datasets can be processed in any AI engine, training the predictive model obtained before adding it to the decompiler as a plug-in.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Guilfanov, I.: Decompilers and beyond. Black Hat USA (2008)

    Google Scholar 

  2. Troshina, K., Chernov, A., Derevenets, Y.: C Decompilation: Is It Possible. In: Proceedings of International Workshop on Program Understanding, Altai Mountains, Russia, pp. 18–27 (2009)

    Google Scholar 

  3. Troshina, K., Chernov, A., Fokin, A.: Profile-based type reconstruction for decompilation. In: 2009 IEEE 17th International Conference on Program Comprehension, pp. 263–267. IEEE (2009)

    Google Scholar 

  4. Cifuentes, C.: A structuring algorithm for decompilation. In: Proceedings of the XIX Conferencia Latinoamericana de Informática, pp. 267–276 (1993)

    Google Scholar 

  5. Schwartz, E., Lee, J., Woo, M., Brumley, D.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: USENIX Secur. Symp. (2013)

    Google Scholar 

  6. Fokin, A., Derevenetc, E., Chernov, A., Troshina, K.: SmartDec: Approaching C++ Decompilation. In: 2011 18th Working Conference on Reverse Engineering, pp. 347–356. IEEE (2011)

    Google Scholar 

  7. Rosenblum, N., Zhu, X., Miller, B., Hunt, K.: Learning to analyze binary computer code. In: Proceedings of the 23rd Conference on Artificial Intelligence, Chicago, pp. 798–804 (2008)

    Google Scholar 

  8. Van Emmerik, M.: Boomerang: Information for students, http://boomerang.sourceforge.net/students.php

  9. Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 522–536. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Watt, D., Brown, D., Sebesta, R.W.: Programming Language Processors in Java: Compilers and Interpreters and Concepts of Programming Languages (2007)

    Google Scholar 

  11. Muchnick, S.S.: Advanced compiler design and implementation (1998)

    Google Scholar 

  12. Alpaydin, E.: Introduction to Machine Learning. The MIT Press (2010)

    Google Scholar 

  13. MSDN: Calling conventions, http://msdn.microsoft.com/en-us/library/k2b2ssfy.aspx

  14. Jönsson, A.: Calling conventions on the x86 platform, http://www.angelcode.com/dev/callconv/callconv.html

  15. Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern-Oriented Software Architecture. A System of Patterns, vol. 1. Wiley (1996)

    Google Scholar 

  16. Clang: a C language family frontend for LLVM, http://clang.llvm.org/

  17. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1–13 (2008)

    Article  Google Scholar 

  18. Hanif, Z., Calhoun, T., Trost, J.: BinaryPig: Scalable Static Binary Analysis Over Hadoop. Black Hat USA 2013 (2012)

    Google Scholar 

  19. Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning 18th International Conf. on Machine Learning, pp. 282–289 (2001)

    Google Scholar 

  20. Rosenblum, N.E., Miller, B.P., Zhu, X.: Extracting compiler provenance from program binaries. In: Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE 2010, p. 21. ACM Press, Toronto (2010)

    Google Scholar 

  21. Ugarte-Pedrero, X., Santos, I., Bringas, P.G.: Structural feature based anomaly detection for packed executable identification. In: Herrero, Á., Corchado, E. (eds.) CISIS 2011. LNCS, vol. 6694, pp. 230–237. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  22. Santos, I., Ugarte-Pedrero, X., Sanz, B., Laorden, C., Bringas, P.G.: Collective classification for packed executable identification. In: Proceedings of the 8th Annual Anti-Abuse and Spam Conference on Collaboration, Electronic Messaging, CEAS 2011, pp. 23–30. ACM Press, Perth (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Escalada .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Escalada, J., Ortin, F. (2014). An Adaptable Infrastructure to Generate Training Datasets for Decompilation Issues. In: Rocha, Á., Correia, A., Tan, F., Stroetmann, K. (eds) New Perspectives in Information Systems and Technologies, Volume 2. Advances in Intelligent Systems and Computing, vol 276. Springer, Cham. https://doi.org/10.1007/978-3-319-05948-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05948-8_9

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05947-1

  • Online ISBN: 978-3-319-05948-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics