An Adaptable Infrastructure to Generate Training Datasets for Decompilation Issues

Escalada, Javier; Ortin, Franciso

doi:10.1007/978-3-319-05948-8_9

Javier Escalada⁶ &
Franciso Ortin⁶

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 276))

1900 Accesses
2 Citations

Abstract

The conventional decompilation approach is based on a combination of heuristics and pattern matching. This approach depends on the processor architecture, the code generation templates used by the compiler, and the optimization level. In addition, there are specific scenarios where heuristics and pattern matching do not infer high-level information such as the return type of a function. Since AI has been previously used in similar scenarios, we have designed an adaptable infrastructure to facilitate the use of AI techniques for overcoming the decompilation issues detected. The proposed infrastructure is aimed at automatically generating training datasets. The architecture follows the Pipes and Filters architectural pattern that facilitates adapting the infrastructure to different kind of decompilation scenarios. It also makes it easier to parallelize the implementation. The generated datasets can be processed in any AI engine, training the predictive model obtained before adding it to the decompiler as a plug-in.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Guilfanov, I.: Decompilers and beyond. Black Hat USA (2008)
Google Scholar
Troshina, K., Chernov, A., Derevenets, Y.: C Decompilation: Is It Possible. In: Proceedings of International Workshop on Program Understanding, Altai Mountains, Russia, pp. 18–27 (2009)
Google Scholar
Troshina, K., Chernov, A., Fokin, A.: Profile-based type reconstruction for decompilation. In: 2009 IEEE 17th International Conference on Program Comprehension, pp. 263–267. IEEE (2009)
Google Scholar
Cifuentes, C.: A structuring algorithm for decompilation. In: Proceedings of the XIX Conferencia Latinoamericana de Informática, pp. 267–276 (1993)
Google Scholar
Schwartz, E., Lee, J., Woo, M., Brumley, D.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: USENIX Secur. Symp. (2013)
Google Scholar
Fokin, A., Derevenetc, E., Chernov, A., Troshina, K.: SmartDec: Approaching C++ Decompilation. In: 2011 18th Working Conference on Reverse Engineering, pp. 347–356. IEEE (2011)
Google Scholar
Rosenblum, N., Zhu, X., Miller, B., Hunt, K.: Learning to analyze binary computer code. In: Proceedings of the 23rd Conference on Artificial Intelligence, Chicago, pp. 798–804 (2008)
Google Scholar
Van Emmerik, M.: Boomerang: Information for students, http://boomerang.sourceforge.net/students.php
Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 522–536. Springer, Heidelberg (2011)
Chapter Google Scholar
Watt, D., Brown, D., Sebesta, R.W.: Programming Language Processors in Java: Compilers and Interpreters and Concepts of Programming Languages (2007)
Google Scholar
Muchnick, S.S.: Advanced compiler design and implementation (1998)
Google Scholar
Alpaydin, E.: Introduction to Machine Learning. The MIT Press (2010)
Google Scholar
MSDN: Calling conventions, http://msdn.microsoft.com/en-us/library/k2b2ssfy.aspx
Jönsson, A.: Calling conventions on the x86 platform, http://www.angelcode.com/dev/callconv/callconv.html
Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern-Oriented Software Architecture. A System of Patterns, vol. 1. Wiley (1996)
Google Scholar
Clang: a C language family frontend for LLVM, http://clang.llvm.org/
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1–13 (2008)
Article Google Scholar
Hanif, Z., Calhoun, T., Trost, J.: BinaryPig: Scalable Static Binary Analysis Over Hadoop. Black Hat USA 2013 (2012)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning 18th International Conf. on Machine Learning, pp. 282–289 (2001)
Google Scholar
Rosenblum, N.E., Miller, B.P., Zhu, X.: Extracting compiler provenance from program binaries. In: Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE 2010, p. 21. ACM Press, Toronto (2010)
Google Scholar
Ugarte-Pedrero, X., Santos, I., Bringas, P.G.: Structural feature based anomaly detection for packed executable identification. In: Herrero, Á., Corchado, E. (eds.) CISIS 2011. LNCS, vol. 6694, pp. 230–237. Springer, Heidelberg (2011)
Chapter Google Scholar
Santos, I., Ugarte-Pedrero, X., Sanz, B., Laorden, C., Bringas, P.G.: Collective classification for packed executable identification. In: Proceedings of the 8th Annual Anti-Abuse and Spam Conference on Collaboration, Electronic Messaging, CEAS 2011, pp. 23–30. ACM Press, Perth (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, University of Oviedo, Calvo Sotelo s/n, 33007, Oviedo, Spain
Javier Escalada & Franciso Ortin

Authors

Javier Escalada
View author publications
You can also search for this author in PubMed Google Scholar
Franciso Ortin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Javier Escalada .

Editor information

Editors and Affiliations

Universidade de Coimbra & LIACC, Rio Tinto, Portugal
Álvaro Rocha
Universidade Nova de Lisboa, Instituto Superior de Estatística e Gestão de Informação, Lisboa, Portugal
Ana Maria Correia
Department of Business Information Systems, Auckland University of Technology, Auckland, New Zealand
Felix . B Tan
Empirica GmbH, Bonn, Germany
Karl . A Stroetmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Escalada, J., Ortin, F. (2014). An Adaptable Infrastructure to Generate Training Datasets for Decompilation Issues. In: Rocha, Á., Correia, A., Tan, F., Stroetmann, K. (eds) New Perspectives in Information Systems and Technologies, Volume 2. Advances in Intelligent Systems and Computing, vol 276. Springer, Cham. https://doi.org/10.1007/978-3-319-05948-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-05948-8_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05947-1
Online ISBN: 978-3-319-05948-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics