Abstract
The conventional decompilation approach is based on a combination of heuristics and pattern matching. This approach depends on the processor architecture, the code generation templates used by the compiler, and the optimization level. In addition, there are specific scenarios where heuristics and pattern matching do not infer high-level information such as the return type of a function. Since AI has been previously used in similar scenarios, we have designed an adaptable infrastructure to facilitate the use of AI techniques for overcoming the decompilation issues detected. The proposed infrastructure is aimed at automatically generating training datasets. The architecture follows the Pipes and Filters architectural pattern that facilitates adapting the infrastructure to different kind of decompilation scenarios. It also makes it easier to parallelize the implementation. The generated datasets can be processed in any AI engine, training the predictive model obtained before adding it to the decompiler as a plug-in.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Guilfanov, I.: Decompilers and beyond. Black Hat USA (2008)
Troshina, K., Chernov, A., Derevenets, Y.: C Decompilation: Is It Possible. In: Proceedings of International Workshop on Program Understanding, Altai Mountains, Russia, pp. 18–27 (2009)
Troshina, K., Chernov, A., Fokin, A.: Profile-based type reconstruction for decompilation. In: 2009 IEEE 17th International Conference on Program Comprehension, pp. 263–267. IEEE (2009)
Cifuentes, C.: A structuring algorithm for decompilation. In: Proceedings of the XIX Conferencia Latinoamericana de Informática, pp. 267–276 (1993)
Schwartz, E., Lee, J., Woo, M., Brumley, D.: Native x86 decompilation using semantics-preserving structural analysis and iterative control-flow structuring. In: USENIX Secur. Symp. (2013)
Fokin, A., Derevenetc, E., Chernov, A., Troshina, K.: SmartDec: Approaching C++ Decompilation. In: 2011 18th Working Conference on Reverse Engineering, pp. 347–356. IEEE (2011)
Rosenblum, N., Zhu, X., Miller, B., Hunt, K.: Learning to analyze binary computer code. In: Proceedings of the 23rd Conference on Artificial Intelligence, Chicago, pp. 798–804 (2008)
Van Emmerik, M.: Boomerang: Information for students, http://boomerang.sourceforge.net/students.php
Wartell, R., Zhou, Y., Hamlen, K.W., Kantarcioglu, M., Thuraisingham, B.: Differentiating code from data in x86 binaries. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 522–536. Springer, Heidelberg (2011)
Watt, D., Brown, D., Sebesta, R.W.: Programming Language Processors in Java: Compilers and Interpreters and Concepts of Programming Languages (2007)
Muchnick, S.S.: Advanced compiler design and implementation (1998)
Alpaydin, E.: Introduction to Machine Learning. The MIT Press (2010)
MSDN: Calling conventions, http://msdn.microsoft.com/en-us/library/k2b2ssfy.aspx
Jönsson, A.: Calling conventions on the x86 platform, http://www.angelcode.com/dev/callconv/callconv.html
Buschmann, F., Meunier, R., Rohnert, H., Sommerlad, P., Stal, M.: Pattern-Oriented Software Architecture. A System of Patterns, vol. 1. Wiley (1996)
Clang: a C language family frontend for LLVM, http://clang.llvm.org/
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1–13 (2008)
Hanif, Z., Calhoun, T., Trost, J.: BinaryPig: Scalable Static Binary Analysis Over Hadoop. Black Hat USA 2013 (2012)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. 18th International Conf. on Machine Learning 18th International Conf. on Machine Learning, pp. 282–289 (2001)
Rosenblum, N.E., Miller, B.P., Zhu, X.: Extracting compiler provenance from program binaries. In: Proceedings of the 9th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE 2010, p. 21. ACM Press, Toronto (2010)
Ugarte-Pedrero, X., Santos, I., Bringas, P.G.: Structural feature based anomaly detection for packed executable identification. In: Herrero, Á., Corchado, E. (eds.) CISIS 2011. LNCS, vol. 6694, pp. 230–237. Springer, Heidelberg (2011)
Santos, I., Ugarte-Pedrero, X., Sanz, B., Laorden, C., Bringas, P.G.: Collective classification for packed executable identification. In: Proceedings of the 8th Annual Anti-Abuse and Spam Conference on Collaboration, Electronic Messaging, CEAS 2011, pp. 23–30. ACM Press, Perth (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Escalada, J., Ortin, F. (2014). An Adaptable Infrastructure to Generate Training Datasets for Decompilation Issues. In: Rocha, Á., Correia, A., Tan, F., Stroetmann, K. (eds) New Perspectives in Information Systems and Technologies, Volume 2. Advances in Intelligent Systems and Computing, vol 276. Springer, Cham. https://doi.org/10.1007/978-3-319-05948-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-05948-8_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05947-1
Online ISBN: 978-3-319-05948-8
eBook Packages: EngineeringEngineering (R0)