Journal of Automated Reasoning

, Volume 60, Issue 3, pp 257–277 | Cite as

Bidirectional Grammars for Machine-Code Decoding and Encoding

Article
  • 44 Downloads

Abstract

Binary analysis, which analyzes machine code, requires a decoder for converting bits into abstract syntax of machine instructions. Binary rewriting requires an encoder for converting instructions to bits. We propose a domain-specific language that enables the specification of both decoding and encoding in a single bidirectional grammar. With dependent types, a bigrammar enables the extraction of an executable decoder and encoder as well as a correctness proof showing their consistency. The bigrammar DSL is embedded in Coq with machine-checked proofs. We have used the bigrammar DSL to specify the decoding and encoding of subsets of the x86-32 and MIPS instruction sets. We have also extracted an executable decoder and encoder from the x86 bigrammar with competitive performance.

Keywords

Certified programs Parsing Grammars Domain-specific languages 

Notes

Acknowledgements

We thank the anonymous reviewers for their constructive comments. This research is supported by NSF Grants CCF-1624124 and CNS-1624126. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of any of the above organizations or any person connected with them.

References

  1. 1.
    Abadi, M., Budiu, M., Erlingsson, Ú., Ligatti, J.: Control-flow integrity. In: 12th ACM Conference on Computer and Communications Security (CCS), pp. 340–353 (2005)Google Scholar
  2. 2.
    Alimarine, A., Smetsers, S., van Weelden, A., van Eekelen, M.C.J.D., Plasmeijer, R.: There and back again: arrows for invertible programming. In: Proceedings of the ACM SIGPLAN Workshop on Haskell, pp. 86–97 (2005)Google Scholar
  3. 3.
    Bergeron, J., Debbabi, M., Desharnais, J., Erhioui, M., Lavoie, Y., Tawbi, N.: Static detection of malicious code in executable programs. Int. J. Requir. Eng. (2001)Google Scholar
  4. 4.
    Bohannon, A., Foster, J.N., Pierce, B.C., Pilkiewicz, A., Schmitt, A.: Boomerang: resourceful lenses for string data. In: 35th ACM Symposium on Principles of Programming Languages (POPL), pp. 407–419 (2008)Google Scholar
  5. 5.
    Brabrand, C., Møller, A., Schwartzbach, M.I.: Dual syntax for XML languages. In: 10th International Symposium on Database Programming Languages (DBPL), pp. 27–41 (2005)Google Scholar
  6. 6.
    Brumley, D., Jager, I., Avgerinos, T., Schwartz, E.J.: BAP: a binary analysis platform. In: Computer Aided Verification (CAV), pp. 463–469 (2011)Google Scholar
  7. 7.
    Brzozowski, J.A.: Derivatives of regular expressions. J. ACM 11, 481–494 (1964)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Christodorescu, M., Jha, S.: Static analysis of executables to detect malicious patterns. In: 12th Usenix Security Symposium, pp. 169–186 (2003)Google Scholar
  9. 9.
    The Coq proof assistant. https://coq.inria.fr/
  10. 10.
    Ford, B.: Parsing expression grammars: A recognition-based syntactic foundation. In: 31st ACM Symposium on Principles of Programming Languages (POPL), pp. 111–122 (2004)Google Scholar
  11. 11.
    Fox, A.C.J.: Improved tool support for machine-code decompilation in HOL4. In: 6th International Conference on Interactive Theorem Proving (ITP), pp. 187–202 (2015)Google Scholar
  12. 12.
    Godefroid, P., Levin, M.Y., Molnar, D.A.: Automated whitebox fuzz testing. In: Network and Distributed System Security Symposium (NDSS) (2008)Google Scholar
  13. 13.
    Jansson, P., Jeuring, J.: Polytypic compact printing and parsing. In: 8th European Symposium on Programming (ESOP), pp. 273–287 (1999)Google Scholar
  14. 14.
    Jourdan, J.H., Pottier, F., Leroy, X.: Validating LR(1) parsers. In: European Symposium on Programming (ESOP), pp. 397–416 (2012)Google Scholar
  15. 15.
    Kawanaka, S., Hosoya, H.: biXid: a bidirectional transformation language for XML. In: ACM International Conference on Functional programming (ICFP), pp. 201–214 (2006)Google Scholar
  16. 16.
    Kennedy, A.: Pickler combinators. J. Funct. Program. 14(6), 727–739 (2004)CrossRefMATHGoogle Scholar
  17. 17.
    Kotha, A., Anand, K., Smithson, M., Yellareddy, G., Barua, R.: Automatic parallelization in a binary rewriter. In: Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2010, pp. 547–557 (2010)Google Scholar
  18. 18.
    Kroll, J., Dean, D.: BakerSFIeld: Bringing software fault isolation to x64. www.jkroll.com/papers/bakersfield_sfi.pdf
  19. 19.
    McCamant, S., Morrisett, G.: Evaluating SFI for a CISC architecture. In: 15th Usenix Security Symposium (2006)Google Scholar
  20. 20.
    Might, M., Darais, D., Spiewak, D.: Parsing with derivatives: a functional pearl. In: ACM International Conference on Functional programming (ICFP), pp. 189–195 (2011)Google Scholar
  21. 21.
    Morrisett, G., Tan, G., Tassarotti, J., Tristan, J.B., Gan, E.: Rocksalt: better, faster, stronger SFI for the x86. In: ACM Conference on Programming Language Design and Implementation (PLDI), pp. 395–404 (2012)Google Scholar
  22. 22.
    Owens, S., Reppy, J., Turon, A.: Regular-expression derivatives re-examined. J. Funct. Program. 19, 173–190 (2009)MathSciNetCrossRefMATHGoogle Scholar
  23. 23.
    Ramsey, N., Fernández, M.F.: Specifying representations of machine instructions. ACM Trans. Program. Lang. Syst. 19(3), 492–524 (1997)CrossRefGoogle Scholar
  24. 24.
    Rendel, T., Ostermann, K.: Invertible syntax descriptions: unifying parsing and pretty printing. In: Third ACM Haskell Symposium on Haskell, pp. 1–12 (2010)Google Scholar
  25. 25.
    Reps, T., Lim, J., Thakur, A., Balakrishnan, G., Lal, A.: There’s plenty of room at the bottom: analyzing and verifying machine code. In: Computer Aided Verification (CAV), pp. 41–56 (2010)Google Scholar
  26. 26.
    Sepp, A., Kranz, J., Simon, A.: GDSL: a generic decoder specification language for interpreting machine language. In: Tools for Automatic Program Analysis, pp. 53–64 (2012)Google Scholar
  27. 27.
    Song, D., Brumley, D., Yin, H., Caballero, J., Jager, I., Kang, M.G., Liang, Z., Newsome, J., Poosankam, P., Saxena, P.: BitBlaze: A new approach to computer security via binary analysis. In: Proceedings of the 4th International Conference on Information Systems Security (2008)Google Scholar
  28. 28.
    Tan, G., Morrisett, G.: Bidirectional grammars for machine-code decoding and encoding. In: 8th International Conference on Verified Software: Theories, Tools, and Experiments (VSTTE), pp. 73–89 (2016)Google Scholar
  29. 29.
    Wahbe, R., Lucco, S., Anderson, T., Graham, S.: Efficient software-based fault isolation. In: ACM SIGOPS Symposium on Operating Systems Principles (SOSP), pp. 203–216. ACM Press, New York (1993)Google Scholar
  30. 30.
    Wartell, R., Mohan, V., Hamlen, K.W., Lin, Z.: Securing untrusted code via compiler-agnostic binary rewriting. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 299–308 (2012)Google Scholar
  31. 31.
    Xu, Z., Miller, B., Reps, T.: Safety checking of machine code. In: ACM Conference on Programming Language Design and Implementation (PLDI), pp. 70–82 (2000)Google Scholar
  32. 32.
    Yee, B., Sehr, D., Dardyk, G., Chen, B., Muth, R., Ormandy, T., Okasaka, S., Narula, N., Fullagar, N.: Native client: A sandbox for portable, untrusted x86 native code. In: IEEE Symposium on Security and Privacy (S&P) (2009)Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2017

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringThe Pennsylvania State UniversityUniversity ParkUSA
  2. 2.Computing and Information ScienceCornell UniversityIthacaUSA

Personalised recommendations