Skip to main content

BinTrimmer: Towards Static Binary Debloating Through Abstract Interpretation

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11543))

Abstract

The increasing complexity of modern programs motivates software engineers to often rely on the support of third-party libraries. Although this practice allows application developers to achieve a compelling time-to-market, it often makes the final product bloated with conspicuous chunks of unused code. Other than making a program unnecessarily large, this dormant code could be leveraged by willful attackers to harm users. As a consequence, several techniques have been recently proposed to perform program debloating and remove (or secure) dead code from applications. However, state-of-the-art approaches are either based on unsound strategies, thus producing unreliable results, or pose too strict assumptions on the program itself.

In this work, we propose a novel abstract domain, called Signedness-Agnostic Strided Interval, which we use as the cornerstone to design a novel and sound static technique, based on abstract interpretation, to reliably perform program debloating. Throughout the paper, we detail the specifics of our approach and show its effectiveness and usefulness by implementing it in a tool, called BinTrimmer, to perform static program debloating on binaries.

Our evaluation shows that BinTrimmer can remove up to 65.6% of a library’s code and that our domain is, on average, 98% more precise than the related work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/ucsb-seclab/sasi.

  2. 2.

    https://github.com/angr/claripy/blob/master/claripy.

  3. 3.

    https://sourceware.org/binutils/docs/binutils/objdump.html.

  4. 4.

    https://www.spec.org/cpu2000/.

  5. 5.

    https://www.spec.org/cpu2006/.

  6. 6.

    https://code.google.com/archive/p/range-analysis/.

  7. 7.

    http://archive.darpa.mil/cybergrandchallenge/.

  8. 8.

    https://github.com/codeplea/tinyexpr.

  9. 9.

    https://github.com/littlstar/b64.c.

  10. 10.

    http://lcamtuf.coredump.cx/afl/.

  11. 11.

    https://github.com/JonathanSalwan/ROPgadget.

References

  1. Balakrishnan, G., Reps, T.: WYSINWYX: what you see is not what you execute. ACM Trans. Program. Lang. Syst. 32(6), 23:1–23:84 (2010)

    Article  Google Scholar 

  2. Bhattacharya, S., Nanda, M.G., Gopinath, K., Gupta, M.: Reuse, recycle to de-bloat software. In: Mezini, M. (ed.) ECOOP 2011. LNCS, vol. 6813, pp. 408–432. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22655-7_19

    Chapter  Google Scholar 

  3. Bruening, D.L.: Efficient, transparent, and comprehensive runtime code manipulation. Ph.D. dissertation, Cambridge, MA, USA (2004). aAI0807735

    Google Scholar 

  4. Caballero, J., Lin, Z.: Type inference on executables. ACM Comput. Surv. (CSUR) 48(4), 65 (2016)

    Article  Google Scholar 

  5. Chen, Y., Lan, T., Venkataramani, G.: DamGate: dynamic adaptive multi-feature gating in program binaries. In: Proceedings of the Workshop on Forming an Ecosystem Around Software Transformation, FEAST 2017, Dallas, Texas, USA (2017)

    Google Scholar 

  6. Cifuentes, C., Van Emmerik, M.: Recovery of jump table case statements from binary code. In: Proceedings of the International Workshop on Program Comprehension, IWPC 1999, Washington, DC, USA (1999)

    Google Scholar 

  7. Cifuentes, C., Gough, K.J.: Decompilation of binary programs. Softw. Pract. Exp. 25(7), 811–829 (1995)

    Article  Google Scholar 

  8. Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL 2077, Los Angeles, California (1977)

    Google Scholar 

  9. Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: Proceedings of the ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL 1978, Tucson, Arizona (1978)

    Google Scholar 

  10. Csallner, C., Smaragdakis, Y.: Check ‘n’ crash: combining static checking and testing. In: Proceedings of the International Conference on Software Engineering, ICSE 2005, St. Louis, MO, USA (2005)

    Google Scholar 

  11. Dutertre, B., Gehani, A., Saidi, H., Schäf, M., Tiwari, A.: Beyond binary program transformation

    Google Scholar 

  12. Feret, J.: The arithmetic-geometric progression abstract domain. In: Cousot, R. (ed.) VMCAI 2005. LNCS, vol. 3385, pp. 42–58. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30579-8_3

    Chapter  MATH  Google Scholar 

  13. Granger, P.: Static analysis of arithmetical congruences. Int. J. Comput. Math. 30, 165–190 (1989)

    Article  Google Scholar 

  14. Harris, L.C., Miller, B.P.: Practical analysis of stripped binary code. SIGARCH Comput. Arch. News 33, 63–68 (2005)

    Article  Google Scholar 

  15. Heo, K., Lee, W., Pashakhanloo, P., Naik, M.: Effective program debloating via reinforcement learning. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, Canada (2018)

    Google Scholar 

  16. Hind, M.: Pointer analysis: Haven’t we solved this problem yet? In: Proceedings of the ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE 2001, Snowbird, Utah, USA (2001)

    Google Scholar 

  17. Jain, N.C.: Disassembler using high level processor models (1999)

    Google Scholar 

  18. Jeannet, B., Miné, A.: Apron: a library of numerical abstract domains for static analysis. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 661–667. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02658-4_52

    Chapter  Google Scholar 

  19. Jiang, Y., Bao, Q., Wang, S., Liu, X., Wu, D.: RedDroid: Android application redundancy customization based on static analysis. In: Proceedings of the IEEE International Symposium on Software Reliability Engineering, ISSRE 2018, Berlin, Germany (2018)

    Google Scholar 

  20. Jiang, Y., Wu, D., Liu, P.: JRed: program customization and bloatware mitigation based on static analysis. In: Proceedings of the IEEE Computer Software and Applications Conference, COMPSAC 2016, Atlanta, GA, USA (2016)

    Google Scholar 

  21. Kinder, J., Zuleger, F., Veith, H.: An abstract interpretation-based framework for control flow reconstruction from binaries. In: Jones, N.D., Müller-Olm, M. (eds.) VMCAI 2009. LNCS, vol. 5403, pp. 214–228. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-93900-9_19

    Chapter  MATH  Google Scholar 

  22. Kruegel, C., Robertson, W., Valeur, F., Vigna, G.: Static disassembly of obfuscated binaries. In: Proceedings of the USENIX Conference on Security Symposium, SEC 2004, San Diego, CA, USA (2004)

    Google Scholar 

  23. Landi, W.: Undecidability of static analysis. ACM Lett. Program. Lang. Syst. 1(4), 323–337 (1992)

    Article  Google Scholar 

  24. Lee, J., Avgerinos, T., Brumley, D.: TIE: principled reverse engineering of types in binary programs. In: Proceedings of the Network and Distributed Systems Security, NDSS 2011, San Diego, CA, USA (2011)

    Google Scholar 

  25. Malecha, G., Gehani, A., Shankar, N.: Automated software winnowing. In: Proceedings of the ACM Symposium on Applied Computing, SAC 2015 (2015)

    Google Scholar 

  26. Miné, A.: Abstract domains for bit-level machine integer and floating-point operations. In: Proceedings of the International Workshop on invariant Generation, WING 2012, Manchester, UK (2012)

    Google Scholar 

  27. Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: Signedness-agnostic program analysis: precise integer bounds for low-level code. In: Jhala, R., Igarashi, A. (eds.) APLAS 2012. LNCS, vol. 7705, pp. 115–130. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35182-2_9

    Chapter  Google Scholar 

  28. Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2007), vol. 42, no. 6, pp. 89–100, June 2007

    Google Scholar 

  29. Quach, A., Prakash, A., Yan, L.K.: Debloating software through piece-wise compilation and loading. In: Proceedings of the USENIX Conference on Security Symposium, SEC 2018, Baltimore, MD, USA (2018)

    Google Scholar 

  30. Ramalingam, G.: The undecidability of aliasing. ACM Trans. Program. Lang. Syst. (TOPLAS 1994) 16(5), 1467–1471 (1994)

    Article  Google Scholar 

  31. Sankaranarayanan, S., Ivančić, F., Gupta, A.: Program analysis using symbolic ranges. In: Nielson, H.R., Filé, G. (eds.) SAS 2007. LNCS, vol. 4634, pp. 366–383. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74061-2_23

    Chapter  Google Scholar 

  32. Schwarz, B., Debray, S., Andrews, G.: Disassembly of executable code revisited. In: Proceedings of the Working Conference on Reverse Engineering, WCRE 2002, Richmond, VA, USA (2002)

    Google Scholar 

  33. Sharif, H., Abubakar, M., Gehani, A., Zaffar, F.: TRIMMER: application specialization for code debloating. In: Proceedings of the ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Corum, Montpellier, France (2018)

    Google Scholar 

  34. Shoshitaishvili, Y., et al.: SoK: (state of) the art of war: offensive techniques in binary analysis. In: Proceedings of the IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA (2016)

    Google Scholar 

  35. Simon, A.: Value-Range Analysis of C Programs: Towards Proving the Absence of Buffer Overflow Vulnerabilities, 1st edn. Springer, London (2010). https://doi.org/10.1007/978-1-84800-017-9

    Book  MATH  Google Scholar 

  36. Song, L., Xing, X.: Fine-grained library customization. arXiv preprint arXiv:1810.11128 (2018)

  37. Venet, A., Brat, G.: Precise and efficient static array bound checking for large embedded C programs. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2004, Washington DC, USA (2004)

    Google Scholar 

  38. Wagner, D.A., Foster, J.S., Brewer, E.A., Aiken, A.: A first step towards automated detection of buffer overrun vulnerabilities. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2000, San Diego, CA, USA (2000)

    Google Scholar 

  39. Wagner, G., Gal, A., Franz, M.: Slimming’ a Java virtual machine by way of cold code removal and optimistic partial program loading. Sci. Comput. Program. 76(11), 1037–1053 (2011)

    Article  Google Scholar 

  40. Wang, R., et al.: Ramblr: making reassembly great again. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2017, San Diego, CA, USA (2017)

    Google Scholar 

  41. Wang, S., Wang, P., Wu, D.: Reassembleable disassembling. In: Proceedings of the USENIX Conference on Security Symposium, SEC 2015, Washington, D.C. (2015)

    Google Scholar 

  42. Warren, H.S.J.: Hacker’s Delight. Addison-Wesley, Boston (2003)

    Google Scholar 

  43. West, D.B., et al.: Introduction to Graph Theory, vol. 2. Prentice Hall, Upper Saddle River (1996)

    MATH  Google Scholar 

  44. Xu, L., Sun, F., Su, Z.: Constructing precise control flow graphs from binaries. University of California, Davis, Technical report (2009)

    Google Scholar 

  45. Yong, S.H., Horwitz, S.: Protecting C programs from attacks via invalid pointer dereferences. SIGSOFT Softw. Eng. Notes 28(5), 307–316 (2003)

    Article  Google Scholar 

  46. Zhang, M., Qiao, R., Hasabnis, N., Sekar, R.: A platform for secure static binary instrumentation (2014)

    Google Scholar 

Download references

Acknowledgements

We would like to thank our reviewers for their valuable comments and input to improve our paper. This material is based on research sponsored by the Office of Naval Research under grant number N00014-17-1-2897, the NSF under Award number CNS-1704253, and the DARPA under agreement number FA8750-15-2-0084 and FA8750-19-C-0003. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA, or the U.S. Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nilo Redini .

Editor information

Editors and Affiliations

A Signedness-Agnostic Strided Interval Operations

A Signedness-Agnostic Strided Interval Operations

We provided our abstract domain with every mathematical and logical operation included in today architectures’ instruction sets. However, due to space constraints, we detail here only the or bitwise operation.

figure d

1.1 A.1 Bitwise or

To define a precise and sound bitwise or operation we leverage the unsigned version of Warren’s algorithm [42], which performs the or operation on classic non-wrapping ranges of values. Given two generic SASIs \(r = s_r[a,b]w\) and \(t = s_t[c,d]w\), the algorithm used to calculate the bitwise or operation is shown in Algorithm 2.

First, we split r and t on the south poles, thus avoiding any wrapping intervals (i.e., intervals might include the values \(1^w\) and \(0^w\)). Then, for each u and v resulting from the split, we create a new SASI calculating its stride (\(s_z\)) and bounds (lb and ub) separately. For the stride, we retrieve the number of trailing zeros (function ntz) in the bit-vector representations of \(s_u\) and \(s_v\) both, and consider the minimum of them to set the stride \(s_z\) (Lines 5 and 6). In fact, as the strides \(s_u\) and \(s_v\) have t low-order bits unset, all the values represented by the SASI resulting from \(u\ |_w^{wr}\ v\) share the same t low-order bits. Therefore, the choice of a stride equal to \(2^t\) is a sound choice (line 6). The value of these t-lower bits is \( k = (e \& m) | (g \& m)\) (where \(m = (1 \ll t) - 1\)). On the other hand, the (\(w - t\)) high-order bits are handled by masking out the obtained t low-order bits and then applying unsigned version of Warren’s or algorithm to find the bounds for the SASI resulting from \(u |_w v\) (from line 9 to 11). Finally, the SASI resulting from \(r\ |_w\ t\) is obtained by applying the generalized join on the list of SASIs collected by applying the algorithm just explained. Since Warren’s algorithm employed is sound, the or operation is sound.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Redini, N., Wang, R., Machiry, A., Shoshitaishvili, Y., Vigna, G., Kruegel, C. (2019). BinTrimmer: Towards Static Binary Debloating Through Abstract Interpretation. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2019. Lecture Notes in Computer Science(), vol 11543. Springer, Cham. https://doi.org/10.1007/978-3-030-22038-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-22038-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-22037-2

  • Online ISBN: 978-3-030-22038-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics