Abstract
The increasing complexity of modern programs motivates software engineers to often rely on the support of third-party libraries. Although this practice allows application developers to achieve a compelling time-to-market, it often makes the final product bloated with conspicuous chunks of unused code. Other than making a program unnecessarily large, this dormant code could be leveraged by willful attackers to harm users. As a consequence, several techniques have been recently proposed to perform program debloating and remove (or secure) dead code from applications. However, state-of-the-art approaches are either based on unsound strategies, thus producing unreliable results, or pose too strict assumptions on the program itself.
In this work, we propose a novel abstract domain, called Signedness-Agnostic Strided Interval, which we use as the cornerstone to design a novel and sound static technique, based on abstract interpretation, to reliably perform program debloating. Throughout the paper, we detail the specifics of our approach and show its effectiveness and usefulness by implementing it in a tool, called BinTrimmer, to perform static program debloating on binaries.
Our evaluation shows that BinTrimmer can remove up to 65.6% of a library’s code and that our domain is, on average, 98% more precise than the related work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
References
Balakrishnan, G., Reps, T.: WYSINWYX: what you see is not what you execute. ACM Trans. Program. Lang. Syst. 32(6), 23:1–23:84 (2010)
Bhattacharya, S., Nanda, M.G., Gopinath, K., Gupta, M.: Reuse, recycle to de-bloat software. In: Mezini, M. (ed.) ECOOP 2011. LNCS, vol. 6813, pp. 408–432. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22655-7_19
Bruening, D.L.: Efficient, transparent, and comprehensive runtime code manipulation. Ph.D. dissertation, Cambridge, MA, USA (2004). aAI0807735
Caballero, J., Lin, Z.: Type inference on executables. ACM Comput. Surv. (CSUR) 48(4), 65 (2016)
Chen, Y., Lan, T., Venkataramani, G.: DamGate: dynamic adaptive multi-feature gating in program binaries. In: Proceedings of the Workshop on Forming an Ecosystem Around Software Transformation, FEAST 2017, Dallas, Texas, USA (2017)
Cifuentes, C., Van Emmerik, M.: Recovery of jump table case statements from binary code. In: Proceedings of the International Workshop on Program Comprehension, IWPC 1999, Washington, DC, USA (1999)
Cifuentes, C., Gough, K.J.: Decompilation of binary programs. Softw. Pract. Exp. 25(7), 811–829 (1995)
Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Proceedings of the ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL 2077, Los Angeles, California (1977)
Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: Proceedings of the ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, POPL 1978, Tucson, Arizona (1978)
Csallner, C., Smaragdakis, Y.: Check ‘n’ crash: combining static checking and testing. In: Proceedings of the International Conference on Software Engineering, ICSE 2005, St. Louis, MO, USA (2005)
Dutertre, B., Gehani, A., Saidi, H., Schäf, M., Tiwari, A.: Beyond binary program transformation
Feret, J.: The arithmetic-geometric progression abstract domain. In: Cousot, R. (ed.) VMCAI 2005. LNCS, vol. 3385, pp. 42–58. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-30579-8_3
Granger, P.: Static analysis of arithmetical congruences. Int. J. Comput. Math. 30, 165–190 (1989)
Harris, L.C., Miller, B.P.: Practical analysis of stripped binary code. SIGARCH Comput. Arch. News 33, 63–68 (2005)
Heo, K., Lee, W., Pashakhanloo, P., Naik, M.: Effective program debloating via reinforcement learning. In: Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, Canada (2018)
Hind, M.: Pointer analysis: Haven’t we solved this problem yet? In: Proceedings of the ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE 2001, Snowbird, Utah, USA (2001)
Jain, N.C.: Disassembler using high level processor models (1999)
Jeannet, B., Miné, A.: Apron: a library of numerical abstract domains for static analysis. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 661–667. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02658-4_52
Jiang, Y., Bao, Q., Wang, S., Liu, X., Wu, D.: RedDroid: Android application redundancy customization based on static analysis. In: Proceedings of the IEEE International Symposium on Software Reliability Engineering, ISSRE 2018, Berlin, Germany (2018)
Jiang, Y., Wu, D., Liu, P.: JRed: program customization and bloatware mitigation based on static analysis. In: Proceedings of the IEEE Computer Software and Applications Conference, COMPSAC 2016, Atlanta, GA, USA (2016)
Kinder, J., Zuleger, F., Veith, H.: An abstract interpretation-based framework for control flow reconstruction from binaries. In: Jones, N.D., Müller-Olm, M. (eds.) VMCAI 2009. LNCS, vol. 5403, pp. 214–228. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-93900-9_19
Kruegel, C., Robertson, W., Valeur, F., Vigna, G.: Static disassembly of obfuscated binaries. In: Proceedings of the USENIX Conference on Security Symposium, SEC 2004, San Diego, CA, USA (2004)
Landi, W.: Undecidability of static analysis. ACM Lett. Program. Lang. Syst. 1(4), 323–337 (1992)
Lee, J., Avgerinos, T., Brumley, D.: TIE: principled reverse engineering of types in binary programs. In: Proceedings of the Network and Distributed Systems Security, NDSS 2011, San Diego, CA, USA (2011)
Malecha, G., Gehani, A., Shankar, N.: Automated software winnowing. In: Proceedings of the ACM Symposium on Applied Computing, SAC 2015 (2015)
Miné, A.: Abstract domains for bit-level machine integer and floating-point operations. In: Proceedings of the International Workshop on invariant Generation, WING 2012, Manchester, UK (2012)
Navas, J.A., Schachte, P., Søndergaard, H., Stuckey, P.J.: Signedness-agnostic program analysis: precise integer bounds for low-level code. In: Jhala, R., Igarashi, A. (eds.) APLAS 2012. LNCS, vol. 7705, pp. 115–130. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35182-2_9
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2007), vol. 42, no. 6, pp. 89–100, June 2007
Quach, A., Prakash, A., Yan, L.K.: Debloating software through piece-wise compilation and loading. In: Proceedings of the USENIX Conference on Security Symposium, SEC 2018, Baltimore, MD, USA (2018)
Ramalingam, G.: The undecidability of aliasing. ACM Trans. Program. Lang. Syst. (TOPLAS 1994) 16(5), 1467–1471 (1994)
Sankaranarayanan, S., Ivančić, F., Gupta, A.: Program analysis using symbolic ranges. In: Nielson, H.R., Filé, G. (eds.) SAS 2007. LNCS, vol. 4634, pp. 366–383. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74061-2_23
Schwarz, B., Debray, S., Andrews, G.: Disassembly of executable code revisited. In: Proceedings of the Working Conference on Reverse Engineering, WCRE 2002, Richmond, VA, USA (2002)
Sharif, H., Abubakar, M., Gehani, A., Zaffar, F.: TRIMMER: application specialization for code debloating. In: Proceedings of the ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Corum, Montpellier, France (2018)
Shoshitaishvili, Y., et al.: SoK: (state of) the art of war: offensive techniques in binary analysis. In: Proceedings of the IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA (2016)
Simon, A.: Value-Range Analysis of C Programs: Towards Proving the Absence of Buffer Overflow Vulnerabilities, 1st edn. Springer, London (2010). https://doi.org/10.1007/978-1-84800-017-9
Song, L., Xing, X.: Fine-grained library customization. arXiv preprint arXiv:1810.11128 (2018)
Venet, A., Brat, G.: Precise and efficient static array bound checking for large embedded C programs. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2004, Washington DC, USA (2004)
Wagner, D.A., Foster, J.S., Brewer, E.A., Aiken, A.: A first step towards automated detection of buffer overrun vulnerabilities. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2000, San Diego, CA, USA (2000)
Wagner, G., Gal, A., Franz, M.: Slimming’ a Java virtual machine by way of cold code removal and optimistic partial program loading. Sci. Comput. Program. 76(11), 1037–1053 (2011)
Wang, R., et al.: Ramblr: making reassembly great again. In: Proceedings of the Network and Distributed System Security Symposium, NDSS 2017, San Diego, CA, USA (2017)
Wang, S., Wang, P., Wu, D.: Reassembleable disassembling. In: Proceedings of the USENIX Conference on Security Symposium, SEC 2015, Washington, D.C. (2015)
Warren, H.S.J.: Hacker’s Delight. Addison-Wesley, Boston (2003)
West, D.B., et al.: Introduction to Graph Theory, vol. 2. Prentice Hall, Upper Saddle River (1996)
Xu, L., Sun, F., Su, Z.: Constructing precise control flow graphs from binaries. University of California, Davis, Technical report (2009)
Yong, S.H., Horwitz, S.: Protecting C programs from attacks via invalid pointer dereferences. SIGSOFT Softw. Eng. Notes 28(5), 307–316 (2003)
Zhang, M., Qiao, R., Hasabnis, N., Sekar, R.: A platform for secure static binary instrumentation (2014)
Acknowledgements
We would like to thank our reviewers for their valuable comments and input to improve our paper. This material is based on research sponsored by the Office of Naval Research under grant number N00014-17-1-2897, the NSF under Award number CNS-1704253, and the DARPA under agreement number FA8750-15-2-0084 and FA8750-19-C-0003. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA, or the U.S. Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Signedness-Agnostic Strided Interval Operations
A Signedness-Agnostic Strided Interval Operations
We provided our abstract domain with every mathematical and logical operation included in today architectures’ instruction sets. However, due to space constraints, we detail here only the or bitwise operation.
1.1 A.1 Bitwise or
To define a precise and sound bitwise or operation we leverage the unsigned version of Warren’s algorithm [42], which performs the or operation on classic non-wrapping ranges of values. Given two generic SASIs \(r = s_r[a,b]w\) and \(t = s_t[c,d]w\), the algorithm used to calculate the bitwise or operation is shown in Algorithm 2.
First, we split r and t on the south poles, thus avoiding any wrapping intervals (i.e., intervals might include the values \(1^w\) and \(0^w\)). Then, for each u and v resulting from the split, we create a new SASI calculating its stride (\(s_z\)) and bounds (lb and ub) separately. For the stride, we retrieve the number of trailing zeros (function ntz) in the bit-vector representations of \(s_u\) and \(s_v\) both, and consider the minimum of them to set the stride \(s_z\) (Lines 5 and 6). In fact, as the strides \(s_u\) and \(s_v\) have t low-order bits unset, all the values represented by the SASI resulting from \(u\ |_w^{wr}\ v\) share the same t low-order bits. Therefore, the choice of a stride equal to \(2^t\) is a sound choice (line 6). The value of these t-lower bits is \( k = (e \& m) | (g \& m)\) (where \(m = (1 \ll t) - 1\)). On the other hand, the (\(w - t\)) high-order bits are handled by masking out the obtained t low-order bits and then applying unsigned version of Warren’s or algorithm to find the bounds for the SASI resulting from \(u |_w v\) (from line 9 to 11). Finally, the SASI resulting from \(r\ |_w\ t\) is obtained by applying the generalized join on the list of SASIs collected by applying the algorithm just explained. Since Warren’s algorithm employed is sound, the or operation is sound.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Redini, N., Wang, R., Machiry, A., Shoshitaishvili, Y., Vigna, G., Kruegel, C. (2019). BinTrimmer: Towards Static Binary Debloating Through Abstract Interpretation. In: Perdisci, R., Maurice, C., Giacinto, G., Almgren, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2019. Lecture Notes in Computer Science(), vol 11543. Springer, Cham. https://doi.org/10.1007/978-3-030-22038-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-22038-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22037-2
Online ISBN: 978-3-030-22038-9
eBook Packages: Computer ScienceComputer Science (R0)