Learning a Variable-Clustering Strategy for Octagon from Labeled Data Generated by a Static Analysis

Heo, Kihong; Oh, Hakjoo; Yang, Hongseok

doi:10.1007/978-3-662-53413-7_12

Learning a Variable-Clustering Strategy for Octagon from Labeled Data Generated by a Static Analysis

Kihong Heo¹⁴,
Hakjoo Oh¹⁵ &
Hongseok Yang¹⁶

Conference paper
First Online: 31 August 2016

1071 Accesses
23 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9837))

Abstract

We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first applied to the program and clusters variables in it. We then run a partial variant of the Octagon analysis that tracks relationships among variables within the same cluster, but not across different clusters. The notable aspect of our learning method is that although the method is based on supervised learning, it does not require manually-labeled data. The method does not ask human to indicate which pairs of program variables in the given codebase should be tracked. Instead it uses the impact pre-analysis for Octagon from our previous work and automatically labels variable pairs in the codebase as positive or negative. We implemented our method on top of a static buffer-overflow detector for C programs and tested it against open source benchmarks. Our experiments show that the partial Octagon analysis with the learned strategy scales up to 100KLOC and is 33x faster than the one with the impact pre-analysis (which itself is significantly faster than the original Octagon analysis), while increasing false alarms by only 2 %.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Because the pre-analysis uses \(\bigstar \) cautiously, only a small portion of variable pairs is marked with \(\oplus \) (that is, 5864 / 258, 165, 546) in our experiments. Replacing “some” by “all” reduces this portion by half (2230 / 258, 165, 546) and makes the learning task more difficult.
2.
By nontrivial, we mean finite bounds that are neither \(\infty \) nor \(-\infty \).
3.
In practice, eliminating these false alarms is extremely challenging in a sound yet non-domain-specific static analyzer for full C. The false alarms arise from a variety of reasons, e.g., recursive calls, unknown library calls, complex loops, etc.

References

Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux, D., Rival, X.: A static analyzer for large safety-critical software. In: PLDI (2003)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: POPL (1978)
Google Scholar
Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: POPL, pp. 499–512 (2016)
Google Scholar
Grigore, R., Yang, H.: Abstraction refinement guided by a learnt probabilistic model. In: POPL (2016)
Google Scholar
Jeannet, B., Miné, A.: Apron: a library of numerical abstract domains for static analysis. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 661–667. Springer, Heidelberg (2009)
Chapter Google Scholar
Mangal, R., Zhang, X., Nori, A.V., Naik, M.: A user-guided approach to program analysis. In: ESEC/FSE, pp. 462–473 (2015)
Google Scholar
Miné, A.: The octagon abstract domain. Higher-order and symbolic computation (2006)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, Inc., New York (1997)
MATH Google Scholar
Murphy, K.P.: Machine Learning: A Probabilistic Perspective. Adaptive Computation and Machine Learning Series. MIT Press, Cambridge (2012)
MATH Google Scholar
Nori, A.V., Sharma, R.: Termination proofs from tests. In: FSE, pp. 246–256 (2013)
Google Scholar
Octeau, D., Jha, S., Dering, M., McDaniel, P., Bartel, A., Li, L., Klein, J., Traon, Y.L.: Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. In: POPL, pp. 469–484 (2016)
Google Scholar
Oh, H., Heo, K., Lee, W., Lee, W., Park, D., Kang, J., Yi, K.: Global sparse analysis framework. ACM Trans. Program. Lang. Syst. 36(3), 8:1–8:44 (2014)
Article Google Scholar
Oh, H., Heo, K., Lee, W., Lee, W., Yi, K.: Design and implementation of sparse global analyses for C-like languages. In: PLDI (2012)
Google Scholar
Oh, H., Lee, W., Heo, K., Yang, H., Yi, K.: Selective context-sensitivity guided by impact pre-analysis. In: PLDI (2014)
Google Scholar
Oh, H., Yang, H., Yi, K.: Learning a strategy for adapting a program analysis via bayesian optimisation. In: OOPSLA (2015)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Raychev, V., Bielik, P., Vechev, M.T., Krause, A.: Learning programs from noisy data. In: POPL, pp. 761–774 (2016)
Google Scholar
Sankaranarayanan, S., Chaudhuri, S., Ivančić, F., Gupta, A.: Dynamic inference of likely data preconditions over predicates by tree learning. In: ISSTA, pp. 295–306 (2008)
Google Scholar
Sankaranarayanan, S., Ivančić, F., Gupta, A.: Mining library specifications using inductive logic programming. In: ICSE, pp. 131–140 (2008)
Google Scholar
Sharma, R., Gupta, S., Hariharan, B., Aiken, A., Liang, P., Nori, A.V.: A data driven approach for algebraic loop invariants. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 574–592. Springer, Heidelberg (2013)
Chapter Google Scholar
Sharma, R., Gupta, S., Hariharan, B., Aiken, A., Nori, A.V.: Verification as learning geometric concepts. In: Logozzo, F., Fähndrich, M. (eds.) SAS 2013. LNCS, vol. 7935, pp. 388–411. Springer, Heidelberg (2013)
Chapter Google Scholar
Sharma, R., Nori, A.V., Aiken, A.: Interpolants as classifiers. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 71–87. Springer, Heidelberg (2012)
Chapter Google Scholar
Singh, G., Püschel, M., Vechev, M.: Making numerical program analysis fast. In: PLDI (2015)
Google Scholar
Sparrow. http://ropas.snu.ac.kr/sparrow
Venet, A., Brat, G.: Precise and efficient static array bound checking for large embedded C programs. In: PLDI (2004)
Google Scholar
Yi, K., Choi, H., Kim, J., Kim, Y.: An empirical study on classification methods for alarms from a bug-finding static C analyzer. Inf. Process. Lett. 102(2–3), 118–123 (2007)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers for their helpful comments. We also thank Kwangkeun Yi, Chung-Kil Hur, and all members of SoFA group members in Seoul National University for their helpful comments and suggestions. This work was supported by Samsung Research Funding Center of Samsung Electronics under Project Number SRFC-IT1502-07 and Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0190-16-2011, Development of Vulnerability Discovery Technologies for IoT Software Security). This research was also supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2016R1C1B2014062).

Author information

Authors and Affiliations

Seoul National University, Seoul, Korea
Kihong Heo
Korea University, Seoul, Korea
Hakjoo Oh
University of Oxford, Oxford, UK
Hongseok Yang

Authors

Kihong Heo
View author publications
You can also search for this author in PubMed Google Scholar
Hakjoo Oh
View author publications
You can also search for this author in PubMed Google Scholar
Hongseok Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kihong Heo or Hakjoo Oh .

Editor information

Editors and Affiliations

Ecole Normale Supérieure , Paris, France
Xavier Rival

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heo, K., Oh, H., Yang, H. (2016). Learning a Variable-Clustering Strategy for Octagon from Labeled Data Generated by a Static Analysis. In: Rival, X. (eds) Static Analysis. SAS 2016. Lecture Notes in Computer Science(), vol 9837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53413-7_12

Download citation

DOI: https://doi.org/10.1007/978-3-662-53413-7_12
Published: 31 August 2016
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53412-0
Online ISBN: 978-3-662-53413-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics