Skip to main content

Learning a Variable-Clustering Strategy for Octagon from Labeled Data Generated by a Static Analysis

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9837))

Abstract

We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first applied to the program and clusters variables in it. We then run a partial variant of the Octagon analysis that tracks relationships among variables within the same cluster, but not across different clusters. The notable aspect of our learning method is that although the method is based on supervised learning, it does not require manually-labeled data. The method does not ask human to indicate which pairs of program variables in the given codebase should be tracked. Instead it uses the impact pre-analysis for Octagon from our previous work and automatically labels variable pairs in the codebase as positive or negative. We implemented our method on top of a static buffer-overflow detector for C programs and tested it against open source benchmarks. Our experiments show that the partial Octagon analysis with the learned strategy scales up to 100KLOC and is 33x faster than the one with the impact pre-analysis (which itself is significantly faster than the original Octagon analysis), while increasing false alarms by only 2 %.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Because the pre-analysis uses \(\bigstar \) cautiously, only a small portion of variable pairs is marked with \(\oplus \) (that is, 5864 / 258, 165, 546) in our experiments. Replacing “some” by “all” reduces this portion by half (2230 / 258, 165, 546) and makes the learning task more difficult.

  2. 2.

    By nontrivial, we mean finite bounds that are neither \(\infty \) nor \(-\infty \).

  3. 3.

    In practice, eliminating these false alarms is extremely challenging in a sound yet non-domain-specific static analyzer for full C. The false alarms arise from a variety of reasons, e.g., recursive calls, unknown library calls, complex loops, etc.

References

  1. Blanchet, B., Cousot, P., Cousot, R., Feret, J., Mauborgne, L., Miné, A., Monniaux, D., Rival, X.: A static analyzer for large safety-critical software. In: PLDI (2003)

    Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Cousot, P., Halbwachs, N.: Automatic discovery of linear restraints among variables of a program. In: POPL (1978)

    Google Scholar 

  4. Garg, P., Neider, D., Madhusudan, P., Roth, D.: Learning invariants using decision trees and implication counterexamples. In: POPL, pp. 499–512 (2016)

    Google Scholar 

  5. Grigore, R., Yang, H.: Abstraction refinement guided by a learnt probabilistic model. In: POPL (2016)

    Google Scholar 

  6. Jeannet, B., Miné, A.: Apron: a library of numerical abstract domains for static analysis. In: Bouajjani, A., Maler, O. (eds.) CAV 2009. LNCS, vol. 5643, pp. 661–667. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  7. Mangal, R., Zhang, X., Nori, A.V., Naik, M.: A user-guided approach to program analysis. In: ESEC/FSE, pp. 462–473 (2015)

    Google Scholar 

  8. Miné, A.: The octagon abstract domain. Higher-order and symbolic computation (2006)

    Google Scholar 

  9. Mitchell, T.M.: Machine Learning. McGraw-Hill, Inc., New York (1997)

    MATH  Google Scholar 

  10. Murphy, K.P.: Machine Learning: A Probabilistic Perspective. Adaptive Computation and Machine Learning Series. MIT Press, Cambridge (2012)

    MATH  Google Scholar 

  11. Nori, A.V., Sharma, R.: Termination proofs from tests. In: FSE, pp. 246–256 (2013)

    Google Scholar 

  12. Octeau, D., Jha, S., Dering, M., McDaniel, P., Bartel, A., Li, L., Klein, J., Traon, Y.L.: Combining static analysis with probabilistic models to enable market-scale android inter-component analysis. In: POPL, pp. 469–484 (2016)

    Google Scholar 

  13. Oh, H., Heo, K., Lee, W., Lee, W., Park, D., Kang, J., Yi, K.: Global sparse analysis framework. ACM Trans. Program. Lang. Syst. 36(3), 8:1–8:44 (2014)

    Article  Google Scholar 

  14. Oh, H., Heo, K., Lee, W., Lee, W., Yi, K.: Design and implementation of sparse global analyses for C-like languages. In: PLDI (2012)

    Google Scholar 

  15. Oh, H., Lee, W., Heo, K., Yang, H., Yi, K.: Selective context-sensitivity guided by impact pre-analysis. In: PLDI (2014)

    Google Scholar 

  16. Oh, H., Yang, H., Yi, K.: Learning a strategy for adapting a program analysis via bayesian optimisation. In: OOPSLA (2015)

    Google Scholar 

  17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, É.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  18. Raychev, V., Bielik, P., Vechev, M.T., Krause, A.: Learning programs from noisy data. In: POPL, pp. 761–774 (2016)

    Google Scholar 

  19. Sankaranarayanan, S., Chaudhuri, S., Ivančić, F., Gupta, A.: Dynamic inference of likely data preconditions over predicates by tree learning. In: ISSTA, pp. 295–306 (2008)

    Google Scholar 

  20. Sankaranarayanan, S., Ivančić, F., Gupta, A.: Mining library specifications using inductive logic programming. In: ICSE, pp. 131–140 (2008)

    Google Scholar 

  21. Sharma, R., Gupta, S., Hariharan, B., Aiken, A., Liang, P., Nori, A.V.: A data driven approach for algebraic loop invariants. In: Felleisen, M., Gardner, P. (eds.) ESOP 2013. LNCS, vol. 7792, pp. 574–592. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  22. Sharma, R., Gupta, S., Hariharan, B., Aiken, A., Nori, A.V.: Verification as learning geometric concepts. In: Logozzo, F., Fähndrich, M. (eds.) SAS 2013. LNCS, vol. 7935, pp. 388–411. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  23. Sharma, R., Nori, A.V., Aiken, A.: Interpolants as classifiers. In: Madhusudan, P., Seshia, S.A. (eds.) CAV 2012. LNCS, vol. 7358, pp. 71–87. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  24. Singh, G., Püschel, M., Vechev, M.: Making numerical program analysis fast. In: PLDI (2015)

    Google Scholar 

  25. Sparrow. http://ropas.snu.ac.kr/sparrow

  26. Venet, A., Brat, G.: Precise and efficient static array bound checking for large embedded C programs. In: PLDI (2004)

    Google Scholar 

  27. Yi, K., Choi, H., Kim, J., Kim, Y.: An empirical study on classification methods for alarms from a bug-finding static C analyzer. Inf. Process. Lett. 102(2–3), 118–123 (2007)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers for their helpful comments. We also thank Kwangkeun Yi, Chung-Kil Hur, and all members of SoFA group members in Seoul National University for their helpful comments and suggestions. This work was supported by Samsung Research Funding Center of Samsung Electronics under Project Number SRFC-IT1502-07 and Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIP) (No. R0190-16-2011, Development of Vulnerability Discovery Technologies for IoT Software Security). This research was also supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (NRF-2016R1C1B2014062).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kihong Heo or Hakjoo Oh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag GmbH Germany

About this paper

Cite this paper

Heo, K., Oh, H., Yang, H. (2016). Learning a Variable-Clustering Strategy for Octagon from Labeled Data Generated by a Static Analysis. In: Rival, X. (eds) Static Analysis. SAS 2016. Lecture Notes in Computer Science(), vol 9837. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53413-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-53413-7_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-53412-0

  • Online ISBN: 978-3-662-53413-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics