Advertisement

Formal Aspects of Computing

, Volume 31, Issue 2, pp 165–206 | Cite as

Hybrid statistical estimation of mutual information and its application to information flow

  • Fabrizio Biondi
  • Yusuke KawamotoEmail author
  • Axel Legay
  • Louis-Marie Traonouez
Original Article

Abstract

Analysis of a probabilistic system often requires to learn the joint probability distribution of its random variables. The computation of the exact distribution is usually an exhaustive precise analysis on all executions of the system. To avoid the high computational cost of such an exhaustive search, statistical analysis has been studied to efficiently obtain approximate estimates by analyzing only a small but representative subset of the system’s behavior. In this paper we propose a hybrid statistical estimation method that combines precise and statistical analyses to estimate mutual information, Shannon entropy, and conditional entropy, together with their confidence intervals. We show how to combine the analyses on different components of a discrete system with different accuracy to obtain an estimate for the whole system. The new method performs weighted statistical analysis with different sample sizes over different components and dynamically finds their optimal sample sizes. Moreover, it can reduce sample sizes by using prior knowledge about systems and a new abstraction-then-sampling technique based on qualitative analysis. To apply the method to the source code of a system, we show how to decompose the code into components and to determine the analysis method for each component by overviewing the implementation of those techniques in the HyLeak tool. We demonstrate with case studies that the new method outperforms the state of the art in quantifying information leakage.

Keywords

Mutual information Statistical estimation Quantitative information flow Hybrid method Confidence interval Statistical model checking 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

The authors are thankful to anonymous reviewers for helpful comments. This work was supported by JSPS KAKENHI Grant Number JP17K12667, by JSPS and Inria under the project LOGIS of the Japan-France AYAME Program, by the MSR-Inria Joint Research Center, by the Sensation European grant, and by région Bretagne.

References

  1. ACKP17.
    Alvim MS, Chatzikokolakis K, Kawamoto Y, Palamidessi C (2017) Information leakage games. In: 8th international conference on decision and game theory for security (GameSec 2017), volume 10575 of Lecture notes in computer science. SpringerGoogle Scholar
  2. ACKP18.
    Alvim MS, Chatzikokolakis K, Kawamoto Y, Palamidessi C (2018) A game-theoretic approach to information-flow control via protocol composition. Entropy 20(5): 382–143Google Scholar
  3. Ada04.
    Adami C (2004) Information theory in molecular biology. Phys Life Rev 1(1): 3–22CrossRefGoogle Scholar
  4. BGP+16.
    Bouissou O, Goubault E, Putot S, Chakarov A, Sankaranarayanan S (2016) Uncertainty propagation using probabilistic affine forms and concentration of measure inequalities. In: Chechik M, Raskin J-F (eds) Tools and algorithms for the construction and analysis of systems—22nd international conference, TACAS 2016, Held as Part of the European joint conferences on theory and practice of software, ETAPS 2016, Eindhoven, The Netherlands, April 2–8, 2016, Proceedings, volume 9636 of Lecture notes in computer science. Springer, pp 225–243Google Scholar
  5. BHP12.
    Barbot B, Haddad S, Picaronny C (2012) Coupling and importance sampling for statistical model checking. In: Flanagan C, König B (eds) Tools and algorithms for the construction and analysis of systems—18th international conference, TACAS 2012, Held as Part of the European joint conferences on theory and practice of software, ETAPS 2012, Tallinn, Estonia, March 24–April 1, 2012. Proceedings, volume 7214 of Lecture notes in computer science. Springer, pp 331–346Google Scholar
  6. BK11.
    Barthe G, Köpf B (2011) Information-theoretic bounds for differentially private mechanisms. In: Proceedings of the 24th IEEE computer security foundations symposium, CSF 2011, Cernay-la-Ville, France, 27–29 June, 2011. IEEE Computer Society, pp 191–204Google Scholar
  7. BKLT.
    Biondi F, Kawamoto Y, Legay A, Traonouez L-M. HyLeak. https://project.inria.fr/hyleak/
  8. BKLT17.
    Biondi F, Kawamoto Y, Legay A, Traonouez L-M (2017) Hyleak: hybrid analysis tool for information leakage. In 15th international symposium on automated technology for verification and analysis (ATVA’17), volume 10482 of Lecture notes in computer science. SpringerGoogle Scholar
  9. BKR09.
    Backes M, Köpf B, Rybalchenko A (2009) Automatic discovery and quantification of information leaks. In: 30th IEEE symposium on security and privacy (S&P 2009), 17–20 May 2009, Oakland, California, USA. IEEE Computer Society, pp 141–153Google Scholar
  10. BLMW15.
    Biondi F, Legay A, Malacaria P, Wasowski A (2015) Quantifying information leakage of randomized protocols. Theor Comput Sci 597: 62–87MathSciNetCrossRefzbMATHGoogle Scholar
  11. BLQ15.
    Biondi F, Legay A, Quilbeuf J (2015) Comparative analysis of leakage tools on scalable case studies. In: Bernd F, Jaco G (eds) Model checking software—22nd international symposium, SPIN 2015, Stellenbosch, South Africa, August 24–26, 2015, Proceedings, volume 9232 of Lecture notes in computer science. Springer, pp 263–281Google Scholar
  12. BLTW.
    Biondi F, Legay A, Traonouez L-M, Wasowski A. QUAIL. https://project.inria.fr/quail/
  13. BLTW13.
    Biondi F, Legay A, Traonouez L-M, Wasowski A (2013) QUAIL: a quantitative security analyzer for imperative code. In: Sharygina N, Veith H (eds) Computer aided verification—25th international conference, CAV 2013, Saint Petersburg, Russia, July 13–19, 2013. Proceedings, volume 8044 of Lecture notes in computer science. Springer, pp 702–707Google Scholar
  14. BP14.
    Boreale M, Paolini M (2014) On formally bounding information leakage by statistical estimation. In: Chow SSM, Camenisch J, Chi Kwong HL, Yiu S-M (eds) Information security—17th international conference, ISC 2014, Hong Kong, China, October 12–14, 2014. Proceedings, volume 8783 of Lecture notes in computer science. Springer, pp 216–236Google Scholar
  15. Bri04.
    Brillinger DR (2004) Some data analysis using mutual information. Braz J Probab Stat 18(6): 163–183MathSciNetzbMATHGoogle Scholar
  16. CCG10.
    Chatzikokolakis K, Chothia T, Guha A (2010) Statistical measurement of information leakage. In: Esparza J, Majumdar R (eds) Tools and algorithms for the construction and analysis of systems, 16th international conference, TACAS 2010, Held as part of the joint European conferences on theory and practice of software, ETAPS 2010, Paphos, Cyprus, March 20–28, 2010. Proceedings, volume 6015 of Lecture notes in computer science. Springer, pp 390–404Google Scholar
  17. CFM+15.
    Chakraborty S, Fremont DJ, Meel KS, Seshia SA, Vardi MY (2015) On parallel scalable uniform SAT witness generation. In: Baier C, Tinelli C (eds) Tools and algorithms for the construction and analysis of systems—21st international conference TACAS 2015, held as part of the European joint conferences on theory and practice of software, ETAPS 2015, London, UK, April 11–18, 2015. Proceedings, volume 9035 of Lecture notes in computer science. Springer, pp 304–319Google Scholar
  18. Cha88.
    Chaum D (1998) The dining cryptographers problem: unconditional sender and recipient untraceability. J Cryptol 1: 65–75MathSciNetzbMATHGoogle Scholar
  19. CHM01.
    Clark D, Hunt S, Malacaria P (2001) Quantitative analysis of the leakage of confidential data. Electr Notes Theor Comput Sci 59(3): 238–251CrossRefGoogle Scholar
  20. CHM07.
    Clark D, Hunt S, Malacaria P (2007) A static analysis for quantifying information flow in a simple imperative language. J Comput Secur 15(3): 321–371CrossRefGoogle Scholar
  21. CK14.
    Chothia T, Kawamoto Y (2014) Statistical estimation of min-entropy leakage, ManuscriptGoogle Scholar
  22. CKNa.
    Chothia T, Kawamoto Y, Novakovic C. leakiEst. http://www.cs.bham.ac.uk/research/projects/infotools/leakiest/
  23. CKNb.
    Chothia T, Kawamoto Y, Novakovic C. LeakWatch. http://www.cs.bham.ac.uk/research/projects/infotools/leakwatch/
  24. CKN13.
    Chothia T, Kawamoto Y, Novakovic C (2013) A tool for estimating information leakage. In: Sharygina N, Veith H (eds) Computer aided verification—25th international conference, CAV 2013, Saint Petersburg, Russia, July 13–19, 2013. Proceedings, volume 8044 of Lecture notes in computer science. Springer, pp 690–695Google Scholar
  25. CKN14.
    Chothia T, Kawamoto Y, Novakovic C (2014) Leakwatch: estimating information leakage from java programs. In: Kutylowski M, Vaidya J (eds) Computer security—ESORICS 2014—19th European symposium on research in computer security, Wroclaw, Poland, September 7–11, 2014. Proceedings, part II, volume 8713 of Lecture notes in computer science. Springer, pp 219–236Google Scholar
  26. CKNP13.
    Chothia T, Kawamoto Y, Novakovic C, Parker D (2013) Probabilistic point-to-point information leakage. In: 2013 IEEE 26th computer security foundations symposium, New Orleans, LA, USA, June 26–28, 2013. IEEE Computer Society, pp 193–205Google Scholar
  27. CMS14.
    Chadha R, Mathur U, Schwoon S (2014) Computing information flow using symbolic model-checking. In: Raman V, Suresh SP (eds) 34th international conference on foundation of software technology and theoretical computer science, FSTTCS 2014, December 15–17, 2014, New Delhi, India, volume 29 of LIPIcs. Schloss Dagstuhl—Leibniz-Zentrum fuer Informatik, pp 505–516Google Scholar
  28. CMV13.
    Chakraborty S, Meel KS, Vardi MY (2013) A scalable approximate model counter. In: Schulte C (ed.), Principles and practice of constraint programming—19th international conference, CP 2013, Uppsala, Sweden, September 16–20, 2013. Proceedings, volume 8124 of Lecture notes in computer science. Springer, pp 200–216Google Scholar
  29. CPP08.
    Chatzikokolakis K, Palamidessi C, Panangaden P (2008) Anonymity protocols as noisy channels. Inf Comput 206(2–4): 378–401MathSciNetCrossRefzbMATHGoogle Scholar
  30. CS10.
    Clarkson MR, Schneider FB (2010) Hyperproperties. J Comput Secur 18(6): 1157–1210CrossRefGoogle Scholar
  31. CT06.
    Cover TM, Thomas JA (2006) Elements of information theory (2nd edn). A Wiley-Interscience publication. WileyGoogle Scholar
  32. CZ11.
    Clarke EM, Zuliani P (2011) Statistical model checking for cyber-physical systems. In: Bultan T, Hsiung P-A (eds) Automated technology for verification and analysis, 9th international symposium, ATVA 2011, Taipei, Taiwan, October 11–14, 2011. Proceedings, volume 6996 of Lecture notes in computer science. Springer, pp 1–12Google Scholar
  33. Den76.
    Denning DE (1976) A lattice model of secure information flow. Commun ACM, 19(5): 236–243MathSciNetCrossRefzbMATHGoogle Scholar
  34. ES13.
    Espinoza B, Smith G (2013) Min-entropy as a resource. Inf Comput 226: 57–75MathSciNetCrossRefzbMATHGoogle Scholar
  35. ESB09.
    Escolano F, Suau P, Bonev B (2009) Information theory in computer vision and pattern recognition, 1st edn. Springer Publishing Company, IncorporatedGoogle Scholar
  36. FS14.
    Fremont DJ, Seshia SA (2014) Speeding up smt-based quantitative program analysis. In: Rümmer P, Wintersteiger CM (eds) Proceedings of the 12th international workshop on satisfiability modulo theories, SMT 2014, affiliated with the 26th international conference on computer aided verification (CAV 2014), the 7th international joint conference on automated reasoning (IJCAR 2014), and the 17th international conference on theory and applications of Satisfiability Testing (SAT 2014), Vienna, Austria, July 17–18, 2014., volume 1163 of CEUR workshop proceedings. CEUR-WS.org, 2014, pp 3–13Google Scholar
  37. Gal68.
    Gallager RG (1968) Information theory and reliable communication. Wiley, New YorkzbMATHGoogle Scholar
  38. III91.
    Gray JW (1991) III. Toward a mathematical foundation for information flow security. In: IEEE symposium on security and privacy, pp 21–35Google Scholar
  39. Jen96.
    Jensen FV (1996) Introduction to Bayesian networks, 1st edn. Springer-Verlag New York, Inc., SecaucusGoogle Scholar
  40. KB07.
    Köpf B, Basin DA (2007) An information-theoretic model for adaptive side-channel attacks. In: Ning P, De Capitani di Vimercati S, Syverson PF (eds) Proceedings of the 2007 ACM conference on computer and communications security, CCS 2007, Alexandria, Virginia, USA, October 28–31, 2007. ACM, pp 286–296Google Scholar
  41. KBL16.
    Kawamoto Y, Biondi F, Legay A(2016) Hybrid statistical estimation of mutual information for quantifying information flow. In: Fitzgerald JS, Heitmeyer CL, Gnesi S, Philippou A(eds) FM 2016: Formal methods—21st international symposium, Limassol, Cyprus, November 9–11, 2016, Proceedings, volume 9995 of Lecture notes in computer science, pp 406–425Google Scholar
  42. KCP14.
    Kawamoto Y, Chatzikokolakis K, Palamidessi C (2014) Compositionality results for quantitative information flow. In: Norman G, Sanders WH (eds) Quantitative evaluation of systems—11th international conference, QEST 2014, Florence, Italy, September 8–10, 2014. Proceedings, volume 8657 of Lecture notes in computer science. Springer, pp 368–383Google Scholar
  43. KCP17.
    Kawamoto Y, Chatzikokolakis K, Palamidessi C (2017) On the compositionality of quantitative information flow. Log Methods Comput Sci 13(3:11): 1–31MathSciNetzbMATHGoogle Scholar
  44. KG15.
    Kawamoto Y, Given-Wilson T (2015) Quantitative information flow for scheduler-dependent systems. In: Bertrand N, Tribastone M (eds) Proceedings thirteenth workshop on quantitative aspects of programming languages and systems, QAPL 2015, London, UK, 11th–12th April 2015. volume 194 of EPTCS, pp 48–62Google Scholar
  45. KMPS11.
    Kang MG, McCamant S, Poosankam P, Song D (2011) DTA++: dynamic taint analysis with targeted control-flow propagation. In: Proceedings of the network and distributed system security symposium, NDSS 2011, San Diego, California, USA, 6th February–9th February 2011. The Internet SocietyGoogle Scholar
  46. KR10.
    Köpf B, Rybalchenko A(2010) Approximation and randomization for quantitative information-flow analysis. In: Proceedings of the 23rd IEEE computer security foundations symposium, CSF 2010, Edinburgh, United Kingdom, July 17–19, 2010. IEEE Computer Society, pp 3–14Google Scholar
  47. LCFS14.
    Liu Z, Chen Z, Fang C, Shi Q (2014) Hybrid test data generation. In: Jalote P, Briand LC, van der Hoek A (eds) 36th international conference on software engineering, ICSE ’14, Companion proceedings, Hyderabad, India, May 31–June 07, 2014. ACM, pp 630–631Google Scholar
  48. LDB10.
    Legay A, Delahaye B, Bensalem S (2010) Statistical model checking: an overview. In: Barringer H, Falcone Y, Finkbeiner B, Havelund K, Lee I, Pace GJ, Rosu G, Sokolsky O, Tillmann N (eds) Runtime verification—first international conference, RV 2010, St. Julians, Malta, November 1–4, 2010. Proceedings, volume 6418 of Lecture notes in computer science. Springer, pp 122–135Google Scholar
  49. Mac02.
    MacKay DJC (2002) Information theory, inference & learning algorithms. Cambridge University Press, New YorkGoogle Scholar
  50. Mal07.
    Malacaria P (2007) Assessing security threats of looping constructs. In: Hofmann M, Felleisen M (eds) Proceedings of the 34th ACM SIGPLAN-SIGACT symposium on principles of programming languages, POPL 2007, Nice, France, January 17–19, 2007. ACM, pp 225–235Google Scholar
  51. ME08.
    McCamant S, Ernst MD (2008) Quantitative information flow as network flow capacity. In: Gupta R, Amarasinghe SP (eds) Proceedings of the ACM SIGPLAN 2008 conference on programming language design and implementation, Tucson, AZ, USA, June 7–13, 2008. ACM, pp 193–205Google Scholar
  52. MKP+18.
    Malacaria P, Khouzani MHR, Pasareanu CS, Phan Q-S, Luckow K S (2018) Symbolic side-channel analysis for probabilistic programs. In Proceedings of the 31st IEEE computer security foundations symposium, CSF 2018. To appearGoogle Scholar
  53. Mod89.
    Moddemeijer R (1989) On estimation of entropy and mutual information of continuous distributions. Signal Process 16: 233–248MathSciNetCrossRefGoogle Scholar
  54. MS07.
    Majumdar R, Sen K (2007) Hybrid concolic testing. In: 29th international conference on software engineering (ICSE 2007), Minneapolis, MN, USA, May 20–26, 2007. IEEE Computer Society, pp 416–426Google Scholar
  55. NMS09.
    Newsome J, McCamant S, Song D (2009) Measuring channel capacity to distinguish undue influence. In: Chong S, Naumann DA (eds) Proceedings of the 2009 workshop on programming languages and analysis for security, PLAS 2009, Dublin, Ireland, 15–21 June, 2009. ACM, pp 73–85Google Scholar
  56. Par07.
    Parr T (2007) The definitive ANTLR reference: building domain specific languagesGoogle Scholar
  57. PM14.
    Phan Q-S, Malacaria P (2014) Abstract model counting: a novel approach for quantification of information leaks. In: Moriai S, Jaeger T, Sakurai K (eds) 9th ACM symposium on information, computer and communications security, ASIA CCS ’14, Kyoto, Japan—June 03–06, 2014. ACM, pp 283–292Google Scholar
  58. PMPd14.
    Phan Q-S, Malacaria P, Pasareanu CS, d’Amorim M (2014) Quantifying information leaks using reliability analysis. In: Rungta N, Tkachuk O (eds) 2014 International symposium on model checking of software, SPIN 2014, Proceedings, San Jose, CA, USA, July 21–23, 2014. ACM, pp 105–108Google Scholar
  59. PMTP12.
    Phan Q-S, Malacaria P, Tkachuk O, Pasareanu CS (2012) Symbolic quantitative information flow. ACM SIGSOFT Softw Eng Notes 37(6): 1–5CrossRefGoogle Scholar
  60. Smi09.
    Smith G (2009) On the foundations of quantitative information flow. In: de Alfaro L (ed) Foundations of software science and computational structures, 12th international conference FOSSACS 2009, held as part of the joint European conferences on theory and practice of software, ETAPS 2009, York, UK, March 22–29, 2009. Proceedings, volume 5504 of Lecture notes in computer science. Springer, pp 288–302Google Scholar
  61. STS+18.
    Sweet I, Trilla JMC, Scherrer C, Hicks M, Magill S (2018) What’s the over/under? Probabilistic bounds on information leakage. In Lujo B, Ralf K (eds) Principles of security and trust - 7th international conference, POST 2018, Held as part of the European joint conferences on theory and practice of software, ETAPS 2018, Thessaloniki, Greece, April 14–20, 2018, Proceedings, volume 10804 of Lecture notes in computer science. Springer, pp 3–27Google Scholar
  62. VEB+16.
    Val CG, Enescu MA, Bayless S, Aiello W, Hu AJ (2016) Precisely measuring quantitative information flow: 10k lines of code and beyond. In: IEEE European symposium on security and privacy, EuroS&P 2016, Saarbrücken, Germany, March 21–24, 2016. IEEE, pp 31–46Google Scholar
  63. Vit85.
    Vitter JS (1985) Random sampling with a reservoir. ACM Trans Math Softw 11(1): 37–57MathSciNetCrossRefzbMATHGoogle Scholar
  64. Wei16.
    Weigl A (2016) Efficient sat-based pre-image enumeration for quantitative information flow in programs. In: Livraga G, Torra V, Aldini A, Martinelli F, Suri N (eds) Data privacy management and security assurance—11th international workshop, DPM 2016 and 5th international workshop, QASA 2016, Heraklion, Crete, Greece, September 26–27, 2016, Proceedings, volume 9963 of Lecture notes in computer science. Springer, pp 51–58Google Scholar
  65. Wil13.
    Wilde MM (2013) Quantum information theory, 1st edn. Cambridge University Press, New YorkGoogle Scholar
  66. YT14.
    Yasuoka H, Terauchi T (2014) Quantitative information flow as safety and liveness hyperproperties. Theor Comput Sci 538: 167–182MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© British Computer Society 2018

Authors and Affiliations

  1. 1.CentraleSupélec RennesRennesFrance
  2. 2.AISTTsukubaJapan
  3. 3.InriaRennesFrance

Personalised recommendations