$$\mathsf {QFlip}$$ : An Adaptive Reinforcement Learning Strategy for the $$\mathsf {FlipIt}$$ Security Game

Oakley, Lisa; Oprea, Alina

doi:10.1007/978-3-030-32430-8_22

Lisa Oakley¹² &
Alina Oprea¹²

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11836))

Included in the following conference series:

International Conference on Decision and Game Theory for Security

1297 Accesses
9 Citations

Abstract

A rise in Advanced Persistent Threats (APTs) has introduced a need for robustness against long-running, stealthy attacks which circumvent existing cryptographic security guarantees. $\mathsf {FlipIt}$ is a security game that models attacker-defender interactions in advanced scenarios such as APTs. Previous work analyzed extensively non-adaptive strategies in $\mathsf {FlipIt}$, but adaptive strategies rise naturally in practical interactions as players receive feedback during the game. We model the $\mathsf {FlipIt}$ game as a Markov Decision Process and introduce $\mathsf {QFlip}$, an adaptive strategy for $\mathsf {FlipIt}$ based on temporal difference reinforcement learning. We prove theoretical results on the convergence of our new strategy against an opponent playing with a Periodic strategy. We confirm our analysis experimentally by extensive evaluation of $\mathsf {QFlip}$ against specific opponents. $\mathsf {QFlip}$ converges to the optimal adaptive strategy for Periodic and Exponential opponents using associated state spaces. Finally, we introduce a generalized $\mathsf {QFlip}$ strategy with composite state space that outperforms a Greedy strategy for several distributions including Periodic and Uniform, without prior knowledge of the opponent’s strategy. We also release an OpenAI Gym environment for $\mathsf {FlipIt}$ to facilitate future research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bowers, K.D., et al.: Defending against the unknown enemy: Applying FLIPIT to system security. In: Proceedings of the Conference on Decision and Game Theory for Security. GameSec (2012)
Google Scholar
Chung, K., Kamhoua, C.A., Kwiat, K.A., Kalbarczyk, Z.T., Iyer, R.K.: Game theory with learning for cyber security monitoring. In: 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE), pp. 1–8, January 2016. https://doi.org/10.1109/HASE.2016.48
van Dijk, M., Juels, A., Oprea, A., Rivest, R.L.: FlipIt: The game of stealthy takeover. J. Cryptol. 26, 655–713 (2013)
Article MathSciNet Google Scholar
Elderman, R., Pater, L.J.J., Thie, A.S., Drugan, M.M., Wiering, M.: Adversarial reinforcement learning in a cyber security simulation. In: ICAART (2017)
Google Scholar
Farhang, S., Grossklags, J.: FlipLeakage: a game-theoretic approach to protect against stealthy attackers in the presence of information leakage. In: Zhu, Q., Alpcan, T., Panaousis, E., Tambe, M., Casey, W. (eds.) GameSec 2016. LNCS, vol. 9996, pp. 195–214. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47413-7_12
Chapter MATH Google Scholar
Feng, X., Zheng, Z., Hu, P., Cansever, D., Mohapatra, P.: Stealthy attacks meets insider threats: a three-player game model. In: IEEE Military Communications Conference on MILCOM 2015–2015, pp. 25–30, October 2015. https://doi.org/10.1109/MILCOM.2015.7357413
Feng, X., Zheng, Z., Mohapatra, P., Cansever, D.: A stackelberg game and Markov modeling of moving target defense. In: Rass, S., An, B., Kiekintveld, C., Fang, F., Schauer, S. (eds.) Decision and Game Theory for Security, pp. 315–335. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68711-7_17
Chapter MATH Google Scholar
Grossklags, J., Reitter, D.: How task familiarity and cognitive predispositions impact behavior in a security game of timing. In: 2014 IEEE 27th Computer Security Foundations Symposium, pp. 111–122, July 2014. https://doi.org/10.1109/CSF.2014.16
Han, Y.: Reinforcement learning for autonomous defence in software-defined networking. In: Bushnell, L., Poovendran, R., Başar, T. (eds.) GameSec 2018. LNCS, vol. 11199, pp. 145–165. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01554-1_9
Chapter Google Scholar
Hu, P., Li, H., Fu, H., Cansever, D., Mohapatra, P.: Dynamic defense strategy against advanced persistent threat with insiders. In: 2015 IEEE Conference on Computer Communications (INFOCOM), pp. 747–755, April 2015. https://doi.org/10.1109/INFOCOM.2015.7218444
Hu, Q., Lv, S., Shi, Z., Sun, L., Xiao, L.: Defense against advanced persistent threats with expert system for internet of things. In: Ma, L., Khreishah, A., Zhang, Y., Yan, M. (eds.) WASA 2017. LNCS, vol. 10251, pp. 326–337. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60033-8_29
Chapter Google Scholar
Klíma, R., Tuyls, K., Oliehoek, F.A.: Markov security games: learning in spatial security problems (2016)
Google Scholar
Laszka, A., Horvath, G., Felegyhazi, M., Buttyán, L.: FlipThem: modeling targeted attacks with flipit for multiple resources. In: Poovendran, R., Saad, W. (eds.) GameSec 2014. LNCS, vol. 8840, pp. 175–194. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12601-2_10
Chapter MATH Google Scholar
Laszka, A., Johnson, B., Grossklags, J.: Mitigating covert compromises: a game-theoretic model of targeted and non-targeted covert attacks. In: 9th International Conference on Web and Internet Economics (WINE) (2013)
Chapter Google Scholar
Laszka, A., Johnson, B., Grossklags, J.: Mitigation of targeted and non-targeted covert attacks as a timing game. In: Das, S.K., Nita-Rotaru, C., Kantarcioglu, M. (eds.) GameSec 2013. LNCS, vol. 8252, pp. 175–191. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-02786-9_11
Chapter MATH Google Scholar
Maleki, H., Valizadeh, S., Koch, W., Bestavros, A., van Dijk, M.: Markov modeling of moving target defense games. In: Proceedings of the 2016 ACM Workshop on Moving Target Defense, MTD 2016, pp. 81–92. ACM, New York (2016). https://doi.org/10.1145/2995272.2995273
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)
Google Scholar
Nochenson, A., Grossklags, J.: A behavioral investigation of the FlipIt game. In: 12th Workshop on the Economics of Information Security (WEIS) (2013)
Google Scholar
Pham, V., Cid, C.: Are we compromised? modelling security assessment games. In: Grossklags, J., Walrand, J. (eds.) GameSec 2012. LNCS, vol. 7638, pp. 234–247. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34266-0_14
Chapter MATH Google Scholar
Reitter, D., Grossklags, J., Nochenson, A.: Risk-seeking in a continuous game of timing. In: 13th International Conference on Cognitive Modeling (ICMM) (2013)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484–503 (2016)
Article Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)
MATH Google Scholar
Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM 38(3), 58–68 (1995). https://doi.org/10.1145/203330.203343
Article Google Scholar
Xiao, L., Li, Y., Han, G., Dai, H., Poor, H.V.: A secure mobile crowdsensing game with deep reinforcement learning. IEEE Trans. Inf. Forensics Secur. 13(1), 35–47 (2018). https://doi.org/10.1109/TIFS.2017.2737968
Article Google Scholar
Zhang, M., Zheng, Z., Shroff, N.B.: Stealthy attacks and observable defenses: a game theoretic model under strict resource constraints. In: 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 813–817, December 2014. https://doi.org/10.1109/GlobalSIP.2014.7032232
Zhu, M., Hu, Z., Liu, P.: Reinforcement learning algorithms for adaptive cyber defense against Heartbleed. In: Proceedings of the First ACM Workshop on Moving Target Defense, MTD 2014, pp. 51–58. ACM, New York (2014). https://doi.org/10.1145/2663474.2663481

Download references

Acknowledgements

We would like to thank Ronald Rivest, Marten van Dijk, Ari Juels, and Sang Chin for discussions about reinforcement learning in $\mathsf {FlipIt}$. We thank Matthew Jagielski, Tina Eliassi-Rad, and Lucianna Kiffer for discussing the theoretical analysis. This project was funded by NSF under grant CNS-1717634. This research was also sponsored by the U.S. Army Combat Capabilities Development Command Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Combat Capabilities Development Command Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

Author information

Authors and Affiliations

Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
Lisa Oakley & Alina Oprea

Authors

Lisa Oakley
View author publications
You can also search for this author in PubMed Google Scholar
Alina Oprea
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lisa Oakley .

Editor information

Editors and Affiliations

University of Melbourne, Melbourne, VIC, Australia
Tansu Alpcan
Washington University in St. Louis, St. Louis, MO, USA
Yevgeniy Vorobeychik
University of Maryland, College Park, College Park, MD, USA
John S. Baras
KTH Royal Institute of Technology, Stockholm, Sweden
György Dán

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Oakley, L., Oprea, A. (2019). $\mathsf {QFlip}$: An Adaptive Reinforcement Learning Strategy for the $\mathsf {FlipIt}$ Security Game. In: Alpcan, T., Vorobeychik, Y., Baras, J., Dán, G. (eds) Decision and Game Theory for Security. GameSec 2019. Lecture Notes in Computer Science(), vol 11836. Springer, Cham. https://doi.org/10.1007/978-3-030-32430-8_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-32430-8_22
Published: 23 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32429-2
Online ISBN: 978-3-030-32430-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

\(\mathsf {QFlip}\): An Adaptive Reinforcement Learning Strategy for the \(\mathsf {FlipIt}\) Security Game

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

\(\mathsf {QFlip}\): An Adaptive Reinforcement Learning Strategy for the \(\mathsf {FlipIt}\) Security Game

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation