Abstract
A rise in Advanced Persistent Threats (APTs) has introduced a need for robustness against long-running, stealthy attacks which circumvent existing cryptographic security guarantees. \(\mathsf {FlipIt}\) is a security game that models attacker-defender interactions in advanced scenarios such as APTs. Previous work analyzed extensively non-adaptive strategies in \(\mathsf {FlipIt}\), but adaptive strategies rise naturally in practical interactions as players receive feedback during the game. We model the \(\mathsf {FlipIt}\) game as a Markov Decision Process and introduce \(\mathsf {QFlip}\), an adaptive strategy for \(\mathsf {FlipIt}\) based on temporal difference reinforcement learning. We prove theoretical results on the convergence of our new strategy against an opponent playing with a Periodic strategy. We confirm our analysis experimentally by extensive evaluation of \(\mathsf {QFlip}\) against specific opponents. \(\mathsf {QFlip}\) converges to the optimal adaptive strategy for Periodic and Exponential opponents using associated state spaces. Finally, we introduce a generalized \(\mathsf {QFlip}\) strategy with composite state space that outperforms a Greedy strategy for several distributions including Periodic and Uniform, without prior knowledge of the opponent’s strategy. We also release an OpenAI Gym environment for \(\mathsf {FlipIt}\) to facilitate future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bowers, K.D., et al.: Defending against the unknown enemy: Applying FLIPIT to system security. In: Proceedings of the Conference on Decision and Game Theory for Security. GameSec (2012)
Chung, K., Kamhoua, C.A., Kwiat, K.A., Kalbarczyk, Z.T., Iyer, R.K.: Game theory with learning for cyber security monitoring. In: 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE), pp. 1–8, January 2016. https://doi.org/10.1109/HASE.2016.48
van Dijk, M., Juels, A., Oprea, A., Rivest, R.L.: FlipIt: The game of stealthy takeover. J. Cryptol. 26, 655–713 (2013)
Elderman, R., Pater, L.J.J., Thie, A.S., Drugan, M.M., Wiering, M.: Adversarial reinforcement learning in a cyber security simulation. In: ICAART (2017)
Farhang, S., Grossklags, J.: FlipLeakage: a game-theoretic approach to protect against stealthy attackers in the presence of information leakage. In: Zhu, Q., Alpcan, T., Panaousis, E., Tambe, M., Casey, W. (eds.) GameSec 2016. LNCS, vol. 9996, pp. 195–214. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47413-7_12
Feng, X., Zheng, Z., Hu, P., Cansever, D., Mohapatra, P.: Stealthy attacks meets insider threats: a three-player game model. In: IEEE Military Communications Conference on MILCOM 2015–2015, pp. 25–30, October 2015. https://doi.org/10.1109/MILCOM.2015.7357413
Feng, X., Zheng, Z., Mohapatra, P., Cansever, D.: A stackelberg game and Markov modeling of moving target defense. In: Rass, S., An, B., Kiekintveld, C., Fang, F., Schauer, S. (eds.) Decision and Game Theory for Security, pp. 315–335. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68711-7_17
Grossklags, J., Reitter, D.: How task familiarity and cognitive predispositions impact behavior in a security game of timing. In: 2014 IEEE 27th Computer Security Foundations Symposium, pp. 111–122, July 2014. https://doi.org/10.1109/CSF.2014.16
Han, Y.: Reinforcement learning for autonomous defence in software-defined networking. In: Bushnell, L., Poovendran, R., Başar, T. (eds.) GameSec 2018. LNCS, vol. 11199, pp. 145–165. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01554-1_9
Hu, P., Li, H., Fu, H., Cansever, D., Mohapatra, P.: Dynamic defense strategy against advanced persistent threat with insiders. In: 2015 IEEE Conference on Computer Communications (INFOCOM), pp. 747–755, April 2015. https://doi.org/10.1109/INFOCOM.2015.7218444
Hu, Q., Lv, S., Shi, Z., Sun, L., Xiao, L.: Defense against advanced persistent threats with expert system for internet of things. In: Ma, L., Khreishah, A., Zhang, Y., Yan, M. (eds.) WASA 2017. LNCS, vol. 10251, pp. 326–337. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60033-8_29
KlÃma, R., Tuyls, K., Oliehoek, F.A.: Markov security games: learning in spatial security problems (2016)
Laszka, A., Horvath, G., Felegyhazi, M., Buttyán, L.: FlipThem: modeling targeted attacks with flipit for multiple resources. In: Poovendran, R., Saad, W. (eds.) GameSec 2014. LNCS, vol. 8840, pp. 175–194. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-12601-2_10
Laszka, A., Johnson, B., Grossklags, J.: Mitigating covert compromises: a game-theoretic model of targeted and non-targeted covert attacks. In: 9th International Conference on Web and Internet Economics (WINE) (2013)
Laszka, A., Johnson, B., Grossklags, J.: Mitigation of targeted and non-targeted covert attacks as a timing game. In: Das, S.K., Nita-Rotaru, C., Kantarcioglu, M. (eds.) GameSec 2013. LNCS, vol. 8252, pp. 175–191. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-02786-9_11
Maleki, H., Valizadeh, S., Koch, W., Bestavros, A., van Dijk, M.: Markov modeling of moving target defense games. In: Proceedings of the 2016 ACM Workshop on Moving Target Defense, MTD 2016, pp. 81–92. ACM, New York (2016). https://doi.org/10.1145/2995272.2995273
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)
Nochenson, A., Grossklags, J.: A behavioral investigation of the FlipIt game. In: 12th Workshop on the Economics of Information Security (WEIS) (2013)
Pham, V., Cid, C.: Are we compromised? modelling security assessment games. In: Grossklags, J., Walrand, J. (eds.) GameSec 2012. LNCS, vol. 7638, pp. 234–247. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-34266-0_14
Reitter, D., Grossklags, J., Nochenson, A.: Risk-seeking in a continuous game of timing. In: 13th International Conference on Cognitive Modeling (ICMM) (2013)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529, 484–503 (2016)
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)
Tesauro, G.: Temporal difference learning and TD-Gammon. Commun. ACM 38(3), 58–68 (1995). https://doi.org/10.1145/203330.203343
Xiao, L., Li, Y., Han, G., Dai, H., Poor, H.V.: A secure mobile crowdsensing game with deep reinforcement learning. IEEE Trans. Inf. Forensics Secur. 13(1), 35–47 (2018). https://doi.org/10.1109/TIFS.2017.2737968
Zhang, M., Zheng, Z., Shroff, N.B.: Stealthy attacks and observable defenses: a game theoretic model under strict resource constraints. In: 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 813–817, December 2014. https://doi.org/10.1109/GlobalSIP.2014.7032232
Zhu, M., Hu, Z., Liu, P.: Reinforcement learning algorithms for adaptive cyber defense against Heartbleed. In: Proceedings of the First ACM Workshop on Moving Target Defense, MTD 2014, pp. 51–58. ACM, New York (2014). https://doi.org/10.1145/2663474.2663481
Acknowledgements
We would like to thank Ronald Rivest, Marten van Dijk, Ari Juels, and Sang Chin for discussions about reinforcement learning in \(\mathsf {FlipIt}\). We thank Matthew Jagielski, Tina Eliassi-Rad, and Lucianna Kiffer for discussing the theoretical analysis. This project was funded by NSF under grant CNS-1717634. This research was also sponsored by the U.S. Army Combat Capabilities Development Command Army Research Laboratory and was accomplished under Cooperative Agreement Number W911NF-13-2-0045 (ARL Cyber Security CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Combat Capabilities Development Command Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Oakley, L., Oprea, A. (2019). \(\mathsf {QFlip}\): An Adaptive Reinforcement Learning Strategy for the \(\mathsf {FlipIt}\) Security Game. In: Alpcan, T., Vorobeychik, Y., Baras, J., Dán, G. (eds) Decision and Game Theory for Security. GameSec 2019. Lecture Notes in Computer Science(), vol 11836. Springer, Cham. https://doi.org/10.1007/978-3-030-32430-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-32430-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32429-2
Online ISBN: 978-3-030-32430-8
eBook Packages: Computer ScienceComputer Science (R0)