Data Poisoning Attacks in Contextual Bandits

Ma, Yuzhe; Jun, Kwang-Sung; Li, Lihong; Zhu, Xiaojin

doi:10.1007/978-3-030-01554-1_11

Data Poisoning Attacks in Contextual Bandits

Yuzhe Ma¹⁶,
Kwang-Sung Jun¹⁶,
Lihong Li¹⁷ &
…
Xiaojin Zhu¹⁶

Conference paper
First Online: 26 September 2018

2128 Accesses
14 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11199))

Abstract

We study offline data poisoning attacks in contextual bandits, a class of reinforcement learning problems with important applications in online recommendation and adaptive medical treatment, among others. We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector. The target arm and target contextual vector are both chosen by the attacker. That is, the attacker can hijack the behavior of a contextual bandit. We also investigate the feasibility and the side effects of such attacks, and identify future directions for defense. Experiments on both synthetic and real-world data demonstrate the efficiency of the attack algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
In this paper we restrict the poisoning to modifying rewards for ease of exposition. More generally, the attacker can add, remove, or modify both the rewards and the context vectors. Our optimization-based attack framework can be generalized to such stronger attacks, though the optimization could become combinatorial.
2.
The choice of norm is application dependent, see e.g., [15, Fig. 3]. Any norm works for the attack formulation.
3.
Even if some context \(x^*\) cannot be strongly attacked, the attacker might be able to weakly attack it. Weak attack is sufficient for the attacker to force an arm pull of \(a^*\). However, as \(\epsilon \rightarrow 0\) strong attack approaches weak attack. Thus we only need to characterize strong attacks.
4.
URL: https://webscope.sandbox.yahoo.com/catalog.php?datatype=r.

References

Abbasi-Yadkori, Y., Pál, D., Szepesvári, C.: Improved algorithms for linear stochastic bandits. In: Advances in Neural Information Processing Systems (NIPS), pp. 2312–2320 (2011)
Google Scholar
Agarwal, A., et al.: Making contextual decisions with low technical debt (2016). coRR abs/1606.03966
Google Scholar
Alfeld, S., Zhu, X., Barford, P.: Data poisoning attacks against autoregressive models. In: The 30th AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Article Google Scholar
Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. In: Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML), pp. 1467–1474 (2012)
Google Scholar
Chapelle, O., Manavoglu, E., Rosales, R.: Simple and scalable response prediction for display advertising. ACM Trans. Intell. Syst. Technol. 5(4), 61:1–61:34 (2014)
Article Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)
Google Scholar
Greenewald, K., Tewari, A., Murphy, S.A., Klasnja, P.V.: Action centered contextual bandits. In: Advances in Neural Information Processing Systems 30 (NIPS), pp. 5979–5987 (2017)
Google Scholar
Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., Li, B.: Manipulating machine learning: poisoning attacks and countermeasures for regression learning. arXiv preprint arXiv:1804.00308 (2018)
Joseph, A.D., Nelson, B., Rubinstein, B.I.P., Tygar, J.: Adversarial Machine Learning. Cambridge University Press, Cambridge (2018)
MATH Google Scholar
Kuleshov, V., Precup, D.: Algorithms for multi-armed bandit problems (2014). coRR abs/1402.6028
Google Scholar
Li, B., Wang, Y., Singh, A., Vorobeychik, Y.: Data poisoning attacks on factorization-based collaborative filtering. In: Advances in Neural Information Processing Systems, pp. 1885–1893 (2016)
Google Scholar
Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web (WWW), pp. 661–670 (2010)
Google Scholar
Mei, S., Zhu, X.: The security of latent Dirichlet allocation. In: The 18th International Conference on Artificial Intelligence and Statistics (AISTATS) (2015)
Google Scholar
Mei, S., Zhu, X.: Using machine teaching to identify optimal training-set attacks on machine learners. In: The 29th AAAI Conference on Artificial Intelligence (2015)
Google Scholar
Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the 16th International Conference on Machine Learning (ICML), pp. 278–287 (1999)
Google Scholar
Zhao, M., An, B., Yu, Y., Liu, S., Pan, S.J.: Data poisoning attacks on multi-task relationship learning. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 2628–2635 (2018)
Google Scholar
Zhu, X.: Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In: The 29th AAAI Conference on Artificial Intelligence (AAAI “Blue Sky” Senior Member Presentation Track) (2015)
Google Scholar
Zhu, X., Singla, A., Zilles, S., Rafferty, A.N.: An overview of machine teaching. arXiv e-prints, January 2018. https://arxiv.org/abs/1801.05927

Download references

Acknowledgment

This work is supported in part by NSF 1545481, 1704117, 1623605, 1561512, and the MADLab AF Center of Excellence FA9550-18-1-0166.

Author information

Authors and Affiliations

University of Wisconsin-Madison, Madison, USA
Yuzhe Ma, Kwang-Sung Jun & Xiaojin Zhu
Google Brain, Kirkland, WA, USA
Lihong Li

Authors

Yuzhe Ma
View author publications
You can also search for this author in PubMed Google Scholar
Kwang-Sung Jun
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojin Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuzhe Ma .

Editor information

Editors and Affiliations

University of Washington, Seattle, WA, USA
Linda Bushnell
University of Washington, Seattle, WA, USA
Radha Poovendran
University of Illinois at Urbana–Champaign, Urbana, IL, USA
Tamer Başar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, Y., Jun, KS., Li, L., Zhu, X. (2018). Data Poisoning Attacks in Contextual Bandits. In: Bushnell, L., Poovendran, R., Başar, T. (eds) Decision and Game Theory for Security. GameSec 2018. Lecture Notes in Computer Science(), vol 11199. Springer, Cham. https://doi.org/10.1007/978-3-030-01554-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-01554-1_11
Published: 26 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01553-4
Online ISBN: 978-3-030-01554-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics