Skip to main content

Data Poisoning Attacks in Contextual Bandits

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11199))

Abstract

We study offline data poisoning attacks in contextual bandits, a class of reinforcement learning problems with important applications in online recommendation and adaptive medical treatment, among others. We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector. The target arm and target contextual vector are both chosen by the attacker. That is, the attacker can hijack the behavior of a contextual bandit. We also investigate the feasibility and the side effects of such attacks, and identify future directions for defense. Experiments on both synthetic and real-world data demonstrate the efficiency of the attack algorithm.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In this paper we restrict the poisoning to modifying rewards for ease of exposition. More generally, the attacker can add, remove, or modify both the rewards and the context vectors. Our optimization-based attack framework can be generalized to such stronger attacks, though the optimization could become combinatorial.

  2. 2.

    The choice of norm is application dependent, see e.g., [15, Fig. 3]. Any norm works for the attack formulation.

  3. 3.

    Even if some context \(x^*\) cannot be strongly attacked, the attacker might be able to weakly attack it. Weak attack is sufficient for the attacker to force an arm pull of \(a^*\). However, as \(\epsilon \rightarrow 0\) strong attack approaches weak attack. Thus we only need to characterize strong attacks.

  4. 4.

    URL: https://webscope.sandbox.yahoo.com/catalog.php?datatype=r.

References

  1. Abbasi-Yadkori, Y., Pál, D., Szepesvári, C.: Improved algorithms for linear stochastic bandits. In: Advances in Neural Information Processing Systems (NIPS), pp. 2312–2320 (2011)

    Google Scholar 

  2. Agarwal, A., et al.: Making contextual decisions with low technical debt (2016). coRR abs/1606.03966

    Google Scholar 

  3. Alfeld, S., Zhu, X., Barford, P.: Data poisoning attacks against autoregressive models. In: The 30th AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  4. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)

    Article  Google Scholar 

  5. Biggio, B., Nelson, B., Laskov, P.: Poisoning attacks against support vector machines. In: Proceedings of the 29th International Coference on International Conference on Machine Learning (ICML), pp. 1467–1474 (2012)

    Google Scholar 

  6. Chapelle, O., Manavoglu, E., Rosales, R.: Simple and scalable response prediction for display advertising. ACM Trans. Intell. Syst. Technol. 5(4), 61:1–61:34 (2014)

    Article  Google Scholar 

  7. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (2015)

    Google Scholar 

  8. Greenewald, K., Tewari, A., Murphy, S.A., Klasnja, P.V.: Action centered contextual bandits. In: Advances in Neural Information Processing Systems 30 (NIPS), pp. 5979–5987 (2017)

    Google Scholar 

  9. Jagielski, M., Oprea, A., Biggio, B., Liu, C., Nita-Rotaru, C., Li, B.: Manipulating machine learning: poisoning attacks and countermeasures for regression learning. arXiv preprint arXiv:1804.00308 (2018)

  10. Joseph, A.D., Nelson, B., Rubinstein, B.I.P., Tygar, J.: Adversarial Machine Learning. Cambridge University Press, Cambridge (2018)

    MATH  Google Scholar 

  11. Kuleshov, V., Precup, D.: Algorithms for multi-armed bandit problems (2014). coRR abs/1402.6028

    Google Scholar 

  12. Li, B., Wang, Y., Singh, A., Vorobeychik, Y.: Data poisoning attacks on factorization-based collaborative filtering. In: Advances in Neural Information Processing Systems, pp. 1885–1893 (2016)

    Google Scholar 

  13. Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: Proceedings of the 19th International Conference on World Wide Web (WWW), pp. 661–670 (2010)

    Google Scholar 

  14. Mei, S., Zhu, X.: The security of latent Dirichlet allocation. In: The 18th International Conference on Artificial Intelligence and Statistics (AISTATS) (2015)

    Google Scholar 

  15. Mei, S., Zhu, X.: Using machine teaching to identify optimal training-set attacks on machine learners. In: The 29th AAAI Conference on Artificial Intelligence (2015)

    Google Scholar 

  16. Ng, A.Y., Harada, D., Russell, S.J.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the 16th International Conference on Machine Learning (ICML), pp. 278–287 (1999)

    Google Scholar 

  17. Zhao, M., An, B., Yu, Y., Liu, S., Pan, S.J.: Data poisoning attacks on multi-task relationship learning. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 2628–2635 (2018)

    Google Scholar 

  18. Zhu, X.: Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In: The 29th AAAI Conference on Artificial Intelligence (AAAI “Blue Sky” Senior Member Presentation Track) (2015)

    Google Scholar 

  19. Zhu, X., Singla, A., Zilles, S., Rafferty, A.N.: An overview of machine teaching. arXiv e-prints, January 2018. https://arxiv.org/abs/1801.05927

Download references

Acknowledgment

This work is supported in part by NSF 1545481, 1704117, 1623605, 1561512, and the MADLab AF Center of Excellence FA9550-18-1-0166.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuzhe Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, Y., Jun, KS., Li, L., Zhu, X. (2018). Data Poisoning Attacks in Contextual Bandits. In: Bushnell, L., Poovendran, R., Başar, T. (eds) Decision and Game Theory for Security. GameSec 2018. Lecture Notes in Computer Science(), vol 11199. Springer, Cham. https://doi.org/10.1007/978-3-030-01554-1_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01554-1_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01553-4

  • Online ISBN: 978-3-030-01554-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics