Robust Contextual Bandit via the Capped-\(\ell _{2}\) Norm for Mobile Health Intervention

  • Feiyun Zhu
  • Xinliang Zhu
  • Sheng Wang
  • Jiawen Yao
  • Zhichun Xiao
  • Junzhou HuangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11046)


This paper considers the actor-critic contextual bandit for the mobile health (mHealth) intervention. The state-of-the-art decision-making methods in the mHealth generally assume that the noise in the dynamic system follows the Gaussian distribution. Those methods use the least-square-based algorithm to estimate the expected reward, which is prone to the existence of outliers. To deal with the issue of outliers, we are the first to propose a novel robust actor-critic contextual bandit method for the mHealth intervention. In the critic updating, the capped-\(\ell _{2}\) norm is used to measure the approximation error, which prevents outliers from dominating our objective. A set of weights could be achieved from the critic updating. Considering them gives a weighted objective for the actor updating. It provides the ineffective sample in the critic updating with zero weights for the actor updating. As a result, the robustness of both actor-critic updating is enhanced. There is a key parameter in the capped-\(\ell _{2}\) norm. We provide a reliable method to properly set it by making use of one of the most fundamental definitions of outliers in statistics. Extensive experiment results demonstrate that our method can achieve almost identical results compared with the state-of-the-art methods on the dataset without outliers and dramatically outperform them on the datasets noised by outliers.


  1. 1.
    Dudík, M., Langford, J., Li, L.: Doubly robust policy evaluation and learning. In: ICML, pp. 1097–1104 (2011)Google Scholar
  2. 2.
    Gao, H., Nie, F., Cai, T.W., Huang, H.: Robust capped norm nonnegative matrix factorization: capped norm NMF. In: ACM International Conference on Information and Knowledge, pp. 871–880 (2015)Google Scholar
  3. 3.
    Grondman, I., Busoniu, L., Lopes, G.A.D., Babuska, R.: A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans. Syst. Man Cybern. 42(6), 1291–1307 (2012)CrossRefGoogle Scholar
  4. 4.
    Lei, H.: An online actor critic algorithm and a statistical decision procedure for personalizing intervention. Ph.D. thesis, University of Michigan (2016)Google Scholar
  5. 5.
    Lei, H., Tewari, A., Murphy, S.: An actor-critic contextual bandit algorithm for personalized interventions using mobile devices. In: NIPS 2014 Workshop: Personalization: Methods and Applications, pp. 1–9 (2014)Google Scholar
  6. 6.
    Li, L., Chu, W., Langford, J., Schapire, R.E.: A contextual-bandit approach to personalized news article recommendation. In: International Conference on World Wide Web (WWW), pp. 661–670 (2010)Google Scholar
  7. 7.
    Liao, P., Tewari, A., Murphy, S.: Constructing just-in-time adaptive interventions. Ph.D. Section Proposal, pp. 1–49 (2015)Google Scholar
  8. 8.
    Murphy, S.A., Deng, Y., Laber, E.B., Maei, H.R., Sutton, R.S., Witkiewitz, K., et al.: A batch, off-policy, actor-critic algorithm for optimizing the average reward. CoRR abs/ arXiv:1607.05047 (2016)
  9. 9.
    Nie, F., Wang, H., Cai, X., Huang, H., Ding, C.: Robust matrix completion via joint schatten p-norm and lp-norm minimization. In: IEEE International Conference on Data Mining (ICDM), pp. 566–574. Washington, DC, USA (2012)Google Scholar
  10. 10.
    Sun, Q., Xiang, S., Ye, Y.: Robust principal component analysis via capped norms. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 311–319 (2013)Google Scholar
  11. 11.
    Tewari, A., Murphy, S.A.: From ads to interventions: contextual bandits in mobile health. In: Rehg, J., Murphy, S.A., Kumar, S. (eds.) Mobile Health: Sensors, Analytic Methods, and Applications. Springer, Berlin (2017)Google Scholar
  12. 12.
    Zhou, L., Brunskill, E.: Latent contextual bandits and their application to personalized recommendations for new users. In: International Joint Conference on Artificial Intelligence, pp. 3646–3653 (2016)Google Scholar
  13. 13.
    Zhu, F., Fan, B., Zhu, X., Wang, Y., Xiang, S., Pan, C., et al.: 10,000\(+\) times accelerated robust subset selection (ARSS). Proc. Assoc. Adv. Artif. Intell. (AAAI). 3217–3224 (2015).
  14. 14.
    Zhu, F., Wang, Y., Fan, B., Meng, G., Pan, C.: Effective spectral unmixing via robust representation and learning-based sparsity. CoRR abs/1409.0685 (2014).

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Feiyun Zhu
    • 1
    • 3
  • Xinliang Zhu
    • 1
  • Sheng Wang
    • 1
  • Jiawen Yao
    • 1
  • Zhichun Xiao
    • 3
  • Junzhou Huang
    • 1
    • 2
    Email author
  1. 1.Department of CSEUniversity of Texas at ArlingtonArlingtonUSA
  2. 2.Tencent AI LabShenzhenChina
  3. 3.Walmart (Sam’s Club) TechnologyDallasUSA

Personalised recommendations