Hierarchical Active Learning with Proportion Feedback on Regions

  • Zhipeng LuoEmail author
  • Milos Hauskrecht
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11052)


Learning of classification models in practice often relies on human annotation effort in which humans assign class labels to data instances. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To solve this problem, instead of soliciting instance-based annotation we explore region-based annotation as the feedback. A region is defined as a hyper-cubic subspace of the input feature space and it covers a subpopulation of data instances that fall into this region. Each region is labeled with a number in [0, 1] (in binary classification setting), representing a human estimate of the positive (or negative) class proportion in the subpopulation. To learn a classifier from region-based feedback we develop an active learning framework that hierarchically divides the input space into smaller and smaller regions. In each iteration we split the region with the highest potential to improve the classification models. This iterative process allows us to gradually learn more refined classification models from more specific regions with more accurate proportions. Through experiments on numerous datasets we demonstrate that our approach offers a new and promising active learning direction that can outperform existing active learning approaches especially in situations when labeling budget is limited and small. Code related to this paper is available at:


Active learning Proportion label Classification 



The work presented in this paper was supported by NIH grants R01GM088224 and R01LM010019. The content of the paper is solely the responsibility of the authors and does not necessarily represent the official views of NIH.


  1. 1.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
  2. 2.
    Dasgupta, S., Hsu, D.: Hierarchical sampling for active learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 208–215. ACM (2008)Google Scholar
  3. 3.
    Du, J., Ling, C.X.: Asking generalized queries to domain experts to improve learning. IEEE Trans. Knowl. Data Eng. 22(6), 812–825 (2010)CrossRefGoogle Scholar
  4. 4.
    Hauskrecht, M., et al.: Outlier-based detection of unusual patient-management actions: an ICU study. J. Biomed. Inform. 64, 211–221 (2016)CrossRefGoogle Scholar
  5. 5.
    Hauskrecht, M., Batal, I., Valko, M., Visweswaran, S., Cooper, G.F., Clermont, G.: Outlier detection for patient monitoring and alerting. J. Biomed. Inform. 46(1), 47–55 (2013)CrossRefGoogle Scholar
  6. 6.
    Kück, H., de Freitas, N.: Learning about individuals from group statistics. CoRR abs/1207.1393 (2012).
  7. 7.
    Luo, Z., Hauskrecht, M.: Hierarchical active learning with group proportion feedback. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 2532–2538 (2018)Google Scholar
  8. 8.
    Nguyen, Q., Valizadegan, H., Hauskrecht, M.: Learning classification models with soft-label information. J. Am. Med. Inform. Assoc. 21(3), 501–508 (2014)CrossRefGoogle Scholar
  9. 9.
    Patrini, G., Nock, R., Rivera, P., Caetano, T.: (Almost) no label no cry. In: Advances in Neural Information Processing Systems, pp. 190–198 (2014)Google Scholar
  10. 10.
    Quadrianto, N., Smola, A.J., Caetano, T.S., Le, Q.V.: Estimating labels from label proportions. J. Mach. Learn. Res. 10, 2349–2374 (2009)MathSciNetzbMATHGoogle Scholar
  11. 11.
    Rashidi, P., Cook, D.J.: Ask me better questions: active learning queries based on rule induction. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 904–912. ACM (2011)Google Scholar
  12. 12.
    Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Urner, R., Wulff, S., Ben-David, S.: PLAL: cluster-based active learning. In: Conference on Learning Theory, pp. 376–397 (2013)Google Scholar
  14. 14.
    Valizadegan, H., Nguyen, Q., Hauskrecht, M.: Learning classification models from multiple experts. J. Biomed. Inform. 46(6), 1125–1135 (2013)CrossRefGoogle Scholar
  15. 15.
    Xue, Y., Hauskrecht, M.: Active learning of classification models with likert-scale feedback. In: SIAM Data Mining Conference. SIAM (2017)Google Scholar
  16. 16.
    Yu, F., Liu, D., Kumar, S., Tony, J., Chang, S.F.: \(\backslash \)proptoSVM for learning with label proportions. In: ICML, pp. 504–512 (2013)Google Scholar
  17. 17.
    Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In: ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, vol. 3 (2003)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of PittsburghPittsburghUSA

Personalised recommendations