Hierarchical Active Learning with Proportion Feedback on Regions
Learning of classification models in practice often relies on human annotation effort in which humans assign class labels to data instances. As this process can be very time-consuming and costly, finding effective ways to reduce the annotation cost becomes critical for building such models. To solve this problem, instead of soliciting instance-based annotation we explore region-based annotation as the feedback. A region is defined as a hyper-cubic subspace of the input feature space and it covers a subpopulation of data instances that fall into this region. Each region is labeled with a number in [0, 1] (in binary classification setting), representing a human estimate of the positive (or negative) class proportion in the subpopulation. To learn a classifier from region-based feedback we develop an active learning framework that hierarchically divides the input space into smaller and smaller regions. In each iteration we split the region with the highest potential to improve the classification models. This iterative process allows us to gradually learn more refined classification models from more specific regions with more accurate proportions. Through experiments on numerous datasets we demonstrate that our approach offers a new and promising active learning direction that can outperform existing active learning approaches especially in situations when labeling budget is limited and small. Code related to this paper is available at: https://github.com/patrick-luo/hierarchical-active-learning.git.
KeywordsActive learning Proportion label Classification
The work presented in this paper was supported by NIH grants R01GM088224 and R01LM010019. The content of the paper is solely the responsibility of the authors and does not necessarily represent the official views of NIH.
- 1.Asuncion, A., Newman, D.: UCI machine learning repository (2007)Google Scholar
- 2.Dasgupta, S., Hsu, D.: Hierarchical sampling for active learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 208–215. ACM (2008)Google Scholar
- 6.Kück, H., de Freitas, N.: Learning about individuals from group statistics. CoRR abs/1207.1393 (2012). http://arxiv.org/abs/1207.1393
- 7.Luo, Z., Hauskrecht, M.: Hierarchical active learning with group proportion feedback. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, pp. 2532–2538 (2018)Google Scholar
- 9.Patrini, G., Nock, R., Rivera, P., Caetano, T.: (Almost) no label no cry. In: Advances in Neural Information Processing Systems, pp. 190–198 (2014)Google Scholar
- 11.Rashidi, P., Cook, D.J.: Ask me better questions: active learning queries based on rule induction. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 904–912. ACM (2011)Google Scholar
- 13.Urner, R., Wulff, S., Ben-David, S.: PLAL: cluster-based active learning. In: Conference on Learning Theory, pp. 376–397 (2013)Google Scholar
- 15.Xue, Y., Hauskrecht, M.: Active learning of classification models with likert-scale feedback. In: SIAM Data Mining Conference. SIAM (2017)Google Scholar
- 16.Yu, F., Liu, D., Kumar, S., Tony, J., Chang, S.F.: \(\backslash \)proptoSVM for learning with label proportions. In: ICML, pp. 504–512 (2013)Google Scholar
- 17.Zhu, X., Lafferty, J., Ghahramani, Z.: Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In: ICML 2003 Workshop on the Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, vol. 3 (2003)Google Scholar