Journal of Zhejiang University SCIENCE C

, Volume 15, Issue 2, pp 107–118 | Cite as

Transfer active learning by querying committee

Article

Abstract

In real applications of inductive learning for classification, labeled instances are often deficient, and labeling them by an oracle is often expensive and time-consuming. Active learning on a single task aims to select only informative unlabeled instances for querying to improve the classification accuracy while decreasing the querying cost. However, an inevitable problem in active learning is that the informative measures for selecting queries are commonly based on the initial hypotheses sampled from only a few labeled instances. In such a circumstance, the initial hypotheses are not reliable and may deviate from the true distribution underlying the target task. Consequently, the informative measures will possibly select irrelevant instances. A promising way to compensate this problem is to borrow useful knowledge from other sources with abundant labeled information, which is called transfer learning. However, a significant challenge in transfer learning is how to measure the similarity between the source and the target tasks. One needs to be aware of different distributions or label assignments from unrelated source tasks; otherwise, they will lead to degenerated performance while transferring. Also, how to design an effective strategy to avoid selecting irrelevant samples to query is still an open question. To tackle these issues, we propose a hybrid algorithm for active learning with the help of transfer learning by adopting a divergence measure to alleviate the negative transfer caused by distribution differences. To avoid querying irrelevant instances, we also present an adaptive strategy which could eliminate unnecessary instances in the input space and models in the model space. Extensive experiments on both the synthetic and the real data sets show that the proposed algorithm is able to query fewer instances with a higher accuracy and that it converges faster than the state-of-the-art methods.

Key words

Active learning Transfer learning Classification 

CLC number

TP3 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Argyriou, A., Maurer, A., Pontil, M., 2008. An algorithm for transfer learning in a heterogeneous environment. Proc. European Conf. on Machine Learning and Knowledge Discovery in Databases, p.71–85. [doi:10.1007/978-3-540-87479-9_23]CrossRefGoogle Scholar
  2. Balcan, M.F., Beygelzimer, A., Langford, J., 2006. Agnostic active learning. Proc. 23rd Int. Conf. on Machine Learning, p.65–72. [doi:10.1145/1143844.1143853]Google Scholar
  3. Cao, B., Pan, S.J., Zhang, Y., et al., 2010. Adaptive transfer learning. Proc. 24th AAAI Conf. on Artificial Intelligence, p.407–412.Google Scholar
  4. Caruana, R., 1997. Multitask learning. Mach. Learn., 28(1):41–75. [doi:10.1023/A:1007379606734]CrossRefMathSciNetGoogle Scholar
  5. Chang, C.C., Lin, C.J., 2001. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27. [doi:10.1145/1961189.1961199]Google Scholar
  6. Chattopadhyay, R., Fan, W., Davidson, I., et al., 2013. Joint transfer and batch-mode active learning. Proc. 30th Int. Conf. on Machine Learning, p.253–261.Google Scholar
  7. Church, K.W., Gale, W.A., 1991. A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams. Comput. Speech Lang., 5(1):19–54. [doi:10.1016/0885-2308(91)90016-J]CrossRefGoogle Scholar
  8. Cohn, D., Atlas, L., Ladner, R., 1994. Improving generalization with active learning. Mach. Learn., 15(2):201–221. [doi:10.1007/BF00993277]Google Scholar
  9. Dagan, I., Engelson, S.P., 1995. Committee-based sampling for training probabilistic classifiers. Proc. 12th Int. Conf. on Machine Learning, p.150–157.Google Scholar
  10. Dai, W., Yang, Q., Xue, G., et al., 2007. Boosting for transfer learning. Proc. 24th Int. Conf. on Machine Learning, p.193–200. [doi:10.1145/1273496.1273521]Google Scholar
  11. Freund, Y., Schapire, R.E., 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci., 55(1):119–139. [doi:10.1006/jcss.1997.1504]CrossRefMATHMathSciNetGoogle Scholar
  12. Harpale, A., Yang, Y., 2010. Active learning for multi-task adaptive filtering. Proc. 27th Int. Conf. on Machine Learning, p.431–438.Google Scholar
  13. Krause, A., Guestrin, C., 2009. Optimal value of information in graphical models. J. Artif. Intell., 35:557–591.MATHMathSciNetGoogle Scholar
  14. Lewis, D.D., Gale, W.A., 1994. A sequential algorithm for training text classifiers. Proc. 17th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.3–12.Google Scholar
  15. Li, H., Shi, Y., Chen, M.Y., et al., 2010. Hybrid active learning for cross-domain video concept detection. Proc. Int. Conf. on Multimedia, p.1003–1006. [doi:10.1145/1873951.1874135]CrossRefGoogle Scholar
  16. Li, L., Jin, X., Pan, S., et al., 2012. Multi-domain active learning for text classification. Proc. 18th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.1086–1094. [doi:10.1145/2339530.2339701]CrossRefGoogle Scholar
  17. Lin, J.H., 1991. Divergence measures based on the Shannon entropy. IEEE Trans. Inform. Theory, 37(1):145–151. [doi:10.1109/18.61115]CrossRefMATHMathSciNetGoogle Scholar
  18. Luo, C.Y., Ji, Y.S., Dai, X.Y., et al., 2012. Active learning with transfer learning. Proc. ACL Student Research Workshop, p.13–18.Google Scholar
  19. McCallum, A.K., Nigam, K., 1998. Employing EM and pool-based active learning for text classification. Proc. 15th Int. Conf. on Machine Learning, p.350–358.Google Scholar
  20. Muslea, I., Minton, S., Knoblock, C.A., 2002. Active+semi-supervised learning = robust multi-view learning. Proc. 19th Int. Conf. on Machine Learning, p.435–442.Google Scholar
  21. Pereira, F., Tishby, N., Lee, L., 1993. Distributional clustering of English words. Proc. 31st Annual Meeting of Association for Computational Linguistics, p.183–190. [doi:10.3115/981574.981598]CrossRefGoogle Scholar
  22. Rajan, S., Ghosh, J., Crawford, M.M., 2006. An active learning approach to knowledge transfer for hyperspectral data analysis. Proc. IEEE Int. Conf. on Geoscience and Remote Sensing Symp., p.541–544. [doi:10.1109/IGARSS.2006.143]Google Scholar
  23. Reichart, R., Tomanek, K., Hahn, U., et al., 2008. Multi-task active learning for linguistic annotations. Proc. Annual Meeting of Association for Computational Linguistics, p.861–869.Google Scholar
  24. Rosenstein, M.T., Marx, Z., Kaelbling, L.P., et al., 2005. To transfer or not to transfer. Proc. NIPS Workshop on Inductive Transfer: 10 Years Later.Google Scholar
  25. Roy, N., McCallum, A., 2001. Toward optimal active learning hrough sampling estimation of error reduction. Proc. 18th Int. Conf. on Machine Learning, p.441–448.Google Scholar
  26. Settles, B., 2010. Active Learning Literature Survey. Technical Report No. 1648, University of Wisconsin, Madison.Google Scholar
  27. Seung, H.S., Opper, M., Sompolinsky, H., 1992. Query by committee. Proc. 5th Annual Workshop on Computational Learning Theory, p.287–294. [doi:10.1145/130385.130417]CrossRefGoogle Scholar
  28. Shao, H., Suzuki, E., 2011. Feature-based inductive transfer learning through minimum encoding. Proc. SIAM Int. Conf. on Data Mining, p.259–270.Google Scholar
  29. Shao, H., Tong, B., Suzuki, E., 2011. Compact coding for hyperplane classifiers in heterogeneous environment. Proc. European Conf. on Machine Learning and Knowledge Discovery in Databases, p.207–222. [doi:10.1007/978-3-642-23808-6_14]CrossRefGoogle Scholar
  30. Shi, X.X., Fan, W., Ren, J.T., 2008. Actively transfer domain knowledge. Proc. European Conf. on Machine Learning and Knowledge Discovery in Databases, p.342–357. [doi:10.1007/978-3-540-87481-2_23]CrossRefGoogle Scholar
  31. Shi, Y., Lan, Z.Z., Liu, W., et al., 2009. Extending semi-supervised learning methods for inductive transfer learning. Proc. 9th IEEE Int. Conf. on Data Mining, p.483–492. [doi:10.1109/ICDM.2009.75]Google Scholar
  32. Yang, L., Hanneke, S., Carbonell, J., 2013. A theory of transfer learning with applications to active learning. Mach. Learn., 90(2):161–189. [doi:10.1007/s10994-012-5310-y]CrossRefMATHMathSciNetGoogle Scholar
  33. Zhang, Y., 2010. Multi-task active learning with output constraints. Proc. 24th AAAI Conf. on Artificial Intelligence, p.667–672.Google Scholar
  34. Zhu, Z., Zhu, X., Ye, Y., et al., 2011. Transfer active learning. Proc. 20th ACM Int. Conf. on Information and Knowledge Management, p.2169–2172. [doi:10.1145/2063576.2063918]Google Scholar
  35. Zhuang, F., Luo, P., Shen, Z., et al., 2010. Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification. Proc. 19th ACM Int. Conf. on Information and Knowledge Management, p.359–368. [doi:10.1145/1871437.1871486]Google Scholar

Copyright information

© Journal of Zhejiang University Science Editorial Office and Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.School of WTO Research & EducationShanghai University of International Business and EconomicsShanghaiChina
  2. 2.School of BusinessEast China University of Science and TechnologyShanghaiChina
  3. 3.School of Computer Science and TechnologyUniversity of Science and Technology of ChinaHefeiChina

Personalised recommendations