Batch Mode Active Learning for Networked Data with Optimal Subset Selection

  • Haihui Xu
  • Pengpeng ZhaoEmail author
  • Victor S. Sheng
  • Guanfeng Liu
  • Lei Zhao
  • Jian Wu
  • Zhiming Cui
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9098)


Active learning has increasingly become an important paradigm for classification of networked data, where instances are connected with a set of links to form a network. In this paper, we propose a novel batch mode active learning method for networked data (BMALNeT). Our novel active learning method selects the best subset of instances from the unlabeled set based on the correlation matrix that we construct from the dedicated informativeness evaluation of each unlabeled instance. To evaluate the informativeness of each unlabeled instance accurately, we simultaneously exploit content information and the network structure to capture the uncertainty and representativeness of each instance and the disparity between any two instances. Compared with state-of-the-art methods, our experimental results on three real-world datasets demonstrate the effectiveness of our proposed method.


Active learning Batch mode Correlation matrix Optimal subset 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Baldridge, J., Osborne, M.: Active learning and the total cost of annotation. In: EMNLP 2004, A meeting of SIGDAT, pp. 9–16 (2004)Google Scholar
  2. 2.
    Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. J. Artif. Intell. Res. (JAIR) 4, 129–145 (1996)zbMATHGoogle Scholar
  3. 3.
    Macskassy, S.A.: Using graph-based metrics with empirical risk minimization to speed up active learning on networked data. In: KDDM 2009, pp. 597–606. ACM (2009)Google Scholar
  4. 4.
    Shi, L., Zhao, Y., Tang, J.: Batch mode active learning for networked data. ACM Transactions on Intelligent Systems and Technology (TIST) 3(2), 33 (2012)Google Scholar
  5. 5.
    Yang, Z., Tang, J., Xu, B., Xing, C.: Active learning for networked data based on non-progressive diffusion model. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 363–372. ACM (2014)Google Scholar
  6. 6.
    Joshi, A.J., Porikli, F., Papanikolopoulos, N.: Multi-class active learning for image classification. In: CVPR 2009, pp. 2372–2379. IEEE (2009)Google Scholar
  7. 7.
    Melville, P., Mooney, R.J.: Diverse ensembles for active learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 74. ACM (2004)Google Scholar
  8. 8.
    Jensen, D., Neville, J., Gallagher, B.: Why collective inference improves relational classification. In: KDDM 2004, pp. 593–598. ACM (2004)Google Scholar
  9. 9.
    Hu, X., Tang, J., Gao, H., Liu, H.: Actnet: Active learning for networked texts in microblogging. In: SDM, pp. 306–314. SIAM (2013)Google Scholar
  10. 10.
    Cesa-Bianchi, N., Gentile, C., Vitale, F., Zappella, G.: Active learning on trees and graphs. arXiv preprint arXiv:1301.5112 (2013)
  11. 11.
    Fang, M., Yin, J., Zhang, C., Zhu, X., Fang, M., Yin, J., Zhu, X., Zhang, C.: Active class discovery and learning for networked data. In: SDM, pp. 315–323. SIAM (2013)Google Scholar
  12. 12.
    Bilgic, M., Mihalkova, L., Getoor, L.: Active learning for networked data. In: ICML 2010, pp. 79–86 (2010)Google Scholar
  13. 13.
    Zhuang, H., Tang, J., Tang, W., Lou, T., Chin, A., Wang, X.: Actively learning to infer social ties. Data Mining and Knowledge Discovery 25(2), 270–297 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  14. 14.
    Newman, M.: Networks: an introduction. Oxford University Press (2010)Google Scholar
  15. 15.
    Freeman, L.C.: A set of measures of centrality based on betweenness. Sociometry, 35–41 (1977)Google Scholar
  16. 16.
    Brandes, U.: On variants of shortest-path betweenness centrality and their generic computation. Social Networks 30(2), 136–145 (2008)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Fu, Y., Zhu, X., Elmagarmid, A.K.: Active learning with optimal instance subset selection. IEEE Transactions on Cybernetics 43(2), 464–475 (2013)CrossRefGoogle Scholar
  18. 18.
    Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM) 42(6), 1115–1145 (1995)zbMATHMathSciNetCrossRefGoogle Scholar
  19. 19.
    Fujisawa, K., Kojima, M., Nakata, K.: Sdpa (semidefinite programming algorithm) user manual-version 4.10. Department of Mathematical and Computing Science, Tokyo Institute of Technology, Research Report, Tokyo (1998)Google Scholar
  20. 20.
    Sen, P., Namata, G.M., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)Google Scholar
  21. 21.
    Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Haihui Xu
    • 1
  • Pengpeng Zhao
    • 1
    Email author
  • Victor S. Sheng
    • 2
  • Guanfeng Liu
    • 1
  • Lei Zhao
    • 1
  • Jian Wu
    • 1
  • Zhiming Cui
    • 1
  1. 1.School of Computer Science and TechnologySoochow UniversitySuzhouPeople’s Republic of China
  2. 2.Computer Science DepartmentUniversity of Central ArkansasConwayUSA

Personalised recommendations