, Volume 29, Issue 1, pp 1–33 | Cite as

On active learning methods for manifold data

  • Hang Li
  • Enrique Del CastilloEmail author
  • George Runger
Invited Paper


Active learning is a major area of interest within the field of machine learning, especially when the labeled instances are very difficult, time-consuming or expensive to obtain. In this paper, we review various active learning methods for manifold data, where the intrinsic manifold structure of data is also incorporated into the active learning query strategies. In addition, we present a new manifold-based active learning algorithm for Gaussian process classification. This new method uses a data-dependent kernel derived from a semi-supervised model that considers both labeled and unlabeled data. The method performs a regularization on the smoothness of the fitted function with respect to both the ambient space and the manifold where the data lie. The regularization parameter is treated as an additional kernel (covariance) parameter and estimated from the data, permitting adaptation of the kernel to the given dataset manifold geometry. Comparisons with other AL methods for manifold data show faster learning performance in our empirical experiments. MATLAB code that reproduces all examples is provided as supplementary materials.


Active learning Gaussian process Classification Optimal design 

Mathematics Subject Classification




We thank the anonymous referees and the editors for useful suggestions that have significantly improved the presentation of this paper.


Funding was provided by National Science Foundation (US) (Grant No. 1537987).


  1. Alaeddini A, Craft E, Meka R, Martinez S (2019) Sequential Laplacian regularized V-optimal design of experiments for response surface modeling of expensive tests: an application in wind tunnel testing. IISE Trans. CrossRefGoogle Scholar
  2. Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68:337–404MathSciNetCrossRefGoogle Scholar
  3. Atlas LE, Cohn DA, Ladner RE (1990) Training connectionist networks with queries and selective sampling. In: Touretzky DS (ed) Advances in neural information processing systems, vol 2. Morgan-Kaufmann, Burlington, pp 566–573Google Scholar
  4. Balcan MF, Beygelzimer A, Langford J (2009) Agnostic active learning. J Comput Syst Sci 75(1):78–89MathSciNetCrossRefGoogle Scholar
  5. Belkin M (2003) Problems of learning on manifolds. Ph.D. thesis, The University of ChicagoGoogle Scholar
  6. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396CrossRefGoogle Scholar
  7. Belkin M, Niyogi P (2005) Towards a theoretical foundation for Laplacian-based manifold methods. In: Proceedings of conference on learning theoryGoogle Scholar
  8. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434MathSciNetzbMATHGoogle Scholar
  9. Bishop C (2006) Pattern recognition and machine learning. Springer, New YorkzbMATHGoogle Scholar
  10. Cai D, He X (2012) Manifold adaptive experimental design for text categorization. IEEE Trans Knowl Data Eng 24(4):707–719CrossRefGoogle Scholar
  11. Chaudhuri K, Kakade SM, Netrapalli P, Sanghavi S (2015) Convergence rates of active learning for maximum likelihood estimation. Adv Neural Inf Process Syst 28:1090–1098Google Scholar
  12. Chen C, Chen Z, Bu J, Wang C, Zhang L, Zhang C (2010) G-optimal design with Laplacian regularization. In: Proceedings of the twenty-fourth AAAI conference on artificial intelligence, vol 1, pp 413–418Google Scholar
  13. Chu W, Ghahramani Z (2005) Preference learning with Gaussian processes. In: Proceedings of the 22nd international conference on machine learning, ICML’05. ACM, New York, NY, USA, pp 137–144.
  14. Chu W, Sindhwani V, Ghahramani Z, Keerthi SS (2007) Relational learning with Gaussian processes. In: Proceedings of the 19th international conference on neural information processing systems, pp 289–296Google Scholar
  15. Cohn D (1994) Neural network exploration using optimal experiment design. Adv Neural Inf Process Syst 6:679–686Google Scholar
  16. Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Mach Learn 15(2):201–221. CrossRefGoogle Scholar
  17. Coifman R, Lafon S, Lee A, Maggioni M, Nadler B, Warner F, Zuker S (2005) Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc Natl Acad Sci 102(21):7426–7431CrossRefGoogle Scholar
  18. Dasgupta S, Hsu D, Monteleoni C (2007) A general agnostic active learning algorithm. Adv Neural Inf Process Syst 20:353–360Google Scholar
  19. Dasgupta S, Kalai AT, Monteleoni C (2009) Analysis of perceptron-based active learning. J Mach Learn Res 10:281–299MathSciNetzbMATHGoogle Scholar
  20. Donoho D, Grimes C (2003) Hessian eigenmaps: locally linear embedding techniques for high dimensional data. Proc Natl Acad Sci 100(10):5591–5596MathSciNetCrossRefGoogle Scholar
  21. Evans LPG, Adams NM, Anagnostopoulos C (2015) Estimating optimal active learning via model retraining improvementGoogle Scholar
  22. Fedorov VV (1972) Theory of optimal experiments. Academic Press, CambridgeGoogle Scholar
  23. Freund Y, Seung HS, Shamir E, Tishby N (1997) Selective sampling using the query by committee algorithm. Mach Learn 28(2):133–168CrossRefGoogle Scholar
  24. Gal Y, Islam R, Ghahramani Z (2017) Deep Bayesian active learning with image data.
  25. Hanneke S (2007) A bound on the label complexity of agnostic active learning. In: Proceedings of the 24th international conference on machine learning, ICML’07. ACM, New York, NY, USA, pp 353–360.
  26. He X (2010) Laplacian regularized D-optimal design for active learning and its application to image retrieval. IEEE Trans Imgae Process 19(1):254–263MathSciNetCrossRefGoogle Scholar
  27. Hein M, Audibert JY, von Luxburg U (2005) From graphs to manifolds–weak and strong pointwise consistency of graph Laplacians. In: Proceedings of the 18th conference on learning theory (2005)Google Scholar
  28. Houlsby N, Huszár F, Ghahramani Z, Lengyel M (2011) Bayesian active learning for classification and preference learningGoogle Scholar
  29. Joshi AJ, Porikli F, Papanikolopoulos N (2009) Multiclass active learning for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2372–2379Google Scholar
  30. Kapoor A, Grauman K, Urtasun R, Darrell T (2007) Active learning with Gaussian processes for object categorization. In: IEEE 11th international conference on computer vision, vol 2Google Scholar
  31. Lafon S (2004) Diffusion maps and geometric harmonics. Ph.D. thesis, Yale UniversityGoogle Scholar
  32. Li X, Guo Y (2013) Adaptive active learning for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 859–866Google Scholar
  33. Li C, Liu H, Cai D (2014) Active learning on manifolds. Neurocomputing 123:398–405CrossRefGoogle Scholar
  34. McCallum A, Nigam K (1998) Employing EM and pool-based active learning for text classification. In: Proceedings of the fifteenth international conference on machine learning, ICML’98. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 350–358. Retrieved 18 Dec 2019
  35. Minka TP (2001) A family of algorithms for approximate Bayesian inference. Ph.D. thesis, Massachusetts Institute of TechnologyGoogle Scholar
  36. Nickisch H, Rasmussen CE (2008) Approximations for binary Gaussian process classification. J Mach Learn Res 9:2035–2078MathSciNetzbMATHGoogle Scholar
  37. Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, CambridgezbMATHGoogle Scholar
  38. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386CrossRefGoogle Scholar
  39. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326CrossRefGoogle Scholar
  40. Roy N, Mccallum A (2001) Toward optimal active learning through Monte Carlo estimation of error reduction. In: Proceedings of the international conference on machine learningGoogle Scholar
  41. Seeger M (2003) Bayesian Gaussian process models: Pac-Bayesian generalisation error bounds and sparse approximations. Ph.D. thesis, University of EdinburghGoogle Scholar
  42. Seeger M (2005) Expectation propagation for exponential familiesGoogle Scholar
  43. Settles B (2010) Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–MadisonGoogle Scholar
  44. Settles B (2012) Active learning. Morgan & Claypool, New YorkzbMATHGoogle Scholar
  45. Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on computational learning theory, COLT’92. ACM, New York, NY, USA, pp 287–294.
  46. Shannon C (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423MathSciNetCrossRefGoogle Scholar
  47. Sindhwani V, Niyogi P, Belkin M (2005) Beyond the point cloud: from transductive to semi-supervised learning. In: Proceedings, twenty second international conference on machine learningGoogle Scholar
  48. Sindhwani V, Chu W, Keerthi SS (2007) Semi-supervised Gaussian process classifiers. In: International joint conference on artificial intelligence, pp 1059–1064Google Scholar
  49. Sun S, Hussain Z, Shawe-Taylor J (2014) Manifold-preserving graph reduction for sparse semi-supervised learning. Neurocomputing 124:13–21CrossRefGoogle Scholar
  50. Tenenbaum JB, de Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323CrossRefGoogle Scholar
  51. Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the ninth ACM international conference on multimedia, MULTIMEDIA’01. ACM, New York, NY, USA, pp 107–118.
  52. Wahba G (1990) Spline models for observational data. Society for Industrial and Applied Mathematics, PhiladelphiaCrossRefGoogle Scholar
  53. Xu H, Yu L, Davenport MA, Zha H (2017) Active manifold learning via a unified framework for manifold landmarking.
  54. Yao G, Lu K, He X (2013) G-optimal feature selection with Laplacian regularization. Neurocomputing 119:175–181CrossRefGoogle Scholar
  55. Yu K, Bi J, Tresp V (2006) Active learning via transductive experimental design. In: Proceedings of the 23rd international conference on machine learningGoogle Scholar
  56. Yu K, Zhu S, Xu W, Gong Y (2008) Non-greedy active learning for text categorization using convex transductive experimental design. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 635–642 (2008)Google Scholar
  57. Zeng J, Lesnikowski A, Alvarez JM (2018) The relevance of Bayesian layer positioning to model uncertainty in deep Bayesian active learning.
  58. Zhou J, Sun S (2014) Active learning of gaussian processes with manifold-preserving graph reduction. Neural Comput Appl 25:1615–1625CrossRefGoogle Scholar
  59. Zhu X, Ghahramani Z, Lafferty J (2003a) Semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the twentieth international conference on machine learningGoogle Scholar
  60. Zhu X, Lafferty J, Ghahramani Z (2003b) Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In: Proceedings of the ICML-2003 workshop on the continuum from labeled to unlabeled dataGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2019

Authors and Affiliations

  1. 1.Harold and Inge Marcus Department of Industrial and Manufacturing EngineeringThe Pennsylvania State UniversityState CollegeUSA
  2. 2.Department of StatisticsThe Pennsylvania State UniversityState CollegeUSA
  3. 3.Department of Biomedical InformaticsArizona State UniversityTempeUSA

Personalised recommendations