Semi-supervised Smoothing for Large Data Problems

  • Mark Vere CulpEmail author
  • Kenneth Joseph Ryan
  • George Michailidis
Part of the Springer Handbooks of Computational Statistics book series (SHCS)


This book chapter is a description of some recent developments in non-parametric semi-supervised regression and is intended for someone with a background in statistics, computer science, or data sciences who is familiar with local kernel smoothing (Hastie et al., The elements of statistical learning (data mining, inference and prediction), chapter 6. Springer, Berlin, 2009). In many applications, response data often require substantially more effort to obtain than feature data. Semi-supervised learning approaches are designed to explicitly train a classifier or regressor using all the available responses and the full feature data. This presentation is focused on local kernel regression methods in semi-supervised learning and provides a good starting point for understanding semi-supervised methods in general.


Computational statistics Machine learning Non-parametric regression 



NSF CAREER/DMS-1255045 grant supported the work of Mark Vere Culp. The opinions and views expressed in this chapter are those of the authors and do not reflect the opinions or views at the NSF.


  1. Abney S (2004) Understanding the Yarowsky algorithm. Comput Linguist 30(3):365–395MathSciNetCrossRefGoogle Scholar
  2. Abney S (2008) Semisupervised learning for computational linguistics. Chapman and Hall, CRC, Boca RatonGoogle Scholar
  3. Belkin M, Matveeva I, Niyogi P (2004) Regularization and semi-supervised learning on large graphs. In: COLT, pp 624–638Google Scholar
  4. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434Google Scholar
  5. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Computational learning theory, pp 92–100Google Scholar
  6. Bredel M, Jacoby E (2004) Chemogenomics: an emerging strategy for rapid target and drug discovery. Nat Rev Genet 5(4):262–275CrossRefGoogle Scholar
  7. Chapelle O, Schölkopf B, Zien A (2006) Semi-supervised learning. MIT Press, Cambridge.
  8. Chapelle O, Sindhwani V, Keerthi S (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Res 9:203–233Google Scholar
  9. Culp M, Michailidis G (2008) An iterative algorithm for extending learners to a semi-supervised setting. J Comput Graph Stat 17(3):545–571MathSciNetCrossRefGoogle Scholar
  10. Culp M, Ryan K (2013) Joint harmonic functions and their supervised connections. J Mach Learn Res 14:3721–3752Google Scholar
  11. Gong C, Liu T, Tao D, Fu K, Tu E, Yang J (2015) Deformed graph Laplacian for semisupervised learning. IEEE Trans Neural Nets Learn Syst 26:2261–2274MathSciNetCrossRefGoogle Scholar
  12. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning (data mining, inference and prediction). Springer, BerlinGoogle Scholar
  13. Jebara T, Wang J, Chang S (2009) Graph construction and b-matching for semi-supervised learning. In: International conference of machine learningGoogle Scholar
  14. Koprinska I, Poon J, Clark J, Chan J (2007) Learning to classify e-mail. Inf Sci 177(10):2167–2187CrossRefGoogle Scholar
  15. Lafferty J, Wasserman L (2007) Statistical analysis of semi-supervised regression. In: Advances in NIPS. MIT Press, Cambridge, pp 801–808Google Scholar
  16. Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: International conference of machine learningGoogle Scholar
  17. Lundblad R (2004) Chemical reagents for protein modification. CRC Press, Boca RatonGoogle Scholar
  18. McCallum A, Nigam K, Rennie J, Seymore K (2000) Automating the construction of internet portals with machine learning. Inf Retr J 3:127–163Google Scholar
  19. Shilang S (2013) A survey of multi-view machine learning. Neural Comput Appl 7–8(28):2013–2038Google Scholar
  20. Wang J, Shen X (2007) Large margin semi-supervised learning. J Mach Learn Res 8:1867–1897Google Scholar
  21. Wang J, Jebara T, Chang S (2013) Semi-supervised learning using greedy max-cut. J Mach Learn Res 14:771–800Google Scholar
  22. Yamanishi Y, Vert J, Kanehisa M (2004) Protein network inference from multiple genomic data: a supervised approach. Bioinformatics 20:363–370CrossRefGoogle Scholar
  23. Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. In: Advances in neural information processing systems 16Google Scholar
  24. Zhu X (2008) Semi-supervised learning literature survey. Technical report, Computer Sciences, University of Wisconsin-MadisonGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Mark Vere Culp
    • 1
    Email author
  • Kenneth Joseph Ryan
    • 1
  • George Michailidis
    • 2
  1. 1.West Virginia UniversityDepartment of StatisticsMorgantownUSA
  2. 2.University of FloridaDepartment of StatisticsGainesvilleUSA

Personalised recommendations