Advertisement

Optimization Approaches to Semi-Supervised Learning

  • Ayhan Demiriz
  • Kristin P. Bennett
Part of the Applied Optimization book series (APOP, volume 50)

Abstract

We examine mathematical models for semi-supervised support vector machines (S3VM). Given a training set of labeled data and a working set of unlabeled data, S3VM constructs a support vector machine using both the training and working sets. We use S3VM to solve the transductive inference problem posed by Vapnik. In transduction, the task is to estimate the value of a classification function at the given points in the working set. This contrasts with inductive inference which estimates the classification function at all possible values. We propose a general S3VM model that minimizes both the misclassification error and the function capacity based on all the available data. Depending on how poorly-estimated unlabeled data are penalized, different mathematical models result. We examine several practical algorithms for solving these model. The first approach utilizes the S3VM model for 1-norm linear support vector machines converted to a mixed-integer program (MIP). A global solution of the MIP is found using a commercial integer programming solver. The second approach uses a nonconvex quadratic program. Variations of block-coordinate-descent algorithms are used to find local solutions of this problem. Using this MIP within a local learning algorithm produced the best results. Our experimental study on these statistical learning methods indicates that incorporating working data can improve generalization.

Keywords

Support Vector Machine Mixed Integer Program Unlabeled Data Misclassification Error Local Learning 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    C. G. Atkeson, A. W. Moore, and S. Schaal. Locally weighted learning. Artificial Intelligence Review, 11:11–73, 1997.CrossRefGoogle Scholar
  2. [2]
    K. P. Bennett. Global tree optimization: a non-greedy decision tree algorithm. Computing Science and Statistics, 26:156–160, 1994.Google Scholar
  3. [3]
    K. P. Bennett. Combining support vector and mathematical programming methods for classification. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods — Support Vector Machines, pages 307–326, Cambridge, MA, 1999. MIT Press.Google Scholar
  4. [4]
    K. P. Bennett and E. J. Bredensteiner. Geometry i learning. Web manuscript, Rensselaer Polytechnic Institute, http://www.rpi.edu/~bennek/geometry2.ps, 1996. Accepted for publication in Geometry at Work, C. Gorini et al, editors, MAA Press.Google Scholar
  5. [5]
    K. P. Bennett and A. Demiriz. Semi-supervised support vector machines. In D. Cohn M. Kearns, S. Solla, editor, Advances in Neural Information Processing Systems, pages 368–374, Cambridge, MA, 1999. MIT Press.Google Scholar
  6. [6]
    K. P. Bennett and O. L. Mangasarian. Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software, 1:23–34, 1992.CrossRefGoogle Scholar
  7. [7]
    K. P. Bennett and O. L. Mangasarian. Bilinear separation in n-space. Computational Optimization and Applications, 4(4):207–227, 1993.MathSciNetCrossRefGoogle Scholar
  8. [8]
    D. P. Bertsekas. Nonlinear Programming. Aethena Scientific, Cambridge, MA, 1996.Google Scholar
  9. [9]
    J. Blue. A hybrid of tabu search and local descent algorithms with applications in artificial intelligence. PhD thesis, Rensselaer Polytechnic Institute, Troy, NY, 1998.Google Scholar
  10. [10]
    A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the 1998 Conference on Computational Learning Theory, Madison WI, 1998. ACM Inc.Google Scholar
  11. [11]
    E. J. Bredensteiner and K. P. Bennett. Feature minimization within decision trees. Computational Optimization and Applications, 10:110–126, 1997.MathSciNetGoogle Scholar
  12. [12]
    V. Castelli and T. M. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16:105–111, 1995.CrossRefGoogle Scholar
  13. [13]
    Z. Cataltepe and M. Magdon-Ismail. Incorporating test inputs into learning. In Proceedings of the Advances in Neural Information Processing Systems, 10, Cambridge, MA, 1997. MIT Press.Google Scholar
  14. [14]
    C. Cortes and V. N. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995.zbMATHGoogle Scholar
  15. [15]
    CPLEX Optimization Incorporated, Incline Village, Nevada. Using the CPLEX Callable Library, 1994.Google Scholar
  16. [16]
    R. Fourer, D. Gay, and B. Kernighan. AMPL A Modeling Language for Mathematical Programming. Boyd and Frazer, Danvers, MA, 1993.Google Scholar
  17. [17]
    T. Hastie and R. Tibshirani. Discriminant adaptive nearest neighbor classification. IEEE PAMI, 18:607–616, 1996.CrossRefGoogle Scholar
  18. [18]
    T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning(ECML), 1998.Google Scholar
  19. [19]
    T. Joachims. Transductive inference for text classification using support vector machines. In International Conference on Machine Learning, 1999.Google Scholar
  20. [20]
    S. Lawrence, A. C. Tsoi, and A. D. Back. Function approximation with neural networks and local methods: Bias, variance and smoothness. In Peter Bartlett, Anthony Burkitt, and Robert Williamson, editors, Australian Conference on Neural Networks, ACNN 96, pages 16–21. Australian National University, 1996.Google Scholar
  21. [21]
    O. L. Mangasarian. Arbitrary norm separating plane. Operations Research Letters, 24(1–2), 1999.Google Scholar
  22. [22]
    O. L. Mangasarian. Generalized support vector machines. In A. Smola, P. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 135–146, Cambridge, MA, 2000. MIT Press. ftp://ftp.cs.wisc.edu/math-prog/tech-reports/98–14.ps.Google Scholar
  23. [23]
    A. McCallum and K. Nigam. Employing em and pool-based active learning for text classification. In Proceedings of the 15th International Conference on Machine Learning (ICML-98), 1998.Google Scholar
  24. [24]
    P.M. Murphy and D.W. Aha. UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, California, 1992.Google Scholar
  25. [25]
    D. R. Musser and A. Saini. STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library. Addison-Wesley, 1996.Google Scholar
  26. [26]
    K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Learning to classify text from labeled and unlabeled documents. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98), 1998.Google Scholar
  27. [27]
    S. Odewahn, E. Stockwell, R. Pennington, R Humphreys, and W Zumach. Automated star/galaxy discrimination with neural networks. Astronomical Journal, 103(1):318–331, 1992.CrossRefGoogle Scholar
  28. [28]
    V. N. Vapnik. Estimation of dependencies based on empirical Data. Springer, New York, 1982. English translation, Russian version 1979.Google Scholar
  29. [29]
    V. N. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, New York, 1995.zbMATHGoogle Scholar
  30. [30]
    V. N. Vapnik. Statistical Learning Theory. Wiley Inter-Science, 1998.zbMATHGoogle Scholar
  31. [31]
    V. N. Vapnik and A. Ja. Chervonenkis. Theory of Pattern Recognition. Nauka, Moscow, 1974. In Russian.zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2001

Authors and Affiliations

  • Ayhan Demiriz
    • 1
  • Kristin P. Bennett
    • 2
  1. 1.Department of Decision Sciences and Engineering SystemsRensselaer Polytechnic InstituteTroyUSA
  2. 2.Department of Mathematical SciencesRensselaer Polytechnic InstituteTroyUSA

Personalised recommendations