Prediction with the SVM Using Test Point Margins
Support vector machines (SVMs) carry out binary classification by constructing a maximal margin hyperplane between the two classes of observed (training) examples and then classifying test points according to the half-spaces in which they reside (irrespective of the distances that may exist between the test examples and the hyperplane). Cross-validation involves finding the one SVM model together with its optimal parameters that minimizes the training error and has good generalization in the future. In contrast, in this chapter we collect all of the models found in the model selection phase and make predictions according to the model whose hyperplane achieves the maximum separation from a test point. This directly corresponds to the L ∞ norm for choosing SVM models at the testing stage. Furthermore, we also investigate other more general techniques corresponding to different L p norms and show how these methods allow us to avoid the complex and timeconsuming paradigm of cross-validation. Experimental results demonstrate this advantage, showing significant decreases in computational time as well as competitive generalization error.
KeywordsSupport Vector Machine Test Point Support Vector Machine Model Generalization Error Covariance Matrix Adaptation Evolution Strategy
Unable to display preview. Download preview PDF.
- 1.R. Berk. An introduction to ensemble methods for data analysis. In eScholarship Repository, University of California. http://repositories.cdlib.org/uclastat/papers/2004072501, 2004.
- 2.B.E. Boser, I.M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In In Fifth Annual Workshop on Computational Learning Theory , ACM., pages 144–152, Pittsburgh, 1992. ACM.Google Scholar
- 3.L. Breiman. Stacked regressions. Machine Learning, 24(1):49–64, 1996.Google Scholar
- 6.N. Cristianini and J. Shawe-Taylor. An introduction to Support Vector Machines. Cambridge University Press, Cambridge, UK, 2000.Google Scholar
- 7.W. Frawley, G. Piatetsky-Shapiro, and C. Matheus. Knowledge discovery in databases: An overview. In AI Magazine, pages 213–228, 1992.Google Scholar
- 8.Y. Freund and R. Schapire. Experiments with a new boosting algorithm. In In Proceedings of the Thirteenth International Conference on Machine Learning, pages 148–156, Bari, Italy, 1996.Google Scholar
- 10.D. Hand, H. Mannila, and P. Smyth. An introduction to Support Vector Machines. MIT Press, Cambridge, MA, 2001.Google Scholar
- 11.S.S. Keerthi. Efficient tuning of svm hyperparameters using radius/margin bound and iterative algorithms. IEEE Transactions on Neural Networks, 13:1225-1229, 2002.Google Scholar
- 12.S.S. Keerthi, V. Sinsdhwani, and O. Chapelle. An efficient method for gradient-based adaptation of hyperparameters in svm models. In In Schölkopf, B.; Platt, J.C.; Hoffman, T. (ed.): Advances in Neural Informations Processing Systems 19. MIT Press, 2007.Google Scholar
- 13.S.B. Kotsiantis. Supervised machine learning: A review of classification techniques. Informatica, 249, 2007.Google Scholar
- 14.J. Langford and J. Shawe-Taylor. PAC bayes and margins. In Advances in Neural Information Processing Systems 15, Cambridge, MA, 2003. MIT Press.Google Scholar
- 16.D. Opitz. Popular ensemble methods: An empircal study. Journal of Artificial Intelligence Research, 11, 1999.Google Scholar
- 17.S. Özöğür, J. Shawe-Taylor, G.-W. Weber, and Z.B. Ögel. Pattern analysis for the prediction of fungal pro-peptide cleavage sites. article in press in special issue of Discrete Applied Mathematics on Networks in Computational Biology, doi:10.1016/j.dam.2008.06.043, 2007.Google Scholar
- 18.M. Perrone. Improving Regression Estimation: Averaging Methods for Variance Reduction with Extension to General Convex Measure Optimization. Ph.D. thesis, Brown University, Providence, RI, 1993.Google Scholar
- 19.R. Schapire, Y. Freund, P. Bartlett, and W. Lee. A new explanation for the effectiveness of voting methods. In In Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, TN, 1997.Google Scholar
- 21.V. N. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998.Google Scholar