Basics of Machine Learning by Support Vector Machines
Here, we talk about the (machine) learning from empirical data (i.e., examples, samples, measurements, records, patterns or observations) by applying support vector machines (SVMs) a.k.a. kernel machines1. The basic aim of this chapter is to give, as far as possible, a condensed (but systematic) presentation of a novel learning paradigm embodied in SVMs. Our focus Will be on the constructive learning algorithms for both the classification (pattern recognition) and regression (function approximation) problems. Consequently, we will not go into all the subtleties and details of the statistical learning theory (SLT) and structural risk minimization (SRM) which are theoretical foundations for the learning algorithms presented below. This seems more appropriate for the application oriented readers. The theoretically minded and interested reader may find an extensive presentation of both the SLT and SRM in [4,15,23,31,33]. Instead of diving into a theory, a quadratic programming based learning leading to parsimonious SVMs will be presented in a gentle way - starting with linear separable problems, through the classification tasks having overlapped classes but still a linear separation boundary, beyond the linearity assumptions to the nonlinear separation boundary, and finally to the linear and nonlinear regression problems. Here, the adjective “parsimonious” denotes a SVM with a small number of support vectors (“hidden layer neurons”). The scarcity of the model results from a sophisticated, QP based, learning that matches the model capacity to the data complexity ensuring a good generalization, i.e., a good performance of SVM on the future, previously, during the training unseen, data.
KeywordsSupport Vector Machine Training Data Support Vector Feature Space Input Space
Unable to display preview. Download preview PDF.