In recent years, Support Vector Machines (SVMs) proposed by Cortes and Vapnik [80, 7] have become the state-of-the-art classifier for supervised classi- fication problems, and have demonstrated great successes in a broad range of tasks, including document categorization, character recognition, image classification, and many more. SVMs are famous for their strong generalization guarantees derived from the max-margin property, and for their ability to use very high dimensional feature spaces using the kernel trick. These are the characteristics that other classifiers do not have.
Despite many impressive successes, SVMs also have some significant limitations: They can assign only one label at a time, and their running time is polynomial in the number of classes. This means that SVMs can not jointly classify correlated instances in a systematic way, and can not take advantage of some precious information for problems with rich structures.
Clearly, we have two approaches that offer complementary strengths and weaknesses. The SVM approach can exploit very high dimensional feature spaces with strong generalization guarantees, but can only perform simple classifications of instances independently, whereas the graphical model approach can model correlations and dependencies among different instances in principled and efficient ways, but does not provide the same level of generalization ability as SVMs. So one natural question to ask is whether or not there is a way to unify both approaches and to get the best of them. The answer is yes. The new framework proposed by Tasker, Guestrin and Koller is called Max-Margin Markov Networks (M3-nets in short) . It is a major breakthrough in the machine learning field in recent years because it has enabled us to apply the SVM principles to a whole new set of problems.
In this chapter, we first provide an overview of SVMs, where the concepts of margin, kernel, generalization bound, etc. are introduced, and a SVM training algorithm, namely Sequential Minimal Optimization (SMO), is presented. In the second half of the chapter, we present the max-margin Markov network framework, which unifies all the ideas of the SVM and the graphical model approaches. We also compare the M3-net with other graphical models, and provide some intuitive insights into why the M3-net is superior to others.
KeywordsSupport Vector Machine True Label Markov Network Kernel Trick Hinge Loss
Unable to display preview. Download preview PDF.