Abstract
We explore an algorithm for training SVMs with Kernels that can represent the learned rule using arbitrary basis vectors, not just the support vectors (SVs) from the training set. This results in two benefits. First, the added flexibility makes it possible to find sparser solutions of good quality, substantially speeding-up prediction. Second, the improved sparsity can also make training of Kernel SVMs more efficient, especially for high-dimensional and sparse data (e.g. text classification). This has the potential to make training of Kernel SVMs tractable for large training sets, where conventional methods scale quadratically due to the linear growth of the number of SVs. In addition to a theoretical analysis of the algorithm, we also present an empirical evaluation.
Article PDF
Similar content being viewed by others
References
Bach, F., & Jordan, M. (2005). Predictive low-rank decomposition for kernel methods. In ICML (pp. 33–40).
Bordes, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. JMLR, 6, 1579–1619.
Burges, C. (1996). Simplified support vector decision rules. In ICML (pp. 71–77).
Burges, C., & Schölkopf, B. (1997). Improving the accuracy and speed of support vector learning machines. NIPS, 9, 375–381.
Fine, S., & Scheinberg, K. (2001). Efficient SVM training using low-rank kernel representations. JMLR, 2, 243–264.
Joachims, T. (1999). Making large-scale SVM learning practical. In Schölkopf, B., Burges, C., Smola, A. (Eds.), Advances in kernel methods—support vector learning (pp. 169–184). Cambridge: MIT Press.
Joachims, T. (2006). Training linear SVMs in linear time. In SIGKDD (pp. 217–226).
Joachims, T., Finley, T., & Yu, C. N. (2009). Cutting-plane training of structural SVMs. Machine Learning, 76(1).
Keerthi, S., Chapelle, O., & DeCoste, D. (2006). Building support vector machines with reduced classifier complexity. JMLR, 7, 1493–1515.
Platt, J. (1999). Using analytic QP and sparseness to speed training of support vector machines. In NIPS (pp. 557–563).
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Smola, A., & Schölkopf, B. (2000). Sparse greedy matrix approximation for machine learning. In ICML (pp. 911–918).
Steinwart, I. (2003). Sparseness of support vector machines. JMLR, 4, 1071–1105.
Teo, C. H., Smola, A., Vishwanathan, S. V., & Le, Q. V. (2007). A scalable modular convex solver for regularized risk minimization. In SIGKDD (pp. 727–736).
Tsang, I., Kwok, J., & Cheung, P. M. (2005). Core vector machines: Fast SVM training on very large data sets. JMLR, 6, 363–392.
Tsang, I. W., Kocsor, A., & Kwok, J. T. (2007). Simpler core vector machines with enclosing balls. In ICML (pp. 911–918).
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. JMLR, 6, 1453–1484.
Vincent, P., & Bengio, Y. (2002). Kernel matching pursuit. Machine Learning, 48(1–3), 165–187.
Williams, C., & Seeger, M. (2001). Using the Nystrom method to speed up kernel machines. In NIPS.
Wu, M., Schölkopf, B., & Bakir, G. H. (2006). A direct method for building sparse kernel learning algorithms. JMLR, 7, 603–624.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Aleksander Kołcz, Dunja Mladenić, Wray Buntine, Marko Grobelnik, and John Shawe-Taylor.
Rights and permissions
About this article
Cite this article
Joachims, T., Yu, CN.J. Sparse kernel SVMs via cutting-plane training. Mach Learn 76, 179–193 (2009). https://doi.org/10.1007/s10994-009-5126-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5126-6