Journal of Intelligent Information Systems

, Volume 37, Issue 2, pp 155–186 | Cite as

Operators for transforming kernels into quasi-local kernels that improve SVM accuracy

Article

Abstract

Motivated by the crucial role that locality plays in various learning approaches, we present, in the framework of kernel machines for classification, a novel family of operators on kernels able to integrate local information into any kernel obtaining quasi-local kernels. The quasi-local kernels maintain the possibly global properties of the input kernel and they increase the kernel value as the points get closer in the feature space of the input kernel, mixing the effect of the input kernel with a kernel which is local in the feature space of the input one. If applied on a local kernel the operators introduce an additional level of locality equivalent to use a local kernel with non-stationary kernel width. The operators accept two parameters that regulate the width of the exponential influence of points in the locality-dependent component and the balancing between the feature-space local component and the input kernel. We address the choice of these parameters with a data-dependent strategy. Experiments carried out with SVM applying the operators on traditional kernel functions on a total of 43 datasets with different characteristics and application domains, achieve very good results supported by statistical significance.

Keywords

SVM Locality Kernel methods Operators on kernels Local SVM 

References

  1. Baudat, G., & Anouar, F. (2000). Generalized discriminant analysis using a kernel approach. Neural Computation, 12(10), 2385–2404.CrossRefGoogle Scholar
  2. Bengio, Y., Delalleau, O., & Le Roux, N. (2005). The curse of dimensionality for local kernel machines. Tech. rep. 1258, Departement d’informatique et recherche operationnelle, Universite de Montreal.Google Scholar
  3. Bengio, Y., Delalleau, O., & Le Roux, N. (2006). The curse of highly variable functions for local kernel machines. Advances in Neural Information Processing Systems, 18, 107–114.Google Scholar
  4. Blanzieri, E., & Bryl, A. (2007). Evaluation of the highest probability SVM nearest neighbor classifier with variable relative error cost. In CEAS 2007. Mountain View, California.Google Scholar
  5. Blanzieri, E., & Melgani, F. (2006). An adaptive SVM nearest neighbor classifier for remotely sensed imagery. In IEEE international conference on geoscience and remote sensing symposium (IGARSS-2006) (pp. 3931–3934).Google Scholar
  6. Blanzieri, E., & Melgani, F. (2008). Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Transactions on Geoscience and Remote Sensing, 46(6), 1804–1811.CrossRefGoogle Scholar
  7. Bottou, L., & Vapnik, V. (1992). Local learning algorithms. Neural Computation, 4(6), 888–900.CrossRefGoogle Scholar
  8. Chang, C. C., & Lin, C. J. (2001). LIBSVM: A library for support vector machines. http://www.csie.ntu.edu.tw/~cjlin/libsvm.
  9. Chang, Q., Chen, Q., & Wang, X. (2005). Scaling gaussian rbf kernel width to improve svm classification. In International conference on neural networks and brain, 2005, ICNN&B ’05 (Vol. 1, pp. 19–22).Google Scholar
  10. Chen, H. T., Chang, H. W., & Liu, T. L. (2005). Local discriminant embedding and its variants. In IEEE Computer Society conference on computer vision and pattern recognition, 2005, CVPR 2005 (Vol. 2, pp. 846–853).Google Scholar
  11. Chen, Y., Bi, J., & Wanf, J. (2006). MILES: Multiple-instance learning via embedded instance selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 1931–1947.CrossRefGoogle Scholar
  12. Cheng, H., Tan, P., & Jin, R. (2007). Localized support vector machine and its efficient algorithm. In Proc SIAM intl. conf data mining.Google Scholar
  13. Choi, H., & Choi, S. (2007). Robust kernel isomap. Pattern Recognition, 40(3), 853–862.MATHCrossRefGoogle Scholar
  14. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.MATHGoogle Scholar
  15. Cristianini, N., & Shawe-Taylor, J. (1999). An introduction to support vector machines: And other kernel-based learning methods. New York: Cambridge University Press.Google Scholar
  16. Dasarathy, B. V. (1990). Nearest neighbor (NN) norms: NN pattern classification techniques. Los Alamitos: IEEE Computer Society Press.Google Scholar
  17. DeCoste, D. (2001). Visualizing Mercel kernel feature spaces via kernelized locally linear embedding. In Proceedings of the eighth international conference on neural information processing (ICONIP-01).Google Scholar
  18. Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.MathSciNetGoogle Scholar
  19. De Silva, V., & Tenenbaum, J. B. (2003). Global versus local methods in nonlinear dimensionality reduction. Advances in Neural Information Processing Systems, 15, 705–712.Google Scholar
  20. Duarte, M., & Hen Hu, Y. (2004). Vehicle classification in distributed sensor networks. Journal of Parallel and Distributed Computing, 64(7), 826–838.CrossRefGoogle Scholar
  21. Fisher, R. (1936). The use of multiple measurements in taxonomic problems. Ann Eugen, 7(2), 179–188.CrossRefGoogle Scholar
  22. Fu, Y., Yang, Q., Sun, R., Li, D., Zeng, R., Ling, C., et al. (2004). Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry. Bioinformatics, 20(12), 1948–1954.CrossRefGoogle Scholar
  23. Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286(5439), 531.CrossRefGoogle Scholar
  24. He, X., Yan, S., Hu, Y., & Zhang, H. J. (2003). Learning a locality preserving subspace for visual recognition. In Proceedings of the ninth IEEE international conference on computer vision, 2003 (Vol.1, pp. 385–392).Google Scholar
  25. Ho, T., & Kleinberg, E. (1996). Building projectable classifiers of arbitrary complexity. In Proc of the 13th international conference on pattern recognition (ICPR-96) (Vol. 2, p. 880).Google Scholar
  26. Hsu, C., Chang, C., Lin, C., et al. (2003). A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University.Google Scholar
  27. Hsu, C. W., & Lin, C. J. (2002). A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, 13(2), 415–425.CrossRefGoogle Scholar
  28. Kim, T. K., & Kittler, J. (2005). Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3), 318–327.CrossRefGoogle Scholar
  29. King, R., Feng, C., & Sutherland, A. (1995). Statlog: Comparison of classification algorithms on large real-world problems. Applications of Artificial Intelligence, 9(3), 289–333.CrossRefGoogle Scholar
  30. Knerr, S., Personnaz, L., & Dreyfus, G. (1990). Single-layer learning revisited: A stepwise procedure for building and training a neural network. Optimization Methods & Software, 1, 23–34.Google Scholar
  31. Kressel, U. (1999). Pairwise classification and support vector machines. In Advances in kernel methods: Support vector learning (pp. 255–268).Google Scholar
  32. Lang, K. (1995). Newsweeder: Learning to filter netnews. In Proc. of the 12th international machine learning conference.Google Scholar
  33. Lewis, D., Jebara, T., & Noble, W. (2006). Nonstationary kernel combination. In Proceedings of the 23rd international conference on machine learning (pp. 553–560). New York: ACM.CrossRefGoogle Scholar
  34. Lin, H., & Lin, C. (2003a). A study on sigmoid kernels for svm and the training of non-psd kernels by smo-type methods. Tech. rep., National Taiwan University.Google Scholar
  35. Lin, K. M., & Lin, C. J. (2003b). A study on reduced support vector machines. IEEE Transactions on Neural Networks, 14(6), 1449–1459.CrossRefGoogle Scholar
  36. Micchelli, C., Xu, Y., & Zhang, H. (2006). Universal kernels. Journal of Machine Learning Research, 7, 2651–2667.MathSciNetGoogle Scholar
  37. Mika, S., Ratsch, G., Weston, J., Scholkopf, B., & Mullers, K. R. (1999). Fisher discriminant analysis with kernels. In Neural networks for signal processing IX, 1999. Proc. of the 1999 IEEE signal processing society workshop (pp. 41–48).Google Scholar
  38. Neumann, J., Schnorr, C., & Steidl, G. (2005). Combined SVM-based feature selection and classification. Machine Learning, 61(1), 129–150.MATHCrossRefGoogle Scholar
  39. Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization (pp. 185–208). Cambridge: MIT.Google Scholar
  40. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRefGoogle Scholar
  41. Schölkopf, B. (1997). Support vector learning. R. Oldenbourg Verlag.Google Scholar
  42. Schölkopf, B. (2001). The kernel trick for distances. Advances in Neural Information Processing Systems, 13, 301–307.Google Scholar
  43. Scholkopf, B., Simard, P., Smola, A., & Vapnik, V. (1998). Prior knowledge in support vector kernels. Advances in Neural Information Processing Systems, 10, 640–646.Google Scholar
  44. Schölkopf, B., & Smola, A. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT.Google Scholar
  45. Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.CrossRefGoogle Scholar
  46. Segata, N., & Blanzieri, E. (2009a). Empirical assessment of classification accuracy of local SVM. In The 18th annual Belgian–Dutch conference on machine learning (Benelearn 2009) (pp. 47–55).Google Scholar
  47. Segata, N., & Blanzieri, E. (2009b). Fast local support vector machines for large datasets. In Int conf on machine learning and data mining MLDM 2009. Lecture notes in computer science (Vol. 5632, pp. 295–310). Springer.Google Scholar
  48. Segata, N., & Blanzieri, E. (2010). Fast and scalable local kernel machines. Journal of Machine Learning Research, 11, 1883–1926.MathSciNetGoogle Scholar
  49. Segata, N., Blanzieri, B., & Cunningham, P. (2009a). A scalable noise reduction technique for large case-based systems. In Case-based reasoning research and development, 8th international conference on case-based reasoning, ICCBR 2009, Seattle. Lecture notes in computer science (Vol. 5650, pp. 328–342).Google Scholar
  50. Segata, N., Blanzieri, B., Delany, S., & Cunningham, P. (2009b). Noise reduction for instance-based learning with a local maximal margin approach. Journal of Intelligent Information Systems, 35(2), 301–331.CrossRefGoogle Scholar
  51. Smits, G., & Jordaan, E. (2002). Improved SVM regression using mixtures of kernels. In Proc of the 2002 International Joint Conference on Neural Networks (IJCNN’02) 3.Google Scholar
  52. Steinwart, I. (2002a). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2, 67–93.MathSciNetMATHGoogle Scholar
  53. Steinwart, I. (2002b). Support vector machines are universally consistent. Journal of Complexity, 18(3), 768–791.MathSciNetMATHCrossRefGoogle Scholar
  54. Steinwart, I. (2005). Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory, 51(1), 128–142.MathSciNetCrossRefGoogle Scholar
  55. Sugiyama, M. (2006). Local fisher discriminant analysis for supervised dimensionality reduction. In ICML ’06: Proceedings of the 23rd international conference on Machine learning (pp. 905–912). New York: ACM.CrossRefGoogle Scholar
  56. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.CrossRefGoogle Scholar
  57. Vapnik, V. (1991). Principles of risk minimization for learning theory. In NIPS (pp. 831–838).Google Scholar
  58. Vapnik, V. (2000). The nature of statistical learning theory. Springer.Google Scholar
  59. Vapnik, V., & Bottou, L. (1993). Local algorithms for pattern recognition and dependencies estimation. Neural Computation, 5(6), 893–909.CrossRefGoogle Scholar
  60. Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., & Vapnik, V. (2001). Feature selection for SVMs. Advances in Neural Information Processing Systems, 13, 668–674.Google Scholar
  61. Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1(6), 80–83.CrossRefGoogle Scholar
  62. Wu, S., & Amari, S. (2002). Conformal transformation of kernel functions: A data-dependent way to improve support vector machine classifiers. Neural Processing Letters, 15(1), 59–67.MATHCrossRefGoogle Scholar
  63. Xiong, H., Zhang, Y., & Chen, X. (2007). Data-dependent kernel machines for microarray data classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 4(583–595), 1.Google Scholar
  64. Zhang, H., Berg, A., Maire, M., & Malik, J. (2006). SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In Proc of the IEEE conference on computer vision and pattern recognition (CVPR 2006) 2.Google Scholar
  65. Zhu, J., Rosset, S., Hastie, T., & Tibshirani, R. (2004). 1-norm support vector machines. Advances in Neural Information Processing Systems, 16, 49–56.Google Scholar
  66. Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T., & Müller, K. (2000). Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics, 16(9), 799–807.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Dipartimento di Ingegneria e Scienza dell’InformazioneUniversity of TrentoTrentoItaly
  2. 2.Biostatistics Department, Harvard School of Public HealthHarvard UniversityBostonUSA

Personalised recommendations