Advertisement

Kernel Methods

  • Francesco CamastraEmail author
  • Alessandro Vinciarelli
Chapter
Part of the Advanced Information and Knowledge Processing book series (AI&KP)

Abstract

Notions of calculus. What the reader should know to understand this chapter \(\bullet \) Notions of calculus. \(\bullet \) Chapters  5 6, and  7. \(\bullet \) Although the reading of Appendix D is not mandatory, it represents an advantage for the chapter understanding.

References

  1. 1.
    M. Aizerman, E. Braverman, and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Controld, 25:821–837, 1964.Google Scholar
  2. 2.
    F. R. Bach and M. I. Jordan. Learning spectral clustering. Technical report, EECS Department, University of California, 2003.Google Scholar
  3. 3.
    A. Barla, E. Franceschi, F. Odone, and F. Verri. Image kernels. In Proceedings of SVM2002, pages 83–96, 2002.Google Scholar
  4. 4.
    A. Ben Hur, D. Horn, H.T. Siegelmann, and V. Vapnik. A support vector method for clustering. In Advances in Neural Information and Processing Systems, volume 12, pages 125–137, 2000.Google Scholar
  5. 5.
    A. Ben-Hur, D. Horn, H.T. Siegelmann, and V. Vapnik. Support vector clustering. Journal of Machine Learning Research, 2(2):125–137, 2001.Google Scholar
  6. 6.
    Y. Bengio, O. Dellaleau, N. Le Roux, J.F. Paiement, Vincent. P., and M. Ouimet. Learning eigenfunction links spectral embedding and kernel pca. Neural Computation, 16(10):2197–2219, 2004.Google Scholar
  7. 7.
    Y. Bengio, Vincent. P., and J.F. Paiement. Spectral clustering and kernel pca are learning eigenfunctions. Technical report, CIRANO, 2003.Google Scholar
  8. 8.
    C. Berg, J.P.R. Christensen, and P. Ressel. Harmonic analysis on semigroups. Springer-Verlag, 1984.Google Scholar
  9. 9.
    C.M. Bishop. Neural Networks for Pattern Recognition. Cambridge University Press, 1995.Google Scholar
  10. 10.
    M. Brand and K. Huang. A unifying theorem for spectral embedding and clustering. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003.Google Scholar
  11. 11.
    L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7:200–217, 1967.Google Scholar
  12. 12.
    F. Camastra and A. Verri. A novel kernel method for clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5):801–805, 2005.Google Scholar
  13. 13.
    N. Cancedda, E. Gaussier, C. Goutte, and J.-M. Renders. Word-sequence kernels. Journal of Machine Learning Research, 3(1):1059–1082, 2003.Google Scholar
  14. 14.
    S. Canu, Y. Grandvalet, V. Guigue, and A. Rakotomamonjy. SVM and kernel methods Matlab toolbox. Technical report, Perception Systemes et Information, INSA de Rouen, 2005.Google Scholar
  15. 15.
    Y. Censor. Row-action methods for huge and sparse systems and their applications. SIAM Reviews, 23(4):444–467, 1981.Google Scholar
  16. 16.
    Y. Censor and A. Lent. An iterative row-action method for interval convex programming. Journal of Optimization Theory and Application, 34(3):321–353, 1981.Google Scholar
  17. 17.
    P.K. Chan, M. Schlag, and J.Y. Zien. Spectral k-way radio-cut partitioning and clustering. In Proceedings of the 1993 International Symposium on Research on Integrated Systems, pages 123–142. MIT Press, 1993.Google Scholar
  18. 18.
    J.H. Chiang. A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Transactions on Fuzzy Systems, 11(4):518–527, 2003.Google Scholar
  19. 19.
    F.R.K. Chung. Spectral Graph Theory. American Mathematical Society, 1997.Google Scholar
  20. 20.
    R. Collobert and S. Bengio. SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 1(2):143–160, 2001.Google Scholar
  21. 21.
    R. Collobert, S. Bengio, and J. Mariethoz. Torch: a modular machine learning software library. Technical report, IDIAP, 2002.Google Scholar
  22. 22.
    C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20(3):1–25, 1995.Google Scholar
  23. 23.
    N. Cressie. Statistics for Spatial Data. John Wiley, 1993.Google Scholar
  24. 24.
    N. Cristianini, J.S. Taylor, and J. S. Kandola. Spectral kernel methods for clustering. In Advances in Neural Information Processing Systems 14, pages 649–655. MIT Press, 2001.Google Scholar
  25. 25.
    A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal Royal Statistical Society, 39(1):1–38, 1977.Google Scholar
  26. 26.
    I.S. Dhillon, Y. Guan, and B. Kullis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the \(10^{th}\) ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 551–556. ACM Press, 2004.Google Scholar
  27. 27.
    I.S. Dhillon, Y. Guan, and B. Kullis. A unified view of kernel k-means, spectral clustering and graph partitioning. Technical report, UTCS, 2005.Google Scholar
  28. 28.
    I.S. Dhillon, Y. Guan, and B. Kullis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(11):1944–1957, 2007.Google Scholar
  29. 29.
    R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley, 2001.Google Scholar
  30. 30.
    T. Evgeniou, M. Pontil, and T. Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 13(1):1–50, 2001.Google Scholar
  31. 31.
    P.-H. Fan, R.-E. andChen and C.-J. Lin. Working set selection using the second order information for training SVM. Journal of Machine Learning Research, 6:1889–1918, 2005.Google Scholar
  32. 32.
    P. Fermat. Methodus ad disquirendam maximam et minimam. In Oeuvres de Fermat. MIT Press, 1891 (First Edition 1679).Google Scholar
  33. 33.
    M. Ferris and T. Munson. Interior point method for massive support vector machines. Technical report, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 2000.Google Scholar
  34. 34.
    M. Ferris and T. Munson. Semi-smooth support vector machines. Technical report, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, 2000.Google Scholar
  35. 35.
    M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Math. J., 23(98):298–305, 1973.MathSciNetGoogle Scholar
  36. 36.
    M. Filippone, F. Camastra, F. Masulli, and S. Rovetta. A survey of spectral and kernel methods for clustering. Pattern Recognition, 41(1):176–190, 2008.Google Scholar
  37. 37.
    I. Fischer and I. Poland. New methods for spectral clustering. Technical report, IDSIA, 2004.Google Scholar
  38. 38.
    R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188, 1936.Google Scholar
  39. 39.
    J. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84(405):165–175, 1989.Google Scholar
  40. 40.
    T.T. Friess, N. Cristianini, and C. Campbell. The kernel adatron algorithm: a fast and simple learning procedure for support vector machines. In Proceedings of \(15^{th}\) International Conference on Machine Learning, pages 188–196. Morgan Kaufman Publishers, 1998.Google Scholar
  41. 41.
    K. Fukunaga. An Introduction to Statistical Pattern Recognition. Academic Press, 1990.Google Scholar
  42. 42.
    T. Gärtner, J.W. Lloyd, and P.A. Flach. Kernels and distances for structured data. Machine Learning, 57(3):205–232, 2004.Google Scholar
  43. 43.
    M. Girolami. Mercer kernel based clustering in feature space. IEEE Transactions on Neural Networks, 13(3):780–784, 2002.Google Scholar
  44. 44.
    F. Girosi, M. Jones, and T. Poggio. Regularization theory and neural network architectures. Neural Computation, 7(2):219–269, 1995.Google Scholar
  45. 45.
    G.H. Golub and C.F.V. Loan. Matrix computation. The Johns Hopkins University Press, 1996.Google Scholar
  46. 46.
    T. Graepel and K. Obermayer. Fuzzy topographic kernel clustering. In Proceedings of the Fifth GI Workshop Fuzzy Neuro Systems’98, pages 90–97, 1998.Google Scholar
  47. 47.
    J. Hadamard. Sur les problemes aux derivees partielles et leur signification physique. Bull. Univ. Princeton, 13:49–52, 1902.Google Scholar
  48. 48.
    T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.Google Scholar
  49. 49.
    R. Herbrich. Learning Kernel Classifiers: Theory and Algorithms. MIT Press, 2004.Google Scholar
  50. 50.
    R. Inokuchi and S. Miyamoto. LVQ clustering and SOM using a kernel function. In Proceedings of IEEE International Conference on Fuzzy Systems, pages 367–373, 2004.Google Scholar
  51. 51.
    T. Joachims. Making large-scale SVM learning practical. In Advances in Kernel Methods, pages 169–184. MIT Press, 1999.Google Scholar
  52. 52.
    T. Joachims, N. Cristianini, and J. Shawe-Taylor. Composite kernels for hypertext classification. In Proceedings of the \(18^{th}\) International Conference on Machine Learning, pages 250–257. IEEE Press, 2001.Google Scholar
  53. 53.
    R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. In Proceedings of the 41\(^{st}\) Annual Symposium on the Foundation of Computer Science, pages 367–380. IEEE Press, 2000.Google Scholar
  54. 54.
    A. Karatzoglou, A. Smola, K. Hornik, and A. Zeleis. kernlab- an s4 package for kernel methods in r. Journal of Statistical Software, 11(9):1–20, 2004.Google Scholar
  55. 55.
    S. Keerthi, S. Shevde, C. Bhattacharyya, and K. Murthy. Improvements to platt’s smo algorithm for SVM classifier design. Technical report, Department of CSA, Bangalore, India,, 1999.Google Scholar
  56. 56.
    S. Keerthi, S. Shevde, C. Bhattacharyya, and K. Murthy. A fast iterative nearest point algorithm for support vector machine design. IEEE Transaction on Neural Networks, 11(1):124–136, 2000.Google Scholar
  57. 57.
    B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(1):291–307, 1970.Google Scholar
  58. 58.
    G.A. Korn and T.M. Korn. Mathematical Handbook for Scientists and Engineers. Mc Graw-Hill, 1968.Google Scholar
  59. 59.
    R. Krishnapuram and J.M. Keller. A possibilistic approach to clustering. IEEE Transactions on Fuzzy Sets, 1(2):98–110, 1993.Google Scholar
  60. 60.
    R. Krishnapuram and J.M. Keller. The possibilistic c-means algorithms: insight and recommandations. IEEE Transactions on Fuzzy Sets, 4(3):385–393, 1996.Google Scholar
  61. 61.
    H.W. Kuhn and A.W. Tucker. Nonlinear programming. In Proceedings of \(2^{nd}\) Berkeley Symposium on Mathematical Statistics and Probabilistics, pages 367–380. University of California Press, 1951.Google Scholar
  62. 62.
    J.-L. Lagrange. Mecanique analytique. Chez La Veuve Desaint Libraire, 1788.<!– Missing/Wrong Year –>Google Scholar
  63. 63.
    D. Lee. An improved cluster labeling method for support vector clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(3):461–464, 2005.Google Scholar
  64. 64.
    C. Leslie, E. Eskin, A. Cohen, J. Weston, and A. Noble. Mismatch string kernels for discriminative protein classification. Bioinformatics, 20(4):467–476, 2004.Google Scholar
  65. 65.
    D. Lueberger. Linear and Nonlinear Programming. Addison-Wesley, 1984.Google Scholar
  66. 66.
    D. Macdonald and C. Fyfe. The kernel self-organizing map. In Fourth International Conference on Knowledge-based Intelligent Engineering Systems and Allied Technologies, pages 317–320, 2000.Google Scholar
  67. 67.
    D.J.C. MacKay. A practical bayesian framework for backpropagation networks. Neural Computation, 4(3):448–472, 1992.Google Scholar
  68. 68.
    O.L. Mangasarian. Linear and non-linear separation of patterns by linear programming. Operations Research, 13(3):444–452, 1965.Google Scholar
  69. 69.
    O.L. Mangasarian and D. Musicant. Lagrangian support vector regression. Technical report, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin, June 2000.Google Scholar
  70. 70.
    G. Matheron. Principles of geostatistics. Economic Geology, 58:1246–1266, 1963.Google Scholar
  71. 71.
    M. Meila and J. Shi. Spectral methods for clustering. In Advances in Neural Information Processing Systems 12, pages 873–879. MIT Press, 2000.Google Scholar
  72. 72.
    S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K.R. Müller. Fisher discriminant analysis with kernels. In Proceedings of IEEE Neural Networks for Signal Processing Workshop, pages 41–48. IEEE Press, 2001.Google Scholar
  73. 73.
    M.L. Minsky and S.A. Papert. Perceptrons. MIT Press, 1969.Google Scholar
  74. 74.
    J. Moody and C. Darken. Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2):281–294, 1989.Google Scholar
  75. 75.
    R. Neal. Bayesian Learning in Neural Networks. Springer-Verlag, 1996.Google Scholar
  76. 76.
    A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems 14, pages 849–856. MIT Press, 2002.Google Scholar
  77. 77.
    E. Osuna, R. Freund, and F. Girosi. An improved training algorithm for support vector machines. In Neural Networks for Signal Processing VII, Proceedings of the 1997 IEEE Workshop, pages 276–285. IEEE Press, 1997.Google Scholar
  78. 78.
    E. Osuna and F. Girosi. Reducing the run-time complexity in support vector machines. In Advances in Kernel Methods, pages 271–284. MIT Press, 1999.Google Scholar
  79. 79.
    A. Paccanaro, C. Chennubhotla, J.A. Casbon, and M.A.S. Saqi. Spectral clustering of protein sequences. In Proceedings of International Joint Conference on Neural Networks, pages 3083–3088. IEEE Press, 2003.Google Scholar
  80. 80.
    J.C. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods, pages 185–208. MIT Press, 1999.Google Scholar
  81. 81.
    J.C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin dags for multiclass classification. In Advances in Neural Information Processing Systems 12, pages 547–553. MIT Press, 2000.Google Scholar
  82. 82.
    T. Poggio and F. Girosi. Networks for approximation and learning. Proceedings of the IEEE, 78(9):1481–1497, 1990.Google Scholar
  83. 83.
    M. Pontil and A. Verri. Support vector machines for 3-d object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(6):637–646, 1998.Google Scholar
  84. 84.
    M.J.D. Powell. Radial basis functions for multivariable interpolation: A review. In Algorithms for Approximation, pages 143–167. Clarendon Press, 1987.Google Scholar
  85. 85.
    A.K. Qinand and P.N. Sugantham. Kernel neural gas algorithms with application to cluster analysis. In iCPR- 17th International Conference on Fuzzy Systems, pages 617–620. Clarendon Press, 2004.Google Scholar
  86. 86.
    C.E. Rasmussen and C. Willims. Gaussian Processes for Machine Learning. MIT Press, 2006.Google Scholar
  87. 87.
    K. Rose. Deterministic annealing for clustering, compression, classification, regression, and related optimization problem. Proceedings of the IEEE, 86(11):2210–2239, 1998.Google Scholar
  88. 88.
    R. Rosipal and M. Girolami. An expectation maximization approach to nonlinear component analysis. Neural Computation, 13(3):505–510, 2001.Google Scholar
  89. 89.
    V. Roth, J. Laub, M. Kawanabe, and J.M. Buhmann. Optimal cluster preserving embedding of nonmetric proximity data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(12):1540–1551, 2003.Google Scholar
  90. 90.
    B. Schölkopf and A.J. Smola. Learning with Kernels. MIT Press, 2002.Google Scholar
  91. 91.
    B. Schölkopf, A.J. Smola, and K.R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5):1299–1319, 1998.Google Scholar
  92. 92.
    B. Schölkopf, A.J. Smola, and K.R. Muller. Nonlinear component analysis as a kernel eigenvalue problem. Technical report, Max Planck Institut für Biologische Kybernetik, 1998.Google Scholar
  93. 93.
    B. Schölkopf, R.C. Williamson, A.J. Smola, J. Shawe-Taylor, and J. Platt. Support vector method for novelty detection. In Advances in Neural Information Processing Systems 12, pages 526–532. MIT Press, 2000.Google Scholar
  94. 94.
    J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004.Google Scholar
  95. 95.
    J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.Google Scholar
  96. 96.
    D.M.J. Tax and R.P.W. Duin. Support vector domain description. Pattern Recognition Letters, 20(11–13):1191–1199, 1999.Google Scholar
  97. 97.
    A.N. Tikhonov. On solving ill-posed problem and method of regularization. Dokl. Acad. Nauk USSR, 153:501–504, 1963.Google Scholar
  98. 98.
    A.N. Tikhonov and V.Y. Arsenin. Solution of ill-posed problems. W.H. Winston, 2002.Google Scholar
  99. 99.
    I. Tsochantaridis, T. Hoffman, T. Joachims, and Y. Altun. Support vector learning for interdependent and structured output spaces. In Proceedings of ICML04. IEEE Press, 2004.Google Scholar
  100. 100.
    C.J. Twining and C.J. Taylor. The use of kernel principal component analysis to model data distributions. Pattern Recognition, 36(1):217–227, 2003.Google Scholar
  101. 101.
    V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, 1995.Google Scholar
  102. 102.
    V.N. Vapnik. Statistical Learning Theory. John Wiley, 1998.Google Scholar
  103. 103.
    V.N. Vapnik and A.Ya. Chervonenkis. A note on one class of perceptron. Automation and Remote Control, 25:103–109, 1964.Google Scholar
  104. 104.
    V.N. Vapnik and A. Lerner. Pattern recognition using generalized portrait method. Automation and Remote Control, 24:774–780, 1963.Google Scholar
  105. 105.
    S. Vishwanathan and A.J. Smola. Fast kernels for string and tree matching. In Advances in Neural Information Processing Systems 15, pages 569–576. MIT Press, 2003.Google Scholar
  106. 106.
    U. von Luxburg, M. Belkin, and O. Bosquet. Consistency of spectral clustering. Technical report, Max Planck Institut für Biologische Kybernetik, 2004.Google Scholar
  107. 107.
    U. von Luxburg, M. Belkin, and O. Bosquet. Limits of spectral clustering. In Advances in Neural Information Processing Systems 17. MIT Press, 2005.Google Scholar
  108. 108.
    D. Wagner and F. Wagner. Between min cut and graph bisection. In Mathematical Foundations of Kernel Methods, pages 744–750, 1993.Google Scholar
  109. 109.
    G. Wahba. Spline Models for Observational Data. SIAM, 1990.Google Scholar
  110. 110.
    J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, and C. Watkins. Support vector density estimation. In Advances in Kernel Methods, pages 293–306. MIT Press, 1999.Google Scholar
  111. 111.
    J. Weston and C. Watkins. Multi-class support vector machines. In Proceedings of ESANN99, pages 219–224. D. Facto Press, 1999.Google Scholar
  112. 112.
    C.K.I. Williams and D. Barber. Bayesian classification with Gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(12):1342–1351, 1998.Google Scholar
  113. 113.
    W.H. Wolberg and O. Mangasarian. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences, U.S.A., 87:9193–9196, 1990.Google Scholar
  114. 114.
    Z.D. Wu, W.X. Xie, and J.P. Yu. Fuzzy c-means clustering algorithm based on kernel method. In Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications, ICCIMA 2003, pages 49–54. IEEE, 2003.Google Scholar
  115. 115.
    J. Yang, V. Estvill-Castro, and S.K. Chalup. Support vector clustering through proximity graph modelling. In Neural Information Processing 2002, ICONIP’02, pages 898–903, 2002.Google Scholar
  116. 116.
    S.X. Yu and J. Shi. Multiclass spectral clustering. In ICCV’03: Proceedings of the Ninth IEEE Conference on Computer Vision. IEEE Computer Society, 2003.Google Scholar
  117. 117.
    D.-Q. Zhang and S.-C. Chen. Fuzzy clustering using kernel method. In The 2002 International Conference on Control and Automation, pages 162–163, 2002.Google Scholar
  118. 118.
    D.-Q. Zhang and S.-C. Chen. Kernel based fuzzy and possibilistic c-means clustering. In Proceedings of the Fifth International Conference on Artificial Neural Networks, ICANN 2003, pages 122–125, 2003.Google Scholar
  119. 119.
    D.-Q. Zhang and S.-C. Chen. A novel kernelized fuzzy c-means algorithms with applications in image segmentation. Artificial Intelligence in Medicine, 32(1):37–50, 2004.Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Department of Science and TechnologyParthenope University of NaplesNaplesItaly
  2. 2.School of Computing Science and the Institute of Neuroscience and PsychologyUniversity of GlasgowGlasgowUK

Personalised recommendations